Online Subspace Learning (and recent applications in background subtraction)

In this post I will review two recent papers that I have been reading up on background subtraction, which follow the assumption that the background in an image approximately lies on a low dimensional subspace, under different lighting conditions. The two works namely –GROUSEGRASTA have shown interesting results for realtime background subtraction, which could potentially be useful and more robust than standard techniques which are fast like using the median filter, or mixture of Gaussians per pixel. Online Subspace Learning: The underlying premise of these methods is that if a dynamical system is very high dimensional, it can often be well approximated by a low dimensional subspace. As a result, instead of actually observing all the dimensions (which maybe impractical), one can randomly subsample the dimensions and make inferences on the lower dimensional data without losing a lot of information, i.e. learn the subspace accurately. All the three works I will focus on use the fact that the space of all subspaces is called the Grassmannian manifold. So the optimization of the cost function that will be proposed in each case will involve some gradient descent on the curved space.

(side note) The Grassmannian is an interesting manifold which I encounter in my work (example 1, 2) quite often but in a slightly different context. The Grassmannian provides a nice way to represent shapes in the context of human silhouettes for activity analysis where a shape is represented as a subspace.  

More info on the Grassmann manifold and its properties here: The Geometry of algorithms with orthogonality constraints GROUSE: (Grassmannian Rank-One Update Subspace Estimation) To track a d-dimensional subspace of \mathcal{R}^n that evolves over time denoted by S[t]. At every time instant we observe a vector v_t \in S[t]  at locations \Omega_t \subset {1,\dots,n}. Basically randomly picking dimensions within the large n dimensional vector to accurately approximate the d – dimensional subspace. Let U be a basis for S. The cost function is as follows: F(S;t) = \underset{a}{\text{min}} ||U_{\Omega_t}a - v_{\Omega_t}||^2 Here U lies on the Grassmann manifold as it represents the subspace. To solve for U at each step, we need to find the gradient on the Grassmannian manifold. To do this, the authors represent the cost function in terms of U rather than U_{\Omega_t} as follows: F(S;t) = \underset{a}{\text{min}} ||\Delta_{\Omega_t}(Ua - v_t||^2 we have now pushed the indexing into \Delta_{\Omega_t}. Next, we take the derivate of this expression with respect to you and use that information to obtain the gradient of the function on the Grassmannian. This allows us to traverse along the direction of the gradient to find the best update of U using the following algorithm. Screen Shot 2014-06-29 at 1.17.23 PM   The important contribution of this work is that one can operate on a small subset of dimensions and still recover the entire subspace. This allows us to search and update the subspace much quicker than traditional methods. It is also been shown that such a technique is an excellent proxy for the amount of energy that lies in the subspace, provided that the number of dimensions (|\Omega|) is greater than the true subspace dimension (d). GRASTA : Grasmannian Robust Adaptive Subspace Tracking Algorithm Using the same idea of subspace tracking, the GRASTA attempts to address the problem of background subtraction. They make the assumption that in modeling a scene, the foreground belong to the objects that change more rapidly than the background. The simplest approach for BG-Subtraction is to keep a running mean or median of the previous frames which is efficient and easy to implement, but is not very responsive to quick changes in the scene like lighting  etc. Using the median gives a 0-dimensional model for the background where we can measure the residual of a given frame as its distance from a point (the median). This model can be made more powerful by keeping a d-dimensional subspace or affine set. Here each image is vectorized into a long vector, and such frames are observed. This matrix can be expressed in an d-dimensional orthonormal basis with weights (similar to above). Finally the model is described as below: At each time step t, a vector frame is generated as follows: v_t = U_t w_t + s_t + \eta_t where U is the orthonormal basis, w_t is the weight/coefficient vector, s_t is the sparse outlier vector that models the foreground (since only a small part of the long vector v_t is the foreground), and the last term is the noise. The problem then is to find out the optimal basis U_t and the weight vectors along with the sparse term. This is solved using the gradient descent on the Grassmannian as earlier and Alternating Direction Method of Multipliers (ADMM) respectively. The “robustness” as in GRASTA is from the fact that the cost function is formulated using the \ell_1 norm as opposed to \ell_2 norm as is done in the case of GROUSE.