Sparsity and Nonnegativity in Least Squares Problems and Matrix Factorizations


Many tasks in machine learning and signal processing rely on linear models where data points can be explained as the results of linear interactions between latent features. To recover these features and extract useful information, we can leverage a priori knowledge or assumptions about the underlying structure of the data. In this thesis, we focus mainly on two assumptions: nonnegativity, meaning that the data points are generated by only additive interactions of features, and sparsity, meaning that only a few features interact to generate a given data point. We also consider the assumption of separability, stating that for each feature there is at least one pure data point, that is a point generated only by this feature. We develop variants of linear models that leverage the assumptions of sparsity and nonnegativity to improve performance or to handle cases that existing approaches fail to handle. We study the theoretical implications of these assumptions and we design algorithms to tackle the corresponding optimization problems. We especially focus on exact algorithms that are guaranteed to recover the underlying solution.

The contributions of this thesis are centered around nonnegative least squares problems (NNLS) and nonnegative matrix factorization (NMF). First, we propose a new variant of separable NMF that leverages the fact that when the separability assumption holds, then there are usually more than one pure data point per feature. We propose two algorithms based on this stronger assumption. Next, we propose a branch-and-bound algorithm to solve exactly sparsity-constrained NNLS problems, as opposed to existing approaches that often rely on heuristics. We also propose a biobjective extension of this algorithm that computes a set of solutions with different tradeoffs between reconstruction error and sparsity. Then, we introduce an algorithmic framework to enforce matrix-wise sparsity constraints on NNLS problems with multiple right-hand sides, and we prove it is optimal in some cases. Finally, we explore a variant of NMF combining the assumptions of separability and sparsity and we provide a guaranteed algorithm to tackle it. This variant is the first to handle the case where a feature is an additive linear combination of other features.

Although we focus on the models and algorithms rather than on the applications, our underlying motivation is to perform feature extraction in images and notably hyperspectral unmixing, that is, the identification of materials and their distribution in a hyperspectral image. All our proposed models are validated empirically on the unmixing of real-world hyperspectral images.

PhD Thesis