Nowadays, there is a dramatic increase of the amount of audio, video, and images created and shared on the internet. From the perspective of machine learning, such multimedia data can be seen as a virtually unlimited supply of training data for various statistical modelling and labelling tasks such as music (audio) analysis and recognition, speech recognition, and computer vision. The analysis of real-world audio and visual data is rather challenging problem due to: the high dimensional nature of the data, the presence of gross errors (e.g., artifacts and pixels corruptions due to noisy recordings, appearance changes due to different illumination/poses etc), the presence of large number outliers (i.e., Big outliers), and the restrictions of the current computational resources. To fully exploit the potential of the multimedia data deluge and address the aforementioned challenges, my theoretical and algorithmic research work mainly focuses on:
- Parsimony (including sparsity and low-rank) aware learning models that depend only on a small number of variables describing the data. Such learning models and algorithms are able to reduce the computational and experimental requirements, exhibiting improved interpretability and generalisation performance.
- Robust subspace learning and error correction in the presence of gross, non-Gaussian errors and Big outliers.
- Linear and multilinear component analysis by resorting to constraint and regularised matrix/tensors decompositions and factorisations. Component analysis refers to a set methods that decompose a signal into components that are relevant for a given task (e.g., feature selection/extraction, dimensionality reduction, source separation, clustering, classification, etc).