2) Computer Audition

Computer audition refers to research field of audio understanding by machines. My research on this field focuses on the following topics:

  1. Bio-inspired auditory representations. Recent psychoacoustical findings indicate that most of the perceptual properties of both speech and music are encoded by slow temporal and spectro-temporal modulations. Motivated by this fact, we develop auditory representations, that map a given audio to high-dimensional representations of its slow temporal and spectro-temporal modulations, by employing computational auditory models. Such auditory representations are used as robust and descriptive music representations in content-based music classification, annotation, and segmentation tasks.
  2. Content-based music classification and annotation. We develop novel machine learning methods, mainly using low-rank and sparsity, to automatically classify/annotate music with descriptive tags, e.g., genres, emotions, instruments, etc. by extracting information from the audio signal.
  3. Music structural segmentation. A music piece is described in terms of shorter, possibly repeated sections, which are often labeled according to their musical function in the piece. Automatic music structure analysis aims at describing a music piece in terms of sections by analysing the audio signal and is a core task in many music information retrieval tasks e.g., music annotation, cover song identification etc. To this end, we proposed to cast the problem of music structure analysis, as a subspace segmentation of audio features. Following this research line, we developed various novel subspace clustering and segmentation algorithms.
  4. Audio source identification/Audio forensics. Recorded speech signals convey information not only for the speakers’ identity and the spoken language, but also for the acquisition devices used for their recording. Therefore, it is reasonable to perform acquisition device identification by analysing the recorded speech signal. To this end, we proposed suitable audio features and collaborative representations for telephone handset identification.
Delicious Twitter Digg this StumbleUpon Facebook