[NIPS4B] Neural Information Processing Scaled for Bioacoustics

Prof. Yann LeCun - New York University, USA (confirmed)
"Shallow and deep architectures, sparse representations for bioacoustic classification"

Intelligent perceptual tasks such as audition require the construction of good internal representations. Theoretical and empirical evidence suggest that the perceptual world is best represented by a multi-stage hierarchy in which features in successive stages are increasingly global, invariant, and abstract. An important challenge for Machine Learning is to devise "deep learning" methods for multi-stage architecture that can automatically learn good feature hierarchies from labeled and unlabeled data. A class of such methods that combine unsupervised sparse coding, and supervised refinement will be described. We demonstrate the use of these deep learning methods to train convolutional networks (ConvNets). ConvNets are biologically-inspired architectures consisting of multiple stages of filter banks, interspersed with non-linear operations, and spatial pooling operations, analogous to the simple cells and complex cells in the mammalian auditory cortex. A number of applications will be shown.

Prof. Ofer Tchernichovski - Hunter College - CUNY, NY, USA (confirmed)
"Physiological brain processes that underlie song learning"

Human language, as well as birdsong, relies on the ability to imitate vocal sounds and arrange them in new sequences. During developmental song learning, the songbird brain produces highly variable song patterns, which allow vocal exploration to guide learning. Tracking song development continuously show that exploratory variability is regulated in fine time scales, such that each song element becomes less variable independently when approaching the target (adult tutor) song. Therefore, multiple localized reinforcement-learning processes can explain how the bird learn to match specific song elements. However, we found that vocal exploration alone cannot explain how birds learn to match vocal combinatorial sequences. Combining an experimental approach in zebra finches with an analysis of natural development of vocal transitions in Bengalese finches and pre-lingual human infants, we found a common, stepwise pattern of acquiring vocal transitions across species. Results point to a common generative process that is conserved across species, suggesting that the long-noted gap between perceptual versus motor combinatorial capabilities in human infants may arise partly from the challenges in constructing new pairwise vocal transitions. Therefore, learning vocal sequences is likely to be constraint by a neuronal growth process, perhaps of establishing connections between representations of song gestures.

Prof. Hervé Glotin (J. Razik, S. Paris, O. Adam and Y. Doh) - USTV, Institut Universitaire de France, CNRS LSIS, FR (confirmed)
"Whale songs classification using Sparse Coding"

The humpack whale songs relies on the ability of these whales to copy and recombine vocal sounds and to arrange them in new sequences, around many tropical sites all over the planet. We present the advantages of the sparse coding to represent these song sequences in order to track their structure and evolution, and to automatically recognize the area from where this song has been emitted. This representation may also help to understand learning processes that can explain how the whale build new songs. Demonstrations are conducted on true recordings (we thank C. Clark and O. Lammers for sharing some of their samples).

Prof. Gérald Pollack - McGill University, Montréal, CA (confirmed)
"Neuroethology of hearing in crickets: embeded neural process to avoid bat"

Many behavioral studies on crickets have identified the relationships between signal structure and behavioral effectiveness, and the neural basis for sound reception and analysis. We'll present the behavioral studies on signal recognition; relationships between stimulus structure and behavioral effectiveness; roles of sound frequency, stimulus temporal structure; positive and negative phonotaxis to cricket-like and bat-like signals, respectively
Early auditory processing: separate channels for processing mate-attraction signals and predator-derived signals (ultrasound); temporal response properties of receptor neurons and first-order interneurons. And descending brain neurons: conveying the results of processing in the brain to motor centers that control behavior.

Dr. Xanadu Halkias - CNRS LSIS, USTV, Toulon, FR (confirmed)
"Classification of Mysticete Sounds using Machine Learning Techniques"

Classification of mysticete sounds has long been a challenging task in the bioacoustics field. The diverse nature of the signals due to the inherent variations as well as the use of different recording apparatus and low Signal to Noise Ratio conditions, often lead to systems that are not able to generalize across different species and require either manual interaction or hyper-tuning in order to fit the underlying distributions. This talk presents a Restricted Boltzmann Machine (RBM) and a Sparse Auto-Encoder (SAE) in order to learn discriminative structure tokens for the different calls, which can then be used in a classification framework.

Neural Information Processing Scaled for Bioacoustics: NIPS4B