ICML 2013 Workshop on Machine Learning for Bioacoustics

Biodiversity assessment remains one of the most difficult challenges encountered by ecologists and conservation biologists. There is a critical need to describe and quantify the spatio-temporal dynamics of biodiversity over ecologically meaningful scales and to provide timely syntheses and interpretations so as to enable responsible decisions that reduce risks to endangered species, populations and habitats from anthropogenic activities.

This task has become even more urgent with the current increase of habitat loss and global environmental changes as a result of global commercial and industrial activities. The field of animal bioacoustics has received increasing attention due to its diverse potential benefits to science and society, and is increasingly required by regulatory agencies as a tool for timely monitoring and mitigation of environmental impacts from human activities. The increased expectations from bioacoustic research have been coincident with a dramatic increase in the spatial, temporal and spectral scales of acoustic data collection efforts. The bottleneck at this point is not access to raw data. It is the inability to efficiently process, visualize and interpret large volumes of data within an advanced, data management system.

This workshop brings together a cohort of world class scientists with expertise in animal bioacoustics, digital signal processing and machine learning to specifically address the emerging field of bioacoustic machine learning, from basic to applied research.

The features and biological significance of animal sounds, while constrained by the physics of sound production and propagation, have evolved through the processes of natural selection. Additional insights have been gained through analysis and attempts of modeling of animal sounds as related to critical life functions (e.g. communicating, mating, migrating, navigating, etc.); social context; and individual, species and population identification. Most recently, researchers in the field have been exploring and identifying possible links and correlations between the dynamics of animal sound development and the evolution of human speech. These observations have led to both quantitative and qualitative advancements such as using MRIs for monitoring bird song ontogeny and human brain activity associated with linguistic metaphors, or the use of genetic algorithms to identify a possible common framework in the evolution of human and non-human cultural relationships. From an applied perspective, very basic, semi-automated systems for near-real-time acoustic detection of species of concern are being used by regulatory agencies to dynamically monitor and mitigate human activities, and there is increasing demand for such near-real-time capabilities.

Although, the majority of the existing applications lend themselves to widely used, advanced acoustic signal processing methodologies, the field has yet to successfully integrate robust signal processing and machine learning algorithms due to multiple and diverse challenges. Specifically, the dynamic and variable factors in the collection and analysis of raw data from both wild and captive environments often require the use of real-time or near-real-time systems that minimize manual interaction/supervision. This requirement can be strongly coupled with the creation and employment of on-line algorithms and stochastic optimization techniques allowing field researchers to assess the computational and accuracy trade-offs without compromising the data collection process. Eventually, results from intelligent, open-access systems could offer significant societal benefits by raising public awareness of natural phenomena and exposing possible hazardous interactions between wildlife and humans allowing for swift mitigation procedures.

An additional, yet critical issue in present bioacoustic analysis strategies is the inability to provide comprehensive, accurate species validation across the full suite of signals available in very large sets of raw data. The process of extracting ground-truth, typically involves manual interaction by experts, which is an intractable task. This inherent bottleneck significantly limits our ability to identify a species’ complete signal variability across the multiple dimensions of its acoustic signals, which thereby constrains our ability to process data at scales commensurate with the spatial-temporal-spectral biodiversity needs. The application of advanced, unsupervised learning algorithms offers a possible solution to this problem because it would enable rapid computational access into the unique, underlying characteristics of the species-specific features, which would accelerate the recognition task. Successful completion of this stage could then be combined with supervised methodologies to yield a robust, iterative system for automatically processing very large amounts of data and visualizing those data products over appropriate ecological scales.

Moreover, automatic and accurate species recognition remains a top priority in the field. This is a highly complex and challenging task. To be effective it needs to mirror the complexities of the hierarchical acoustic structures so often found within animal acoustic signaling behaviors, which would involve the application of both discriminative and generative approaches. Depending on the type of species under study, shallow or deep architectures might be favored. However, the diversities of the vocalization repertoires of the different species combined with their underlying biological structures indicate that any analysis and modeling would greatly benefit by integrating sparse constraints in order to increase the discriminative power of the models.

Finally, the lack of standardization and unified comparative framework, combined with the different environments and contexts of large scale data collection creates a unique domain adaptation and transfer learning framework whereby the proposed machine learning methodologies need to provide an adequate intra- and inter-species generalization.

In conclusion, the application of machine learning processes to bioacoustic signal recognition analysis and modeling of large data sets promises to yield significant theoretical and applied advances in present understandings of complex, learned animal vocal behaviors and in the quantitative description of biodiversity over ecologically meaningful spatio-temporal-spectral scales.