The Machine Learning Meets Bioacoustics Workshop : the Bird Challenge

Home of ICML4B wkp : Home of ICML4B challenges : SABIOD project : Sueur : Glotin :

Welcome to the ICML4B BIRD CHALLENGE (deadline 16th of june)

The data for this challenge are copyright of Fernand Deroussen Jerome Sueur of the Musee National d Histoire Naturelle, their usage is restricted to this challenge. The competition test data was graciously provided by Jerome Sueur.
Train Data extracted from :
Fernand Deroussen naturophonia.fr
Deroussen, F., 2001. Oiseaux des jardins de France. Nashvert Production, Charenton, France
Deroussen, F., Jiguet, F., 2006. La sonotheque du Museum: Oiseaux de France, les passereaux. Nashvert production, Charenton, France

Here is the link to KAGGLE WEB SITE with description and RUN SUBMISSION (+SCORING).
We are pleased to announce that the challenge on bird classification at ICML workshop is now running on Kaggle web site [ https://www.kaggle.com/c/the-icml-2013-bird-challenge ].
=> thus each of your run (~not limited number) are automatically scored : this leaderboard is calculated on approximately 33% of the test data.

The final rank will be based on the other 67%, so the final standings may be different. Kaggle competitors are used to selecting 5 models at the end of the competition (16th of june).

You find below the data sets, also available at Kaggle on other format (CVS). If you participate to this challenge, please Email to icml4b@gmail.com for free inscription, and to get updated news if any on these data during the challenge.

Task description : for each test file, you have to index the 35 bird species given in the official training set (below).
Runs submission : each run will be submitted into the Kaggle web site according the details given on the Kaggle web site. Each run gives the Pij (90x35) probabilities :
Pij = P('The species j sings in the test file i') , with j and i in alphabetical order.

You can submit up to 5 runs in the official contest, choosen from the ones you submitted to Kaggle until the 16th of june. Your run may include if possible at least one using the given MFCC features.
Evaluations will be computed on ROC.
* You are free to use any additional external data or recordings (as wikipedia wav samples linked in the list below, or taxonomia for hierarchical classification, ...), but in that case you must specify it in you run description (.txt) and this run with modified train set will not count for the prize of the challenge.

Sorted list of the 35 bird species (= classes) with wikipedia links

link to the TRAIN SET : 35 .WAV files and suggested FEATURES (see also Kaggle for various format)

link to the TEST SET : 90 .WAV files and suggested FEATURES (see also Kaggle for various format)

** This TEST SET includes the results for three test files to be used as a small development set (please include them at their right place into your run).

The suggested MFCC features were computed according to a minimum error (in average on all the species) reconstruction signal of the signal. The scripts are given here : MFCC SCRIPTS

METADATA : you may use some metadata or other training set to enhance your model, such as :
PHYLOGENETIC tree to reveal some acoustic cues between species. This tree, with distances, is given there. The phylogenetic tree, with distances, of the target species.
WEATHER : The weather, wind speed, humidity, sun conditions... of each test set files.

SUBMISSION : The ground truth of each test set will be used to score ROC. * The participants must not try to hand label the test set for tuning their model: every parameter will be automatically set.
Guidelines on submitting results are simple as explained in Kaggle.
You are invited to send a working note with your partial results by the 30th of may following the ICML template: it will be published into this ICML workshop Proceedings.

Last updated: $Date: 2013/05/08 $
Glotin Herve' <glotin@univ-tln.fr>