Dataset Documentation and Retrieval
These workshop datasets have been provided by the Scripps Institution of Oceanography for the 8th DCLDE Workshop. They consist of acoustic recordings from multiple deployments of high-frequency acoustic recording packages (Wiggins and Hildebrand, 2007) deployed in the Western North Atlantic (US EEZ) and Gulf of Mexico. Separate sets of development data are provided for mysticetes and odontocetes. The mysticete data have been decimated to 1 kHz bandwidth (2 kHz sample rate) and the odontocete data have 100 kHz of bandwidth (200 kHz sample rate). Data were selected to cover multiple seasons and locations while providing high species diversity and call counts. If you would like to learn how to access these datasets, please check out Dataset Retrieval
As for previous workshops, this common dataset of underwater recordings is being made available to encourage researchers to compare results. Large training and testing datasets are designed to cover the range of spatial, temporal, and recording variability that may be encountered by researchers in the field. We hope that these datasets will provide an opportunity to develop detectors and classifiers that will perform robustly across different tasks and under novel conditions.
In order to emulate the research teams, an online white test system will be proposed mid february, on a small section of the test set, so that each team could during the challenge have an overview of their progress in respect to the current results of the communauty.
In order to receive updates on these task / evaluation, please register by a simple email to firstname.lastname@example.org, subject: DCLDEeval.
High-Frequency Data Content Descriptions
The high-frequency dataset consists of marked encounters with echolocation clicks of species commonly found along the US Atlantic Coast, and in the Gulf of Mexico:
- Mesoplodon europaeus - Gervais' beaked whale
- Ziphius cavirostris - Cuvier's beaked whale
- Mesoplodon bidens - Sowerby's beaked whale
- Lagenorhynchus acutus - Atlantic white-sided dolphin
- Grampus griseus - Risso's dolphin
- Globicephala macrorhynchus - Short-finned pilot whale
- Stenella sp. - Stenellid dolphins
- Delphinid type A
- Delphinid type B
- Unidentified delphinid - delphinid other than those described above
The goal for these datasets is to identify acoustic encounters by species during times when animals were echolocating. Analysts examined data for echolocation clicks and approximated the start and end times of acoustic encounters. Any period that was separated from another one by five minutes or more was marked as a separate encounter. Whistle activity was not considered. Consequently, while the use of whistle information during echolocation activity is appropriate, reporting a species based on whistles in the absence of echolocation activity will be considered a false positive for this classification task.
Low-Frequency Data Content Descriptions
The low-frequency dataset consists of annotated data for specific calls from two mysticete species found along the US Atlantic coast:
- Balaenoptera musculus- North Atlantic blue whale tonal calls (Mellinger and Clark, 2003)
- Eubalaena glacialis- North Atlantic right whale up-call (Parks Tyack, 2005)
The goal for this dataset is to identify specific blue whale tonal calls, and right whale up-calls. Analysts examined data using long-term spectral averages as well as manual scanning of the data for individual calls.
Acoustic data are provided as wav files, with the filename encoding the site, deployment, and starting timestamp of each file.
High frequency ex:WAT_HZ_01_151014_055500.x.wav
- WAT_HZ_01 indicates the first deployment at site HZ (Heezen), within the Western Atlantic (WAT) project. Other project names are HAT (Cape Hatteras), JAX (Jacksonville), GofMX (Gulf of Mexico).
- Recording started October 14th, 2015 at 5:55:00. All times are UTC.
Low frequency files are similar but contain additional fields in the filename related to decimation from the original high-frequency dataset.
Low-frequency example: HAT_A_02_121021_000000.d100.x.wav
- HAT_A_02 indicates the second deployment at site A, within the HAT (Cape Hatteras) project.
- Recording started October 21st, 2012 at 00:00:00, and has been decimated by a factor of 100.
Recording Locations ** Updated FEB 3rd **
Data were recored from different locations in the Western North Atlantic and Gulf of Mexico as shown in the figure below. The accompanying table lists the coordinates, and depth of the various sites. These data were collect between 2011 and 2015, and the time period for each recording can be inferred directly from the data.
Preamplifiers for HARPs have been calibrated and two Matlab routines are provided to show how to apply the appropriate transfer function. All necessary files will be soon available.
- gettransferfn(filename, BinsHz)- Assuming that the transfer function folder is in the same folder as this function, it will parse the filename and load the appropriate transfer function. The function will be sampled at the frequency bin center frequencies provided in BinHz and the appropriate offsets will be returned.
- tfadjustexample()- This function prompts the user for a filename, reads the first 1/10th of a second of data and produces a plot of sound pressure level after applying the transfer function.
Format of the Annotation
Comma separated value files are used as input to routines that compute the precision and recall as well as coverage and fragmentation for encounters (see Roch et al., 2011 for details). The following species abbreviations should be used:
|Me||Mesoplodon europaeus- Gervais beaked whale|
|Zc||Ziphius cavirostris- Cuvier's beaked whale|
|Mb||Mesoplodon bidens- Sowerby's beaked whale|
|La||Lagenorhynchus acutus- Atlantic white-sided dolphin|
|Gg||Grampus griseus- Risso's dolphin|
|Gma||Globicephala macrorhynchus- Short-finned pilot whale|
|Ssp||Stenella sp.Stenellid dolphin|
|UDA||Delphinid type A|
|UDB||Delphinid type B|
|Bm||Balaenoptera musculus- blue whale|
|Eg||Eubalaena glacialis- North Atlantic right whale|
For encounter level tests, the result file should contain comma separated value (CSV) entries with each line as follows:
Time stamps are provided as follows: YYYY-MM-DDTHH:MM:SS with an optional decimal and fractional seconds following the seconds field
Example for Cuvier's beaked whale detection at HAT site A:
Call level results for blue and right whales are similar, with the addition of:
Spaces between fields may be included or omitted. A scoring script will be provided by the conference organizers in March so that participants can evaluate their algorithms performance on the development data. Ground truth data based on trained analyst annotations is provided for the development data set.
Analyst Annotations Retrieval (UPDATED on JAN 23rd)
The annotations of the Low-frequency and High-frequency data can be obtained from the following Google Drive links:
Low-Frequency Dataset (WAV files)
Low-frequency data can be obtained from the following Google Drive links:
High-Frequency Dataset (WAV files)
Purchase a disk (approximately $130.00) from the link below:
Please have it shipped to:
Scripps Institution of Oceanography
Attn: Erin O'Neill
Ritter Hall/Room 208
8635 Discovery Way
La Jolla, CA 92037
Once the high frequency dataset has been copied onto your disk, the disk will be shipped to you: please send your name and address to us at email@example.com
Evaluation Dataset (WAV files)
A separate evaluation data set will be provided ONLINE (400 Go for High-Frequency, few Go for Low-Frequency) at a later date, without labels. Participants wishing to be part of the algorithm comparison will be able to submit their detector CSV files via the web site. This (official) evaluation dataset will contain additional weeks of data from the sites that have been included in the development set and data from a deployment that was not present in the development set.
Metrics / Scoring Tools
The scoring tool is designed to compare detections with the ground truth files provided for the workshop. A link to the scoring tool will appear here in advance of the workshop. It accepts files in the workshop CSV format:
For the high-frequency task, the result file should contain comma separated value (CSV) entries with each line as follows (see the DCLDE dataset description for further details on species abbreviations, etc.):
Time stamps are provided in ISO 8601 format: YYYY-MM-DDTHH:MM:SS with an optional decimal and fractional seconds following the seconds field. Example for Cuvier's beaked whale detection at HAT site A:
Low-frequency-task results for blue and right whales are similar. Low frequency dataset groundtruth files contain an additional field describing detection quality ('good' or 'poor'). Poor quality calls will not contribute to classification scores.
Spaces between fields may be included or omitted.
More details on the metrics will be distributed in February.
- GLOTIN Hervé, LIS, Univ Toulon, France
- HILDEBRAND John, Scripps, California, USA
- Mellinger, D. K., Carson, C. D., Clark, C. W. (2000). Characteristics of minke whale (Balaenoptera acutorostrata) pulse trains recorded near Puerto Rico. Mar. Mamm. Sci,16(4), 739-756.
- Mellinger, D. K., Clark, C. W. (2003). Blue whale (Balaenoptera musculus) sounds from the North Atlantic. J Acoust Soc Am 114(2), 1108-1119.
- Parks, S. E., Tyack, P. L. (2005). Sound production by North Atlantic right whales (Eubalaena glacialis) in surface active groups. J Acoust Soc Am, 117(5), 3297-3306.
- Roch, M. A., Brandes, T. S., Patel, B., Barkley, Y., Baumann-Pickering, S. and Soldevilla, M. S. (2011). Automated extraction of odontocete whistle contours. J Acoust Soc Am 130, 2212-23, doi:10.1121/1.3624821.
- Wiggins, S. M. and Hildebrand, J. A.(2007). High-frequency Acoustic Recording Package (HARP) for broad-band, long-term marine mammal monitoring. In Intl. Symp. Underwater Tech., pp. 551-557. Tokyo, Japan.