birdclef

Algorithm overview for the BirdCLEF 2021 - Birdcall Identification data:

data split of audio recordings:
- train: 56863 recordings
- validation: 2997 recordings
- test: 3014 recordings
outputs for 397 bird species
pretrained Inception neural network backbone
recording-level metrics (based on Deep CNN framework for audio event recognition using weakly labeled web data):
- MultiCategoryAccuracy: a value of 1 for a recording if its top scoring species is in either the primary or secondary bird labels; a value of 0 otherwise.
- CrossentropyModified: species in a recording's secondary bird labels are effectively ignored. Remaining species are used for its loss computation.
training augmentation (the code for the random masks of the time/frequency slices is based on the documentation for Tensorflow IO's audio package):
- remove a random amount of audio from the beginning of each recording that is longer than 5 seconds
- apply a mask to a random time slice
- apply a mask to a random frequency slice
- randomly change the decibel threshold
- add waveform from another recording of the same species
tunable hyperparameters:
- (initial) learning rate (of Adam optimizer)
- maximum time slice mask
- maximum frequency slice mask
- maximum decibel threshold reduction
- dropout rate for final layer of Inception
fixed hyperparameter:
- weight multiplied to loss for features with positive labels
recording-level accuracy on the validation data. (Accuracy on a blind test set would likely be lower):
- 0.7718

Implementation details:

Setup Amazon SageMaker AI:

Open Amazon SageMaker AI Console:

In the left panel click on Domains
Click on Create Domain
Select Set up for single user (Quick setup). Click on Set up
After the environment is set up, click on Open Studio
Click on JupyterLab
Click on Create JupyterLab space
Enter a name for the space. Leave Sharing as Private. Click on Create space
Set Storage (GB) to 90. Set Instance to ml.t3.large.

Open AWS Service Quotas:

Click on AWS Services
Search for and select Amazon SageMaker
Search for and select ml.g5.4xlarge for spot training job usage
Click on Request increase at account level
For Increase quota value, enter 3
Click on Request

Prepare data on an AWS instance that has FFMPEG installed:

Install Kaggle app
Download Kaggle token from Settings->Account->API to ~/.kaggle/kaggle.json
Agree to Kaggle birdsong data rules.
Download Kaggle birdsong data and put into SageMaker bucket (requires access to the S3 bucket):

kaggle competitions download -c birdclef-2021
aws s3 cp birdclef-2021.zip s3://sagemaker-{REGION}-{ACCOUNT}/

Train model:

From SageMaker studio, open previously created JupyterLab space.
Click on Run space
After the space is running, click on Open JupyterLab
In JupyterLab, click on the Git icon.
Enter the repository's url: https://github.com/toddstep/birdclef.git . Click on Clone
Run Birdclef Training Notebook
Update Test Scores Analysis and Deploy Notebook with the tun.latest_tuning_job.job_name from Birdclef Training Notebook
Tune notebook can be shutdown and the space stopped while the tuning runs.

Analyze model:

Wait for the hyperparameter tuning job to complete,
- The tuning jobs can be monitored at Training | Amazon SageMaker AI
- Best tuning job:
  - Hyperparameter values:
    - learning_rate: 0.0003819051384479275
    - time_mask_param: 199
    - freq_mask_param: 87
    - reduce_db_param: 0.4612848134279439
    - feat_drop_rate: 0.36573757215236335
    - pos_weight: 1 (fixed)
  - 24 training epochs (best epoch chosen using an accuracy precision of 0.01)
Go to SageMaker Studio.
Open previously created JupyterLab space
Estimate optimal threshold level for a 1% targeted false-positive rate using Test Scores Analysis. Ideally, this would be done on a dataset not previously used in tuning the model.
- Note: for simplicity, the secondary labels are not accounted for in this analysis.

Create endpoint

Run Deploy Notebook

Prepare web demo:

Deploy on AWS Serverless Application Model (SAM):

Install Docker:
- On Amazon Linux 2023, follow Creating a container image for use on Amazon ECS.
- On Ubuntu, follow Installing Docker Engine on Ubuntu.
Follow the AWS SAM prerequisites.
Install AWS SAM CLI
Set up a domain and hosted zone in Route 53 for the demo website. Enter this domain for the requested CloudFrontAlias when deploying below.
Create a certificate in AWS Certificate Manager(ACM) for the domain. Enter the certificate's arn for the requested AliasCertificate when deploying below.
Enter bird-predict as the endpoint name.
Enter the median_thresh value from Test Scores Analysis as the threshold level.
Build stack and deploy (see Steps 2 and 3 of Tutorial: Deploying a Hello World application):
- Answer y when asked: FlaskFunction Function Url has no authentication. Is this okay?

sam build -u
sam deploy --guided

Use the CloudFrontAlias and the BirdFrontUrl displayed during the deployment for Routing traffic to an Amazon CloudFront distribution by using your domain name.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
birddata		birddata
birdmodeling		birdmodeling
birdtraining		birdtraining
flask		flask
unittests		unittests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
analyze.ipynb		analyze.ipynb
birdclef-modeling.ipynb		birdclef-modeling.ipynb
calibrate_export_model.py		calibrate_export_model.py
competition_classes.txt		competition_classes.txt
constants.py		constants.py
deploy.ipynb		deploy.ipynb
inference.py		inference.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
requirements_trainingOnly.txt		requirements_trainingOnly.txt
template.yaml		template.yaml
top_birds.py		top_birds.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

birdclef

Algorithm overview for the BirdCLEF 2021 - Birdcall Identification data:

Implementation details:

Setup Amazon SageMaker AI:

Open Amazon SageMaker AI Console:

Open AWS Service Quotas:

Prepare data on an AWS instance that has FFMPEG installed:

Train model:

Analyze model:

Create endpoint

Prepare web demo:

Deploy on AWS Serverless Application Model (SAM):

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

birdclef

Algorithm overview for the BirdCLEF 2021 - Birdcall Identification data:

Implementation details:

Setup Amazon SageMaker AI:

Open Amazon SageMaker AI Console:

Open AWS Service Quotas:

Prepare data on an AWS instance that has FFMPEG installed:

Train model:

Analyze model:

Create endpoint

Prepare web demo:

Deploy on AWS Serverless Application Model (SAM):

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages