Skip to content

toddstep/birdclef

Repository files navigation

birdclef

Algorithm overview for the BirdCLEF 2021 - Birdcall Identification data:

  • data split of audio recordings:
    • train: 56863 recordings
    • validation: 2997 recordings
    • test: 3014 recordings
  • outputs for 397 bird species
  • pretrained Inception neural network backbone
  • recording-level metrics (based on Deep CNN framework for audio event recognition using weakly labeled web data):
    • MultiCategoryAccuracy: a value of 1 for a recording if its top scoring species is in either the primary or secondary bird labels; a value of 0 otherwise.
    • CrossentropyModified: species in a recording's secondary bird labels are effectively ignored. Remaining species are used for its loss computation.
  • training augmentation (the code for the random masks of the time/frequency slices is based on the documentation for Tensorflow IO's audio package):
    • remove a random amount of audio from the beginning of each recording that is longer than 5 seconds
    • apply a mask to a random time slice
    • apply a mask to a random frequency slice
    • randomly change the decibel threshold
    • add waveform from another recording of the same species
  • tunable hyperparameters:
    • (initial) learning rate (of Adam optimizer)
    • maximum time slice mask
    • maximum frequency slice mask
    • maximum decibel threshold reduction
    • dropout rate for final layer of Inception
  • fixed hyperparameter:
    • weight multiplied to loss for features with positive labels
  • recording-level accuracy on the validation data. (Accuracy on a blind test set would likely be lower):
    • 0.7718

Implementation details:

  • In the left panel click on Domains
  • Click on Create Domain
  • Select Set up for single user (Quick setup). Click on Set up
  • After the environment is set up, click on Open Studio
  • Click on JupyterLab
  • Click on Create JupyterLab space
  • Enter a name for the space. Leave Sharing as Private. Click on Create space
  • Set Storage (GB) to 90. Set Instance to ml.t3.large.
  • Click on AWS Services
  • Search for and select Amazon SageMaker
  • Search for and select ml.g5.4xlarge for spot training job usage
  • Click on Request increase at account level
  • For Increase quota value, enter 3
  • Click on Request

Prepare data on an AWS instance that has FFMPEG installed:

  • Install Kaggle app
  • Download Kaggle token from Settings->Account->API to ~/.kaggle/kaggle.json
  • Agree to Kaggle birdsong data rules.
  • Download Kaggle birdsong data and put into SageMaker bucket (requires access to the S3 bucket):
kaggle competitions download -c birdclef-2021
aws s3 cp birdclef-2021.zip s3://sagemaker-{REGION}-{ACCOUNT}/

Train model:

Analyze model:

  • Wait for the hyperparameter tuning job to complete,
    • The tuning jobs can be monitored at Training | Amazon SageMaker AI
    • Best tuning job:
      • Hyperparameter values:
        • learning_rate: 0.0003819051384479275
        • time_mask_param: 199
        • freq_mask_param: 87
        • reduce_db_param: 0.4612848134279439
        • feat_drop_rate: 0.36573757215236335
        • pos_weight: 1 (fixed)
      • 24 training epochs (best epoch chosen using an accuracy precision of 0.01)
  • Go to SageMaker Studio.
  • Open previously created JupyterLab space
  • Estimate optimal threshold level for a 1% targeted false-positive rate using Test Scores Analysis. Ideally, this would be done on a dataset not previously used in tuning the model.
    • Note: for simplicity, the secondary labels are not accounted for in this analysis.

Create endpoint

Prepare web demo:

Deploy on AWS Serverless Application Model (SAM):

sam build -u
sam deploy --guided

About

Birdsong modeling for Kaggle's birdclef data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages