TweetyNet Missing Annotations #153

lmendes14 · 2023-04-28T21:17:17Z

When running TweetyNet in PyHa through generate_automated_labels() from the IsoAutio module, some files result in neither any annotations nor in any errors.

We found this issue when running it on the BirdCLEF2023 training data.

We used the following parameters when running TN:

isolation_parameters_tweety = { "model" : "tweetynet", "tweety_output": True, "verbose" : True }

When we called generate_automated_labels(), we passed in the default parameters.

I've uploaded all of the .wav files that resulted in zero annotations or errors to the shared e4e google drive under the folder 'BirdCLEF2023_Missing_Files_Issue'.

The text was updated successfully, but these errors were encountered:

JacobGlennAyers · 2023-04-28T22:12:12Z

Did you use the spectrogram and local score array output function to see exactly what the local score arrays look like?

JacobGlennAyers · 2023-04-28T22:15:20Z

Specifically, the spectrogram_visualization() function might help you get some insights. Keep in mind that this tweetynet model was trained on south american xeno-canto clips, so it is likely just missing African birds. Also, we have better datasets to train with since this model was trained, something that your team should consider looking into. Specifically, we have the data science team's annotations of near 2000 audio clips, we also have the dataset that was annotated by COSMOS students last summer. Sam should know where these are located.

Sean1572 · 2023-04-29T01:37:07Z

I recommended creating this issue since the system didn't throw an error for zero detections via zero division. If the model doesn't detect a bird, we should have an error thrown that the user sees, so a file that doesn't appear to have been processed is worth looking into.

As for tweetynet, wasn't it trained on bird-vox and another European bird dataset? Which south American datasets was tweetynet trained on?

JacobGlennAyers · 2023-04-30T01:19:18Z

I think that the easiest way to accomplish this would be to have a parameter that could be a bit of a "failure report" and you could construct a set of the clip names as the clips are iterated through in generate_automated_labels(). Once you get the output dataframe, you can create a set from the FILE NAME column. You can use set theory to subtract the successful clips from all clips which would give you a list of failed clips.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TweetyNet Missing Annotations #153

TweetyNet Missing Annotations #153

lmendes14 commented Apr 28, 2023

JacobGlennAyers commented Apr 28, 2023

JacobGlennAyers commented Apr 28, 2023

Sean1572 commented Apr 29, 2023 •

edited

Loading

JacobGlennAyers commented Apr 30, 2023

TweetyNet Missing Annotations #153

TweetyNet Missing Annotations #153

Comments

lmendes14 commented Apr 28, 2023

JacobGlennAyers commented Apr 28, 2023

JacobGlennAyers commented Apr 28, 2023

Sean1572 commented Apr 29, 2023 • edited Loading

JacobGlennAyers commented Apr 30, 2023

Sean1572 commented Apr 29, 2023 •

edited

Loading