Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TweetyNet Missing Annotations #153

Open
lmendes14 opened this issue Apr 28, 2023 · 4 comments
Open

TweetyNet Missing Annotations #153

lmendes14 opened this issue Apr 28, 2023 · 4 comments

Comments

@lmendes14
Copy link

When running TweetyNet in PyHa through generate_automated_labels() from the IsoAutio module, some files result in neither any annotations nor in any errors.

We found this issue when running it on the BirdCLEF2023 training data.

We used the following parameters when running TN:

isolation_parameters_tweety = { "model" : "tweetynet", "tweety_output": True, "verbose" : True }

When we called generate_automated_labels(), we passed in the default parameters.

I've uploaded all of the .wav files that resulted in zero annotations or errors to the shared e4e google drive under the folder 'BirdCLEF2023_Missing_Files_Issue'.

@JacobGlennAyers
Copy link
Contributor

Did you use the spectrogram and local score array output function to see exactly what the local score arrays look like?

@JacobGlennAyers
Copy link
Contributor

Specifically, the spectrogram_visualization() function might help you get some insights. Keep in mind that this tweetynet model was trained on south american xeno-canto clips, so it is likely just missing African birds. Also, we have better datasets to train with since this model was trained, something that your team should consider looking into. Specifically, we have the data science team's annotations of near 2000 audio clips, we also have the dataset that was annotated by COSMOS students last summer. Sam should know where these are located.

@Sean1572
Copy link
Contributor

Sean1572 commented Apr 29, 2023

I recommended creating this issue since the system didn't throw an error for zero detections via zero division. If the model doesn't detect a bird, we should have an error thrown that the user sees, so a file that doesn't appear to have been processed is worth looking into.

As for tweetynet, wasn't it trained on bird-vox and another European bird dataset? Which south American datasets was tweetynet trained on?

@JacobGlennAyers
Copy link
Contributor

I think that the easiest way to accomplish this would be to have a parameter that could be a bit of a "failure report" and you could construct a set of the clip names as the clips are iterated through in generate_automated_labels(). Once you get the output dataframe, you can create a set from the FILE NAME column. You can use set theory to subtract the successful clips from all clips which would give you a list of failed clips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants