Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language detection glitches in video classifier #58

Open
mgdigital opened this issue Nov 10, 2023 · 1 comment
Open

Language detection glitches in video classifier #58

mgdigital opened this issue Nov 10, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@mgdigital
Copy link
Collaborator

mgdigital commented Nov 10, 2023

Describe the bug
We get quite a few incorrect language detections from the video classifier (especially for TV shows) due to how it looks for 3-letter ISO language codes in the section after the episode number.

To Reproduce
If you have "Series Name EP05E01 Episode Name Includes San Francisco" then this gets detected as being Sanskrit, or something with "Mac" in the episode name would be detected as Macedonian. Incidentally I'm not sure if Sanskrit should even be a possibility here as I think it's an ancient written language...?

Expected behavior
3-letter language codes should not be confused with 3 letter words that happen to be a language code. Ideally this should not be at the cost of missing genuine language codes. Perhaps something is needed to detect the episode name part though it could be tricky to do this reliably.

@mgdigital mgdigital added the bug Something isn't working label Nov 10, 2023
@nilsherzig
Copy link

The TMDB API provides endpoints for alternative and country dependent titles for shows and movies. It might be possible to reconstruct the language of the file with this information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants