Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different Translation on Yandex #79

Open
benjieperez opened this issue Mar 13, 2023 · 11 comments
Open

Different Translation on Yandex #79

benjieperez opened this issue Mar 13, 2023 · 11 comments

Comments

@benjieperez
Copy link

BTW Thanks for this library its such a nice library and suites to my on going project.

My question is I found out testing Yandex Translate on Live vs The Library Itself and it shows a different translation.

This is the text.

{
'original_text': 'And as surely as that, we will achieve the country all Filipinos deserve. God bless the Philippines, God bless our work. Maraming, maraming salamat po sa inyong lahat!',
'translated_text': 'Und so sicher werden wir das Land erreichen, das alle Filipinos verdienen. Gott segne die Philippinen, Gott segne unsere Arbeit. Marschieren, marschieren salamat po sa inyong lahat!'
}

image

But when I segment the full text into single sentences. It actually translate the same.

@ZhymabekRoman
Copy link
Contributor

That's strange, have you tried the same in the mobile application Yandex Translate? translatepy uses Android backend API.

@benjieperez
Copy link
Author

Yeah, I try the Yandex Translate using translatepy directly and it outputs the 'Und so sicher werden wir das Land erreichen, das alle Filipinos verdienen. Gott segne die Philippinen, Gott segne unsere Arbeit. Marschieren, marschieren salamat po sa inyong lahat!'. Different to the Yandex Translate on Browser.

It only occurs if the string is multilingual.

@benjieperez
Copy link
Author

I used a different user agent in yandex.py and it actually give me the right translation.

image
image

@benjieperez
Copy link
Author

Maybe a user agent randomizer will fix on this?

@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented Mar 13, 2023

Maybe a user agent randomizer will fix on this?

Hmmm, I think yes, we can fix it.

@Animenosekai, Can we rewrite the current user agent generation algorithm with a random user agent data information combination (OS, browser, version and etc.) instead of just getting a random user agent string in useragents?

@benjieperez
Copy link
Author

I have this simple user agent randomizer.

f'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.{random.randint(0, 9999)} Safari/537.{random.randint(0, 99)}'

Let me know if this would help, Thanks!

@Animenosekai
Copy link
Owner

woooow why didn't I get this notification sooner...

@Animenosekai, Can we rewrite the current user agent generation algorithm with a random user agent data information combination (OS, browser, version and etc.) instead of just getting a random user agent string in useragents?

Yup, this seems totally doable !

I don't have much time now, but whenever I get the time to 🍡

@ZhymabekRoman
Copy link
Contributor

So I did some research and came up with some interesting results. Original text: And as surely as that, we will achieve the country all Filipinos deserve. God bless the Philippines, God bless our work. Maraming, maraming salamat po sa inyong lahat! is in Tagalog. Translatepy recognises it as English. And the web version of Yandex Translate also detects it as English in the first lookup. In a second lookup Yandex web partially performs text language detection, in our case it's sending salamat po sa inyong lahat! to detect language, and it's returned as Tagalog. User-agent there is no common sense.

@Animenosekai
Copy link
Owner

So I did some research and came up with some interesting results. Original text: And as surely as that, we will achieve the country all Filipinos deserve. God bless the Philippines, God bless our work. Maraming, maraming salamat po sa inyong lahat! is in Tagalog. Translatepy recognises it as English. And the web version of Yandex Translate also detects it as English in the first lookup. In a second lookup Yandex web partially performs text language detection, in our case it's sending salamat po sa inyong lahat! to detect language, and it's returned as Tagalog. User-agent there is no common sense.

So they have a 2-pass system for language detection where they check the language a second time ??

@ZhymabekRoman
Copy link
Contributor

So they have a 2-pass system for language detection where they check the language a second time ??

Yes, I just checked the Android application, it also correctly detects the language of the text as Tagalog.

@ZhymabekRoman
Copy link
Contributor

Today, Yandex incorrectly detects the language of text. So translatepy has nothing to do with it:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants