Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Faroese language #1914

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

isolveit-aps
Copy link

I added initial support for Faroese language. Did the rules for most common phonetic rules that are in place for Faroese, and the ability to more easily make improvements to the phonemes, rules, and exceptions. Current status is early testing, but voice is mostly understandable at this point, and pronounces numbers by the correct rules now.

@jaacoppi
Copy link
Collaborator

jaacoppi commented Jun 9, 2024

Looks good. Things to do before approval:

  1. add a commit to update ChangeLog.md to include this new language
  2. update your branch to solve the "This branch is out-of-date with the base branch" warning.

@jaacoppi
Copy link
Collaborator

jaacoppi commented Jun 9, 2024

Also: include the language in the windows build files in src/windows as well

@isolveit-aps
Copy link
Author

Also: include the language in the windows build files in src/windows as well

Thank you, Juho, for the response.

Regarding the inclusion of the language in the Windows build files in src/windows, is there a guide or documentation available for this part? I'm assuming that I need to add my language to the src/windows/installer/Product.wxs file. Specifically, for the Guid value, should I generate a new GUID myself, or should I use a pre-existing value?

Additionally, I have added some new phonemes and the pronunciation of around 225,000 words in Faroese. Can I use this opportunity to include that update in the same pull request, or would you prefer that I create a separate pull request for these additions?

I haven't used git extensively, so I'm still getting familiar with the workflow and processes in the open-source environment. Any guidance you can provide would be greatly appreciated.

Thank you again for your help.

@isolveit-aps
Copy link
Author

I believe I have done the requested steps, to align the repo with the base branch, and I did adjust the changelog.
I added the language to the src/windows as well (but I'm not sure, as I didn't find instructions on how to do this).

@jaacoppi
Copy link
Collaborator

Now that the PR is ready for acceptance I ran the pipeline.

The following tests FAILED:
11 - language-phonemes (Failed)
You can run the test with: ./tests/language-phonemes.test
Let us know if you need help fixing or understanding it

The pipeline testlog says: 
 testing fo 

13c22fcd8aa140bd22e3299fdcc75b5b2c2308ca != 25a10409481c8874d4c0b9c46a70e185d0b5f40f
make: *** [Makefile:3150: tests/language-phonemes.check] Error 1
Error: Process completed with exit code 2.

All projects have their own processes and culture. The important part is to use the commit messages and PR description to explain what the commits do and why. You have done it well.

I think we don't have documentation for the windows installer. Just copying what other languages do should be ok.

As for the large word list, do you think it's necessary? Check what other languages have done; most have multiple rules and less word exceptions. On the other hand, for example Russian needs a large dictionary because of the way word stress is handled in Russian. I don't know how Faroese works. Might be easier to have general rules for 99% of the words and then just fix the exceptions.

@isolveit-aps
Copy link
Author

Now that the PR is ready for acceptance I ran the pipeline.

The following tests FAILED: 11 - language-phonemes (Failed) You can run the test with: ./tests/language-phonemes.test Let us know if you need help fixing or understanding it

The pipeline testlog says: 
 testing fo 

13c22fcd8aa140bd22e3299fdcc75b5b2c2308ca != 25a10409481c8874d4c0b9c46a70e185d0b5f40f make: *** [Makefile:3150: tests/language-phonemes.check] Error 1 Error: Process completed with exit code 2.

All projects have their own processes and culture. The important part is to use the commit messages and PR description to explain what the commits do and why. You have done it well.

I think we don't have documentation for the windows installer. Just copying what other languages do should be ok.

As for the large word list, do you think it's necessary? Check what other languages have done; most have multiple rules and less word exceptions. On the other hand, for example Russian needs a large dictionary because of the way word stress is handled in Russian. I don't know how Faroese works. Might be easier to have general rules for 99% of the words and then just fix the exceptions.

Thank you Juho, I forgot about the language-phonemes.test now that the pronunciations have been improved. I will change the computed value to match the new reality.

Regarding the phoneme dictionary, I was thinking that since this phoneme dictionary has been prepared by professional linguists for a 3½ year period from 2019-2022 (http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.495.pdf), I assumed that it would be of great value to add that to the fo_list, as it didn't seem to impact the performance of espeak-ng. The Faroese language does have grammar rules and many of those rules have exceptions. Since I'm not a linguist, I tried to make the rules match as good as I could get it, but with the phoneme dictionary, the speech improved pretty good. On the other hand, I would agree that it would be very good if we could just have rules that work for the 99% and make list for the exceptions. I think that this should also be the aim. Meanwhile, as I work on those improvements, I was hoping that this version would pass, as Faroese is somewhat neglected in the TTS world, and implementing a good voice in espeak would open many opportunities, instead of training TTS voices with norwegian or icelandic as the base for Faroese :)
Please let me know what you think about this.

Regarding the language-phonemes.test, this should just be an alteration of the checksum in the .test file, as the produced phonemes since the dictionary and added phonemes, no longer give the same checksum, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants