Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with custom word list #33

Open
ghost opened this issue May 30, 2022 · 3 comments
Open

Problem with custom word list #33

ghost opened this issue May 30, 2022 · 3 comments
Labels
bug Something isn't working words Word selection, word lists, language support, etc.

Comments

@ghost
Copy link

ghost commented May 30, 2022

I have i test file with 60 words in it, and when i run toipe -f words -n 10 there will words that occurs multiple times , also the words that are at the end of the text file don't even show up at all [the words at the first line or two occurs the most]

@Samyak2 Samyak2 added words Word selection, word lists, language support, etc. bug Something isn't working labels May 30, 2022
@Samyak2 Samyak2 changed the title Problem with custom prompt Problem with custom word list May 30, 2022
@Samyak2
Copy link
Owner

Samyak2 commented May 30, 2022

Hey @SamDc73. Could you check that your word list satisfies these assumptions?

also the words that are at the end of the text file don't even show up at all [the words at the first line or two occurs the most]

There's no bias given to words that occur at the beginning, a random word is chosen uniformly. It's possible that those words at the end of the file are being skipped because they don't satisfy the assumptions.

@ghost
Copy link
Author

ghost commented Jun 1, 2022

could you please (if you have the time of course ) test it ?
here the word list that I'm using : https://pastebin.com/0vP3XMKC
the command that I'm running is "toipe -f Notes/typing/words -n 10",
just run couple of times , it's very obvious that couple of words occur more than one time, and the words at the end barely show up

@Samyak2
Copy link
Owner

Samyak2 commented Jun 2, 2022

@SamDc73 thank you for posting the word list!

First, the word list was not sorted alphabetically. I had to sort it myself using (it needs to be sorted according to the 3rd assumption in the list):

LC_COLLATE=en_US.UTF-8 sort -d -o /path/to/wordlist /path/to/wordlist

Then, some of the words in the file had trailing spaces. I removed them using:

sed --in-place 's/[[:space:]]\+$//' /path/to/wordlist

(you can use any text editor to do it too, but this automated).

Here's the fixed word list for reference.
absolute
another
appreciated
appreciated
available
because
beginner
behavior
believe
competitive
criteria
currently
decisions
default
dependencies
description
differences
discovered
duplicates
dynamically
engineering
environmental
everything
extensions
favorite
financially
focused
guideline
inflation
league
navigate
obvious
occurring
parse
population
preference
prescription
prompt
question
robotics
separate
slight
theoretically
variable
visualization

After fixing the file, I found a bug in toipe. I was using complete word lists (which had words for all 26 letters), so the code assumed that too. Because of this, a bias was introduced. I will be fixing this by using a better word selection algorithm that doesn't need require as many assumptions and isn't so hyper optimized :)

Thank you again for bringing this to my notice and sending the word list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working words Word selection, word lists, language support, etc.
Projects
None yet
Development

No branches or pull requests

1 participant