Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chinese quotes treated as a single word #5358

Open
2 tasks done
extoplasm opened this issue May 3, 2024 · 29 comments
Open
2 tasks done

chinese quotes treated as a single word #5358

extoplasm opened this issue May 3, 2024 · 29 comments
Labels
bug Something isn't working waiting for update Pull requests or issues that require changes/comments before continuing

Comments

@extoplasm
Copy link
Contributor

Did you clear cache before opening an issue?

  • I have cleared my cache

Is there an existing issue for this?

  • I have searched the existing issues

Does the issue happen when logged in?

Yes

Does the issue happen when logged out?

Yes

Does the issue happen in incognito mode when logged in?

Yes

Does the issue happen in incognito mode when logged out?

Yes

Account name

extoplasm

Account config

{"theme":"alduin","themeLight":"serika","themeDark":"serika_dark","autoSwitchTheme":false,"customTheme":false,"customThemeColors":["#323437","#e2b714","#e2b714","#646669","#000000","#d1d0c5","#ca4754","#7e2a33","#ca4754","#7e2a33"],"favThemes":[],"showKeyTips":true,"smoothCaret":"medium","quickRestart":"off","punctuation":false,"numbers":false,"words":10,"time":60,"mode":"quote","quoteLength":[0],"language":"chinese_simplified","fontSize":1.5,"freedomMode":true,"difficulty":"normal","blindMode":false,"quickEnd":false,"caretStyle":"default","paceCaretStyle":"default","flipTestColors":false,"layout":"default","funbox":"none","confidenceMode":"off","indicateTypos":"off","timerStyle":"mini","liveSpeedStyle":"off","liveAccStyle":"off","liveBurstStyle":"off","colorfulMode":false,"randomTheme":"off","timerColor":"main","timerOpacity":"1","stopOnError":"off","showAllLines":false,"keymapMode":"off","keymapStyle":"staggered","keymapLegendStyle":"lowercase","keymapLayout":"qwerty","keymapShowTopRow":"layout","fontFamily":"JetBrains_Mono","smoothLineScroll":false,"alwaysShowDecimalPlaces":false,"alwaysShowWordsHistory":false,"singleListCommandLine":"manual","capsLockWarning":true,"playSoundOnError":"off","playSoundOnClick":"9","soundVolume":"1.0","startGraphsAtZero":true,"showOutOfFocusWarning":true,"paceCaret":"pb","paceCaretCustomSpeed":1,"repeatedPace":true,"accountChart":["on","on","on","on"],"minWpm":"off","minWpmCustomSpeed":100,"highlightMode":"letter","typingSpeedUnit":"wpm","ads":"result","hideExtraLetters":false,"strictSpace":false,"minAcc":"off","minAccCustom":90,"monkey":false,"repeatQuotes":"off","oppositeShiftMode":"off","customBackground":"","customBackgroundSize":"cover","customBackgroundFilter":[0,1,1,1,1],"customLayoutfluid":"qwerty#dvorak#colemak","monkeyPowerLevel":"off","minBurst":"off","minBurstCustomSpeed":100,"burstHeatmap":true,"britishEnglish":false,"lazyMode":false,"showAverage":"off","tapeMode":"off","maxLineWidth":0}

Current Behavior

image

when typing in chinese, entire quote is treated as one word -> whenever space is pressed the test finishes, also every quote is in the short category.

Expected Behavior

could count every character excluding punctuation as a word

Steps To Reproduce

  1. change language to chinese simplified
  2. go to quotes
  3. press space
  4. test finished

Environment

  • OS: Windows 10
  • Browser: Google Chrome
  • Browser Version: Version 124.0.6367.119 (Official Build) (64-bit)

Anything else?

No response

@extoplasm extoplasm added the bug Something isn't working label May 3, 2024
@faq0
Copy link
Contributor

faq0 commented May 3, 2024

If that can be fixed, I believe the spaces in the "words" section should be typed automatically too, as a sentence in Simplified Chinese does not include spaces.
e.g.: In "只有 出现 革命 存在 发生 方法…", users should not need to hit the spacebar before entering the next word.

@extoplasm
Copy link
Contributor Author

i reckon you keep the words the same, it’s good to separate the words

but just count every character in a sentence as a word in the quotes section

@Miodec
Copy link
Member

Miodec commented May 6, 2024

The characters used are full-width commas, and as @faq0 said, simplified chinese does not include spaces, so im not sure what should be done here.

@Miodec Miodec added the waiting for update Pull requests or issues that require changes/comments before continuing label May 6, 2024
@faq0
Copy link
Contributor

faq0 commented May 6, 2024

I believe there are some commonly used full-width punctuation marks in simplified Chinese, which can be set as an exception in the quote mode.
e.g.: Some of these include ",。!?“”:;《》—", have the unicode \uff0c\u3002\uff01\uff1f\u201c\u201d\uff1a\uff1b\u300a\u300b\u2014.

But for the zen or custom modes, they might need other rules as the punctuation marks are not limited to these characters.

However, I have noticed that, in fact, many Chinese typing practice websites do actually count symbols as a character, being calculated towards the WPM. That might be an easy way for that.
image

@extoplasm
Copy link
Contributor Author

the punctuation isn't an issue, there isn't much punctuation in the quotes anyways, i reckon you can count every character as a word and parse out the full width punctuation or change it into its english equivalent when counting the words although this would be rough to implement.

it's really up to you, but as a quasi-mandarin speaker this is just my suggestion.

@Miodec
Copy link
Member

Miodec commented May 8, 2024

So, whats the solution? Because if you want to add spaces you would need to edit the quotes themselves.

@extoplasm
Copy link
Contributor Author

wdym, i’m saying we count each character as a word, as mandarin doesn’t follow the rule that each word is separated by spaces. eg. “猴子打字” (monkey type lol) counted as 4 separate words

@extoplasm
Copy link
Contributor Author

also if we add spaces it wouldn’t be accurate, not sure how the word counting works but a special case can be added to split the characters differently (removing the punctuation before of course)

@Miodec
Copy link
Member

Miodec commented May 9, 2024

So, this should be the case for all chinese text, not just quotes right.

Is this because you need multiple keypresses per character? Maybe we can count each keypress as a character, instead of each character as a word.

@faq0
Copy link
Contributor

faq0 commented May 9, 2024

So, this should be the case for all chinese text, not just quotes right.

Yes.

Maybe we can count each keypress as a character, instead of each character as a word.

This would be good in most cases, but I believe that could be the way to calculate the speed, not the accuracy.
In fact, there are mutliple typing methods in Simplified Chinese that might result in different number of keystrokes.

e.g.: For an example quote "我能吞下玻璃而不伤身体",
In Full Pinyin, it would be "wonengtunxiabolierbushangshenti" (31 chars).
In Double Pinyin, it would be "wongtpxwboliorbuuhufti" (2 keys/word, total 22 chars).
For Wubi, that would be 4 keys/word, total 44 chars. But in this case, there is lower amount of time needed to select the desired Chinese characted in the candidate window.

@extoplasm
Copy link
Contributor Author

yes i agree with faq0 on the speed calculation part but the main issue is that in the quotes the entire sentence is counted as one word, i’m suggesting that we split the quote by character instead of by space as when someone presses space the test ends and the progress is inaccurate

@Miodec
Copy link
Member

Miodec commented May 13, 2024

yes i agree with faq0 on the speed calculation part but the main issue is that in the quotes the entire sentence is counted as one word, i’m suggesting that we split the quote by character instead of by space as when someone presses space the test ends and the progress is inaccurate

If you split by character then the website will require you to press space between every chracter. When you type quotes normally, when do you press space? (not on monkeytype).

@extoplasm
Copy link
Contributor Author

in chinese there is no such thing as a space lol
if its like that then there might not be an easy solution
perhaps make a special case??? because im like 50% sure its the same for any asian language, this could be good if adding quotes for other languages

@faq0
Copy link
Contributor

faq0 commented May 13, 2024

If you split by character then the website will require you to press space between every chracter. When you type quotes normally, when do you press space? (not on monkeytype).

We might not press space for every character.
In fact, there is a candidate window (IME window) to choose from a list of characters.

We might not press the space key.
If I want to type the character**"我"** in Full Pinyin, that would be:
What I type: w o <spacebar>.
In this case, the candidate window will be (Microsoft Pinyin IME as an example):
image
I have to select one of the desired character in the candidate list, whereas "1" = "我", "2" = "喔", etc.. I can also press the spacebar as an alternative to select the first option (the spacebar is more commonly used than "1" when selecting the first option).

We might not press the key for every character.
In a longer sentence, such as "我能吞下玻璃而不伤身体", I can type the sentence at once. In Full Pinyin, this would be:
What I type: w o n e n g t u n x i a b o l i e r b u s h a n g s h e n t i <spacebar>.
image
It is lucky that in this case, my desired sentence is at the first place. I can press spacebar.
However, if that isn't the case. I may have to select each character (or word) one by one, divided using the apostrophe shown in the IME. For example,
image

This means that there are many ways to type a sentence, with some of them not containing a spacebar keystroke. I believe that monkeytype should just detect the number of keystrokes when a character itself is typed.

@Miodec
Copy link
Member

Miodec commented May 13, 2024

in chinese there is no such thing as a space lol if its like that then there might not be an easy solution perhaps make a special case??? because im like 50% sure its the same for any asian language, this could be good if adding quotes for other languages

What if i just disable space then? Monkeytype wont try to "move to the next word" because there would be no "next word" and that "moving to the next word" wont even be triggered by the space. The only thing the space would be doing is interacting with the input manager, like it already does.

@extoplasm
Copy link
Contributor Author

extoplasm commented May 13, 2024

I believe that monkeytype should just detect the number of keystrokes when a character itself is typed.

what does this mean?

@extoplasm
Copy link
Contributor Author

What if i just disable space then? Monkeytype wont try to "move to the next word" because there would be no "next word" and that "moving to the next word" wont even be triggered by the space. The only thing the space would be doing is interacting with the input manager, like it already does.

this should be good enough haha

@faq0
Copy link
Contributor

faq0 commented May 13, 2024

what does this mean?

Keystroke per second is calculated based on the number of keystrokes, which will be shown on the final speed chart, while the accuracy and WPM is calculated based on the typed Chinese characters per second.

@faq0
Copy link
Contributor

faq0 commented May 13, 2024

What if i just disable space then? Monkeytype wont try to "move to the next word" because there would be no "next word" and that "moving to the next word" wont even be triggered by the space. The only thing the space would be doing is interacting with the input manager, like it already does.

This should be a good idea, as long as it can deal with the speed and accuracy correctly.

@extoplasm
Copy link
Contributor Author

another problem might be that 1 misspelt character results in the test being unable to finish, as when u disable space, it will stop the test from force finishing as monkeytype does not let you finish on a misspelt word.

@extoplasm
Copy link
Contributor Author

im pretty sure you have to both split quote by character and disable spaces

@extoplasm
Copy link
Contributor Author

i've done some thinking and this problem is present in nearly all text input based websites:

here

For Chinese and Japanese, WorldServer has a special way to count words. Each character is considered a word. For these languages we are, effectively, counting characters. When a user sees "Words" in the WorldServer UI (for example, in scoping) for Chinese and Japanese source languages it actually means "Characters".

https://docs.rws.com/791662/251856/sdl-worldserver-11-0-1/word-counting-algorithm

the best way, imo, is to count every character as a word, remove "spaces" when presenting input to user, and auto-nextword when they type a character

is there a way to auto-nextword?

where is the code to handle next words in the file system?

@Miodec
Copy link
Member

Miodec commented May 29, 2024

@extoplasm Which languages should use this per character way of calculating speed?

@Miodec
Copy link
Member

Miodec commented May 29, 2024

Also, are the calculated speeds accurate if you just change the typing speed unit to cpm in the settings?

@extoplasm
Copy link
Contributor Author

@extoplasm Which languages should use this per character way of calculating speed?

japanese and chinese off the top of my head

Also, are the calculated speeds accurate if you just change the typing speed unit to cpm in the settings?

not sure can’t test rn i’m not at home

@faq0
Copy link
Contributor

faq0 commented May 30, 2024

Also, are the calculated speeds accurate if you just change the typing speed unit to cpm in the settings?

They are accurate (I don't know if the data is accurate or not, but they indeed work) in the "words" section, but the results cannot be uploaded due to "Result data doesn't make sense" after multiple attempts.

image
image

The speed calculation doesn't even work in quotes.

When 1 character is mistyped, it will not auto proceed to the completion page (for a quote that shows as 1 total word). I have to press spacebar manually and it shows a CPM of 0, but with an accuracy of 95%.
image

When no characters are typed wrongly, it will still show the "Result data doesn't make sense" error.
image
image

Plus, I've noticed some wrongly written characters in the quotes section. How do I report these?

@extoplasm
Copy link
Contributor Author

Plus, I've noticed some wrongly written characters in the quotes section. How do I report these?

is that my bad... oops
you don't need to report this just make a PR

@faq0
Copy link
Contributor

faq0 commented May 31, 2024

is that my bad... oops
you don't need to report this just make a PR

PR added. Added some quotes as well. #5465

@Miodec
Copy link
Member

Miodec commented Jun 3, 2024

Looking at the data, it looks like you're reporting less keypresses than characters typed. Looks like the input system is eating up some of the keypress events (which seems to be the same issue as someone else just opened with Korean typing..)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting for update Pull requests or issues that require changes/comments before continuing
Projects
None yet
Development

No branches or pull requests

3 participants