Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

part-of-speech virtually always "Unknown" when creating Anki vocabulary notes #988

Open
mlidbom opened this issue May 22, 2024 · 8 comments
Labels
kind/bug The issue or PR is regarding a bug

Comments

@mlidbom
Copy link

mlidbom commented May 22, 2024

Description
with the jitendex dictionary installed, find the word 緩り and add it to Anki. In spite of the word having the part-of-speech "noun" the field in the Anki note will show "Unknown". The same goes for virtually all words I've added. I think, maybe, it's been populated in some rare cases, but I'm not sure.

Browser version
Latest Edge

Yomitan version
24.5.14.1

Exported settings file
yomitan-settings-2024-05-22-21-31-26.json

@mlidbom mlidbom added the kind/bug The issue or PR is regarding a bug label May 22, 2024
@stephenmk
Copy link

I wrote about this two years ago.

[T]he part-of-speech field only contains a limited and modified subset of [the part-of-speech tags]. [...] these values are used behind-the-scenes for de-conjugating words into their dictionary forms so that they may be queried by yomichan. Part-of-speech tags that are not used for de-conjugation are not added to this part-of-speech list. [...]

[...] I don't think this {part-of-speech} handlebar should even exist. All of this information already exists in a complete form within the {glossary} field. The part-of-speech of a given word can also vary depending on the sense in which it is used. For example, 亜 can be a prefix or a noun.

The part-of-speech handlebar shouldn't even exist. That information is only used by yomitan for deinflecting words that may be inflected (adjectives and verbs). There's no need to display it to end users.

@mlidbom
Copy link
Author

mlidbom commented May 22, 2024

I make heavy use of the part-of-speech tagging when studying. It's important to me. Parsable metadata in a field is very different from the information in principle being present in another field with reams of text.

To me the difference is vital, since the metadata is parsed by the anki addon that I'm developing and is used in many places to show abbreviated versions of the vocabulary information.

So, as is, I simply have to manually type it in for every word I add to Anki. This is a real pain.

@stephenmk
Copy link

The part-of-speech field within yomitan dictionaries exists to provide deinflection information to the yomitan parser. It was never intended to provide a parsable metadata field containing all of the part-of-speech information for a particular entry. (It wouldn't make a lot of sense to create such a field because different senses within a particular entry may contain different part-of-speech tags).

The reason why the handlebar produces "Unknown" so often is because the information simply isn't provided by the dictionary files. So this aspect is not a bug with yomitan. What you are requesting is really a new feature to assist the anki addon you're developing.

@mlidbom
Copy link
Author

mlidbom commented May 22, 2024

I think the field should be populated with he union of all the part-of-speech tags from the senses. It seems to be the only thing that makes sense for a field that represents the entry as a whole and it is exactly what I need and what I expect most people need. Just to know, without reading through every sense which takes far more time, which word types this can be used as. It makes perfect sense to me to have such a field.

If, for some reason, this is unacceptable, then I agree that it would be better to remove support entirely than to have it populate "Unknown" all the time. But really, if anyone doesn't want this information in a separate field, they don't have to use the field. Some, me included, find it very helpful and would very much like it to be properly populated. I really don't see a downside to fixing it so that it is populated.

@Kuuuube
Copy link
Member

Kuuuube commented May 22, 2024

Just tested a few words and this handlebar appears to work fine for what it does. I'm getting outputs like Ichidan verb or Godan verb. But you won't ever get noun as an output though. That's just how it works.

I do think this isn't very good UX to have and I agree with stephen that it shouldn't exist.

As for what you're requesting here... The data doesn't exist to give you that output. Unfortunately Yomitan is not magic.

@Kuuuube
Copy link
Member

Kuuuube commented May 22, 2024

If you just want a list of all the dictionary "tags" that are within each gloss that might be possible. I'm not super familiar with the dictionary format to say for sure if that's reasonable. Unsure if these are custom defined by jitendex or if it's a standard thing we can pull out.

@mlidbom
Copy link
Author

mlidbom commented May 22, 2024

If you just want a list of all the dictionary "tags" that are within each gloss that might be possible. I'm not super familiar with the dictionary format to say for sure if that's reasonable. Unsure if these are custom defined by jitendex or if it's a standard thing we can pull out.

That sounds good to me. I'm guessing that the worst that could happen is that some tags I don't care about come along for the ride. I would love it if this was implemented.

But if these "tags" can contain entries that are not POS information, then perhaps renaming the field would be a good idea....

@stephenmk
Copy link

For the record, I'm not convinced that feature (the union set of all PoS tags) would be useful and I'm not interested in adding it to jitendex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug The issue or PR is regarding a bug
Projects
None yet
Development

No branches or pull requests

3 participants