Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support accessing more fields in mecab node #84

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

sophiefy
Copy link

@sophiefy sophiefy commented Sep 22, 2023

This pr may resolve #76.

Here's an example:

from fugashi import Tagger

tagger = Tagger()

text = "麩菓子は、麩を主材料とした日本の菓子。"

node_list = tagger.parseToNodeList(text)

print("surface\twcost\tcost")

for node in node_list:
  surface = node.surface
  wcost = node.wcost
  cost = node.cost

  print("{}\t{}\t{}".format(surface, wcost, cost))

output:

surface	wcost	cost
麩	5477	12007
菓子	3841	19459
は	-904	19947
、	-7508	16317
麩	5727	26362
を	-1331	24371
主材	5114	33181
料	5720	38627
と	-286	41321
し	2517	45155
た	3045	46329
日本	-2903	47649
の	-294	48738
菓子	3841	55480
。	-3217	55216

@sophiefy
Copy link
Author

I just modified parseToNodeList() and nbestToNodeList() to keep BOS/EOS in the node list. Because I think they could be useful in visualizing and analyzing the results.

fugashi

@polm
Copy link
Owner

polm commented Sep 23, 2023

Thank you for the PR. The code to add the fields looks fine, though there should be tests for it, even trivial ones. I can add them if you're not sure how to.

For the BOS/EOS nodes, returning those by default would break existing code that expects them to be removed and is not OK. We can put the functionality behind a parameter that is off by default.

@sophiefy
Copy link
Author

sophiefy commented Sep 23, 2023

though there should be tests for it, even trivial ones. I can add them if you're not sure how to.

I'm not sure how to add tests so it would be great if you could demonstrate that:)

For the BOS/EOS nodes, returning those by default would break existing code that expects them to be removed and is not OK. We can put the functionality behind a parameter that is off by default.

I just added a strip parameter to parseToNodeList() and nbestToNodeList(). It's True by default, meaning stripping BOS/EOS nodes.

@polm
Copy link
Owner

polm commented Apr 15, 2024

Apologies for taking so long to get to this, but I've added some tests. I am still not sure about the BOS/EOS thing, especially giving them surfaces, so I'll think about it a little more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add access to more Node fields
2 participants