Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

分词结果过滤单个字符 #16

Open
lyle-w opened this issue Mar 18, 2021 · 2 comments
Open

分词结果过滤单个字符 #16

lyle-w opened this issue Mar 18, 2021 · 2 comments

Comments

@lyle-w
Copy link

lyle-w commented Mar 18, 2021

请问分词结果怎么过滤单个字符呢?如果源词就只有一个字符那么就直接返回源词,如果原来的词是多个字符例如 “我是中国人”, 那么分词结果只保留 “我是中国人”, “我是”,“中国人”, “中国”,不再要“人”

@magese
Copy link
Owner

magese commented Mar 25, 2021

请问分词结果怎么过滤单个字符呢?如果源词就只有一个字符那么就直接返回源词,如果原来的词是多个字符例如 “我是中国人”, 那么分词结果只保留 “我是中国人”, “我是”,“中国人”, “中国”,不再要“人”

这个需求可以使用solr自带的 Length Filter 过滤器来实现。

示例如下:

<analyzer>
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.LengthFilterFactory" min="2" max="7"/>
</analyzer>
参数名 参数值 描述
min int 必填 指定最小的token长度
max int 必须大于min 指定最大的token长度

将该过滤器配置在 ik 分词器的过滤器列表里即可。

@lyle-w
Copy link
Author

lyle-w commented Mar 30, 2021

十分感谢🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants