Skip to content

Simple chinese word segmentation with experiments on the PKU datatset

Notifications You must be signed in to change notification settings

yihong-chen/chinese-word-segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

chinese-word-segmentation

Simple chinese word segmentation with experiments on the PKU datatset

Methods

  • Pattern based word segmentation
  • CRF ++ tagging
  • LSTM tagging

Performance

F1

  • Pattern Based Segmentation: 0.87
  • CRF++ Tagging: 0.93
  • LSTM Tagging: 0.86

It seems that the simple LSTM tagger doesn't perform better than CRF++ or even pattern based segmentation.

Tips for improve the performance of the LSTM tagger on the segementation task

About

Simple chinese word segmentation with experiments on the PKU datatset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published