Skip to content
This repository has been archived by the owner on Sep 4, 2019. It is now read-only.

tokenmill/ltlangpack

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lithuanian language processing tools to be used in NLP, search or other applications.

Sentence detection

Folder: sentence-detect

OpenNLP model for Lithuanian sentence detection.

Scripts to help with building the model:

  • add - append new text into the model (see comment inside the script)
  • train - build model based on example corpora
  • evaluate - evaluate detection quality

Snowball

Snowball version of Porter stemmer for Lithuanian language was moved to this page.

Language detection

Folder: language-detect

N-grams for Lithuanian language detection. Used in Apache Tika https://issues.apache.org/jira/browse/TIKA-582

License

Copyright (C) 2011 UAB TokenMill

Distributed under the Eclipse Public License.

About

Tools for Lithuanian language processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%