Overview

This project has code to demonstrate how statistical backoff language
models are trained and applied. ARPA-format language models can be
loaded and used, and new language models can be trained with a version
of the Kneser-Ney language model training algorithm. A document
explaining the theoretical and mathematical aspects of backoff
language models and Kneser-Ney training is included in the doc/
directory.

This code should not be used for actual language model training; use
SRILM instead. This code may be useful as an educational
aid towards understanding language modeling and aspects of the
Kneser-Ney approach that aren't clear in the literature. Despite these
caveats, the light testing included in this project indicates that the
perplexity of the models trained by this code is at least as good as
the perplexity of models generated by SRILM.

Continued development will be aimed at correcting bugs, clarifying
fundamental concepts in the code and documentation, and improving the
efficiency of the current algorithms. In particular, creating a
Kneser-Ney backoff language model is extremely inefficient, and I'd
like to fix and document a more efficient method.

Brian Romanowski
[email protected]

Dependencies

This code requires the ItemCounter project to be added to the build path.

License

This code is licensed under one of the BSD variants, please see
LICENSE.txt for full details. The language model document is released
under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.settings		.settings
doc		doc
performance/com/pwnetics/performance		performance/com/pwnetics/performance
src/com/pwnetics/languagemodel		src/com/pwnetics/languagemodel
test/com/pwnetics/languagemodel		test/com/pwnetics/languagemodel
testData		testData
.classpath		.classpath
.gitignore		.gitignore
.project		.project
LICENSE.txt		LICENSE.txt
README.md		README.md
notes.txt		notes.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.settings

.settings

doc

doc

performance/com/pwnetics/performance

performance/com/pwnetics/performance

src/com/pwnetics/languagemodel

src/com/pwnetics/languagemodel

test/com/pwnetics/languagemodel

test/com/pwnetics/languagemodel

testData

testData

.classpath

.classpath

.gitignore

.gitignore

.project

.project

LICENSE.txt

LICENSE.txt

README.md

README.md

notes.txt

notes.txt

Repository files navigation

Overview

Dependencies

License

About

Releases

Packages

Languages

License

romanows/LanguageModeling

Folders and files

Latest commit

History

Repository files navigation

Overview

Dependencies

License

About

Resources

License

Stars

Watchers

Forks

Languages