Skip to content

accraze/text2token

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

travis build Codecov version license semantic-release

text2token

is a nodejs module that breaks down a corpus of text into lines and tokens.

Install

$ npm install text2token

Usage

The module has one method: text2token, which returns an object that contains a list of each line in your text file, as well as a list of all unique tokens.

$ node
> 
> var lib = require('text2token');

> var converted = lib.text2token('./src/bigtext.txt')

> converted.tokens
  [ '©',
  '2015',
  'GitHub,',
  'Inc.',
  'Terms',
  'Privacy',
  'Security',
  ..........

> converted.lines

[ '© 2015 GitHub, Inc. Terms Privacy Security Contact Help',
  'Status API Training Shop Blog About Pricing',
  'The quick brown fox jumped over the lazy dog'
 .......

MIT License 2015-2016 © Andy Craze & Contributors