Skip to content

SeonbeomKim/Python-Byte_Pair_Encoding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python-Byte_Pair_Encoding

Byte Pair Encoding (BPE)

Env

  • Python 3
  • Numpy 1.15
  • tqdm
  • multiprocessing

Paper

Command

  • learn BPE from document
python bpe_learn.py 
	-train_path 1_document 2_document ... K_document
	-voca_out_path voca_path/voca_file_name
	-bpe_out_path 1_BPE_document 2_BPE_document ... K_BPE_document
	-train_voca_threshold 1 
	-num_merges 30000 
	-multi_proc=-1 (-1:use all process, 1:not use)
	-final_voca_size 30000 or -final_voca_threshold 50
  • apply BPE to document
python bpe_apply.py
	-data_path 1_document 2_document ... K_document
	-voca_path voca_path/voca_file_name
	-bpe_out_path 1_BPE_document 2_BPE_document ... K_BPE_document

Reference

Releases

No releases published

Packages

No packages published

Languages