distilling

Experiments with distilling large language models.

alpha values

alpha_ce : Linear weight for KLDivLoss between student and teacher token classification outputs.
alpha_clm : Linear weight for CrossEntropyLoss between stduent token classification outputs and the labels.
alpha_mse : Linear weight for MSELoss between student and teacher token classification logits.
alpha_cos : Linear weight for CosineEmbeddingLoss between student and teacher last layers hidden_states.(works only when the model hidden_size of student and teacher are same)

STEPS

Use create_data.py (for small dataset) to create a pickle of input_ids list or use create_data_for_large_dataset.py (for large dataset) to create shards of npy files.
Use run_distillation.py to start distillation training. For training on large dataset set --train_on_large_dataset param.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_data.py		create_data.py
create_data_for_large_dataset.py		create_data_for_large_dataset.py
distillGPTNeoConfig.json		distillGPTNeoConfig.json
distiller.py		distiller.py
grouped_batch_sampler.py		grouped_batch_sampler.py
lazy_load_lm_seqs_dataset.py		lazy_load_lm_seqs_dataset.py
lm_seqs_dataset.py		lm_seqs_dataset.py
requirements.txt		requirements.txt
run_distillation.py		run_distillation.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

create_data.py

create_data.py

create_data_for_large_dataset.py

create_data_for_large_dataset.py

distillGPTNeoConfig.json

distillGPTNeoConfig.json

distiller.py

distiller.py

grouped_batch_sampler.py

grouped_batch_sampler.py

lazy_load_lm_seqs_dataset.py

lazy_load_lm_seqs_dataset.py

lm_seqs_dataset.py

lm_seqs_dataset.py

requirements.txt

requirements.txt

run_distillation.py

run_distillation.py

utils.py

utils.py

Repository files navigation

distilling

alpha values

STEPS

About

Releases

Packages

Languages

License

kinoc/distilling

Folders and files

Latest commit

History

Repository files navigation

distilling

alpha values

STEPS

About

Resources

License

Stars

Watchers

Forks

Languages