Skip to content

Latest commit

History

History
51 lines (38 loc) 路 3.32 KB

README.md

File metadata and controls

51 lines (38 loc) 路 3.32 KB

Running the Evaluation scripts

The pre-training script will perform evaluations if you set the do_eval argument to True and evaluation_strategy to step. However, you can also re-run evaluations, or run them separately, by using the evaluation.py script.

To speed up experiments, the evaluation.py scripts expects you to set a folder path where the dataset will be stored locally. The dataset folder must contain a list of parquet files, and you can achieve this by simply cloning the dataset from the hub to a local directory:

git lfs install
git clone https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-tokenized

Then, you should separate the dataset into train and test folders. Or you can modify the script and load the dataset like this. If the dataset is set to be saved in your cache folder, you will only need to download it once:

eval_dataset = load_dataset("nicholasKluge/Pt-Corpus-tokenized", split='test')

Note: Our scripts do not support streaming since much of the arithmetic behind the stipulation of the training uses the length of the dataloaders as a factor. If you want to allow streaming (recommended for larger datasets, but it results in a slower training when compared to having the dataset loaded in memory), you will need to modify how these calculations are made by, for example, hard coding the number of steps, examples in each training split, etc.

You can run this script like this:

python evaluation.py \
--logger_name "TeenyTinyLlama" \
--model_checkpoint_path "nicholasKluge/TeenyTinyLlama-460m" \
--revision "step100000" \
--attn_implementation "flash_attention_2" \
--per_device_eval_batch_size 16 \
--completed_steps 100000 \
--total_energy_consumption 3.34

These are the arguments you pass to this script:

Argument Description
logger_name The logger name
model_checkpoint_path Path to the model checkpoint to be used for evaluation
revision Specify the revision for the model (e.g., "step100000")
attn_implementation Specify the attention implementation for evaluation
per_device_eval_batch_size Set the batch size per device for evaluation
completed_steps Specify the number of completed training steps (e.g., 100000).
total_energy_consumption Specify the total energy consumption made thus far

Benchmark Evaluation

The lm-evaluation-harness-pt.ipynb notebook showcases how to evaluate a model on the Laiviet version of the LM-Evaluation-Harness. To run it, run the cells in the notebook in an environment with access to a GPU (e.g., Colab). Evaluation on Portuguese benchmarks are available in the New-EVAL folder.

Open In Colab