Skip to content

Commit

Permalink
updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Christianfoley committed Dec 13, 2023
1 parent 83d967d commit 4216956
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 4 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ The arguments the run this are in the bottom of the script where the argparse ar

To generate the radar plots, we copy code from [this colab notebook by Lmsys](https://colab.research.google.com/drive/15O3Y8Rxq37PuMlArE291P4OC6ia37PQK#scrollTo=5i8R0l-XqkgO).

We provide a customized copy in this repo, run [generate_mt_bench_plots.ipynb](mt_bench/generate_mt_bench_plots.ipynb) to replicate the plots.
We provide a customized copy in this repo, run [generate_mt_bench_plots.ipynb](visualization_notebooks/generate_mt_bench_plots.ipynb) to replicate the plots.

## MT-Bench Results

Expand Down Expand Up @@ -193,4 +193,4 @@ To evaluate our model on MT-Bench do the following setup in you favorite python

## Training Curves, Hyper-Parameters, and Ablations

To replicate the training curves in the paper, run through [this ipynb](training_curves/create_training_visualizations.ipynb). The batching graphs were pulled from WandB, so the files are directly included in the same directory as the notebook.
To replicate the training curves in the paper, run through [this ipynb](visualization_notebooks/create_training_visualizations.ipynb). The batching graphs were pulled from WandB, so the files are directly included in the same directory as the notebook.
4 changes: 3 additions & 1 deletion visualization_notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@

- [profanity.ipynb](profanity.ipynb): An analysis of profanity usage statistics between different models. We find that further finetuning increases profanity usage, likely due to model forgetting of value alignment.
- [human_feedback.ipynb](human_feedback.ipynb): An analysis of our human feedback surveys. We find that humans vastly prefer our model outputs for rap, and are even for pop.
- [swift_LM.ipynb](swift_LM.ipynb): A data analysis & visualization of ground-truth n-gram frequency between baseline and Taylor Swift finetuned models. We find that lyre-swift (the model finetuned from lyre) tends to perform less plagarism.
- [swift_LM.ipynb](swift_LM.ipynb): A data analysis & visualization of ground-truth n-gram frequency between baseline and Taylor Swift finetuned models. We find that lyre-swift (the model finetuned from lyre) tends to perform less plagarism.
- [create_training_visualizations](create_training_visualizations.ipynb): Analysis notebooks for isualizing data from finetuning ablations and monitoring.
- [generate_mt_bench_plots.ipynb](generate_mt_bench_plots.ipynb): Analysis of task-specific catastrophic forgetting; figure generation from mt-bench.
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"import matplotlib.pyplot as plt\n",
"\n",
"import sys\n",
"sys.path.insert(0, \"../training_curves\")"
"sys.path.insert(0, \"../data/training_curves\")"
]
},
{
Expand Down

0 comments on commit 4216956

Please sign in to comment.