updated readme

Christianfoley · Dec 13, 2023 · 4216956 · 4216956
1 parent 83d967d
commit 4216956
Show file tree

Hide file tree

Showing 3 changed files with 6 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -155,7 +155,7 @@ The arguments the run this are in the bottom of the script where the argparse ar
 
 To generate the radar plots, we copy code from [this colab notebook by Lmsys](https://colab.research.google.com/drive/15O3Y8Rxq37PuMlArE291P4OC6ia37PQK#scrollTo=5i8R0l-XqkgO).
 
-We provide a customized copy in this repo, run [generate_mt_bench_plots.ipynb](mt_bench/generate_mt_bench_plots.ipynb) to replicate the plots.
+We provide a customized copy in this repo, run [generate_mt_bench_plots.ipynb](visualization_notebooks/generate_mt_bench_plots.ipynb) to replicate the plots.
 
 ## MT-Bench Results
 
@@ -193,4 +193,4 @@ To evaluate our model on MT-Bench do the following setup in you favorite python
 
 ## Training Curves, Hyper-Parameters, and Ablations
 
-To replicate the training curves in the paper, run through [this ipynb](training_curves/create_training_visualizations.ipynb). The batching graphs were pulled from WandB, so the files are directly included in the same directory as the notebook.
+To replicate the training curves in the paper, run through [this ipynb](visualization_notebooks/create_training_visualizations.ipynb). The batching graphs were pulled from WandB, so the files are directly included in the same directory as the notebook.
diff --git a/visualization_notebooks/README.md b/visualization_notebooks/README.md
@@ -2,4 +2,6 @@
 
 - [profanity.ipynb](profanity.ipynb): An analysis of profanity usage statistics between different models. We find that further finetuning increases profanity usage, likely due to model forgetting of value alignment.
 - [human_feedback.ipynb](human_feedback.ipynb): An analysis of our human feedback surveys. We find that humans vastly prefer our model outputs for rap, and are even for pop.
-- [swift_LM.ipynb](swift_LM.ipynb): A data analysis & visualization of ground-truth n-gram frequency between baseline and Taylor Swift finetuned models. We find that lyre-swift (the model finetuned from lyre) tends to perform less plagarism.
+- [swift_LM.ipynb](swift_LM.ipynb): A data analysis & visualization of ground-truth n-gram frequency between baseline and Taylor Swift finetuned models. We find that lyre-swift (the model finetuned from lyre) tends to perform less plagarism.
+- [create_training_visualizations](create_training_visualizations.ipynb): Analysis notebooks for isualizing data from finetuning ablations and monitoring.
+- [generate_mt_bench_plots.ipynb](generate_mt_bench_plots.ipynb): Analysis of task-specific catastrophic forgetting; figure generation from mt-bench.
diff --git a/visualization_notebooks/create_training_visualizations.ipynb b/visualization_notebooks/create_training_visualizations.ipynb
@@ -12,7 +12,7 @@
  "import matplotlib.pyplot as plt\n",
  "\n",
  "import sys\n",
- "sys.path.insert(0, \"../training_curves\")"
+ "sys.path.insert(0, \"../data/training_curves\")"
  ]
  },
  {