You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow epoch to be optionally used as the x-axis of training/eval charts for easier comparison between runs with different amounts of training pairs.
Motivation
I'm often experimenting with adding or removing subsets of training data, either at dataset creation time by adding or deleting categories of prompts, or via the sampling slider in the H2O LLM Studio UX. Currently, all training and eval loss and perplexity graphs (and any other metric graphs) use step as the x-axis, but it would be easier to compare training runs with different numbers of training pairs if you could switch the x-axis to epoch, that way two training runs X% of the way finished would line up. See the image below for an example of how it is hard to compare two training runs with dissimilar numbers of training pairs (here for the shorter run I removed various prompt templates teaching specific skills/tasks from the training dataset, as I want to see if that improves performance on one specific task I care about and which solely makes up the eval set).
The text was updated successfully, but these errors were encountered:
馃殌 Feature
Allow epoch to be optionally used as the x-axis of training/eval charts for easier comparison between runs with different amounts of training pairs.
Motivation
I'm often experimenting with adding or removing subsets of training data, either at dataset creation time by adding or deleting categories of prompts, or via the sampling slider in the H2O LLM Studio UX. Currently, all training and eval loss and perplexity graphs (and any other metric graphs) use step as the x-axis, but it would be easier to compare training runs with different numbers of training pairs if you could switch the x-axis to epoch, that way two training runs X% of the way finished would line up. See the image below for an example of how it is hard to compare two training runs with dissimilar numbers of training pairs (here for the shorter run I removed various prompt templates teaching specific skills/tasks from the training dataset, as I want to see if that improves performance on one specific task I care about and which solely makes up the eval set).
The text was updated successfully, but these errors were encountered: