Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update gpt-3.5-turbo token limit in Chat_finetuning_data_prep.ipynb #1178

Merged
merged 4 commits into from
Jun 7, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 5 additions & 5 deletions examples/Chat_finetuning_data_prep.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@
"2. **Number of Messages Per Example**: Summarizes the distribution of the number of messages in each conversation, providing insight into dialogue complexity.\n",
"3. **Total Tokens Per Example**: Calculates and summarizes the distribution of the total number of tokens in each conversation. Important for understanding fine-tuning costs.\n",
"4. **Tokens in Assistant's Messages**: Calculates the number of tokens in the assistant's messages per conversation and summarizes this distribution. Useful for understanding the assistant's verbosity.\n",
"5. **Token Limit Warnings**: Checks if any examples exceed the maximum token limit (4096 tokens), as such examples will be truncated during fine-tuning, potentially resulting in data loss.\n"
"5. **Token Limit Warnings**: Checks if any examples exceed the maximum token limit (16,385 tokens), as such examples will be truncated during fine-tuning, potentially resulting in data loss.\n"
]
},
{
Expand Down Expand Up @@ -240,7 +240,7 @@
"mean / median: 1610.2, 10.0\n",
"p5 / p95: 6.0, 4811.200000000001\n",
"\n",
"1 examples may be over the 4096 token limit, they will be truncated during fine-tuning\n"
"0 examples may be over the 16,385 token limit, they will be truncated during fine-tuning\n"
]
}
],
Expand All @@ -267,8 +267,8 @@
"print_distribution(n_messages, \"num_messages_per_example\")\n",
"print_distribution(convo_lens, \"num_total_tokens_per_example\")\n",
"print_distribution(assistant_message_lens, \"num_assistant_tokens_per_example\")\n",
"n_too_long = sum(l > 4096 for l in convo_lens)\n",
"print(f\"\\n{n_too_long} examples may be over the 4096 token limit, they will be truncated during fine-tuning\")"
"n_too_long = sum(l > 16,385 for l in convo_lens)\n",
"print(f\"\\n{n_too_long} examples may be over the 16,385 token limit, they will be truncated during fine-tuning\")"
]
},
{
Expand Down Expand Up @@ -300,7 +300,7 @@
],
"source": [
"# Pricing and default n_epochs estimate\n",
"MAX_TOKENS_PER_EXAMPLE = 4096\n",
"MAX_TOKENS_PER_EXAMPLE = 16385\n",
"\n",
"TARGET_EPOCHS = 3\n",
"MIN_TARGET_EXAMPLES = 100\n",
Expand Down