New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224
Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224
Comments
Fix Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb aws-samples#224
I too face the same issue. I tried to reduce the chunksize to 10000 even though getting the same error after training about 2 hours |
I am also getting this error. |
I was able to fix the issue by reducing the chunk size and chunk overlap to 5000 and 1000, respectively; 5000 was an assumption, I am sure it would still create the model with anything below 10000 (someone above observed that the model would not get created for a chunk size equaling 10000): text_splitter = RecursiveCharacterTextSplitter( docs = text_splitter.split_documents(document) |
"Maximum input token count 4919 exceeds limit of 4096 for train data" in model-customization-job/amazon.titan-text-lite-v1:0:4k/nhjsh25oes0i in notebook 03_Model_customization/03_continued_pretraining_titan_text.ipynb
The text was updated successfully, but these errors were encountered: