Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

Open
FireballDWF opened this issue Apr 3, 2024 · 3 comments · May be fixed by FireballDWF/amazon-bedrock-workshop#3

Comments

@FireballDWF
Copy link
Contributor

"Maximum input token count 4919 exceeds limit of 4096 for train data" in model-customization-job/amazon.titan-text-lite-v1:0:4k/nhjsh25oes0i in notebook 03_Model_customization/03_continued_pretraining_titan_text.ipynb

FireballDWF added a commit to FireballDWF/amazon-bedrock-workshop that referenced this issue Apr 3, 2024
Fix Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb aws-samples#224
@HiDhineshRaja
Copy link

I too face the same issue. I tried to reduce the chunksize to 10000 even though getting the same error after training about 2 hours

@jicowan
Copy link

jicowan commented Apr 25, 2024

I am also getting this error.

@nmudkey000
Copy link

nmudkey000 commented May 10, 2024

I was able to fix the issue by reducing the chunk size and chunk overlap to 5000 and 1000, respectively; 5000 was an assumption, I am sure it would still create the model with anything below 10000 (someone above observed that the model would not get created for a chunk size equaling 10000):

text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size = 5000, # assumption
chunk_overlap = 1000, # overlap for continuity across chunks
)

docs = text_splitter.split_documents(document)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants