Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

FireballDWF · 2024-04-03T11:50:36Z

"Maximum input token count 4919 exceeds limit of 4096 for train data" in model-customization-job/amazon.titan-text-lite-v1:0:4k/nhjsh25oes0i in notebook 03_Model_customization/03_continued_pretraining_titan_text.ipynb

Fix Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb aws-samples#224

HiDhineshRaja · 2024-04-09T10:16:12Z

I too face the same issue. I tried to reduce the chunksize to 10000 even though getting the same error after training about 2 hours

jicowan · 2024-04-25T22:50:34Z

I am also getting this error.

nmudkey000 · 2024-05-10T16:14:32Z

I was able to fix the issue by reducing the chunk size and chunk overlap to 5000 and 1000, respectively; 5000 was an assumption, I am sure it would still create the model with anything below 10000 (someone above observed that the model would not get created for a chunk size equaling 10000):

text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size = 5000, # assumption
chunk_overlap = 1000, # overlap for continuity across chunks
)

docs = text_splitter.split_documents(document)

FireballDWF mentioned this issue Apr 3, 2024

fix Maximum input token count exceeds limit #224 FireballDWF/amazon-bedrock-workshop#1

Open

FireballDWF linked a pull request Apr 25, 2024 that will close this issue

fix Maximum input token count exceeds limit aws-samples#224 FireballDWF/amazon-bedrock-workshop#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

FireballDWF commented Apr 3, 2024

HiDhineshRaja commented Apr 9, 2024

jicowan commented Apr 25, 2024

nmudkey000 commented May 10, 2024 •

edited

Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb #224

Comments

FireballDWF commented Apr 3, 2024

HiDhineshRaja commented Apr 9, 2024

jicowan commented Apr 25, 2024

nmudkey000 commented May 10, 2024 • edited

nmudkey000 commented May 10, 2024 •

edited