Skip to content

Commit

Permalink
fix Maximum input token count exceeds limit aws-samples#224
Browse files Browse the repository at this point in the history
Fix Maximum input token count 4919 exceeds limit of 4096 for train data in 03_Model_customization/03_continued_pretraining_titan_text.ipynb aws-samples#224
  • Loading branch information
FireballDWF committed Apr 3, 2024
1 parent 4910f7a commit e7a5c00
Showing 1 changed file with 2 additions and 2 deletions.
Expand Up @@ -268,8 +268,8 @@
"# - in our testing Character split works better with this PDF data set\n",
"text_splitter = RecursiveCharacterTextSplitter(\n",
" # Set a really small chunk size, just to show.\n",
" chunk_size = 20000, # 4096 tokens * 6 chars per token = 24,576 \n",
" chunk_overlap = 2000, # overlap for continuity across chunks\n",
" chunk_size = 4000, # when set to 20000, got error "Maximum input token count 4919 exceeds limit of 4096". Orginal comment was 4096 tokens * 6 chars per token = 24,576 \n",
" chunk_overlap = 1000, # overlap for continuity across chunks\n",
")\n",
"\n",
"docs = text_splitter.split_documents(document)"
Expand Down

0 comments on commit e7a5c00

Please sign in to comment.