Skip to content

Commit

Permalink
Merge pull request #1320 from zc277584121/master
Browse files Browse the repository at this point in the history
refine the use cases
  • Loading branch information
jaelgu committed May 16, 2024
2 parents 2b37ba2 + bebbe72 commit 7daee87
Show file tree
Hide file tree
Showing 3 changed files with 104 additions and 91 deletions.
58 changes: 34 additions & 24 deletions bootcamp/tutorials/integration/rag_with_milvus_and_haystack.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"metadata": {},
"outputs": [],
"source": [
"# ! pip install --upgrade --quiet pymilvus milvus-haystack"
"! pip install --upgrade --quiet pymilvus milvus-haystack"
]
},
{
Expand Down Expand Up @@ -72,6 +72,7 @@
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"sk-***********\""
]
},
Expand All @@ -86,7 +87,7 @@
"source": [
"## Prepare the data\n",
"\n",
"We use a popular online [essay](https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt) to be used as private knowledge in our RAG, which is a good data source for a simple RAG pipeline.\n",
"We use an online content about [Leonardo Da Vinci](https://www.gutenberg.org/cache/epub/7785/pg7785.txt) as a store of private knowledge for our RAG pipeline, which is a good data source for a simple RAG pipeline.\n",
"\n",
"Download it and save it as a local text file."
]
Expand All @@ -97,16 +98,14 @@
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import os\n",
"import urllib.request\n",
"\n",
"url = \"https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt\"\n",
"file_path = \"./paul_graham_essay.txt\"\n",
"url = \"https://www.gutenberg.org/cache/epub/7785/pg7785.txt\"\n",
"file_path = \"./davinci.txt\"\n",
"\n",
"if not os.path.exists(file_path):\n",
" response = requests.get(url)\n",
" with open(file_path, \"wb\") as file:\n",
" file.write(response.content)"
" urllib.request.urlretrieve(url, file_path)"
]
},
{
Expand Down Expand Up @@ -141,18 +140,18 @@
"output_type": "stream",
"text": [
"Converting markdown files to Documents: 100%|█| 1/\n",
"Calculating embeddings: 100%|█| 12/12 [00:07<00:00\n",
"E20240515 20:52:54.001194 4751750 milvus_local.cpp:189] [SERVER][GetCollection][] Collecton HaystackCollection not existed\n",
"E20240515 20:52:54.001967 4751750 milvus_local.cpp:189] [SERVER][GetCollection][] Collecton HaystackCollection not existed\n",
"E20240515 20:52:54.002050 4751750 milvus_local.cpp:189] [SERVER][GetCollection][] Collecton HaystackCollection not existed\n",
"E20240515 20:52:54.002069 4751750 milvus_local.cpp:189] [SERVER][GetCollection][] Collecton HaystackCollection not existed\n"
"Calculating embeddings: 100%|█| 9/9 [00:05<00:00, \n",
"E20240516 10:40:32.945937 5309095 milvus_local.cpp:189] [SERVER][GetCollection][] Collecton HaystackCollection not existed\n",
"E20240516 10:40:32.946677 5309095 milvus_local.cpp:189] [SERVER][GetCollection][] Collecton HaystackCollection not existed\n",
"E20240516 10:40:32.946704 5309095 milvus_local.cpp:189] [SERVER][GetCollection][] Collecton HaystackCollection not existed\n",
"E20240516 10:40:32.946725 5309095 milvus_local.cpp:189] [SERVER][GetCollection][] Collecton HaystackCollection not existed\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of documents: 368\n"
"Number of documents: 277\n"
]
}
],
Expand Down Expand Up @@ -199,7 +198,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 8,
"metadata": {
"collapsed": false,
"jupyter": {
Expand All @@ -214,21 +213,32 @@
"name": "stdout",
"output_type": "stream",
"text": [
" I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he'd write 2 pages at a time and then print them out, but it was a lot better than a typewriter.\n",
"----------\n",
" They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n",
"\n",
"The first programs I tried writing were on the IBM 1401 that our school district used for what was then called \"data processing.\n",
"). The\n",
"composition of this oil-painting seems to have been built up on the\n",
"second cartoon, which he had made some eight years earlier, and which\n",
"was apparently taken to France in 1516 and ultimately lost.\n",
"----------\n",
"\n",
"This \"Baptism of Christ,\" which is now in the Accademia in Florence\n",
"and is in a bad state of preservation, appears to have been a\n",
"comparatively early work by Verrocchio, and to have been painted\n",
"in 1480-1482, when Leonardo would be about thirty years of age.\n",
"\n",
"So in the summer of 1995, after I submitted the camera-ready copy of ANSI Common Lisp to the publishers, we started trying to write software to build online stores. At first this was going to be normal desktop software, which in those days meant Windows software.\n",
"To about this period belongs the superb drawing of the \"Warrior,\" now\n",
"in the Malcolm Collection in the British Museum.\n",
"----------\n",
"\" Although he\n",
"completed the cartoon, the only part of the composition which he\n",
"eventually executed in colour was an incident in the foreground\n",
"which dealt with the \"Battle of the Standard.\" One of the many\n",
"supposed copies of a study of this mural painting now hangs on the\n",
"south-east staircase in the Victoria and Albert Museum.\n",
"----------\n"
]
}
],
"source": [
"question = \"what is the first programs the author tried writing?\"\n",
"question = 'Where is the painting \"Warrior\" currently stored?'\n",
"\n",
"retrieval_pipeline = Pipeline()\n",
"retrieval_pipeline.add_component(\"embedder\", OpenAITextEmbedder())\n",
Expand Down Expand Up @@ -258,7 +268,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 9,
"metadata": {
"collapsed": false,
"jupyter": {
Expand All @@ -273,7 +283,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"RAG answer: The first programs the author tried writing were on the IBM 1401 used by the school district for data processing.\n"
"RAG answer: The painting \"Warrior\" is currently stored in the Malcolm Collection in the British Museum.\n"
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
"metadata": {},
"outputs": [],
"source": [
"# ! pip install --upgrade --quiet langchain langchain-core langchain-community langchain-text-splitters langchain-milvus langchain-openai bs4"
"! pip install --upgrade --quiet langchain langchain-core langchain-community langchain-text-splitters langchain-milvus langchain-openai bs4"
]
},
{
Expand Down

0 comments on commit 7daee87

Please sign in to comment.