diff --git a/guide/14-deep-learning/use_mistral_llm_for_text_classification_and_entity_recognition.ipynb b/guide/14-deep-learning/use_mistral_llm_for_text_classification_and_entity_recognition.ipynb new file mode 100644 index 000000000..724913dc9 --- /dev/null +++ b/guide/14-deep-learning/use_mistral_llm_for_text_classification_and_entity_recognition.ipynb @@ -0,0 +1,615 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Use Mistral LLM for Text Classification and Entity Recognition" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "toc": true + }, + "source": [ + "

Table of Content

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Introduction to Mistral model " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Mistral 7B is a decoder-based language model trained using almost 7 billion parameters designed to deliver both efficiency and high performance for real-world applications.\n", + "\n", + "Employing attention mechanisms like Sliding Window Attention, Mistral 7B can train with an 8k context length and a fixed cache size, resulting in a theoretical attention span of 128K tokens. This capability allows the model to focus on crucial parts of the text efficiently. Moreover, the model incorporates Grouped Query Attention (GQA) to accelerate inference and reduce cache size, thereby expediting its inference process. Additionally, its Byte-fallback tokenizer ensures consistent representation of characters, eliminating the need for out-of-vocabulary tokens.\n", + "\n", + "Such design features in its architecture equip Mistral 7B for exceptional performance, particularly in tasks related to language comprehension and generation. In this guide we see how we can use the Mistral LLM for text classification and named entity recognition." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Mistral Implementation in arcgis.learn " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install the model backbone " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Follow these steps to download and install the Mistral model backbone:\n", + "\n", + "1. Download the mistral model backbone.\n", + "\n", + "2. Extract the downloaded zip file.\n", + "\n", + "3. Open the anaconda prompt and move to the folder that contains arcgis_mistral_backbone-1.0.0-py_0.tar.bz2\n", + "\n", + "4. Run:\n", + "\n", + " ```conda install --offline arcgis_mistral_backbone-1.0.0-py_0.tar.bz2```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Mistral with the TextClassifier model " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Import the TextClassifier class from the arcgis.learn.text module " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from arcgis.learn.text import TextClassifier" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize the TextClassifier model with a databunch " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Prepare databunch for the TextClassifier model using theprepare_textdata method in arcgis.learn." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from arcgis.learn import prepare_textdata\n", + "data = prepare_textdata(\"path_to_data_folder\", task=\"classification\",train_file=\"input_csv_file.csv\",\n", + " text_columns=\"text_column\", label_columns=\"label_column\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once the data is prepared, the TextClassifier model object can be instantiated as below with the following parameters:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "data: The databunch created using the prepare_textdata method.\n", + "\n", + "backbone: To use mistral as the model backbone, use backbone=\"mistral\".\n", + "\n", + "prompt: Text string describing the task and its guardrails. This is an optional parameter." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "classifier_model = TextClassifier(\n", + " data=data,\n", + " backbone=\"mistral\",\n", + " prompt=\"Classify all the input sentences into the defined labels, do not make up your own labels.\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize the TextClassifier model without a databunch " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A TextClassifier model with a mistral backbone can also be created without a large dataset using only a few examples." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Below are the parameters to be passed into TextClassifier:\n", + "\n", + "backbone: To use mistral as the model backbone, use backbone=\"mistral\".\n", + "\n", + "examples: User defined examples to provide the mistral model, in python dictionary format:\n", + "```\n", + "{\n", + " \"label_1\" :[\"input_text_example_1\", \"input_text_example_2\", ...],\n", + " \"label_2\" :[\"input_text_example_1\", \"input_text_example_2\", ...],\n", + " ...\n", + "}\n", + "```\n", + " \n", + "prompt: Text string describing the task and its guardrails. This is an optional parameter.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "classifier_model = TextClassifier(\n", + " data=None,\n", + " backbone=\"mistral\",\n", + " prompt=\"Classify all the input sentences into the defined labels, do not make up your own labels.\",\n", + " examples={\n", + " \"positive\" : [\" Good! it was a wonderful experience!\", \"i really adore your work\"],\n", + " \"negative\" : [\"The customer support was unhelpful\", \"I don`t like your work\"]\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Classify the text using mistral model " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To classify text using the mistral model, use the predict method from the TextClassifier class. The input to the method will be a text string or a list of text string." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "classifier_model.predict()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load the model " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To load a saved mistral model, use the from_model method from the TextClassifier class." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "classifier_model.from_model(r'path_to_dlpk_file')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save the model " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following method saves the model weights and creates a Deep Learning Package (.dlpk)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "classifier_model.save(\"name_of_the_mistral_model\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Mistral with an EntityRecognizer model " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Import the EntityRecognizer class from the arcgis.learn.text module " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from arcgis.learn.text import EntityRecognizer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize the EntityRecognizer model with a databunch " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Prepare the databunch for the EntityRecognizer model using the prepare_textdata method in arcgis.learn." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from arcgis.learn import prepare_textdata\n", + "data = prepare_textdata(\"path_to_data_file\", task=\"entity_recognition\", dataset_type='ner_json')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once the data is prepared, the EntityRecognizer model object can be instantiated with the following parameters:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "data: The databunch created using the prepare_textdata method.\n", + "\n", + "backbone: To use mistral as the model backbone, use backbone=\"mistral\".\n", + "\n", + "prompt: Text string describing the task and its guardrails. This is an optional parameter." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "entity_recognizer_model = EntityRecognizer(\n", + " data=data,\n", + " backbone=\"mistral\",\n", + " prompt=\"Tag the input sentences in the named entity for the given classes, no other class should be tagged.\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize the EntityRecognizer model without a databunch " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An EntityRecognizer model with a mistral backbone can also be created without a large dataset by using only a few examples." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Below are the parameters to be passed into EntityRecognizer :\n", + "\n", + "backbone: To use mistral as the model backbone, use backbone=\"mistral\".\n", + "\n", + "examples: User defined examples for the mistral model, in python list format:\n", + "\n", + "```\n", + "[\n", + " (\"input_text_sentence\", \n", + " {\n", + " \"class_1\":[\"Named Entity\", ...],\n", + " \"class_2\": [\"Named entity\", ...],\n", + " ...\n", + " }\n", + " )\n", + " ...\n", + "]\n", + "```\n", + "\n", + "```\n", + "Note: The EntityRecognizer class, using the \"Mistral\" backbone, needs at least six examples to work effectively.\n", + "```\n", + " \n", + "prompt: Text string describing the task and its guardrails. This is an optional parameter.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "entity_recognizer_model = EntityRecognizer(\n", + " data=None,\n", + " backbone=\"mistral\",\n", + " prompt=\"Tag the input sentences in the named entity for the given classes, no other class should be tagged.\"\n", + " examples=[(\n", + " 'Jim stays in London',\n", + " {\n", + " 'name': ['Jim'], \n", + " 'location': ['London']\n", + " },\n", + " ...\n", + " )]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Extract entities using the mistral model " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To extract named entities using the mistral model, use the extract_entities method from the EntityRecognizer class. The input to the method will be a text string or a list of text strings." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "entity_recognizer_model.extract_entities()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load the model " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To load a saved mistral model, use the from_model method from the EntityRecognizer class." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "entity_recognizer_model.from_model(r'path_to_dlpk_file')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save the model " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This method saves the model weights and creates a Deep Learning Package (.dlpk)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "entity_recognizer_model.save(\"name_of_the_mistral_model\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this guide we demonstrated the steps to initialize and perform inference using the Mistral LLM as a backbone with the TextClassifier and EntityRecognizer models in arcgis.learn." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## References " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Mistral-7B HuggingFace: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
\n", + "Mistral-7B MistralAI: https://mistral.ai/news/announcing-mistral-7b" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": true, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": true + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}