Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom specified models for HF model parsers #863

Merged
merged 1 commit into from
Jan 10, 2024
Merged

Support custom specified models for HF model parsers #863

merged 1 commit into from
Jan 10, 2024

Conversation

saqadri
Copy link
Contributor

@saqadri saqadri commented Jan 10, 2024

Support custom specified models for HF model parsers

The HF prompt schema has been updated in #850 to support a "model" property, which can specify the model to use for a given HF task.

This change updates the model parsers to use this setting.

Test Plan:

Tested with the following prompt:

    {
      "name": "image_caption",
      "input": {
        "attachments": [
          {
            "data": "https://s3.amazonaws.com/lastmileai.aiconfig.public/uploads/2024_1_10_18_41_31/1742/bear_with_honey.png",
            "mime_type": "image/png"
          },
          {
            "data": "https://s3.amazonaws.com/lastmileai.aiconfig.public/uploads/2024_1_10_18_41_31/7275/fox_in_forest.png",
            "mime_type": "image/png"
          }
        ]
      },
      "metadata": {
        "model": {
          "name": "HuggingFaceImage2TextTransformer",
          "settings": {
            "max_new_tokens": 10,
            "model": "Salesforce/blip-image-captioning-base"
          }
        },
        "parameters": {}
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 0,
          "data": "a bear sitting on a rock eating honey",
          "metadata": {}
        },
        {
          "output_type": "execute_result",
          "execution_count": 1,
          "data": "a red fox in the woods",
          "metadata": {}
        }
      ]
    },

Validated the the model setting was respected and worked.


Stack created with Sapling. Best reviewed with ReviewStack.

self.pipelines[model_name] = pipeline(task="image-to-text", model=model_name)
captioner = self.pipelines[model_name]
model_name = get_hf_model(aiconfig, prompt, self)
key = model_name if model_name is not None else "__default__"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: double underscores are usually associated with dunder methods or special built-in values in Python. Would prefer a different value here To avoid any confusion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can make a little enum class or something, like

class Sentinel(Enum):
    DEFAUT = "DEFAULT"

Then check for that value wherever needed and supply the real default value in a specific use case

@Ankush-lastmile
Copy link
Member

Ankush-lastmile commented Jan 10, 2024

I don't see ASR model parser updated, did that not need updating?

Copy link
Member

@Ankush-lastmile Ankush-lastmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming the model_name == model_parser.id(): was tested, lgtm though left a couple of comments

Copy link
Contributor

@jonathanlastmileai jonathanlastmileai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, just would suggest addressing nits and doing a thorough test plan

self.pipelines[model_name] = pipeline(task="image-to-text", model=model_name)
captioner = self.pipelines[model_name]
model_name = get_hf_model(aiconfig, prompt, self)
key = model_name if model_name is not None else "__default__"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can make a little enum class or something, like

class Sentinel(Enum):
    DEFAUT = "DEFAULT"

Then check for that value wherever needed and supply the real default value in a specific use case

self.summarizers[model_name] = pipeline("summarization", model=model_name)
summarizer = self.summarizers[model_name]
model_name = get_hf_model(aiconfig, prompt, self)
key = model_name if model_name is not None else "__default__"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

should_stream = (options.stream if options else False) and (
not "stream" in completion_data or completion_data.get("stream") != False
)
should_stream = (options.stream if options else False) and (not "stream" in completion_data or completion_data.get("stream") != False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this diff but this stream logic should be cleaned up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an bootcamp issue to track this in #861

self.translators[model_name] = pipeline("translation", model_name)
translator = self.translators[model_name]
model_name = get_hf_model(aiconfig, prompt, self)
key = model_name if model_name is not None else "__default__"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Comment on lines +7 to +11
if TYPE_CHECKING:
from aiconfig import AIConfigRuntime


def get_hf_model(aiconfig: "AIConfigRuntime", prompt: Prompt, model_parser: ParameterizedModelParser) -> str | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible it's much better to untangle the circle. Do we even have a circle in this case?

Copy link
Contributor

@rossdanlm rossdanlm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: I realize that unlike Facebook commandeer, if I take it it'll create a new PR and that will need to get approved again. In that case makes sense just for you to resolve. Just make sure all imports are included, doesn't need to be alphabetical or whatever, I can always fix forward

Sorry just FYI there will be some merge conflicts due to #862. I can commandeer and resolve if you want, just let me know

The HF prompt schema has been updated in #850 to support a "model" property, which can specify the model to use for a given HF task.

This change updates the model parsers to use this setting.

Test Plan:

Tested with the following prompt:
```
    {
      "name": "image_caption",
      "input": {
        "attachments": [
          {
            "data": "https://s3.amazonaws.com/lastmileai.aiconfig.public/uploads/2024_1_10_18_41_31/1742/bear_with_honey.png",
            "mime_type": "image/png"
          },
          {
            "data": "https://s3.amazonaws.com/lastmileai.aiconfig.public/uploads/2024_1_10_18_41_31/7275/fox_in_forest.png",
            "mime_type": "image/png"
          }
        ]
      },
      "metadata": {
        "model": {
          "name": "HuggingFaceImage2TextTransformer",
          "settings": {
            "max_new_tokens": 10,
            "model": "Salesforce/blip-image-captioning-base"
          }
        },
        "parameters": {}
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 0,
          "data": "a bear sitting on a rock eating honey",
          "metadata": {}
        },
        {
          "output_type": "execute_result",
          "execution_count": 1,
          "data": "a red fox in the woods",
          "metadata": {}
        }
      ]
    },
```

Validated the the model setting was respected and worked.
saqadri added a commit that referenced this pull request Jan 10, 2024
Fix parameterization issues

Now that we have non-text input model parsers, we can see some issues in
our parameterization.

1) ASR and Image-to-Text modelparsers should NOT be
`ParameterizedModelParser` instances, since their inputs cannot be
parameterized.

2) Parameter resolution logic shouldn't throw errors in the case where
it's parsing prompts belonging to regular model parsers.

Test Plan:
Ran the translation prompt for this aiconfig:

```
prompts: [
{
      "name": "image_caption",
      "input": {
        "attachments": [
          {
            "data": "https://s3.amazonaws.com/lastmileai.aiconfig.public/uploads/2024_1_10_18_41_31/1742/bear_with_honey.png",
            "mime_type": "image/png"
          },
          {
            "data": "https://s3.amazonaws.com/lastmileai.aiconfig.public/uploads/2024_1_10_18_41_31/7275/fox_in_forest.png",
            "mime_type": "image/png"
          }
        ]
      },
      "metadata": {
        "model": {
          "name": "HuggingFaceImage2TextTransformer",
          "settings": {
            "max_new_tokens": 10,
            "model": "Salesforce/blip-image-captioning-base"
          }
        },
        "parameters": {}
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 0,
          "data": "a bear sitting on a rock eating honey",
          "metadata": {}
        },
        {
          "output_type": "execute_result",
          "execution_count": 1,
          "data": "a red fox in the woods",
          "metadata": {}
        }
      ]
    },
    {
      "name": "translation",
      "input": "Once upon a time, in a lush and vibrant forest, there lived a magnificent creature known as the Quick Brown Fox. This fox was unlike any other, possessing incredible speed and agility that awed all the animals in the forest. With its fur as golden as the sun and its eyes as sharp as emeralds, the Quick Brown Fox was admired by everyone, from the tiniest hummingbird to the mightiest bear. The fox had a kind heart and would often lend a helping paw to those in need. The Quick Brown Fox had a particular fondness for games and challenges. It loved to test its skills against others, always seeking new adventures to satisfy its boundless curiosity. Its favorite game was called \"The Great Word Hunt,\" where it would embark on a quest to find hidden words scattered across the forest. \n\nOne day it got very old and died",
      "metadata": {
        "model": {
          "name": "HuggingFaceTextTranslationTransformer",
          "settings": {
            "model": "Helsinki-NLP/opus-mt-en-fr",
            "max_length": 100,
            "min_length": 50,
            "num_beams": 1
          }
        },
        "parameters": {}
      }
    }
]
```

Before:

```
File "/opt/homebrew/lib/python3.11/site-packages/aiconfig/util/params.py", line 235, in get_prompt_template
    raise Exception(f"Cannot get prompt template string from prompt: {prompt.input}")
Exception: Cannot get prompt template string from prompt: attachments=[Attachment(data='https://s3.amazonaws.com/lastmileai.aiconfig.public/uploads/2024_1_10_18_41_31/1742/bear_with_honey.png', mime_type='image/png', metadata=None), Attachment(data='https://s3.amazonaws.com/lastmileai.aiconfig.public/uploads/2024_1_10_18_41_31/7275/fox_in_forest.png', mime_type='image/png', metadata=None)] data=None
```

After:
* Beautiful translation

---
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/lastmile-ai/aiconfig/pull/866).
* #826
* __->__ #866
* #863
@saqadri saqadri merged commit 5e533b0 into main Jan 10, 2024
saqadri added a commit that referenced this pull request Jan 10, 2024
Gradio Local Editor cookbook

---
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/lastmile-ai/aiconfig/pull/826).
* __->__ #826
* #866
* #863
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants