"Efficient JSON generation following a JSON Schema" example running continuosly, desn't stop #826

mpetruc · 2024-04-18T17:02:19Z

mpetruc
Apr 18, 2024

Hi,
I just discovered your library, and it seems awesome! I've tried running one of the examples in the Readme, but i'm obvously doing something wrong, because the generator never stops while GPU utilization is 100%. Here's the code:

import outlines.models
import outlines.models.transformers
import outlines
import torch
model_name='/OpenHermes-2.5-Mistral-7B-2'
model = outlines.models.transformers(model_name, device="cuda",model_kwargs={'torch_dtype':torch.float16, 'attn_implementation':"flash_attention_2"})
schema = '''{
    "title": "Character",
    "type": "object",
    "properties": {
        "name": {
            "title": "Name",
            "maxLength": 10,
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "armor": {"$ref": "#/definitions/Armor"},
        "weapon": {"$ref": "#/definitions/Weapon"},
        "strength": {
            "title": "Strength",
            "type": "integer"
        }
    },
    "required": ["name", "age", "armor", "weapon", "strength"],
    "definitions": {
        "Armor": {
            "title": "Armor",
            "description": "An enumeration.",
            "enum": ["leather", "chainmail", "plate"],
            "type": "string"
        },
        "Weapon": {
            "title": "Weapon",
            "description": "An enumeration.",
            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
            "type": "string"
        }
    }
}'''

generator = outlines.generate.json(model, schema)
character = generator("Give me a character description")

I'm using:

python 3.9.5
transformers 4.37.2
outlines 0.0.39
torch 2.0.1

16Gb RTX 3080

Any suggestions? Thank you!

Answered by mpetruc

Apr 18, 2024

I did one more experiment, adding 'low_cpu_mem_usage':True to model_kwargs and that seemed to do the trick. If finished generation in 4secs. Here's the final command, in case someone runs into the same issue:

model = outlines.models.transformers(model_name, device="cuda",model_kwargs={'torch_dtype':torch.float16, 'low_cpu_mem_usage':True, 'attn_implementation':"flash_attention_2"})

Thank you!

View full answer

mpetruc · 2024-04-18T17:13:46Z

mpetruc
Apr 18, 2024
Author

I did one more experiment, adding 'low_cpu_mem_usage':True to model_kwargs and that seemed to do the trick. If finished generation in 4secs. Here's the final command, in case someone runs into the same issue:

model = outlines.models.transformers(model_name, device="cuda",model_kwargs={'torch_dtype':torch.float16, 'low_cpu_mem_usage':True, 'attn_implementation':"flash_attention_2"})

Thank you!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Efficient JSON generation following a JSON Schema" example running continuosly, desn't stop #826

{{title}}

Replies: 1 comment

{{title}}

Select a reply

"Efficient JSON generation following a JSON Schema" example running continuosly, desn't stop #826

mpetruc Apr 18, 2024

Replies: 1 comment

mpetruc Apr 18, 2024 Author

mpetruc
Apr 18, 2024

mpetruc
Apr 18, 2024
Author