Endless generation bug popped up during migration to `Guide` in vLLM integration #856

br3no · 2024-05-02T09:24:29Z

Describe the issue as clearly as possible:

While updating the vLLM integration with outlines (cf. vllm-project/vllm#4109) to move to the new Guide API a test started failing. The symptom is that a regular expression that matches IPv4 addresses does not stop generating.

The original regular expression was: r"((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.){3}(25[0-5]|(2[0-4]|1\d|[1-9]|)\d)"
The issue can still be observed with the much simpler expression: r"((25|(2|1\d|\d))\.){3}(25|(2|1\d|\d))"

The issue was discussed in a call with @rlouf.

Things we tried, that didn't change the result:

move to outlines 0.0.36
change the RegexGuide code and remove the byte_fsm replacing it with regex_pattern.to_fsm()

Steps/code to reproduce the bug:

#
# To reproduce the error you need to pull the correct vllm fork, where the change to `Guides` is being worked on (https://github.com/br3no/vllm/tree/3715-outlines-guides) 
# 

import asyncio
import re
import pytest
from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine

from vllm.entrypoints.openai.protocol import CompletionRequest
from vllm.entrypoints.openai.serving_completion import OpenAIServingCompletion
from unittest.mock import Mock

pytestmark = pytest.mark.asyncio


async def test_guided_decoding_with_outlines():
    guided_regex = (r"((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.){3}"
              r"(25[0-5]|(2[0-4]|1\d|[1-9]|)\d)")

    args = AsyncEngineArgs(
        model = "HuggingFaceH4/zephyr-7b-beta",
        dtype = "bfloat16",
        max_model_len = 8192,
        enforce_eager = True,
        max_num_seqs = 128)
    
    engine = AsyncLLMEngine.from_engine_args(args)

    openai_serving_completion = OpenAIServingCompletion(
        engine, [args.model])
    
    raw_request_mock = Mock()
    future_false = asyncio.Future()
    future_false.set_result(False)
    raw_request_mock.is_disconnected = Mock(return_value=future_false)
    completion = await openai_serving_completion.create_completion(
        CompletionRequest(
            model=args.model,
            prompt=f"Give an example IPv4 address with this regex: {guided_regex}",
            n=1,
            temperature=1.0,
            max_tokens=20,
            guided_regex=guided_regex,
            guided_decoding_backend="outlines"
        ),
        raw_request_mock
    )

    for choice in completion.choices:
        assert len(choice.text) <= 15
        assert re.match(guided_regex, choice.text) is not None

Expected result:

The test should not fail. Instead it fails generating IPs like this: `10.0.0.1282552550000`.

Error message:

No response

Outlines/Python version information:

Version information

0.0.39
Python 3.8.18 (default, Oct  2 2023, 15:02:11) 
[GCC 9.4.0]
accelerate==0.27.2
ai2-olmo==0.2.5
aiohttp==3.9.3
aioprometheus==23.12.0
aiosignal==1.3.1
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.2.0
async-timeout==4.0.3
attrs==23.2.0
awscli==1.32.83
boto3==1.34.83
botocore==1.34.83
cached_path==1.6.2
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
cmake==3.29.2
codespell==2.2.6
colorama==0.4.4
cupy-cuda12x==12.1.0
diskcache==5.6.3
distro==1.9.0
docutils==0.16
einops==0.7.0
exceptiongroup==1.2.0
fastapi==0.109.2
fastrlock==0.8.2
filelock==3.13.1
frozenlist==1.4.1
fsspec==2024.2.0
google-api-core==2.18.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
h11==0.14.0
hiredis==2.3.2
httpcore==1.0.4
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.20.3
idna==3.6
importlib-resources==6.1.1
importlib_metadata==7.0.2
iniconfig==2.0.0
interegular==0.3.3
isort==5.13.2
Jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
lark==1.1.9
libnacl==2.1.0
llvmlite==0.41.1
lm-format-enforcer==0.9.3
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.5
mypy==1.9.0
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.1
ninja==1.11.1.1
numba==0.58.1
numpy==1.24.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
omegaconf==2.3.0
openai==1.13.3
orjson==3.9.13
outlines==0.0.39
packaging==23.2
peft==0.9.0
pillow==10.3.0
pkgutil_resolve_name==1.3.10
pluggy==1.4.0
prometheus_client==0.20.0
proto-plus==1.23.0
protobuf==4.25.2
psutil==5.9.8
py==1.11.0
py-cpuinfo==9.0.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
pydantic==2.6.1
pydantic_core==2.16.2
Pygments==2.17.2
pynvml==11.5.0
pytest==8.0.2
pytest-asyncio==0.23.5
pytest-forked==1.6.0
pytest-rerunfailures==13.0
pytest-shard==0.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
PyYAML==6.0.1
quantile-python==1.1
ray==2.9.2
redis==5.0.3
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rich==13.7.1
rpds-py==0.17.1
rsa==4.7.2
ruff==0.1.5
s3transfer==0.10.1
safetensors==0.4.2
scipy==1.10.1
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.0
starlette==0.36.3
sympy==1.12
tensorizer==2.9.0a0
tiktoken==0.6.0
tokenizers==0.19.1
toml==0.10.2
tomli==2.0.1
torch==2.1.2
tqdm==4.66.1
transformers==4.40.0
triton==2.1.0
types-PyYAML==6.0.12.12
types-requests==2.31.0.6
types-setuptools==69.1.0.20240308
types-urllib3==1.26.25.14
typing_extensions==4.9.0
urllib3==1.26.18
uvicorn==0.27.0.post1
uvloop==0.19.0
-e git+ssh://[email protected]/br3no/vllm.git@e6ffb1af2e904436473568f9d43131c998e55063#egg=vllm
watchfiles==0.21.0
websockets==12.0
xformers==0.0.23.post1
yapf==0.32.0
yarl==1.9.4
zipp==3.17.0

Context for the issue:

No response

The text was updated successfully, but these errors were encountered:

lapp0 · 2024-05-10T09:50:56Z

Thanks for the issue and your work integrating Outlines new Guide interface into vLLM, @br3no!

Could you please check this PR, I believe I've successfully reproduced your issue and resolved it: #884 Your test case passes with the PR on my end.

You can prototype the fix with pip install git+https://github.com/lapp0/outlines@issue-856

Please feel free to ping me if you run into any issues with this PR, or with integrations between Outlines and vLLM.

br3no · 2024-05-10T10:44:52Z

@lapp0 Your PR is very similar to mine: #874. The only difference being, that in my PR the eos token or inexistent states will map to the virtual final state -1, while in yours, the eos token will map to -1 while inexistent states will always transition to themselves.

I believe they are functionally equivalent.

I don't really mind which one gets merged; the important thing is getting this fix!

br3no · 2024-05-10T10:47:29Z

And probably we should create a new issue for this: #874 (comment)

Right?

lapp0 · 2024-05-10T11:16:22Z

Oops, sorry I missed your PR. Yes they're functionallity identical.

And I agree, this calls for a new issue. Does this seem reasonable? #885

br3no · 2024-05-10T11:33:36Z

Yes!

Fixes #856 The code this PR removes introduces an artificial and erroneous loop transition in every final state that is always traversed, regardless of the generation. The comment doesn't make sense in my opinion, as the `if` above just handles exactly this case. Removing this piece of code fixes the bug that surfaced in the upgrade of outlines in the vLLM integration.

br3no added the bug label May 2, 2024

This was referenced May 2, 2024

[Frontend][Core] Update Outlines Integration from FSM to Guide vllm-project/vllm#4109

Open

[CI/Build] Unpin outlines vllm-project/vllm#4558

Closed

Remove broken final state loop #874

Merged

brandonwillard added structured generation Linked to structured generation vLLM Things involving vLLM support labels May 9, 2024

lapp0 mentioned this issue May 10, 2024

Fix Incorrect Finality Check in RegexGuide.get_next_state() #884

Closed

lapp0 mentioned this issue May 10, 2024

Stop using is_final_state and final_states #885

Open

rlouf closed this as completed in #874 May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Endless generation bug popped up during migration to `Guide` in vLLM integration #856

Endless generation bug popped up during migration to `Guide` in vLLM integration #856

br3no commented May 2, 2024

lapp0 commented May 10, 2024 •

edited

br3no commented May 10, 2024

br3no commented May 10, 2024

lapp0 commented May 10, 2024

br3no commented May 10, 2024

Endless generation bug popped up during migration to Guide in vLLM integration #856

Endless generation bug popped up during migration to Guide in vLLM integration #856

Comments

br3no commented May 2, 2024

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

lapp0 commented May 10, 2024 • edited

br3no commented May 10, 2024

br3no commented May 10, 2024

lapp0 commented May 10, 2024

br3no commented May 10, 2024

Endless generation bug popped up during migration to `Guide` in vLLM integration #856

Endless generation bug popped up during migration to `Guide` in vLLM integration #856

lapp0 commented May 10, 2024 •

edited