Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless generation bug popped up during migration to Guide in vLLM integration #856

Closed
br3no opened this issue May 2, 2024 · 5 comments · Fixed by #874
Closed

Endless generation bug popped up during migration to Guide in vLLM integration #856

br3no opened this issue May 2, 2024 · 5 comments · Fixed by #874
Labels
bug structured generation Linked to structured generation vLLM Things involving vLLM support

Comments

@br3no
Copy link
Contributor

br3no commented May 2, 2024

Describe the issue as clearly as possible:

While updating the vLLM integration with outlines (cf. vllm-project/vllm#4109) to move to the new Guide API a test started failing. The symptom is that a regular expression that matches IPv4 addresses does not stop generating.

The original regular expression was: r"((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.){3}(25[0-5]|(2[0-4]|1\d|[1-9]|)\d)"
The issue can still be observed with the much simpler expression: r"((25|(2|1\d|\d))\.){3}(25|(2|1\d|\d))"

The issue was discussed in a call with @rlouf.

Things we tried, that didn't change the result:

  • move to outlines 0.0.36
  • change the RegexGuide code and remove the byte_fsm replacing it with regex_pattern.to_fsm()

Steps/code to reproduce the bug:

#
# To reproduce the error you need to pull the correct vllm fork, where the change to `Guides` is being worked on (https://github.com/br3no/vllm/tree/3715-outlines-guides) 
# 

import asyncio
import re
import pytest
from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine

from vllm.entrypoints.openai.protocol import CompletionRequest
from vllm.entrypoints.openai.serving_completion import OpenAIServingCompletion
from unittest.mock import Mock

pytestmark = pytest.mark.asyncio


async def test_guided_decoding_with_outlines():
    guided_regex = (r"((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.){3}"
              r"(25[0-5]|(2[0-4]|1\d|[1-9]|)\d)")

    args = AsyncEngineArgs(
        model = "HuggingFaceH4/zephyr-7b-beta",
        dtype = "bfloat16",
        max_model_len = 8192,
        enforce_eager = True,
        max_num_seqs = 128)
    
    engine = AsyncLLMEngine.from_engine_args(args)

    openai_serving_completion = OpenAIServingCompletion(
        engine, [args.model])
    
    raw_request_mock = Mock()
    future_false = asyncio.Future()
    future_false.set_result(False)
    raw_request_mock.is_disconnected = Mock(return_value=future_false)
    completion = await openai_serving_completion.create_completion(
        CompletionRequest(
            model=args.model,
            prompt=f"Give an example IPv4 address with this regex: {guided_regex}",
            n=1,
            temperature=1.0,
            max_tokens=20,
            guided_regex=guided_regex,
            guided_decoding_backend="outlines"
        ),
        raw_request_mock
    )

    for choice in completion.choices:
        assert len(choice.text) <= 15
        assert re.match(guided_regex, choice.text) is not None

Expected result:

The test should not fail. Instead it fails generating IPs like this: `10.0.0.1282552550000`.

Error message:

No response

Outlines/Python version information:

Version information

0.0.39
Python 3.8.18 (default, Oct  2 2023, 15:02:11) 
[GCC 9.4.0]
accelerate==0.27.2
ai2-olmo==0.2.5
aiohttp==3.9.3
aioprometheus==23.12.0
aiosignal==1.3.1
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.2.0
async-timeout==4.0.3
attrs==23.2.0
awscli==1.32.83
boto3==1.34.83
botocore==1.34.83
cached_path==1.6.2
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
cmake==3.29.2
codespell==2.2.6
colorama==0.4.4
cupy-cuda12x==12.1.0
diskcache==5.6.3
distro==1.9.0
docutils==0.16
einops==0.7.0
exceptiongroup==1.2.0
fastapi==0.109.2
fastrlock==0.8.2
filelock==3.13.1
frozenlist==1.4.1
fsspec==2024.2.0
google-api-core==2.18.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
h11==0.14.0
hiredis==2.3.2
httpcore==1.0.4
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.20.3
idna==3.6
importlib-resources==6.1.1
importlib_metadata==7.0.2
iniconfig==2.0.0
interegular==0.3.3
isort==5.13.2
Jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
lark==1.1.9
libnacl==2.1.0
llvmlite==0.41.1
lm-format-enforcer==0.9.3
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.5
mypy==1.9.0
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.1
ninja==1.11.1.1
numba==0.58.1
numpy==1.24.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
omegaconf==2.3.0
openai==1.13.3
orjson==3.9.13
outlines==0.0.39
packaging==23.2
peft==0.9.0
pillow==10.3.0
pkgutil_resolve_name==1.3.10
pluggy==1.4.0
prometheus_client==0.20.0
proto-plus==1.23.0
protobuf==4.25.2
psutil==5.9.8
py==1.11.0
py-cpuinfo==9.0.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
pydantic==2.6.1
pydantic_core==2.16.2
Pygments==2.17.2
pynvml==11.5.0
pytest==8.0.2
pytest-asyncio==0.23.5
pytest-forked==1.6.0
pytest-rerunfailures==13.0
pytest-shard==0.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
PyYAML==6.0.1
quantile-python==1.1
ray==2.9.2
redis==5.0.3
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rich==13.7.1
rpds-py==0.17.1
rsa==4.7.2
ruff==0.1.5
s3transfer==0.10.1
safetensors==0.4.2
scipy==1.10.1
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.0
starlette==0.36.3
sympy==1.12
tensorizer==2.9.0a0
tiktoken==0.6.0
tokenizers==0.19.1
toml==0.10.2
tomli==2.0.1
torch==2.1.2
tqdm==4.66.1
transformers==4.40.0
triton==2.1.0
types-PyYAML==6.0.12.12
types-requests==2.31.0.6
types-setuptools==69.1.0.20240308
types-urllib3==1.26.25.14
typing_extensions==4.9.0
urllib3==1.26.18
uvicorn==0.27.0.post1
uvloop==0.19.0
-e git+ssh://[email protected]/br3no/vllm.git@e6ffb1af2e904436473568f9d43131c998e55063#egg=vllm
watchfiles==0.21.0
websockets==12.0
xformers==0.0.23.post1
yapf==0.32.0
yarl==1.9.4
zipp==3.17.0

Context for the issue:

No response

@lapp0
Copy link
Contributor

lapp0 commented May 10, 2024

Thanks for the issue and your work integrating Outlines new Guide interface into vLLM, @br3no!

Could you please check this PR, I believe I've successfully reproduced your issue and resolved it: #884 Your test case passes with the PR on my end.

You can prototype the fix with pip install git+https://github.com/lapp0/outlines@issue-856

Please feel free to ping me if you run into any issues with this PR, or with integrations between Outlines and vLLM.

@br3no
Copy link
Contributor Author

br3no commented May 10, 2024

@lapp0 Your PR is very similar to mine: #874. The only difference being, that in my PR the eos token or inexistent states will map to the virtual final state -1, while in yours, the eos token will map to -1 while inexistent states will always transition to themselves.

I believe they are functionally equivalent.

I don't really mind which one gets merged; the important thing is getting this fix!

@br3no
Copy link
Contributor Author

br3no commented May 10, 2024

And probably we should create a new issue for this: #874 (comment)

Right?

@lapp0
Copy link
Contributor

lapp0 commented May 10, 2024

Oops, sorry I missed your PR. Yes they're functionallity identical.

And I agree, this calls for a new issue. Does this seem reasonable? #885

@br3no
Copy link
Contributor Author

br3no commented May 10, 2024

Yes!

rlouf pushed a commit that referenced this issue May 11, 2024
Fixes #856

The code this PR removes introduces an artificial and erroneous loop
transition in every final state that is always traversed, regardless of
the generation.

The comment doesn't make sense in my opinion, as the `if` above just
handles exactly this case.

Removing this piece of code fixes the bug that surfaced in the upgrade
of outlines in the vLLM integration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug structured generation Linked to structured generation vLLM Things involving vLLM support
Projects
Status: Done
3 participants