Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add left recursion check: quit early instead of going into an infinite loop #7083

Merged
merged 3 commits into from May 14, 2024

Conversation

nuchi
Copy link
Contributor

@nuchi nuchi commented May 5, 2024

Fixes #6492

Adds a check to see if the grammar has left recursion, and quits instead of going into an infinite loop.

I'm not sure if this is the best place for it, happy to move it around if not.

attn: @HanClinto

Copy link
Contributor

github-actions bot commented May 5, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 558 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8367.81ms p(95)=20438.81ms fails=, finish reason: stop=496 truncated=62
  • Prompt processing (pp): avg=101.6tk/s p(95)=436.12tk/s
  • Token generation (tg): avg=34.96tk/s p(95)=49.27tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=left-recursion commit=1bbf54a13c93e30d32f22326d4c9f46890fa1eb2

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 558 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715545127 --> 1715545757
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 892.79, 892.79, 892.79, 892.79, 892.79, 689.04, 689.04, 689.04, 689.04, 689.04, 715.62, 715.62, 715.62, 715.62, 715.62, 746.58, 746.58, 746.58, 746.58, 746.58, 806.38, 806.38, 806.38, 806.38, 806.38, 803.1, 803.1, 803.1, 803.1, 803.1, 818.37, 818.37, 818.37, 818.37, 818.37, 832.97, 832.97, 832.97, 832.97, 832.97, 841.19, 841.19, 841.19, 841.19, 841.19, 839.45, 839.45, 839.45, 839.45, 839.45, 849.34, 849.34, 849.34, 849.34, 849.34, 886.13, 886.13, 886.13, 886.13, 886.13, 872.38, 872.38, 872.38, 872.38, 872.38, 775.46, 775.46, 775.46, 775.46, 775.46, 755.1, 755.1, 755.1, 755.1, 755.1, 755.17, 755.17, 755.17, 755.17, 755.17, 748.18, 748.18, 748.18, 748.18, 748.18, 752.16, 752.16, 752.16, 752.16, 752.16, 754.21, 754.21, 754.21, 754.21, 754.21, 761.63, 761.63, 761.63, 761.63, 761.63, 767.71, 767.71, 767.71, 767.71, 767.71, 762.21, 762.21, 762.21, 762.21, 762.21, 766.21, 766.21, 766.21, 766.21, 766.21, 777.83, 777.83, 777.83, 777.83, 777.83, 783.5, 783.5, 783.5, 783.5, 783.5, 782.41, 782.41, 782.41, 782.41, 782.41, 781.4, 781.4, 781.4, 781.4, 781.4, 780.14, 780.14, 780.14, 780.14, 780.14, 786.74, 786.74, 786.74, 786.74, 786.74, 786.98, 786.98, 786.98, 786.98, 786.98, 788.24, 788.24, 788.24, 788.24, 788.24, 790.81, 790.81, 790.81, 790.81, 790.81, 798.14, 798.14, 798.14, 798.14, 798.14, 804.64, 804.64, 804.64, 804.64, 804.64, 791.01, 791.01, 791.01, 791.01, 791.01, 788.62, 788.62, 788.62, 788.62, 788.62, 786.75, 786.75, 786.75, 786.75, 786.75, 791.51, 791.51, 791.51, 791.51, 791.51, 792.09, 792.09, 792.09, 792.09, 792.09, 800.81, 800.81, 800.81, 800.81, 800.81, 792.97, 792.97, 792.97, 792.97, 792.97, 792.33, 792.33, 792.33, 792.33, 792.33, 791.44, 791.44, 791.44, 791.44, 791.44, 789.83, 789.83, 789.83, 789.83, 789.83, 799.07, 799.07, 799.07, 799.07, 799.07, 799.06, 799.06, 799.06, 799.06, 799.06, 806.09, 806.09, 806.09, 806.09, 806.09, 805.57, 805.57, 805.57, 805.57, 805.57, 808.85, 808.85, 808.85, 808.85, 808.85, 812.69, 812.69, 812.69, 812.69, 812.69, 812.14, 812.14, 812.14, 812.14, 812.14, 819.68, 819.68, 819.68, 819.68, 819.68, 818.94, 818.94, 818.94, 818.94, 818.94, 820.51, 820.51, 820.51, 820.51, 820.51, 822.09, 822.09, 822.09, 822.09, 822.09, 822.02, 822.02, 822.02, 822.02, 822.02, 823.79, 823.79, 823.79, 823.79, 823.79, 824.14, 824.14, 824.14, 824.14, 824.14, 826.95, 826.95, 826.95, 826.95, 826.95, 826.05, 826.05, 826.05, 826.05, 826.05, 826.05]
                    
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 558 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715545127 --> 1715545757
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 34.51, 34.51, 34.51, 34.51, 34.51, 32.67, 32.67, 32.67, 32.67, 32.67, 28.79, 28.79, 28.79, 28.79, 28.79, 29.75, 29.75, 29.75, 29.75, 29.75, 31.14, 31.14, 31.14, 31.14, 31.14, 31.99, 31.99, 31.99, 31.99, 31.99, 32.98, 32.98, 32.98, 32.98, 32.98, 33.2, 33.2, 33.2, 33.2, 33.2, 33.88, 33.88, 33.88, 33.88, 33.88, 34.35, 34.35, 34.35, 34.35, 34.35, 34.3, 34.3, 34.3, 34.3, 34.3, 33.88, 33.88, 33.88, 33.88, 33.88, 33.83, 33.83, 33.83, 33.83, 33.83, 32.64, 32.64, 32.64, 32.64, 32.64, 32.63, 32.63, 32.63, 32.63, 32.63, 33.02, 33.02, 33.02, 33.02, 33.02, 33.14, 33.14, 33.14, 33.14, 33.14, 32.98, 32.98, 32.98, 32.98, 32.98, 32.89, 32.89, 32.89, 32.89, 32.89, 32.82, 32.82, 32.82, 32.82, 32.82, 32.91, 32.91, 32.91, 32.91, 32.91, 32.99, 32.99, 32.99, 32.99, 32.99, 33.16, 33.16, 33.16, 33.16, 33.16, 33.43, 33.43, 33.43, 33.43, 33.43, 33.25, 33.25, 33.25, 33.25, 33.25, 33.14, 33.14, 33.14, 33.14, 33.14, 32.27, 32.27, 32.27, 32.27, 32.27, 32.51, 32.51, 32.51, 32.51, 32.51, 32.69, 32.69, 32.69, 32.69, 32.69, 32.78, 32.78, 32.78, 32.78, 32.78, 32.81, 32.81, 32.81, 32.81, 32.81, 32.94, 32.94, 32.94, 32.94, 32.94, 32.8, 32.8, 32.8, 32.8, 32.8, 32.66, 32.66, 32.66, 32.66, 32.66, 32.56, 32.56, 32.56, 32.56, 32.56, 32.42, 32.42, 32.42, 32.42, 32.42, 32.44, 32.44, 32.44, 32.44, 32.44, 32.54, 32.54, 32.54, 32.54, 32.54, 32.62, 32.62, 32.62, 32.62, 32.62, 32.74, 32.74, 32.74, 32.74, 32.74, 31.86, 31.86, 31.86, 31.86, 31.86, 31.83, 31.83, 31.83, 31.83, 31.83, 31.74, 31.74, 31.74, 31.74, 31.74, 30.7, 30.7, 30.7, 30.7, 30.7, 30.65, 30.65, 30.65, 30.65, 30.65, 30.66, 30.66, 30.66, 30.66, 30.66, 30.76, 30.76, 30.76, 30.76, 30.76, 30.79, 30.79, 30.79, 30.79, 30.79, 30.87, 30.87, 30.87, 30.87, 30.87, 30.89, 30.89, 30.89, 30.89, 30.89, 30.85, 30.85, 30.85, 30.85, 30.85, 30.64, 30.64, 30.64, 30.64, 30.64, 30.62, 30.62, 30.62, 30.62, 30.62, 30.76, 30.76, 30.76, 30.76, 30.76, 30.95, 30.95, 30.95, 30.95, 30.95, 30.99, 30.99, 30.99, 30.99, 30.99, 31.07, 31.07, 31.07, 31.07, 31.07, 31.12, 31.12, 31.12, 31.12, 31.12, 31.11, 31.11, 31.11, 31.11, 31.11, 31.08, 31.08, 31.08, 31.08, 31.08, 31.14]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 558 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715545127 --> 1715545757
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3, 0.3, 0.3, 0.3, 0.3, 0.29, 0.29, 0.29, 0.29, 0.29, 0.24, 0.24, 0.24, 0.24, 0.24, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.19, 0.19, 0.19, 0.19, 0.19, 0.3, 0.3, 0.3, 0.3, 0.3, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.09, 0.09, 0.09, 0.09, 0.09, 0.22, 0.22, 0.22, 0.22, 0.22, 0.35, 0.35, 0.35, 0.35, 0.35, 0.31, 0.31, 0.31, 0.31, 0.31, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.29, 0.29, 0.29, 0.29, 0.29, 0.2, 0.2, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.28, 0.28, 0.28, 0.28, 0.28, 0.55, 0.55, 0.55, 0.55, 0.55, 0.49, 0.49, 0.49, 0.49, 0.49, 0.51, 0.51, 0.51, 0.51, 0.51, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.09, 0.09, 0.09, 0.09, 0.09, 0.2, 0.2, 0.2, 0.2, 0.2, 0.29, 0.29, 0.29, 0.29, 0.29, 0.21, 0.21, 0.21, 0.21, 0.21, 0.27, 0.27, 0.27, 0.27, 0.27, 0.12, 0.12, 0.12, 0.12, 0.12, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.24]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 558 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715545127 --> 1715545757
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0]
                    

@mofosyne mofosyne added bugfix fixes an issue or bug review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 9, 2024
llama.cpp Outdated
LLAMA_LEFT_REC_FOUND_CYCLE = 3,
};

static void detect_left_recursion(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor formatting note -- internal functions and structures such as these should go in the section marked grammar - internal.

Also, because this is a grammar-specific function, it should probably be renamed llama_grammar_detect_left_recursion

@HanClinto
Copy link
Collaborator

Nice work, @nuchi !

Very sorry for missing this when you first opened it -- I have adjusted my notification settings and hopefully I won't miss such things in the future.

This looks really good. I especially like your extensions of the integration tests -- well done!

Reading your code, I was initially worried that it might falsely flag examples of non-left recursion, but you handled that perfectly.

I like your approach of checking for left recursion as part of the grammar initialization. My first approach was to detect left recursion only at inference time (inside of llama_grammar_advance_stack), but that is non-deterministic -- because if a user has left recursion buried deep in their grammar, it's possible one could generate text that doesn't hit it -- but your check is exhaustive, which I think is better than how I was going to approach it.

Looking at the code, I think it can be simplified quite a bit. For instance, you're doing a full walk through the tree every time, but there is possibility to have an early exit in the case of finding recursion.

Also, the custom enumeration is cool (and very thorough!), but might be a bit overkill for our needs. All we need to do is track lists of the rules references that are visited in each branch, and check to see if they have been visited before. I took your base code and stripped it down a bit:

static bool llama_grammar_detect_left_recursion(
        const std::vector<std::vector<llama_grammar_element>> & rules,
        size_t                                                  rule_index,
        std::vector<size_t>                                   * rules_visited) {

    if (rules_visited == nullptr) {
        rules_visited = new std::vector<size_t>();
    }

    // Check to see if this rule index has already been visited
    if (std::find(rules_visited->begin(), rules_visited->end(), rule_index) != rules_visited->end()) {
        // If it has, then we're in a loop
        return true;
    }

    // Mark that we have looked at this rule
    rules_visited->emplace_back(rule_index);

    // Recurse through this rule's elements
    const std::vector<llama_grammar_element> & rule = rules[rule_index];
    size_t i = 0;
    do {
        // If the first element in a rule sequence is a rule reference, recurse into that rule
        if (rule[i].type == LLAMA_GRETYPE_RULE_REF) {
            // Create a new vector to store the list of rules visited in this branch
            std::vector<size_t> rules_visited_branch(*rules_visited);
            if (llama_grammar_detect_left_recursion(rules, (size_t)rule[i].value, &rules_visited_branch)) {
                // Early exit if we detect left recursion
                return true;
            }
        }
        // Skip over all other elements in this sequence -- even if they are rule references.
        // We only care about detecting left recursion -- not all recursion.
        while (!llama_grammar_is_end_of_sequence(&rule[i])) {
            i++;
        }
        i++;
    } while (i < rule.size());

    return false;
}

And then this would be called from llama_grammar_init in this way:

    // Check for left recursion
    for (size_t i = 0; i < n_rules; i++) {
        if (llama_grammar_detect_left_recursion(vec_rules, i, nullptr)) {
            throw std::runtime_error(format("unsupported grammar, left recursion detected for nonterminal at index %d", (int)i));
        }
    }

This is not new code -- I only adapted what you already had here and tried to reduce it to your bare essentials.

Overall I love your approach, and it's much better than the direction I was originally taking. Well done!

Comment on lines 31 to 43
static bool test_build_grammar_fails(const std::string & grammar_str) {
bool grammar_fails = false;
try {
build_grammar(grammar_str);
fprintf(stderr, "❌ Expected build failure, but succeeded: %s\n", grammar_str.c_str());
} catch (const std::exception & err) {
grammar_fails = true;
fprintf(stdout, "✅︎\n");
}
return grammar_fails;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor formatting adjustment so that we print the grammar on success as well.

Suggested change
static bool test_build_grammar_fails(const std::string & grammar_str) {
bool grammar_fails = false;
try {
build_grammar(grammar_str);
fprintf(stderr, "❌ Expected build failure, but succeeded: %s\n", grammar_str.c_str());
} catch (const std::exception & err) {
grammar_fails = true;
fprintf(stdout, "✅︎\n");
}
return grammar_fails;
}
static bool test_build_grammar_fails(const std::string & grammar_str) {
fprintf(stderr, "⚫ Testing failure for grammar: %s\n", grammar_str.c_str());
bool grammar_fails = false;
try {
build_grammar(grammar_str);
fprintf(stderr, " ❌ Expected build failure, but succeeded\n");
} catch (const std::exception & err) {
grammar_fails = true;
fprintf(stdout, " ✅︎\n");
}
return grammar_fails;
}

@HanClinto
Copy link
Collaborator

I especially like how you extended the integration tests here. Super well done!

@nuchi
Copy link
Contributor Author

nuchi commented May 10, 2024

Great suggestions, thank you, good idea to ditch the enum. I'll incorporate them with maybe some minor changes — I also just realized I missed an edge case where the leftmost nonterminal(s) might be empty (in which case I also need to recurse into the next-leftmost nonterminal).

Very sorry for missing this when you first opened it -- I have adjusted my notification settings and hopefully I won't miss such things in the future.

Sorry about that and probably my fault — I edited your @ into my initial post, so maybe that's why it didn't ping you.

Finally I bring up that it's also a possibility to return nullptr and leave handling to the caller. That'd be messy to do right now since callers aren't expecting it, but also it might be preferable in the case of e.g. a server accepting grammars over the wire where we don't want the server to fall over on a bad request.

@HanClinto
Copy link
Collaborator

HanClinto commented May 10, 2024

I also just realized I missed an edge case where the leftmost nonterminal(s) might be empty (in which case I also need to recurse into the next-leftmost nonterminal).

Oh nice! Were you able to build this into another test case?

Very sorry for missing this when you first opened it -- I have adjusted my notification settings and hopefully I won't miss such things in the future.

Sorry about that and probably my fault — I edited your @ into my initial post, so maybe that's why it didn't ping you.

That might have contributed -- I was also being e-mailed on every PR, which is cool sometimes, but it also added to a bit too much noise, so now I'm back down to participating and mentions. :)

Finally I bring up that it's also a possibility to return nullptr and leave handling to the caller. That'd be messy to do right now since callers aren't expecting it, but also it might be preferable in the case of e.g. a server accepting grammars over the wire where we don't want the server to fall over on a bad request.

I agree -- that feels cleaner.

Along those same lines, you might consider moving this into llama_sampling_init() (in sampling.cpp) where the other grammar correctness checks are made -- I.E., right next to where the root node check is made. Checking for left recursion could be right after that, and return a nullptr in the failing case. Then all of the validation checks are made in the same place.

Only awkward bit with that implementation is that you wouldn't have the rules split up already into those nice vectors, so it might be a bit more pointer math at that stage in the processing, but it's something to consider.

nuchi added 2 commits May 11, 2024 16:30
…internal" section, add handling for edge case where a leftmost nonterminal may be empty
@nuchi
Copy link
Contributor Author

nuchi commented May 11, 2024

Pushed an update but I still intend to take a look at the interface and maybe return nullptr. I may also look at removing some code repetition.

@nuchi
Copy link
Contributor Author

nuchi commented May 12, 2024

Actually, I might leave it as is for now if that looks good to you. I have some ideas about interface changes but I think that'd be a larger change that's decoupled from this one. As of now, the changes here are self-contained and strictly improvements to the status quo, in that the thrown exception here is what would have been a stack overflow or OOM in the status quo.

llama.cpp Outdated Show resolved Hide resolved
@HanClinto
Copy link
Collaborator

Actually, I might leave it as is for now if that looks good to you. I have some ideas about interface changes but I think that'd be a larger change that's decoupled from this one. As of now, the changes here are self-contained and strictly improvements to the status quo, in that the thrown exception here is what would have been a stack overflow or OOM in the status quo.

Yes, I think that's a fair way to approach it. Especially with the server (that can accept dynamic grammars as part of the query request), I don't think that a bad grammar should be able to crash the server -- it should probably gracefully return an error.

That said, such functionality can be added later, and as you say -- everything here is an incremental improvement from the status quo.

Copy link
Collaborator

@HanClinto HanClinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@mofosyne
Copy link
Collaborator

mofosyne commented May 14, 2024

@nuchi added tests/test-grammar-integration.cpp and the CI approved. @HanClinto and @nuchi both agrees that this PR is good enough as a starting point for further incremental improvements.

At a glance the reviewed code does not have any obvious issues and there is already back and forth reviews.

Safe to merge.

@mofosyne mofosyne merged commit e0f5561 into ggerganov:master May 14, 2024
64 checks passed
@nuchi nuchi deleted the left-recursion branch May 14, 2024 07:03
@mofosyne
Copy link
Collaborator

mofosyne commented May 14, 2024

@nuchi looks like 'CI / windows-latest-cmake (avx512, -DLLAMA_NATIVE=OFF -' is failing in the main branch. See if you can trace down the issue and make a new PR for that.

 24 - test-json-schema-to-grammar (Timeout)

@mofosyne
Copy link
Collaborator

mofosyne commented May 14, 2024

Actually scratch that. The commit ahead showed that this particular error did not happen again.

It's a timeout so must be a fluke. Wonder if I can just try kicking off that test again.

edit: CI all passes. So definitely a fluke

teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 17, 2024
…e loop (ggerganov#7083)

* Add left recursion check: quit early instead of going into an infinite loop

* Remove custom enum, rename left recursion check and move to "grammar internal" section, add handling for edge case where a leftmost nonterminal may be empty

* Remove unnecessary declaration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault when parsing grammars with left-recursion
3 participants