Add left recursion check: quit early instead of going into an infinite loop #7083

nuchi · 2024-05-05T05:45:14Z

Fixes #6492

Adds a check to see if the grammar has left recursion, and quits instead of going into an infinite loop.

I'm not sure if this is the best place for it, happy to move it around if not.

attn: @HanClinto

github-actions · 2024-05-05T06:44:59Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 558 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8367.81ms p(95)=20438.81ms fails=, finish reason: stop=496 truncated=62
Prompt processing (pp): avg=101.6tk/s p(95)=436.12tk/s
Token generation (tg): avg=34.96tk/s p(95)=49.27tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=left-recursion commit=1bbf54a13c93e30d32f22326d4c9f46890fa1eb2

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 558 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715545127 --> 1715545757
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 892.79, 892.79, 892.79, 892.79, 892.79, 689.04, 689.04, 689.04, 689.04, 689.04, 715.62, 715.62, 715.62, 715.62, 715.62, 746.58, 746.58, 746.58, 746.58, 746.58, 806.38, 806.38, 806.38, 806.38, 806.38, 803.1, 803.1, 803.1, 803.1, 803.1, 818.37, 818.37, 818.37, 818.37, 818.37, 832.97, 832.97, 832.97, 832.97, 832.97, 841.19, 841.19, 841.19, 841.19, 841.19, 839.45, 839.45, 839.45, 839.45, 839.45, 849.34, 849.34, 849.34, 849.34, 849.34, 886.13, 886.13, 886.13, 886.13, 886.13, 872.38, 872.38, 872.38, 872.38, 872.38, 775.46, 775.46, 775.46, 775.46, 775.46, 755.1, 755.1, 755.1, 755.1, 755.1, 755.17, 755.17, 755.17, 755.17, 755.17, 748.18, 748.18, 748.18, 748.18, 748.18, 752.16, 752.16, 752.16, 752.16, 752.16, 754.21, 754.21, 754.21, 754.21, 754.21, 761.63, 761.63, 761.63, 761.63, 761.63, 767.71, 767.71, 767.71, 767.71, 767.71, 762.21, 762.21, 762.21, 762.21, 762.21, 766.21, 766.21, 766.21, 766.21, 766.21, 777.83, 777.83, 777.83, 777.83, 777.83, 783.5, 783.5, 783.5, 783.5, 783.5, 782.41, 782.41, 782.41, 782.41, 782.41, 781.4, 781.4, 781.4, 781.4, 781.4, 780.14, 780.14, 780.14, 780.14, 780.14, 786.74, 786.74, 786.74, 786.74, 786.74, 786.98, 786.98, 786.98, 786.98, 786.98, 788.24, 788.24, 788.24, 788.24, 788.24, 790.81, 790.81, 790.81, 790.81, 790.81, 798.14, 798.14, 798.14, 798.14, 798.14, 804.64, 804.64, 804.64, 804.64, 804.64, 791.01, 791.01, 791.01, 791.01, 791.01, 788.62, 788.62, 788.62, 788.62, 788.62, 786.75, 786.75, 786.75, 786.75, 786.75, 791.51, 791.51, 791.51, 791.51, 791.51, 792.09, 792.09, 792.09, 792.09, 792.09, 800.81, 800.81, 800.81, 800.81, 800.81, 792.97, 792.97, 792.97, 792.97, 792.97, 792.33, 792.33, 792.33, 792.33, 792.33, 791.44, 791.44, 791.44, 791.44, 791.44, 789.83, 789.83, 789.83, 789.83, 789.83, 799.07, 799.07, 799.07, 799.07, 799.07, 799.06, 799.06, 799.06, 799.06, 799.06, 806.09, 806.09, 806.09, 806.09, 806.09, 805.57, 805.57, 805.57, 805.57, 805.57, 808.85, 808.85, 808.85, 808.85, 808.85, 812.69, 812.69, 812.69, 812.69, 812.69, 812.14, 812.14, 812.14, 812.14, 812.14, 819.68, 819.68, 819.68, 819.68, 819.68, 818.94, 818.94, 818.94, 818.94, 818.94, 820.51, 820.51, 820.51, 820.51, 820.51, 822.09, 822.09, 822.09, 822.09, 822.09, 822.02, 822.02, 822.02, 822.02, 822.02, 823.79, 823.79, 823.79, 823.79, 823.79, 824.14, 824.14, 824.14, 824.14, 824.14, 826.95, 826.95, 826.95, 826.95, 826.95, 826.05, 826.05, 826.05, 826.05, 826.05, 826.05]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 558 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715545127 --> 1715545757
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 34.51, 34.51, 34.51, 34.51, 34.51, 32.67, 32.67, 32.67, 32.67, 32.67, 28.79, 28.79, 28.79, 28.79, 28.79, 29.75, 29.75, 29.75, 29.75, 29.75, 31.14, 31.14, 31.14, 31.14, 31.14, 31.99, 31.99, 31.99, 31.99, 31.99, 32.98, 32.98, 32.98, 32.98, 32.98, 33.2, 33.2, 33.2, 33.2, 33.2, 33.88, 33.88, 33.88, 33.88, 33.88, 34.35, 34.35, 34.35, 34.35, 34.35, 34.3, 34.3, 34.3, 34.3, 34.3, 33.88, 33.88, 33.88, 33.88, 33.88, 33.83, 33.83, 33.83, 33.83, 33.83, 32.64, 32.64, 32.64, 32.64, 32.64, 32.63, 32.63, 32.63, 32.63, 32.63, 33.02, 33.02, 33.02, 33.02, 33.02, 33.14, 33.14, 33.14, 33.14, 33.14, 32.98, 32.98, 32.98, 32.98, 32.98, 32.89, 32.89, 32.89, 32.89, 32.89, 32.82, 32.82, 32.82, 32.82, 32.82, 32.91, 32.91, 32.91, 32.91, 32.91, 32.99, 32.99, 32.99, 32.99, 32.99, 33.16, 33.16, 33.16, 33.16, 33.16, 33.43, 33.43, 33.43, 33.43, 33.43, 33.25, 33.25, 33.25, 33.25, 33.25, 33.14, 33.14, 33.14, 33.14, 33.14, 32.27, 32.27, 32.27, 32.27, 32.27, 32.51, 32.51, 32.51, 32.51, 32.51, 32.69, 32.69, 32.69, 32.69, 32.69, 32.78, 32.78, 32.78, 32.78, 32.78, 32.81, 32.81, 32.81, 32.81, 32.81, 32.94, 32.94, 32.94, 32.94, 32.94, 32.8, 32.8, 32.8, 32.8, 32.8, 32.66, 32.66, 32.66, 32.66, 32.66, 32.56, 32.56, 32.56, 32.56, 32.56, 32.42, 32.42, 32.42, 32.42, 32.42, 32.44, 32.44, 32.44, 32.44, 32.44, 32.54, 32.54, 32.54, 32.54, 32.54, 32.62, 32.62, 32.62, 32.62, 32.62, 32.74, 32.74, 32.74, 32.74, 32.74, 31.86, 31.86, 31.86, 31.86, 31.86, 31.83, 31.83, 31.83, 31.83, 31.83, 31.74, 31.74, 31.74, 31.74, 31.74, 30.7, 30.7, 30.7, 30.7, 30.7, 30.65, 30.65, 30.65, 30.65, 30.65, 30.66, 30.66, 30.66, 30.66, 30.66, 30.76, 30.76, 30.76, 30.76, 30.76, 30.79, 30.79, 30.79, 30.79, 30.79, 30.87, 30.87, 30.87, 30.87, 30.87, 30.89, 30.89, 30.89, 30.89, 30.89, 30.85, 30.85, 30.85, 30.85, 30.85, 30.64, 30.64, 30.64, 30.64, 30.64, 30.62, 30.62, 30.62, 30.62, 30.62, 30.76, 30.76, 30.76, 30.76, 30.76, 30.95, 30.95, 30.95, 30.95, 30.95, 30.99, 30.99, 30.99, 30.99, 30.99, 31.07, 31.07, 31.07, 31.07, 31.07, 31.12, 31.12, 31.12, 31.12, 31.12, 31.11, 31.11, 31.11, 31.11, 31.11, 31.08, 31.08, 31.08, 31.08, 31.08, 31.14]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 558 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715545127 --> 1715545757
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3, 0.3, 0.3, 0.3, 0.3, 0.29, 0.29, 0.29, 0.29, 0.29, 0.24, 0.24, 0.24, 0.24, 0.24, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.19, 0.19, 0.19, 0.19, 0.19, 0.3, 0.3, 0.3, 0.3, 0.3, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.09, 0.09, 0.09, 0.09, 0.09, 0.22, 0.22, 0.22, 0.22, 0.22, 0.35, 0.35, 0.35, 0.35, 0.35, 0.31, 0.31, 0.31, 0.31, 0.31, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.29, 0.29, 0.29, 0.29, 0.29, 0.2, 0.2, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.28, 0.28, 0.28, 0.28, 0.28, 0.55, 0.55, 0.55, 0.55, 0.55, 0.49, 0.49, 0.49, 0.49, 0.49, 0.51, 0.51, 0.51, 0.51, 0.51, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.09, 0.09, 0.09, 0.09, 0.09, 0.2, 0.2, 0.2, 0.2, 0.2, 0.29, 0.29, 0.29, 0.29, 0.29, 0.21, 0.21, 0.21, 0.21, 0.21, 0.27, 0.27, 0.27, 0.27, 0.27, 0.12, 0.12, 0.12, 0.12, 0.12, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.24]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 558 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715545127 --> 1715545757
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0]

HanClinto · 2024-05-10T17:16:24Z

llama.cpp

+ LLAMA_LEFT_REC_FOUND_CYCLE = 3,
+};
+
+static void detect_left_recursion(


Minor formatting note -- internal functions and structures such as these should go in the section marked grammar - internal.

Also, because this is a grammar-specific function, it should probably be renamed llama_grammar_detect_left_recursion

HanClinto · 2024-05-10T17:23:06Z

Nice work, @nuchi !

Very sorry for missing this when you first opened it -- I have adjusted my notification settings and hopefully I won't miss such things in the future.

This looks really good. I especially like your extensions of the integration tests -- well done!

Reading your code, I was initially worried that it might falsely flag examples of non-left recursion, but you handled that perfectly.

I like your approach of checking for left recursion as part of the grammar initialization. My first approach was to detect left recursion only at inference time (inside of llama_grammar_advance_stack), but that is non-deterministic -- because if a user has left recursion buried deep in their grammar, it's possible one could generate text that doesn't hit it -- but your check is exhaustive, which I think is better than how I was going to approach it.

Looking at the code, I think it can be simplified quite a bit. For instance, you're doing a full walk through the tree every time, but there is possibility to have an early exit in the case of finding recursion.

Also, the custom enumeration is cool (and very thorough!), but might be a bit overkill for our needs. All we need to do is track lists of the rules references that are visited in each branch, and check to see if they have been visited before. I took your base code and stripped it down a bit:

static bool llama_grammar_detect_left_recursion(
        const std::vector<std::vector<llama_grammar_element>> & rules,
        size_t                                                  rule_index,
        std::vector<size_t>                                   * rules_visited) {

    if (rules_visited == nullptr) {
        rules_visited = new std::vector<size_t>();
    }

    // Check to see if this rule index has already been visited
    if (std::find(rules_visited->begin(), rules_visited->end(), rule_index) != rules_visited->end()) {
        // If it has, then we're in a loop
        return true;
    }

    // Mark that we have looked at this rule
    rules_visited->emplace_back(rule_index);

    // Recurse through this rule's elements
    const std::vector<llama_grammar_element> & rule = rules[rule_index];
    size_t i = 0;
    do {
        // If the first element in a rule sequence is a rule reference, recurse into that rule
        if (rule[i].type == LLAMA_GRETYPE_RULE_REF) {
            // Create a new vector to store the list of rules visited in this branch
            std::vector<size_t> rules_visited_branch(*rules_visited);
            if (llama_grammar_detect_left_recursion(rules, (size_t)rule[i].value, &rules_visited_branch)) {
                // Early exit if we detect left recursion
                return true;
            }
        }
        // Skip over all other elements in this sequence -- even if they are rule references.
        // We only care about detecting left recursion -- not all recursion.
        while (!llama_grammar_is_end_of_sequence(&rule[i])) {
            i++;
        }
        i++;
    } while (i < rule.size());

    return false;
}

And then this would be called from llama_grammar_init in this way:

    // Check for left recursion
    for (size_t i = 0; i < n_rules; i++) {
        if (llama_grammar_detect_left_recursion(vec_rules, i, nullptr)) {
            throw std::runtime_error(format("unsupported grammar, left recursion detected for nonterminal at index %d", (int)i));
        }
    }

This is not new code -- I only adapted what you already had here and tried to reduce it to your bare essentials.

Overall I love your approach, and it's much better than the direction I was originally taking. Well done!

HanClinto · 2024-05-10T17:27:49Z

tests/test-grammar-integration.cpp

+static bool test_build_grammar_fails(const std::string & grammar_str) {
+ bool grammar_fails = false;
+ try {
+ build_grammar(grammar_str);
+ fprintf(stderr, "❌ Expected build failure, but succeeded: %s\n", grammar_str.c_str());
+ } catch (const std::exception & err) {
+ grammar_fails = true;
+ fprintf(stdout, "✅︎\n");
+ }
+ return grammar_fails;
+}
+


Minor formatting adjustment so that we print the grammar on success as well.

Suggested change

static bool test_build_grammar_fails(const std::string & grammar_str) {

bool grammar_fails = false;

try {

build_grammar(grammar_str);

fprintf(stderr, "❌ Expected build failure, but succeeded: %s\n", grammar_str.c_str());

} catch (const std::exception & err) {

grammar_fails = true;

fprintf(stdout, "✅︎\n");

}

return grammar_fails;

}

static bool test_build_grammar_fails(const std::string & grammar_str) {

fprintf(stderr, "⚫ Testing failure for grammar: %s\n", grammar_str.c_str());

bool grammar_fails = false;

try {

build_grammar(grammar_str);

fprintf(stderr, " ❌ Expected build failure, but succeeded\n");

} catch (const std::exception & err) {

grammar_fails = true;

fprintf(stdout, " ✅︎\n");

}

return grammar_fails;

}

HanClinto · 2024-05-10T17:28:37Z

I especially like how you extended the integration tests here. Super well done!

nuchi · 2024-05-10T18:21:08Z

Great suggestions, thank you, good idea to ditch the enum. I'll incorporate them with maybe some minor changes — I also just realized I missed an edge case where the leftmost nonterminal(s) might be empty (in which case I also need to recurse into the next-leftmost nonterminal).

Very sorry for missing this when you first opened it -- I have adjusted my notification settings and hopefully I won't miss such things in the future.

Sorry about that and probably my fault — I edited your @ into my initial post, so maybe that's why it didn't ping you.

Finally I bring up that it's also a possibility to return nullptr and leave handling to the caller. That'd be messy to do right now since callers aren't expecting it, but also it might be preferable in the case of e.g. a server accepting grammars over the wire where we don't want the server to fall over on a bad request.

HanClinto · 2024-05-10T18:32:14Z

I also just realized I missed an edge case where the leftmost nonterminal(s) might be empty (in which case I also need to recurse into the next-leftmost nonterminal).

Oh nice! Were you able to build this into another test case?

Very sorry for missing this when you first opened it -- I have adjusted my notification settings and hopefully I won't miss such things in the future.

Sorry about that and probably my fault — I edited your @ into my initial post, so maybe that's why it didn't ping you.

That might have contributed -- I was also being e-mailed on every PR, which is cool sometimes, but it also added to a bit too much noise, so now I'm back down to participating and mentions. :)

Finally I bring up that it's also a possibility to return nullptr and leave handling to the caller. That'd be messy to do right now since callers aren't expecting it, but also it might be preferable in the case of e.g. a server accepting grammars over the wire where we don't want the server to fall over on a bad request.

I agree -- that feels cleaner.

Along those same lines, you might consider moving this into llama_sampling_init() (in sampling.cpp) where the other grammar correctness checks are made -- I.E., right next to where the root node check is made. Checking for left recursion could be right after that, and return a nullptr in the failing case. Then all of the validation checks are made in the same place.

Only awkward bit with that implementation is that you wouldn't have the rules split up already into those nice vectors, so it might be a bit more pointer math at that stage in the processing, but it's something to consider.

…e loop

…internal" section, add handling for edge case where a leftmost nonterminal may be empty

nuchi · 2024-05-11T23:48:44Z

Pushed an update but I still intend to take a look at the interface and maybe return nullptr. I may also look at removing some code repetition.

nuchi · 2024-05-12T02:22:08Z

Actually, I might leave it as is for now if that looks good to you. I have some ideas about interface changes but I think that'd be a larger change that's decoupled from this one. As of now, the changes here are self-contained and strictly improvements to the status quo, in that the thrown exception here is what would have been a stack overflow or OOM in the status quo.

llama.cpp

HanClinto · 2024-05-13T20:57:55Z

Actually, I might leave it as is for now if that looks good to you. I have some ideas about interface changes but I think that'd be a larger change that's decoupled from this one. As of now, the changes here are self-contained and strictly improvements to the status quo, in that the thrown exception here is what would have been a stack overflow or OOM in the status quo.

Yes, I think that's a fair way to approach it. Especially with the server (that can accept dynamic grammars as part of the query request), I don't think that a bad grammar should be able to crash the server -- it should probably gracefully return an error.

That said, such functionality can be added later, and as you say -- everything here is an incremental improvement from the status quo.

HanClinto

Looks good to me!

mofosyne · 2024-05-14T05:25:53Z

@nuchi added tests/test-grammar-integration.cpp and the CI approved. @HanClinto and @nuchi both agrees that this PR is good enough as a starting point for further incremental improvements.

At a glance the reviewed code does not have any obvious issues and there is already back and forth reviews.

Safe to merge.

mofosyne · 2024-05-14T08:33:22Z

~~@nuchi looks like 'CI / windows-latest-cmake (avx512, -DLLAMA_NATIVE=OFF -' is failing in the main branch. See if you can trace down the issue and make a new PR for that.~~

 24 - test-json-schema-to-grammar (Timeout)

mofosyne · 2024-05-14T09:17:21Z

Actually scratch that. The commit ahead showed that this particular error did not happen again.

It's a timeout so must be a fluke. Wonder if I can just try kicking off that test again.

edit: CI all passes. So definitely a fluke

…e loop (ggerganov#7083) * Add left recursion check: quit early instead of going into an infinite loop * Remove custom enum, rename left recursion check and move to "grammar internal" section, add handling for edge case where a leftmost nonterminal may be empty * Remove unnecessary declaration

mofosyne added bugfix fixes an issue or bug review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 9, 2024

HanClinto reviewed May 10, 2024

View reviewed changes

nuchi added 2 commits May 11, 2024 16:30

Add left recursion check: quit early instead of going into an infinit…

65176e7

…e loop

Remove custom enum, rename left recursion check and move to "grammar …

5c5911d

…internal" section, add handling for edge case where a leftmost nonterminal may be empty

nuchi force-pushed the left-recursion branch from 19d51ba to 5c5911d Compare May 11, 2024 23:47

HanClinto reviewed May 12, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

Remove unnecessary declaration

1bbf54a

HanClinto approved these changes May 14, 2024

View reviewed changes

mofosyne merged commit e0f5561 into ggerganov:master May 14, 2024
64 checks passed

nuchi deleted the left-recursion branch May 14, 2024 07:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add left recursion check: quit early instead of going into an infinite loop #7083

Add left recursion check: quit early instead of going into an infinite loop #7083

nuchi commented May 5, 2024 •

edited

github-actions bot commented May 5, 2024 •

edited

HanClinto May 10, 2024

HanClinto commented May 10, 2024

HanClinto May 10, 2024

HanClinto commented May 10, 2024

nuchi commented May 10, 2024

HanClinto commented May 10, 2024 •

edited

nuchi commented May 11, 2024

nuchi commented May 12, 2024

HanClinto commented May 13, 2024

HanClinto left a comment

mofosyne commented May 14, 2024 •

edited

mofosyne commented May 14, 2024 •

edited

mofosyne commented May 14, 2024 •

edited

Add left recursion check: quit early instead of going into an infinite loop #7083

Add left recursion check: quit early instead of going into an infinite loop #7083

Conversation

nuchi commented May 5, 2024 • edited

github-actions bot commented May 5, 2024 • edited

HanClinto May 10, 2024

Choose a reason for hiding this comment

HanClinto commented May 10, 2024

HanClinto May 10, 2024

Choose a reason for hiding this comment

HanClinto commented May 10, 2024

nuchi commented May 10, 2024

HanClinto commented May 10, 2024 • edited

nuchi commented May 11, 2024

nuchi commented May 12, 2024

HanClinto commented May 13, 2024

HanClinto left a comment

Choose a reason for hiding this comment

mofosyne commented May 14, 2024 • edited

mofosyne commented May 14, 2024 • edited

mofosyne commented May 14, 2024 • edited

nuchi commented May 5, 2024 •

edited

github-actions bot commented May 5, 2024 •

edited

HanClinto commented May 10, 2024 •

edited

mofosyne commented May 14, 2024 •

edited

mofosyne commented May 14, 2024 •

edited

mofosyne commented May 14, 2024 •

edited