• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
      • Apt
      • Zfc
      • Acre
      • Milawa
      • Smtlink
      • Abnf
        • Deftreeops
        • Defdefparse
        • Defgrammar
        • Tree-utilities
        • Notation
        • Grammar-parser
          • Grammar-parser-implementation
          • Grammar-parser-correctness
            • Grammar-parser-completeness
              • Grammar-parser-disambiguation
              • Grammar-parser-constraints-from-tree-matching
              • Grammar-parser-constraints-from-parsing
              • Grammar-parser-disambiguating-restrictions
              • Grammar-parser-parsing-failure-propagation
              • Parse-alt/conc/rep/elem/group/option-when-tree-/-tree-list-match-lemmas
              • Parse-grammar-when-tree-match
              • Parse-*-in-either-range-when-tree-list-match
              • Parse-in-either-range-when-tree-match
              • Parse-element-when-tree-match-induction-step-1+2+3+4+5
              • Pred-concatenation
              • Pred-conc-rest
              • Pred-alternation
              • Pred-alt-rest
              • Pred-conc-rest-comp
              • Pred-alt-rest-comp
              • Parse-elements-when-tree-match
              • Pred-repetition
              • Pred-element
              • Parse-*-rule-/-*cwsp-cnl-when-tree-list-match-and-restriction
              • Parse-option-when-tree-match-induction-step-3
              • Parse-option-when-tree-match-induction-step-2
              • Parse-option-when-tree-match-induction-step-1+2+3
              • Parse-group-when-tree-match-induction-step-3
              • Parse-group-when-tree-match-induction-step-2
              • Parse-group-when-tree-match-induction-step-1+2+3
              • Parse-bin/dec/hex-val-when-tree-match
              • Parse-alt-rest-when-tree-list-match
              • Parse-alt-rest-comp-when-tree-match-induction-step-2
              • Parse-alt-rest-comp-when-tree-match-induction-step-1
              • Parse-*bit-when-tree-list-match
              • Parse-*-dot-1*bit-when-tree-list-match
              • Parse-rulelist-when-tree-match-and-restriction
              • Parse-exact-when-tree-match
              • Parse-element-when-tree-match-induction-step-3
              • Parse-element-when-tree-match-induction-step-2
              • Parse-element-when-tree-match-induction-step-1
              • Parse-conc-rest-when-tree-list-match
              • Parse-conc-rest-comp-when-tree-match-induction-step-2
              • Parse-conc-rest-comp-when-tree-match-induction-step-1+2
              • Parse-conc-rest-comp-when-tree-match-induction-step-1
              • Parse-bin-val-rest-when-tree-match
              • Parse-alternation-when-tree-match
              • Parse-alt-rest-comp-when-tree-match-induction-step-1+2
              • Parse-alt-rest-comp-when-tree-match
              • Parse-1*-dot-1*bit-when-tree-list-match
              • Pred-option
              • Pred-group
              • Parse-?%i-when-tree-match
              • Parse-*digit-when-tree-list-match
              • Parse-*cwsp-when-tree-list-match
              • Parse-wsp-when-tree-match
              • Parse-repetition-when-tree-match-induction-step-2
              • Parse-repetition-when-tree-match-induction-step-1+2
              • Parse-repetition-when-tree-match-induction-step-1
              • Parse-repetition-when-tree-match
              • Parse-option-when-tree-match-induction-step-1
              • Parse-ichar-when-tree-match
              • Parse-htab-when-tree-match
              • Parse-hex-val-when-tree-match
              • Parse-group-when-tree-match-induction-step-1
              • Parse-equal-/-equal-slash-when-tree-match
              • Parse-element-when-tree-match-induction-step-5
              • Parse-element-when-tree-match-induction-step-4
              • Parse-element-when-tree-match
              • Parse-dot-1*bit-when-tree-match
              • Parse-dec-val-when-tree-match
              • Parse-cr-when-tree-match
              • Parse-concatenation-when-tree-match-induction-step-2
              • Parse-concatenation-when-tree-match
              • Parse-conc-rest-comp-when-tree-match
              • Parse-cnl-when-tree-match
              • Parse-bin-val-when-tree-match
              • Parse-alternation-when-tree-match-induction-step-2
              • Parse-alpha-when-tree-match
              • Parse-1*cwsp-when-tree-list-match
              • Parse-1*bit-when-tree-list-match
              • Parse-*wsp/vchar-when-tree-list-match
              • Parse-*hexdig-when-tree-list-match
              • Parse-*digit-star-*digit-when-tree-match
              • Parse-*cwsp-cnl-when-tree-match
              • Parse-*-dot-1*hexdig-when-tree-list-match
              • Parse-*-dot-1*digit-when-tree-list-match
              • Parse-*-alpha/digit/dash-when-tree-list-match
              • Parse-sp-when-tree-match
              • Parse-rule-/-*cwsp-cnl-when-tree-match
              • Parse-rule-when-tree-match
              • Parse-num-val-when-tree-match
              • Parse-lf-when-tree-match
              • Parse-in-range-when-tree-match
              • Parse-ichar2-when-tree-match
              • Parse-hex-val-rest-when-tree-match
              • Parse-defined-as-when-tree-match
              • Parse-dec-val-rest-when-tree-match
              • Parse-crlf-when-tree-match
              • Parse-conc-rest-when-tree-list-match-induction-step-2
              • Parse-case-sensitive-string-when-tree-match
              • Parse-case-insensitive-string-when-tree-match
              • Parse-bit-when-tree-match
              • Parse-alt-rest-when-tree-list-match-induction-step-2
              • Parse-1*hexdig-when-tree-list-match
              • Parse-1*digit-when-tree-list-match
              • Parse-1*-dot-1*hexdig-when-tree-list-match
              • Parse-1*-dot-1*digit-when-tree-list-match
              • Parse-?repeat-when-tree-match
              • Parse-wsp/vchar-when-tree-match
              • Parse-vchar-when-tree-match
              • Parse-rulename-when-tree-match
              • Parse-repeat-when-tree-match
              • Parse-quoted-string-when-tree-match
              • Parse-prose-val-when-tree-match
              • Parse-option-when-tree-match
              • Parse-hexdig-when-tree-match
              • Parse-group-when-tree-match
              • Parse-element-when-tree-match-induction-step-6
              • Parse-element-when-tree-match-base-case
              • Parse-dquote-when-tree-match
              • Parse-dot-1*hexdig-when-tree-match
              • Parse-dot-1*digit-when-tree-match
              • Parse-digit-when-tree-match
              • Parse-dash-1*hexdig-when-tree-match
              • Parse-dash-1*digit-when-tree-match
              • Parse-dash-1*bit-when-tree-match
              • Parse-cwsp-when-tree-match
              • Parse-concatenation-when-tree-match-induction-step-1
              • Parse-conc-rest-when-tree-list-match-induction-step-1
              • Parse-conc-rest-comp-when-tree-match-base-case
              • Parse-comment-when-tree-match
              • Parse-cnl-wsp-when-tree-match
              • Parse-char-val-when-tree-match
              • Parse-alternation-when-tree-match-induction-step-1
              • Parse-alt-rest-when-tree-list-match-induction-step-1
              • Parse-alt-rest-comp-when-tree-match-base-case
              • Parse-alpha/digit/dash-when-tree-match
              • Parse-option-when-tree-match-base-case
              • Parse-group-when-tree-match-base-case
              • Parse-any-of-cons
            • Grammar-parser-soundness
        • Meta-circular-validation
        • Parsing-primitives-defresult
        • Parsing-primitives-seq
        • Operations
        • Examples
        • Differences-with-paper
        • Constructor-utilities
        • Grammar-printer
        • Parsing-tools
      • Vwsim
      • Isar
      • Wp-gen
      • Dimacs-reader
      • Pfcs
      • Legacy-defrstobj
      • C
      • Proof-checker-array
      • Soft
      • Farray
      • Rp-rewriter
      • Instant-runoff-voting
      • Imp-language
      • Sidekick
      • Ethereum
      • Leftist-trees
      • Java
      • Riscv
      • Taspi
      • Bitcoin
      • Zcash
      • Des
      • X86isa
      • Sha-2
      • Yul
      • Proof-checker-itp13
      • Regex
      • ACL2-programming-language
      • Json
      • Jfkr
      • Equational
      • Cryptography
      • Axe
      • Poseidon
      • Where-do-i-place-my-book
      • Aleo
      • Bigmems
      • Builtins
      • Execloader
      • Solidity
      • Paco
      • Concurrent-programs
      • Bls12-377-curves
    • Debugging
    • Community
    • Std
    • Proof-automation
    • Macro-libraries
    • ACL2
    • Interfacing-tools
    • Hardware-verification
    • Software-verification
    • Math
    • Testing-utilities
  • Grammar-parser-correctness

Grammar-parser-completeness

Completeness theorems for the parser of ABNF grammars.

For every terminated tree rooted at rulelist that satisfies the disambiguating restrictions, parse-grammar succeeds on the string at the leaves of the tree and returns that tree:

Theorem: parse-grammar-when-tree-match

(defthm parse-grammar-when-tree-match
  (implies
       (and (tree-match-element-p tree (element-rulename *rulelist*)
                                  *grammar*)
            (tree-terminatedp tree)
            (tree-rulelist-restriction-p tree))
       (equal (parse-grammar (tree->string tree))
              (tree-fix tree))))

This is proved by proving the following, for each parsing function out of which parse-grammar is built: if a (list of) terminated tree(s) matches a certain syntactic entity and possibly satisfies certain disambiguating restrictions, then running the parsing function on the append of (i) the string at the leaves of the tree(s) and (ii) some remaining input possibly satisfying certain hypotheses explained below, succeeds and yields that (list of) tree(s) and that remaining input. More precisely, the parsing function yields the (list of) tree(s) fixed with tree-fix or tree-list-fix and the remaining input fixed with nat-list-fix; an alternative formulation is to avoid these fixing functions but include the hypotheses that the (list of) tree(s) satisfies treep or tree-listp and that the remaining input satisfies nat-listp.

For example, the completeness theorem parse-alpha-when-tree-match says that running parse-alpha on the append of (i) the leaves of a terminated tree that matches ALPHA, and (ii) some remaining input, succeeds and yields (the fixing of) that tree and (the fixing of) that remaining input. Since ALPHA is not involved in the disambiguating restrictions, parse-alpha-when-tree-match has no hypothesis related to those disambiguating restrictions. This theorem also has no hypothesis on the remaining input, as explained below.

The completeness theorem of parse-any does not involve trees but makes an analogous statement: running parse-any on the cons of a natural number and some remaining natural numbers, returns (the fixing of) that natural number and (the fixing of) the remaining natural numbers.

Hypotheses on the Remaining Input

In all the completeness theorems, the remaining input (following the string at the leaves of the tree(s)) is denoted by the variable rest-input. The hypotheses on the remaining input, when present, are that certain parsing functions fail on the remaining input.

If a parsing function ignores the remaining input, the corresponding completeness theorem has no hypotheses on the remaining input. This is the case for parsing functions that, like parse-alpha mentioned above, parse a fixed number of natural numbers from the input. This is also the case for parsing functions that parse a fixed number of natural numbers after parsing a variable number of natural numbers: for example, parse-prose-val always parses the closing angle bracket (which is a single character) after parsing a variable number of characters after parsing the opening angle bracket.

In contrast, if a parsing function ``examines'' (part of) the remaining input, the corresponding completeness theorem has hypotheses on the remaining input. If a parsing function examines part of the remaining input, but that part of the remaining input is absent from the returned tree(s) (by hypothesis of the function's completeness theorem), it means that the function attempts but fails to parse further into the remaining input, backtracking and returning a (list of) tree(s) only for the input that precedes the remaining input. Thus, the completeness theorem for the function must include hypotheses stating or implying such parsing failures. Without these hypotheses, parsing further into the remaining input might succeed, extending the tree(s) and rendering the theorem untrue. Some concrete examples are given below.

A parsing function for a repetition of zero or more instances of some syntactic entity always examines the remaining input to decide when to stop. The function's completeness theorem has a hypothesis on the remaining input stating or implying that parsing another instance of the syntactic entity fails. For example, parse-*bit stops when parse-bit fails; this parsing failure occurs in the remaining input. The completeness theorem parse-*bit-when-tree-list-match has the hypothesis that parse-bit fails on the remaining input. Without this hypothesis, the theorem would not hold because, if another BIT could be parsed from the remaining input, then parse-*bit would return (at least) an additional tree beyond the list of trees hypothesized in the theorem.

A parsing function for an optional occurrence of some syntactic entity may examine the remaining input. This happens when parsing the syntactic entity fails, in which case the function returns a tree without leaves, because the optional entity is absent. The function's completeness theorem has a hypothesis on the remaining input stating or implying that parsing the syntactic entity fails. For example, parse-?%i may fail to parse "%i"; this parsing failure occurs in the remaining input, because the function returns a tree without leaves, reflecting the absence of the syntactic entity. The completeness theorem parse-?%i-when-tree-match has the hypothesis that parse-ichar2 (with arguments #\% and #\i) fails on the remaining input. Without this hypothesis, the theorem would not hold because, if "%i" could be parsed from the remaining input but the tree hypothesized by the theorem had no leaves, then parse-?%i would return a tree with leaves instead.

The kind of hypothesis on the remaining input described in the previous paragraph, for the completeness theorems of parsing functions that parse optional entities, is stronger than needed. If the parsing function succeeds in parsing the optional entity, then it does not examine the remaining input, returning a tree with leaves. So the hypothesis on the remaining input could be weakened to require the parsing failure to happen only if the tree has no leaves. However, in syntactically valid ABNF grammars, the stronger hypothesis is always satisfied (e.g. [ "%i" ] cannot be followed by "%i"), so there is no loss in using the stronger hypothesis; the stronger hypothesis keeps the completeness theorems simpler without precluding the eventual proof of the top-level completeness theorem.

If a parsing function calls, or may call, another parsing function as its last action, the former's completeness theorem ``inherits'' the hypotheses on the remaining input from the latter's completeness theorem. If the hypotheses were not inherited, the called function may successfully parse some of the remaining input, returning more or different subtrees than hypothesized by the calling function's completeness theorem, rendering the theorem untrue. For example, since parse-1*bit calls parse-*bit as its last action, parse-1*bit-when-tree-list-match inherits from parse-*bit-when-tree-list-match the hypothesis that parse-bit fails on the remaining input; otherwise, parse-*bit could return additional BIT trees and so parse-1*bit could return additional BIT trees as well. As another example, since parse-dot-1*bit calls parse-1*bit as its last action, parse-dot-1*bit-when-tree-match inherits from parse-*bit-when-tree-list-match the hypothesis that parse-bit fails on the remaining input; otherwise, parse-1*bit could return additional BIT trees and so parse-dot-1*bit could return a tree with additional BIT subtrees. As a third example, since parse-bin/dec/hex-val may call parse-bin-val (and parse-dec-val and parse-hex-val) as its last action, parse-bin/dec/hex-val-when-tree-match inherits from parse-bin-val-when-tree-match (and parse-dec-val-when-tree-match and parse-hex-val-when-tree-match) various parsing failure hypotheses on the remaining input.

As a slight generalization of the situation described in the previous paragraph, a parsing function may call, as its last action, another parsing function that may return a tree without leaves. In this case, the calling parsing function's completeness theorem inherits the hypotheses on the remaining input also from the completeness theorem of the parsing function that it calls just before the last one. For example, parse-elements calls parse-*cwsp as its last action, and thus parse-elements-when-tree-match inherits from parse-*cwsp-when-tree-list-match the hypothesis that parse-cwsp fails on the remaining input, as explained earlier. But since parse-*cwsp may return a tree with no leaves (if no instances of c-wsp follow the alternation), parse-elements-when-tree-match also inherits the hypotheses on the remaining input from parse-alternation-when-tree-match.

As illustrated by the parse-bin/dec/hex-val-when-tree-match example above, when a parsing function may call a set of different parsing functions as its last action, the calling function's completeness theorem inherits from the called functions' completeness theorems all the hypotheses on the remaining input. The resulting hypotheses, in the calling function's completeness theorem, are stronger than needed: they could be weakened to require an inherited parsing failure hypothesis only if the subtree(s) correspond(s) to the called function. For example, in parse-bin/dec/hex-val-when-tree-match, the failure of parse-bit could be required only if the (bin-val / dec-val / hex-val) tree has a bin-val subtree. However, in syntactically valid ABNF grammars, the stronger hypotheses are always satisfied (e.g. dec-val cannot be followed by BIT), so there is no loss in using the stronger hypotheses; the stronger hypotheses keep the completeness theorems simpler without precluding the eventual proof of the top-level completeness theorem.

In the rules in [RFC:4], certain repeated and optional syntactic entities ``nest to the right'', e.g. 1*BIT nests to the right inside 1*("." 1*BIT). When this kind of nesting occurs, the completeness theorem of the parsing function for the outer repetition or option has not only the parsing failure hypotheses on the remaining input relative to the outer repetition or option, but also the parsing failure hypotheses on the remaining input relative to the inner repetition or option. For example, parse-1*-dot-1*bit-when-tree-list-match (and parse-*-dot-1*bit-when-tree-list-match) includes not only the failure of parse-dot-1*bit (actually, a stronger hypothesis, as explained below), but also the failure of parse-bit. As another example, parse-bin-val-rest-when-tree-match includes the failure of parse-bit for the 1*BIT repetition as well as (stronger hypotheses, as explained below, implying) the failure of (both alternatives inside) the [ 1*("." 1*BIT) "/" ("-" 1*BIT) ] option.

Besides the cases already mentioned of stronger hypotheses on the remaining input (that keep the theorems simpler while not precluding the top-level proof), there are other cases in which completeness theorems have stronger parsing failure hypotheses than needed. An example, as hinted above, is parse-1*-dot-1*bit-when-tree-list-match: instead of having a hypothesis requiring the failure of parse-1*-dot-1*bit, the theorem has the stronger hypothesis that parse-ichar with argument #\. fails. However, in syntactically valid ABNF grammars, the stronger hypotheses are always satisfied (e.g. 1*("." 1*BIT) cannot be followed by "." unless that is followed by BIT), so there is no loss in using the stronger hypotheses; the stronger hypotheses keep the completeness theorems simpler without precluding the eventual proof of the top-level completeness theorem.

The two alternatives parsed by parse-equal-/-equal-slash are one a prefix of the other. Therefore, parse-equal-/-equal-slash-when-tree-match has the hypothesis that parse-ichar with argument #\/ fails on the remaining input. Without this hypothesis, the tree hypothesized in the theorem could match just "=", but the remaining input could start with "/", in which case parse-equal-/-equal-slash would return a tree matching "=/" instead, rendering the theorem untrue. This hypothesis on the remaining input is stronger than needed: it could be weakened to requiring the parsing failure only if the tree matches "=". However, in syntactically valid ABNF grammars, the stronger hypothesis is always satisfied (i.e. "=/" cannot be followed by "/"), so there is no loss in using the stronger hypothesis; the stronger hypothesis keeps this completeness theorem simpler without precluding the eventual proof of the top-level completeness theorem.

Since (i) parse-rule calls parse-cnl as its last action, (ii) parse-cnl always returns a tree with leaves, and (iii) parse-cnl-when-tree-match has no hypotheses on the remaining input, it may seem that parse-rule-when-tree-match needs no hypotheses on the remaining input. However, before calling parse-cnl, parse-rule calls parse-elements, which may parse the ending c-nl and then attempt and fail to parse WSP after c-nl (this is how the grammar parser implementation resolves the rulelist ambiguity). Thus, parse-rule may actually examine part of the remaining input. The needed hypothesis is that parse-wsp fails on the remaining input: if WSP could be parsed from the remaining input, parse-rule would put that with the c-nl under an additional c-wsp instance under elements and under rule, thus returning a different tree than hypothesized in the theorem.

An analogous discussion to the one in the previous paragraph applies to parse-*cwsp-cnl. Thus, parse-*cwsp-cnl-when-tree-match has the hypothesis that parse-wsp fails on the remaining input.

Hypotheses on the Tree(s)

Most completeness theorems include hypotheses saying that the trees are terminated. This ensures that the strings at the leaves of the trees consist of natural numbers and not rule names, since the parsing functions operate on natural numbers. A few completeness theorems do not need those hypotheses because the corresponding syntactic entities can only be matched by trees whose leaves are natural numbers, e.g. parse-exact-when-tree-match, parse-ichar-when-tree-match, and parse-?%i-when-tree-match.

The completeness theorems parse-*-rule-/-*cwsp-cnl-when-tree-list-match-and-restriction and parse-rulelist-when-tree-match-and-restriction, as suggested by the ending of their names, have hypotheses saying that the (list of) tree(s) satisfies the disambiguating restrictions. Without these hypotheses, the theorems would not hold because there would be multiple choices of trees for certain inputs, but the parser only produces one choice for each input.

Proof Methods

The completeness theorems of the more ``basic'' parsing functions parse-any, parse-exact, parse-in-range, parse-ichar, and parse-ichar2 are proved by expanding the necessary definitions. The proofs for parse-exact and parse-in-range use parse-any-of-cons as a rewrite rule, obviating the need to expand parse-any. The proofs for parse-exact and parse-in-range also expand some tree-related functions, which seems odd because it should be possible to treat trees as abstract data types; there may be ways to avoid that, perhaps by adding some tree lemmas.

Each of the other completeness theorems is proved, generally speaking, by reducing the tree matching hypothesis to one or more subtree matching facts, reducing the call to the parsing function on the (list of) tree(s) to calls to other parsing functions on (lists of) subtrees, and using the already proved theorems for the called parsing functions to show that the calling parsing function returns the right results. The completeness theorems are used as rewrite rules (implicitly, since they are enabled). The subtree matching facts to which the tree matching hypothesis reduces are used to relieve the hypotheses of these rewrite rules.

For example, parse-cr-when-tree-match is proved as follows. The hypothesis that the tree matches CR is reduced to the fact that its (only) subtree matches %x0D. The call to parse-cr in the conclusion is reduced to a call to parse-exact with argument #x0d on the append of (i) the string at the leaves of the subtree and (ii) the remaining input. The rewrite rule parse-exact-when-tree-match then applies (whose hypothesis is relieved via the aforementioned fact that the subtree matches %0D), rewriting the call to parse-exact to the triple consisting of nil (i.e. success), the subtree, and the remaining input. Therefore, parse-cr, by its definition, returns the triple consisting of nil (i.e. success), the CR tree with the %x0D subtree, and the remaining input. Thus, parse-cr-when-tree-match is proved.

This reduction approach works also when there are multiple subtrees. For example, parse-crlf-when-tree-match is proved as follows. The hypothesis that the tree matches CRLF is reduced to the first subtree matching CR and the second subtree matching LF. The call to parse-crlf is reduced to a call to parse-cr on the string at the leaves of the first subtree and a call to parse-lf on the string at the leaves of the second subtree. Then parse-cr-when-tree-match and parse-lf-when-tree-match apply.

Since completeness theorems generally have hypotheses about the trees being terminated, in order to apply completeness theorems as rewrite rules to subtrees (in the reduction approach outlined above), the hypothesis that the trees are terminated must be reduced to facts that subtrees are terminated to relieve the hypotheses of the rewrite rules. Thus, we enable tree-terminatedp just before the completeness theorems and we disable it just after. The existing enabled rewrite rules take care of tree-list-terminatedp and tree-list-list-terminatedp.

Similarly, when calls to parsing functions on trees are reduced to calls to parsing functions on subtrees, the strings at the leaves of the trees must be reduced to strings at the leaves of the subtrees. Thus, we enable tree->string, tree-list->string, and tree-list-list->string just before the completeness theorems and we disable them just after. The existing enabled rules about tree-list->string and tree-list-list->string do not suffice: if these two functions are not enabled, the proofs of some completeness theorems fails.

The tree-match-element-p hypotheses of the completeness theorems are expanded via explicit :expand hints; just enabling tree-match-element-p does not perform the expansion (presumably due to ACL2's heuristics for expanding recursive functions). Since many repetitions consist of one element, the rewrite rule tree-list-match-repetition-p-of-1-repetition is used in many completeness proofs: we enable it just before the completeness theorems and disabled it just after. There is no direct use of the definitions of tree-list-list-match-alternation-p and tree-list-list-match-concatenation-p because the alternations and concatenations in the completeness theorems always have an explicit list structure and thus rewrite rules like tree-list-list-match-alternation-p-of-cons-alternation suffice. Repetition are handled via the rules tree-list-match-repetition-p-of-0+-reps-when-consp and tree-list-match-repetition-p-of-1+-repetitions where needed, as explained below.

If a parsing function may backtrack, its completeness theorem uses a disambiguation theorem as a rewrite rule, by explicitly enabling it (with just one exception: parse-alpha-when-tree-match uses fail-1st-range-when-match-2nd-range with a :use hint, because the latter does not quite apply as a rewrite rule there). For example, in the proof of the completeness theorem parse-wsp-when-tree-match, the hypothesis that the tree matches WSP reduces to two cases for the subtree: either the subtree matches SP or the subtree matches HTAB. In the first case, the completeness theorem parse-sp-when-tree-match implies that parse-sp succeeds, so parse-wsp succeeds and parse-wsp-when-tree-match is proved. In the second case, in order to use, in a similar way, the completeness theorem parse-htab-when-tree-match, we need to show that parse-sp fails, so that parse-wsp reduces to parse-htab by backtracking and then parse-wsp-when-tree-match is proved using parse-htab-when-tree-match. The disambiguation theorem fail-sp-when-match-htab serves to show that, in the second case above, parse-sp fails.

All the completeness theorems for parsing functions that may backtrack follow this proof pattern, which motivates the formulation of the disambiguation theorems. In particular, it motivates the ``asymmetric'' use of trees and parsing functions to show incompatibility (as opposed to showing incompatibility between parsing functions or between trees).

Some completeness theorems use some disambiguation theorems not to show that the parsing function must backtrack, but to relieve hypotheses of other completeness theorems. For example, in the completeness theorem parse-1*-dot-1*bit-when-tree-list-match, the disambiguation theorem fail-bit-when-match-*-dot-1*bit serves to relieve the parse-bit failure hypothesis of the parse-dot-1*bit-when-tree-match completeness theorem.

If a parsing function parses a repetition of one or more elements (e.g. parse-1*bit), its completeness theorem (e.g. parse-1*bit-when-tree-list-match) is proved by using tree-list-match-repetition-p-of-1+-repetitions to reduce the matching to a single element and to a repetition of zero or more elements, and then using the completeness theorems for the element (e.g. parse-bit-when-tree-match) and for the repetition of zero or more elements (e.g. parse-*bit-when-tree-list-match).

If a parsing function is singly recursive (e.g. parse-*bit), i.e. it parses a repetition of zero or more elements, its completeness theorem is proved by induction on the length of the list of trees that matches the repetition; induction on the parsing function does not work, because the argument of the parsing function is not a variable (it is (append (tree-list->string trees) rest-input)). We enable tree-list-match-repetition-p-of-0+-reps-when-consp to handle the induction step of the proof. We also disable the rewrite rule acl2::nat-list-fix-of-append because it interferes with the proof by preventing nat-list-fix from being eliminated via the rewrite rule that shows that the parsing function fixes the input (e.g. the theorem parse-*bit-of-nat-list-fix-input).

The proof of the mutually recursive parsing functions (i.e. parse-alternation, parse-concatenation, etc.) is more complex. As with the singly recursive parsing functions, a straightforward induction on the mutually recursive parsing functions does not work because their arguments are not variables (they are (append (tree->string tree) rest-input) and (append (tree-list->string trees) rest-input)). In analogy with the completeness theorems for the singly recursive parsing functions, we could try an induction on the sizes of the trees (variables tree and trees), but the formulation seems somewhat complicated, due to the presence of multiple parsing functions.

Instead, we take the desired formulation of each completeness theorem of the mutually recursive parsing functions, we add a hypothesis that a new variable input is equal to (append (tree->string tree) rest-input) or (append (tree-list->string trees) rest-input), we universally quantify the tree or trees variable and the rest-input variable into a define-sk predicate with argument input, and we prove all these predicates by induction on the mutually recursive functions. That is, we prove that the parsing functions satisfy their ``completeness properties'' for every way in which their input can be ``split'' into (the string at the leaves of) a (list of) tree(s) and some remaining input. The predicates capture these completeness properties.

The predicates are pred-alternation, pred-concatenation, etc. They are not guard-verified because they only serve to prove the completeness of the mutually recursive parsing functions. The consequents of the implications in their bodies call the parsing functions on (append (tree->string tree) rest-input) or (append (tree-list->string trees) rest-input), and not on the shorter input that the antecedents assert to be equal, for a practical reason: if we used input in the consequents, the rewrite rules generated by define-sk (e.g. pred-alternation-necc) would have tree, trees, and rest-input as free variables, making their use harder. The predicates also include the fact that the remaining input satisfies nat-listp, instead of using nat-list-fix in the consequent: this is because the nat-list-fix approach causes the proofs to fail (perhaps due to some interaction with the equality with input), so we use nat-listp instead in the predicates.

We prove by induction on the mutually recursive parsing functions that all the predicates hold for every input argument: see parse-alt/conc/rep/elem/group/option-when-tree-/-tree-list-match-lemmas. These are completeness lemmas, from which the completeness theorems are proved with ease. The completeness theorem have the same formulation as the ones for the other, non-recursive or singly recursive parsing functions; in particular, they use nat-list-fix instead of nat-listp on the remaining input.

The induction proof of the conpleteness lemmas generates 5 base cases and 26 induction steps. We prove each of them separately (these are the theorems whose names end with -base-case and -induction-step-N where N is a number, e.g. parse-element-when-tree-match-base-case and parse-alternation-when-tree-match-induction-step-2. Attempting to prove the completeness lemmas by induction in one shot fails, perhaps due to interference between the different hints used for the base cases and induction steps; however, it may be possible to find a way to prove the lemmas in one shot.

The formulation of each base case and induction step is derived directly from the output generated by the defthm-parse-alt/conc/rep/elem/group/option-flag form. Attempting to prove each base case and induction step in one shot fails, perhaps because of the equality between input and (append (tree->string tree) rest-input) or (append (tree-list->string trees) rest-input). So, we prove each base case and induction step via a local lemma where the predicate in the conclusion of the base case or induction step is replaced with its definition; the base case or induction step is then proved just by expanding the predicate definition and using the local lemma as a rewrite rule.

The proof of each local lemma for base cases and induction steps is similar to the proofs of the completeness theorems of the non-recursive and singly recursive parsing functions. In addition, these local lemmas use (implicitly) the rewrite rules generated by define-sk (e.g. pred-alternation-necc); some disambiguation theorems are sometimes enabled to relieve the hypotheses of these define-sk rewrite rules. The induction steps that involve lists of trees (as opposed to single trees) use a :cases hint to split on whether the lists are empty or not.

Earlier we explained that some completeness theorems have stronger parsing failure hypotheses on the remaining input than needed, in order to keep the theorems simpler. These theorems enable certain parsing failure propagation theorems to turn the stronger hypotheses into the facts needed to show the weaker parsing failures within the parsing functions. For example, in the completeness theorem parse-*-dot-1*bit-when-tree-list-match, the parsing failure propagation theorem fail-dot-1*bit-when-fail-dot is used to turn the hypothesis that parse-ichar with argument #\. fails into the fact that parse-dot-1*bit fails, needed to show that parse-*-dot-1*bit stops before the remaining input.

Induction Schema Change

At some point, the mutually recursive functions parse-alternation and companions were slightly changed to use Seq's :s= operator to insert an mbt check on the lengths: see the `Termination' section in grammar-parser-implementation for motivation. Before the change, the code of those parsing functions did not use any Seq macros, and had explicit mbt tests. After the change, the flag induction schema generated from the mutually recursive functions changed slightly. As a result, the lemmas for induction steps, described above, no longer aligned exactly with the new induction schema. However, the alignment could be easily restored by combining some of the previous induction step lemmas into single ones: these have names like ...-induction-step-1+2, which combines ...-induction-step-1 and ...-induction-step-2. It may be possible to streamline these lemmas and proofs in the future (in particular, prove just one lemma for each induction step in the new induction schema, instead of combining them), but for now this approach works fine.

Subtopics

Grammar-parser-disambiguation
Disambiguation theorems for the parser of ABNF grammars.
Grammar-parser-constraints-from-tree-matching
Tree maching constraint theorems for the parser of ABNF grammars.
Grammar-parser-constraints-from-parsing
Parsing constraint theorems for the parser of ABNF grammars.
Grammar-parser-disambiguating-restrictions
Restrictions on parse trees that correspond to how the parser of ABNF grammars resolves the rulelist ambiguity.
Grammar-parser-parsing-failure-propagation
Parsing failure propagation theorems for the parser of ABNF grammars.
Parse-alt/conc/rep/elem/group/option-when-tree-/-tree-list-match-lemmas
Completeness lemmas for the mutually recursive parsing functions.
Parse-grammar-when-tree-match
Top-level completeness theorem of the parser of ABNF grammars.
Parse-*-in-either-range-when-tree-list-match
Completeness theorem for parse-*-in-either-range.
Parse-in-either-range-when-tree-match
Completeness theorem for parse-in-either-range.
Parse-element-when-tree-match-induction-step-1+2+3+4+5
Combination of parse-element-when-tree-match-induction-step-1, parse-element-when-tree-match-induction-step-2, parse-element-when-tree-match-induction-step-3, parse-element-when-tree-match-induction-step-4, and parse-element-when-tree-match-induction-step-5.
Pred-concatenation
Completeness property for parse-concatenation.
Pred-conc-rest
Completeness property for parse-conc-rest.
Pred-alternation
Completeness property for parse-alternation.
Pred-alt-rest
Completeness property for parse-alt-rest.
Pred-conc-rest-comp
Completeness property for parse-conc-rest-comp.
Pred-alt-rest-comp
Completeness property for parse-alt-rest-comp.
Parse-elements-when-tree-match
Completeness theorem for parse-elements.
Pred-repetition
Completeness property for parse-repetition.
Pred-element
Completeness property for parse-element.
Parse-*-rule-/-*cwsp-cnl-when-tree-list-match-and-restriction
Completeness theorem for parse-*-rule-/-*cwsp-cnl.
Parse-option-when-tree-match-induction-step-3
Third induction step of the completeness lemma for parse-option.
Parse-option-when-tree-match-induction-step-2
Second induction step of the completeness lemma for parse-option.
Parse-option-when-tree-match-induction-step-1+2+3
Combination of parse-option-when-tree-match-induction-step-1, parse-option-when-tree-match-induction-step-2, and parse-option-when-tree-match-induction-step-3.
Parse-group-when-tree-match-induction-step-3
Third induction step of the completeness lemma for parse-group.
Parse-group-when-tree-match-induction-step-2
Second induction step of the completeness lemma for parse-group.
Parse-group-when-tree-match-induction-step-1+2+3
Combination of parse-group-when-tree-match-induction-step-1, parse-group-when-tree-match-induction-step-2, and parse-group-when-tree-match-induction-step-3.
Parse-bin/dec/hex-val-when-tree-match
Completeness theorem for parse-bin/dec/hex-val.
Parse-alt-rest-when-tree-list-match
Completeness theorem for parse-alt-rest.
Parse-alt-rest-comp-when-tree-match-induction-step-2
Second induction step of the completeness lemma for parse-alt-rest-comp.
Parse-alt-rest-comp-when-tree-match-induction-step-1
First induction step of the completeness lemma for parse-alt-rest-comp.
Parse-*bit-when-tree-list-match
Completeness theorem for parse-*bit.
Parse-*-dot-1*bit-when-tree-list-match
Completeness theorem for parse-*-dot-1*bit.
Parse-rulelist-when-tree-match-and-restriction
Completeness theorem for parse-rulelist.
Parse-exact-when-tree-match
Completeness theorem for parse-exact.
Parse-element-when-tree-match-induction-step-3
Third induction step of the completeness lemma for parse-element.
Parse-element-when-tree-match-induction-step-2
Second induction step of the completeness lemma for parse-element.
Parse-element-when-tree-match-induction-step-1
First induction step of the completeness lemma for parse-element.
Parse-conc-rest-when-tree-list-match
Completeness theorem for parse-conc-rest.
Parse-conc-rest-comp-when-tree-match-induction-step-2
Second induction step of the completeness lemma for parse-conc-rest-comp.
Parse-conc-rest-comp-when-tree-match-induction-step-1+2
Combination of parse-conc-rest-comp-when-tree-match-induction-step-1 and parse-conc-rest-comp-when-tree-match-induction-step-2.
Parse-conc-rest-comp-when-tree-match-induction-step-1
First induction step of the completeness lemma for parse-conc-rest-comp.
Parse-bin-val-rest-when-tree-match
Completeness theorem for parse-bin-val-rest.
Parse-alternation-when-tree-match
Completeness theorem for parse-alternation.
Parse-alt-rest-comp-when-tree-match-induction-step-1+2
Combination of parse-alt-rest-comp-when-tree-match-induction-step-1 and parse-alt-rest-comp-when-tree-match-induction-step-2.
Parse-alt-rest-comp-when-tree-match
Completeness theorem for parse-alt-rest-comp.
Parse-1*-dot-1*bit-when-tree-list-match
Completeness theorem for parse-1*-dot-1*bit.
Pred-option
Completeness property for parse-option.
Pred-group
Completeness property for parse-group.
Parse-?%i-when-tree-match
Completeness theorem for parse-?%i.
Parse-*digit-when-tree-list-match
Completeness theorem for parse-*digit.
Parse-*cwsp-when-tree-list-match
Completeness theorem for parse-*cwsp.
Parse-wsp-when-tree-match
Completeness theorem for parse-wsp.
Parse-repetition-when-tree-match-induction-step-2
Second induction step of the completeness lemma for parse-repetition.
Parse-repetition-when-tree-match-induction-step-1+2
Combination of parse-repetition-when-tree-match-induction-step-1 and parse-repetition-when-tree-match-induction-step-2.
Parse-repetition-when-tree-match-induction-step-1
First induction step of the completeness lemma for parse-repetition.
Parse-repetition-when-tree-match
Completeness theorem for parse-repetition.
Parse-option-when-tree-match-induction-step-1
First induction step of the completeness lemma for parse-option.
Parse-ichar-when-tree-match
Completeness theorem for parse-ichar.
Parse-htab-when-tree-match
Completeness theorem for parse-htab.
Parse-hex-val-when-tree-match
Completeness theorem for parse-hex-val.
Parse-group-when-tree-match-induction-step-1
First induction step of the completeness lemma for parse-group.
Parse-equal-/-equal-slash-when-tree-match
Completeness theorem for parse-equal-/-equal-slash.
Parse-element-when-tree-match-induction-step-5
Fifth induction step of the completeness lemma for parse-element.
Parse-element-when-tree-match-induction-step-4
Fourth induction step of the completeness lemma for parse-element.
Parse-element-when-tree-match
Completeness theorem for parse-element.
Parse-dot-1*bit-when-tree-match
Completeness theorem for parse-dot-1*bit.
Parse-dec-val-when-tree-match
Completeness theorem for parse-dec-val.
Parse-cr-when-tree-match
Completeness theorem for parse-cr.
Parse-concatenation-when-tree-match-induction-step-2
Second induction step of the completeness lemma for parse-concatenation.
Parse-concatenation-when-tree-match
Completeness theorem for parse-concatenation.
Parse-conc-rest-comp-when-tree-match
Completeness theorem for parse-conc-rest-comp.
Parse-cnl-when-tree-match
Completeness theorem for parse-cnl.
Parse-bin-val-when-tree-match
Completeness theorem for parse-bin-val.
Parse-alternation-when-tree-match-induction-step-2
Second induction step of the completeness lemma for parse-alternation.
Parse-alpha-when-tree-match
Completeness theorem for parse-alpha.
Parse-1*cwsp-when-tree-list-match
Completeness theorem for parse-1*cwsp.
Parse-1*bit-when-tree-list-match
Completeness theorem for parse-1*bit.
Parse-*wsp/vchar-when-tree-list-match
Completeness theorem for parse-*wsp/vchar.
Parse-*hexdig-when-tree-list-match
Completeness theorem for parse-*hexdig.
Parse-*digit-star-*digit-when-tree-match
Completeness theorem for parse-*digit-star-*digit.
Parse-*cwsp-cnl-when-tree-match
Completeness theorem for parse-*cwsp-cnl.
Parse-*-dot-1*hexdig-when-tree-list-match
Completeness theorem for parse-*-dot-1*hexdig.
Parse-*-dot-1*digit-when-tree-list-match
Completeness theorem for parse-*-dot-1*digit.
Parse-*-alpha/digit/dash-when-tree-list-match
Completeness theorem for parse-*-alpha/digit/dash.
Parse-sp-when-tree-match
Completeness theorem for parse-sp.
Parse-rule-/-*cwsp-cnl-when-tree-match
Completeness theorem for parse-rule-/-*cwsp-cnl.
Parse-rule-when-tree-match
Completeness theorem for parse-rule.
Parse-num-val-when-tree-match
Completeness theorem for parse-num-val.
Parse-lf-when-tree-match
Completeness theorem for parse-lf.
Parse-in-range-when-tree-match
Completeness theorem for parse-in-range.
Parse-ichar2-when-tree-match
Completeness theorem for parse-ichar2.
Parse-hex-val-rest-when-tree-match
Completeness theorem for parse-hex-val-rest.
Parse-defined-as-when-tree-match
Completeness theorem for parse-defined-as.
Parse-dec-val-rest-when-tree-match
Completeness theorem for parse-dec-val-rest.
Parse-crlf-when-tree-match
Completeness theorem for parse-crlf.
Parse-conc-rest-when-tree-list-match-induction-step-2
Second induction step of the completeness lemma for parse-conc-rest.
Parse-case-sensitive-string-when-tree-match
Completeness theorem for parse-case-sensitive-string.
Parse-case-insensitive-string-when-tree-match
Completeness theorem for parse-case-insensitive-string.
Parse-bit-when-tree-match
Completeness theorem for parse-bit.
Parse-alt-rest-when-tree-list-match-induction-step-2
Second induction step of the completeness lemma for parse-alt-rest.
Parse-1*hexdig-when-tree-list-match
Completeness theorem for parse-1*hexdig.
Parse-1*digit-when-tree-list-match
Completeness theorem for parse-1*digit.
Parse-1*-dot-1*hexdig-when-tree-list-match
Completeness theorem for parse-1*-dot-1*hexdig.
Parse-1*-dot-1*digit-when-tree-list-match
Completeness theorem for parse-1*-dot-1*digit.
Parse-?repeat-when-tree-match
Completeness theorem for parse-?repeat.
Parse-wsp/vchar-when-tree-match
Completeness theorem for parse-wsp/vchar.
Parse-vchar-when-tree-match
Completeness theorem for parse-vchar.
Parse-rulename-when-tree-match
Completeness theorem for parse-rulename.
Parse-repeat-when-tree-match
Completeness theorem for parse-repeat.
Parse-quoted-string-when-tree-match
Completeness theorem for parse-quoted-string.
Parse-prose-val-when-tree-match
Completeness theorem for parse-prose-val.
Parse-option-when-tree-match
Completeness theorem for parse-option.
Parse-hexdig-when-tree-match
Completeness theorem for parse-hexdig.
Parse-group-when-tree-match
Completeness theorem for parse-group.
Parse-element-when-tree-match-induction-step-6
Sixth induction step of the completeness lemma for parse-element.
Parse-element-when-tree-match-base-case
Base case of the completeness lemma for parse-element.
Parse-dquote-when-tree-match
Completeness theorem for parse-dquote.
Parse-dot-1*hexdig-when-tree-match
Completeness theorem for parse-dot-1*hexdig.
Parse-dot-1*digit-when-tree-match
Completeness theorem for parse-dot-1*digit.
Parse-digit-when-tree-match
Completeness theorem for parse-digit.
Parse-dash-1*hexdig-when-tree-match
Completeness theorem for parse-dash-1*hexdig.
Parse-dash-1*digit-when-tree-match
Completeness theorem for parse-dash-1*digit.
Parse-dash-1*bit-when-tree-match
Completeness theorem for parse-dash-1*bit.
Parse-cwsp-when-tree-match
Completeness theorem for parse-cwsp.
Parse-concatenation-when-tree-match-induction-step-1
First induction step of the completeness lemma for parse-concatenation.
Parse-conc-rest-when-tree-list-match-induction-step-1
First induction step of the completeness lemma for parse-conc-rest.
Parse-conc-rest-comp-when-tree-match-base-case
Base case of the completeness lemma for parse-conc-rest-comp.
Parse-comment-when-tree-match
Completeness theorem for parse-comment.
Parse-cnl-wsp-when-tree-match
Completeness theorem for parse-cnl-wsp.
Parse-char-val-when-tree-match
Completeness theorem for parse-char-val.
Parse-alternation-when-tree-match-induction-step-1
First induction step of the completeness lemma for parse-alternation.
Parse-alt-rest-when-tree-list-match-induction-step-1
First induction step of the completeness lemma for parse-alt-rest.
Parse-alt-rest-comp-when-tree-match-base-case
Base case of the completeness lemma for parse-alt-rest-comp.
Parse-alpha/digit/dash-when-tree-match
Completeness theorem for parse-alpha/digit/dash.
Parse-option-when-tree-match-base-case
Base case of the completeness lemma for parse-option.
Parse-group-when-tree-match-base-case
Base case of the completeness lemma for parse-group.
Parse-any-of-cons
Completeness theorem for parse-any.