• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
    • Debugging
    • Community
    • Std
    • Proof-automation
    • Macro-libraries
    • ACL2
    • Interfacing-tools
    • Hardware-verification
    • Software-verification
      • Kestrel-books
        • Crypto-hdwallet
        • Apt
        • Error-checking
        • Fty-extensions
        • Isar
        • Kestrel-utilities
        • Set
        • C
          • Syntax-for-tools
            • Formalized-subset
            • Mapping-to-language-definition
            • Input-files
            • Compilation-database
            • Printer
            • Output-files
            • Abstract-syntax-operations
            • Implementation-environments
            • Abstract-syntax
            • Concrete-syntax
            • Disambiguation
            • Validation
            • Gcc-builtins
            • Preprocessing
              • Preprocessor
                • Pproc
                • Preprocessor-states
                • Preprocessor-printer
                • Pproc-directive
                • Pread-token
                • Pproc-files
                • Pread-token/newline
                • Pproc-group-part
                • Preprocessor-lexer
                  • Plex-lexeme
                    • Plex-block-comment
                    • Plex-pp-number
                    • Plex-line-comment
                    • Plex-identifier
                    • Plex-escape-sequence
                    • Plex-spaces
                    • Plex-*-hexadecimal-digit
                    • Plex-*-s-char
                    • Plex-*-c-char
                    • Plex-header-name
                    • Plex-*-q-char
                    • Plex-*-h-char
                    • Plex-character-constant
                    • Plex-hexadecimal-digit
                    • Plex-string-literal
                    • Plex-hex-quad
                  • Pproc-file
                  • Pproc-include-directive
                  • Pproc-text-line
                  • Pproc-*-group-part
                  • Punread-token
                  • *pproc-files-max*
                  • String-plexeme-list-alist-to-filepath-filedata-map
                  • String-plexeme-list-alist
                  • Read-input-file-to-preproc
                  • Preprocessor-reader
                  • Preprocessor-messages
                • External-preprocessing
              • Parsing
            • Atc
            • Transformation-tools
            • Language
            • Representation
            • Insertion-sort
            • Pack
          • Soft
          • Bv
          • Imp-language
          • Ethereum
          • Event-macros
          • Java
          • Riscv
          • Bitcoin
          • Zcash
          • Yul
          • ACL2-programming-language
          • Prime-fields
          • Json
          • Syntheto
          • File-io-light
          • Cryptography
          • Number-theory
          • Axe
          • Lists-light
          • Builtins
          • Solidity
          • Helpers
          • Htclient
          • Typed-lists-light
          • Arithmetic-light
        • X86isa
        • Axe
        • Execloader
      • Math
      • Testing-utilities
    • Preprocessor-lexer

    Plex-lexeme

    Lex a lexeme during preprocessing.

    Signature
    (plex-lexeme headerp ppstate) 
      → 
    (mv erp lexeme? span new-ppstate)
    Arguments
    headerp — Guard (booleanp headerp).
    ppstate — Guard (ppstatep ppstate).
    Returns
    lexeme? — Type (plexeme-optionp lexeme?).
    span — Type (spanp span).
    new-ppstate — Type (ppstatep new-ppstate), given (ppstatep ppstate).

    This is the top-level lexing function for the preprocessor. It returns the next lexeme found in the parser state, or nil if we reached the end of the file; an error is returned if lexing fails.

    Lexing in the preprocessor is context-dependent [C17:5.1.1.2/1, footnote 7]: when expecting a header name, a " or a < are interpreted differently (i.e. as starting a header name) than usual (i.e. as starting a string literal or a punctuator). (Note that header names surrounded with double quotes are not the same as string literals, because the latter allow escapes while the former do not.) Thus, this lexing function takes a boolean flag indicating whether we are expecting a header name or not.

    This lexing function is similar to lex-lexeme, with the necessary differences, including the handling of the context header flag.

    Definitions and Theorems

    Function: plex-lexeme

    (defun plex-lexeme (headerp ppstate)
     (declare (xargs :stobjs (ppstate)))
     (declare (xargs :guard (and (booleanp headerp)
                                 (ppstatep ppstate))))
     (b* (((reterr) nil (irr-span) ppstate)
          ((erp char pos ppstate)
           (pread-char ppstate))
          ((unless char)
           (retok nil (make-span :start pos :end pos)
                  ppstate)))
      (cond
       ((utf8-= char 32)
        (plex-spaces pos ppstate))
       ((utf8-= char 9)
        (retok (plexeme-horizontal-tab)
               (make-span :start pos :end pos)
               ppstate))
       ((utf8-= char 11)
        (retok (plexeme-vertical-tab)
               (make-span :start pos :end pos)
               ppstate))
       ((utf8-= char 12)
        (retok (plexeme-form-feed)
               (make-span :start pos :end pos)
               ppstate))
       ((utf8-= char 10)
        (retok (plexeme-newline (newline-lf))
               (make-span :start pos :end pos)
               ppstate))
       ((utf8-= char 13)
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-newline (newline-cr))
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 10)
                 (retok (plexeme-newline (newline-crlf))
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-newline (newline-cr))
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((and (utf8-<= (char-code #\0) char)
             (utf8-<= char (char-code #\9)))
        (plex-pp-number nil (code-char char)
                        pos ppstate))
       ((utf8-= char (char-code #\.))
        (b* (((erp char2 & ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator ".")
                        (make-span :start pos :end pos)
                        ppstate))
                ((and (utf8-<= (char-code #\0) char2)
                      (utf8-<= char2 (char-code #\9)))
                 (plex-pp-number t (code-char char2)
                                 pos ppstate))
                ((utf8-= char2 (char-code #\.))
                 (b* (((erp char3 pos3 ppstate)
                       (pread-char ppstate)))
                   (cond ((not char3)
                          (b* ((ppstate (punread-char ppstate)))
                            (retok (plexeme-punctuator ".")
                                   (make-span :start pos :end pos)
                                   ppstate)))
                         ((utf8-= char3 (char-code #\.))
                          (retok (plexeme-punctuator "...")
                                 (make-span :start pos :end pos3)
                                 ppstate))
                         (t (b* ((ppstate (punread-char ppstate))
                                 (ppstate (punread-char ppstate)))
                              (retok (plexeme-punctuator ".")
                                     (make-span :start pos :end pos)
                                     ppstate))))))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator ".")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\u))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-ident (ident "u"))
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\'))
                 (plex-character-constant (cprefix-locase-u)
                                          pos ppstate))
                ((utf8-= char2 (char-code #\"))
                 (plex-string-literal (eprefix-locase-u)
                                      pos ppstate))
                ((utf8-= char2 (char-code #\8))
                 (b* (((erp char3 & ppstate)
                       (pread-char ppstate)))
                   (cond ((not char3)
                          (retok (plexeme-ident (ident "u8"))
                                 (make-span :start pos :end pos2)
                                 ppstate))
                         ((utf8-= char3 (char-code #\"))
                          (plex-string-literal (eprefix-locase-u8)
                                               pos ppstate))
                         (t (b* ((ppstate (punread-char ppstate))
                                 (ppstate (punread-char ppstate)))
                              (plex-identifier char pos ppstate))))))
                (t (b* ((ppstate (punread-char ppstate)))
                     (plex-identifier char pos ppstate))))))
       ((utf8-= char (char-code #\U))
        (b* (((erp char2 & ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-ident (ident "U"))
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\'))
                 (plex-character-constant (cprefix-upcase-u)
                                          pos ppstate))
                ((utf8-= char2 (char-code #\"))
                 (plex-string-literal (eprefix-upcase-u)
                                      pos ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (plex-identifier char pos ppstate))))))
       ((utf8-= char (char-code #\L))
        (b* (((erp char2 & ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-ident (ident "L"))
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\'))
                 (plex-character-constant (cprefix-upcase-l)
                                          pos ppstate))
                ((utf8-= char2 (char-code #\"))
                 (plex-string-literal (eprefix-upcase-l)
                                      pos ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (plex-identifier char pos ppstate))))))
       ((or (and (utf8-<= (char-code #\A) char)
                 (utf8-<= char (char-code #\Z)))
            (and (utf8-<= (char-code #\a) char)
                 (utf8-<= char (char-code #\z)))
            (= char (char-code #\_)))
        (plex-identifier char pos ppstate))
       ((utf8-= char (char-code #\'))
        (plex-character-constant nil pos ppstate))
       ((utf8-= char (char-code #\"))
        (if headerp (b* ((ppstate (punread-char ppstate)))
                      (plex-header-name ppstate))
          (plex-string-literal nil pos ppstate)))
       ((utf8-= char (char-code #\/))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator "/")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\*))
                 (plex-block-comment pos ppstate))
                ((utf8-= char2 (char-code #\/))
                 (plex-line-comment pos ppstate))
                ((utf8-= char2 (char-code #\=))
                 (retok (plexeme-punctuator "/=")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator "/")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\#))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator "#")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\#))
                 (retok (plexeme-punctuator "##")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator "#")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((or (utf8-= char (char-code #\[))
            (utf8-= char (char-code #\]))
            (utf8-= char (char-code #\())
            (utf8-= char (char-code #\)))
            (utf8-= char (char-code #\{))
            (utf8-= char (char-code #\}))
            (utf8-= char (char-code #\~))
            (utf8-= char (char-code #\?))
            (utf8-= char (char-code #\,))
            (utf8-= char (char-code #\;)))
        (retok
            (plexeme-punctuator (acl2::implode (list (code-char char))))
            (make-span :start pos :end pos)
            ppstate))
       ((utf8-= char (char-code #\*))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator "*")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\=))
                 (retok (plexeme-punctuator "*=")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator "*")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\^))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator "^")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\=))
                 (retok (plexeme-punctuator "^=")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator "^")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\!))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator "!")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\=))
                 (retok (plexeme-punctuator "!=")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator "!")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\=))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator "=")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\=))
                 (retok (plexeme-punctuator "==")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator "=")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\:))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator ":")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\>))
                 (retok (plexeme-punctuator ":>")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator ":")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\&))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator "&")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\&))
                 (retok (plexeme-punctuator "&&")
                        (make-span :start pos :end pos2)
                        ppstate))
                ((utf8-= char2 (char-code #\=))
                 (retok (plexeme-punctuator "&=")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator "&")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\|))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator "|")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\|))
                 (retok (plexeme-punctuator "||")
                        (make-span :start pos :end pos2)
                        ppstate))
                ((utf8-= char2 (char-code #\=))
                 (retok (plexeme-punctuator "|=")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator "|")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\+))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator "+")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\+))
                 (retok (plexeme-punctuator "++")
                        (make-span :start pos :end pos2)
                        ppstate))
                ((utf8-= char2 (char-code #\=))
                 (retok (plexeme-punctuator "+=")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator "+")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\-))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator "-")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\>))
                 (retok (plexeme-punctuator "->")
                        (make-span :start pos :end pos2)
                        ppstate))
                ((utf8-= char2 (char-code #\-))
                 (retok (plexeme-punctuator "--")
                        (make-span :start pos :end pos2)
                        ppstate))
                ((utf8-= char2 (char-code #\=))
                 (retok (plexeme-punctuator "-=")
                        (make-span :start pos :end pos2)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator "-")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\>))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
          (cond ((not char2)
                 (retok (plexeme-punctuator ">")
                        (make-span :start pos :end pos)
                        ppstate))
                ((utf8-= char2 (char-code #\>))
                 (b* (((erp char3 pos3 ppstate)
                       (pread-char ppstate)))
                   (cond ((not char3)
                          (retok (plexeme-punctuator ">>")
                                 (make-span :start pos :end pos2)
                                 ppstate))
                         ((utf8-= char3 (char-code #\=))
                          (retok (plexeme-punctuator ">>=")
                                 (make-span :start pos :end pos3)
                                 ppstate))
                         (t (b* ((ppstate (punread-char ppstate)))
                              (retok (plexeme-punctuator ">>")
                                     (make-span :start pos :end pos2)
                                     ppstate))))))
                ((utf8-= char2 (char-code #\=))
                 (retok (plexeme-punctuator ">=")
                        (make-span :start pos :end pos)
                        ppstate))
                (t (b* ((ppstate (punread-char ppstate)))
                     (retok (plexeme-punctuator ">")
                            (make-span :start pos :end pos)
                            ppstate))))))
       ((utf8-= char (char-code #\%))
        (b* (((erp char2 pos2 ppstate)
              (pread-char ppstate)))
         (cond
          ((not char2)
           (retok (plexeme-punctuator "%")
                  (make-span :start pos :end pos)
                  ppstate))
          ((utf8-= char2 (char-code #\=))
           (retok (plexeme-punctuator "%=")
                  (make-span :start pos :end pos2)
                  ppstate))
          ((utf8-= char2 (char-code #\>))
           (retok (plexeme-punctuator "%>")
                  (make-span :start pos :end pos2)
                  ppstate))
          ((utf8-= char2 (char-code #\:))
           (b* (((erp char3 & ppstate)
                 (pread-char ppstate)))
             (cond ((not char3)
                    (retok (plexeme-punctuator "%:")
                           (make-span :start pos :end pos2)
                           ppstate))
                   ((utf8-= char3 (char-code #\%))
                    (b* (((erp char4 pos4 ppstate)
                          (pread-char ppstate)))
                      (cond ((not char4)
                             (b* ((ppstate (punread-char ppstate)))
                               (retok (plexeme-punctuator "%:")
                                      (make-span :start pos :end pos2)
                                      ppstate)))
                            ((utf8-= char4 (char-code #\:))
                             (retok (plexeme-punctuator "%:%:")
                                    (make-span :start pos :end pos4)
                                    ppstate))
                            (t (b* ((ppstate (punread-char ppstate))
                                    (ppstate (punread-char ppstate)))
                                 (retok (plexeme-punctuator "%:")
                                        (make-span :start pos :end pos2)
                                        ppstate))))))
                   (t (b* ((ppstate (punread-char ppstate)))
                        (retok (plexeme-punctuator "%:")
                               (make-span :start pos :end pos2)
                               ppstate))))))
          (t (b* ((ppstate (punread-char ppstate)))
               (retok (plexeme-punctuator "%")
                      (make-span :start pos :end pos)
                      ppstate))))))
       ((utf8-= char (char-code #\<))
        (if headerp (b* ((ppstate (punread-char ppstate)))
                      (plex-header-name ppstate))
          (b* (((erp char2 pos2 ppstate)
                (pread-char ppstate)))
            (cond ((not char2)
                   (retok (plexeme-punctuator "<")
                          (make-span :start pos :end pos)
                          ppstate))
                  ((utf8-= char2 (char-code #\<))
                   (b* (((erp char3 pos3 ppstate)
                         (pread-char ppstate)))
                     (cond ((not char3)
                            (retok (plexeme-punctuator "<<")
                                   (make-span :start pos :end pos2)
                                   ppstate))
                           ((utf8-= char3 (char-code #\=))
                            (retok (plexeme-punctuator "<<=")
                                   (make-span :start pos :end pos3)
                                   ppstate))
                           (t (b* ((ppstate (punread-char ppstate)))
                                (retok (plexeme-punctuator "<<")
                                       (make-span :start pos :end pos2)
                                       ppstate))))))
                  ((utf8-= char2 (char-code #\=))
                   (retok (plexeme-punctuator "<=")
                          (make-span :start pos :end pos2)
                          ppstate))
                  ((utf8-= char2 (char-code #\:))
                   (retok (plexeme-punctuator "<:")
                          (make-span :start pos :end pos2)
                          ppstate))
                  ((utf8-= char2 (char-code #\%))
                   (retok (plexeme-punctuator "<%")
                          (make-span :start pos :end pos2)
                          ppstate))
                  (t (b* ((ppstate (punread-char ppstate)))
                       (retok (plexeme-punctuator "<")
                              (make-span :start pos :end pos)
                              ppstate)))))))
       (t (retok (plexeme-other char)
                 (make-span :start pos :end pos)
                 ppstate)))))

    Theorem: plexeme-optionp-of-plex-lexeme.lexeme?

    (defthm plexeme-optionp-of-plex-lexeme.lexeme?
      (b* (((mv acl2::?erp ?lexeme? ?span ?new-ppstate)
            (plex-lexeme headerp ppstate)))
        (plexeme-optionp lexeme?))
      :rule-classes :rewrite)

    Theorem: spanp-of-plex-lexeme.span

    (defthm spanp-of-plex-lexeme.span
      (b* (((mv acl2::?erp ?lexeme? ?span ?new-ppstate)
            (plex-lexeme headerp ppstate)))
        (spanp span))
      :rule-classes :rewrite)

    Theorem: ppstatep-of-plex-lexeme.new-ppstate

    (defthm ppstatep-of-plex-lexeme.new-ppstate
      (implies (ppstatep ppstate)
               (b* (((mv acl2::?erp ?lexeme? ?span ?new-ppstate)
                     (plex-lexeme headerp ppstate)))
                 (ppstatep new-ppstate)))
      :rule-classes :rewrite)

    Theorem: ppstate->size-of-plex-lexeme-uncond

    (defthm ppstate->size-of-plex-lexeme-uncond
      (b* (((mv acl2::?erp ?lexeme? ?span ?new-ppstate)
            (plex-lexeme headerp ppstate)))
        (<= (ppstate->size new-ppstate)
            (ppstate->size ppstate)))
      :rule-classes :linear)

    Theorem: ppstate->size-of-plex-lexeme-cond

    (defthm ppstate->size-of-plex-lexeme-cond
      (b* (((mv acl2::?erp ?lexeme? ?span ?new-ppstate)
            (plex-lexeme headerp ppstate)))
        (implies (and (not erp) lexeme?)
                 (<= (ppstate->size new-ppstate)
                     (1- (ppstate->size ppstate)))))
      :rule-classes :linear)