• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
    • Debugging
    • Community
    • Std
    • Proof-automation
    • Macro-libraries
    • ACL2
    • Interfacing-tools
    • Hardware-verification
    • Software-verification
      • Kestrel-books
        • Crypto-hdwallet
        • Apt
        • Error-checking
        • Fty-extensions
        • Isar
        • Kestrel-utilities
        • Set
        • C
          • Syntax-for-tools
            • Formalized-subset
            • Mapping-to-language-definition
            • Input-files
            • Compilation-database
            • Printer
            • Output-files
            • Abstract-syntax-operations
            • Implementation-environments
            • Abstract-syntax
            • Concrete-syntax
            • Disambiguation
            • Validation
            • Gcc-builtins
            • Preprocessing
              • Preprocessor
                • Preprocessor-states
                • Pread-token/newline
                • Pread-token
                • Preprocessor-lexer
                • Pproc-files
                • Preprocessor-printer
                • Pproc-file
                • Filepath-plexeme-list-alist-to-filepath-filedata-map
                • Filepath-plexeme-list-alist
                • Preprocessor-reader
                  • Pread-char
                    • Punread-chars
                    • Update-ppstate-for-char
                    • Punread-char
                  • Preprocessor-messages
                • External-preprocessing
              • Parsing
            • Atc
            • Transformation-tools
            • Language
            • Representation
            • Insertion-sort
            • Pack
          • Soft
          • Bv
          • Imp-language
          • Ethereum
          • Event-macros
          • Java
          • Riscv
          • Bitcoin
          • Zcash
          • Yul
          • ACL2-programming-language
          • Prime-fields
          • Json
          • Syntheto
          • File-io-light
          • Cryptography
          • Number-theory
          • Axe
          • Lists-light
          • Builtins
          • Solidity
          • Helpers
          • Htclient
          • Typed-lists-light
          • Arithmetic-light
        • X86isa
        • Axe
        • Execloader
      • Math
      • Testing-utilities
    • Preprocessor-reader

    Pread-char

    Read a character during preprocessing.

    Signature
    (pread-char ppstate) → (mv erp char? pos new-ppstate)
    Arguments
    ppstate — Guard (ppstatep ppstate).
    Returns
    char? — Type (nat-optionp char?).
    pos — Type (positionp pos).
    new-ppstate — Type (ppstatep new-ppstate), given (ppstatep ppstate).

    This is almost identical to read-char (see its documentation first), but with some differences.

    Here we perform phases 1 and 2 of [C17:5.1.1.2].

    Here the implementation-defined mapping from physical source file multibyte characters to the source character set amounts to UTF-8 decoding.

    Unlike read-char, here we do not normalize the three kinds of new lines to LF, because we want to preserve line endings. However, we need to be careful about how we increment the current position. If we read a CR, we check whether it is followed by LF. If that is the case, the line ending is the combination of CR and LF, and thus we do not change the position upon reading the CR (as if the CR took no visual space); when this function is called again next, it will read the LF, which always changes the position to the next line. If instead the CR is not followed by LF, then the CR is the whole line ending, and we change the position to the next line. Reading an LF always changes the position to the next line, whether it is preceded by CR (see above) or not; in the latter case, the LF is the whole line ending.

    If the next character is a question mark, we check the subsequent characters (if any) to see whether we have a trigraph sequence, which we turn into the single character it denotes [C17:5.2.1.1].

    If the next character is a backslash, we check whether it is followed by new-line characters, namely CR (not followed by LF), LF, or CR LF. If that is the case, the backslash and the new-line character(s) are skipped, and we recursively try to read another character. We also ensure that the backslash-new-line being deleted is not at the end of the file [C17:5.1.1.2/2].

    Definitions and Theorems

    Function: pread-char

    (defun pread-char (ppstate)
     (declare (xargs :stobjs (ppstate)))
     (declare (xargs :guard (ppstatep ppstate)))
     (let ((__function__ 'pread-char))
      (declare (ignorable __function__))
      (b*
       (((reterr) nil (irr-position) ppstate)
        (ppstate.bytes (ppstate->bytes ppstate))
        (ppstate.position (ppstate->position ppstate))
        (ppstate.chars-read (ppstate->chars-read ppstate))
        (ppstate.chars-unread (ppstate->chars-unread ppstate))
        (ppstate.size (ppstate->size ppstate))
        ((when (> ppstate.chars-unread 0))
         (b*
          (((unless (> ppstate.size 0))
            (raise
             "Internal error: ~
                          there are ~x0 unread characters ~
                          but the size of the preprocessing state is 0."
             ppstate.chars-unread)
            (reterr t))
           ((unless (< ppstate.chars-read
                       (ppstate->chars-length ppstate)))
            (raise "Internal error: index ~x0 out of bound ~x1."
                   ppstate.chars-read
                   (ppstate->chars-length ppstate))
            (reterr t))
           (char+pos (ppstate->char ppstate.chars-read ppstate))
           (ppstate
                (update-ppstate->chars-unread (1- ppstate.chars-unread)
                                              ppstate))
           (ppstate (update-ppstate->chars-read (1+ ppstate.chars-read)
                                                ppstate))
           (ppstate (update-ppstate->size (1- ppstate.size)
                                          ppstate)))
          (retok (char+position->char char+pos)
                 (char+position->position char+pos)
                 ppstate)))
        ((unless (consp ppstate.bytes))
         (if
          (> ppstate.size 0)
          (prog2$
           (raise
            "Internal error: ~
                                there are no unread characters and no more bytes, ~
                                but the size of the preprocessing state is ~x0."
            ppstate.size)
           (reterr t))
          (retok nil ppstate.position ppstate)))
        ((unless (> ppstate.size 0))
         (raise
          "Internal error: ~
                    there are more bytes but ~
                    the size of the preprocessing state is 0.")
         (reterr t))
        (byte (car ppstate.bytes))
        (bytes (cdr ppstate.bytes))
        ((unless (< ppstate.chars-read
                    (ppstate->chars-length ppstate)))
         (raise "Internal error: index ~x0 out of bound ~x1."
                ppstate.chars-read
                (ppstate->chars-length ppstate))
         (reterr t))
        ((when (utf8-= byte 13))
         (b*
          ((lf-follows-cr-p (and (consp bytes)
                                 (utf8-= (car bytes) 10)))
           (new-position (if lf-follows-cr-p ppstate.position
                           (position-inc-line 1 ppstate.position)))
           (ppstate
             (update-ppstate-for-char 13 bytes new-position 1 ppstate)))
          (retok 13 ppstate.position ppstate)))
        ((when (utf8-= byte 10))
         (b*
          ((new-position (position-inc-line 1 ppstate.position))
           (ppstate
             (update-ppstate-for-char 10 bytes new-position 1 ppstate)))
          (retok 10 ppstate.position ppstate)))
        ((when (utf8-= byte (char-code #\?)))
         (b*
          (((mv char new-bytes num-chars/bytes)
            (if
             (and (consp bytes)
                  (utf8-= (car bytes) (char-code #\?))
                  (consp (cdr bytes)))
             (b*
              (((unless (>= ppstate.size 3))
                (raise
                 "Internal error: ~
                                    there are three or more bytes ~
                                    but the size of the preprocessing state is ~x0."
                 ppstate.size)
                (mv 0 nil 1)))
              (cond ((utf8-= (cadr bytes) (char-code #\=))
                     (mv (char-code #\#) (cddr bytes) 3))
                    ((utf8-= (cadr bytes) (char-code #\())
                     (mv (char-code #\[) (cddr bytes) 3))
                    ((utf8-= (cadr bytes) (char-code #\/))
                     (mv (char-code #\\) (cddr bytes) 3))
                    ((utf8-= (cadr bytes) (char-code #\)))
                     (mv (char-code #\]) (cddr bytes) 3))
                    ((utf8-= (cadr bytes) (char-code #\'))
                     (mv (char-code #\^) (cddr bytes) 3))
                    ((utf8-= (cadr bytes) (char-code #\<))
                     (mv (char-code #\{) (cddr bytes) 3))
                    ((utf8-= (cadr bytes) (char-code #\!))
                     (mv (char-code #\|) (cddr bytes) 3))
                    ((utf8-= (cadr bytes) (char-code #\>))
                     (mv (char-code #\}) (cddr bytes) 3))
                    ((utf8-= (cadr bytes) (char-code #\-))
                     (mv (char-code #\~) (cddr bytes) 3))
                    (t (mv byte bytes 1))))
             (mv byte bytes 1)))
           (new-position
                (position-inc-column num-chars/bytes ppstate.position))
           (ppstate (update-ppstate-for-char
                         char new-bytes
                         new-position num-chars/bytes ppstate)))
          (retok char ppstate.position ppstate)))
        ((when (utf8-= byte (char-code #\\)))
         (b*
          (((when (endp bytes))
            (reterr
                 (msg "File ends with backslash instead of new line.")))
           ((when (utf8-= (car bytes) 10))
            (b*
             (((unless (>= ppstate.size 2))
               (raise
                "Internal error: ~
                                there are two or more bytes ~
                                but the size of the preprocessing state is ~x0."
                ppstate.size)
               (reterr t))
              (new-bytes (cdr bytes))
              (new-position (position-inc-line 1 ppstate.position))
              (new-size (- ppstate.size 2))
              (ppstate (update-ppstate->bytes new-bytes ppstate))
              (ppstate (update-ppstate->position new-position ppstate))
              (ppstate (update-ppstate->size new-size ppstate))
              ((unless (consp new-bytes))
               (reterr (msg "File ends with backslash and new line."))))
             (pread-char ppstate)))
           ((when (utf8-= (car bytes) 13))
            (if
             (and (consp (cdr bytes))
                  (utf8-= (cadr bytes) 10))
             (b*
              (((when (endp (cddr bytes)))
                (reterr
                 (msg
                  "File ends with backslash and new line ~
                                          instead of new line.")))
               ((unless (>= ppstate.size 3))
                (raise
                 "Internal error: ~
                                    there are three or more bytes ~
                                    but the size of the preprocessing state is ~x0."
                 ppstate.size)
                (reterr t))
               (new-bytes (cddr bytes))
               (new-position (position-inc-line 1 ppstate.position))
               (new-size (- ppstate.size 3))
               (ppstate (update-ppstate->bytes new-bytes ppstate))
               (ppstate (update-ppstate->position new-position ppstate))
               (ppstate (update-ppstate->size new-size ppstate))
               ((unless (consp new-bytes))
                (reterr
                     (msg "File ends with backslash and new line."))))
              (pread-char ppstate))
             (b*
              (((when (endp (cdr bytes)))
                (reterr
                 (msg
                  "File ends with backslash and new line ~
                                        instead of new line.")))
               ((unless (>= ppstate.size 2))
                (raise
                 "Internal error: ~
                                  there are three or more bytes ~
                                  but the size of the preprocessing state is ~x0."
                 ppstate.size)
                (reterr t))
               (new-bytes (cdr bytes))
               (new-position (position-inc-line 1 ppstate.position))
               (new-size (- ppstate.size 2))
               (ppstate (update-ppstate->bytes new-bytes ppstate))
               (ppstate (update-ppstate->position new-position ppstate))
               (ppstate (update-ppstate->size new-size ppstate))
               ((unless (consp new-bytes))
                (reterr
                     (msg "File ends with backslash and new line."))))
              (pread-char ppstate))))
           (new-position (position-inc-column 1 ppstate.position))
           (ppstate (update-ppstate-for-char
                         byte bytes new-position 1 ppstate)))
          (retok byte ppstate.position ppstate)))
        ((when (or (utf8-= byte 9)
                   (utf8-= byte 11)
                   (utf8-= byte 12)
                   (and (utf8-<= 32 byte)
                        (utf8-<= byte 126))))
         (b* ((new-position (position-inc-column 1 ppstate.position))
              (ppstate (update-ppstate-for-char
                            byte bytes new-position 1 ppstate)))
           (retok byte ppstate.position ppstate)))
        ((when (utf8-= (logand byte 224) 192))
         (b*
          (((unless (consp bytes))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "another byte after ~
                                              the first byte ~x0 ~
                                              of the form 110... ~
                                              (i.e. between 192 and 223) ~
                                              of a two-byte UTF-8 encoding"
              byte)
             :found "end of file"))
           ((unless (>= ppstate.size 2))
            (raise
             "Internal error: ~
                          there are two or more bytes ~
                          but the size of the preprocessing state is ~x0."
             ppstate.size)
            (reterr t))
           (byte2 (car bytes))
           (bytes (cdr bytes))
           ((unless (utf8-= (logand byte2 192) 128))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "a byte of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              after the first byte ~x0 ~
                                              of the form 110... ~
                                              (i.e. between 192 and 223) ~
                                              of a two-byte UTF-8 encoding"
              byte)
             :found (msg "the byte ~x0" byte2)))
           (code (+ (ash (logand byte 31) 6)
                    (logand byte2 63)))
           ((when (< code 128))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "a value between 80h and 7FFh ~
                                              UTF-8-encoded in the two bytes ~
                                              (~x0 ~x1)"
              byte byte2)
             :found (msg "the value ~x0" code)))
           (new-position (position-inc-column 1 ppstate.position))
           (ppstate (update-ppstate-for-char
                         code bytes new-position 2 ppstate)))
          (retok code ppstate.position ppstate)))
        ((when (utf8-= (logand byte 240) 224))
         (b*
          (((unless (consp bytes))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "another byte after ~
                                              the first byte ~x0 ~
                                              of the form 1110... ~
                                              (i.e. between 224 to 239) ~
                                              of a three-byte UTF-8 encoding"
              byte)
             :found "end of file"))
           ((unless (>= ppstate.size 2))
            (raise
             "Internal error: ~
                          there are two or more bytes ~
                          but the size of the preprocessing state is ~x0."
             ppstate.size)
            (reterr t))
           (byte2 (car bytes))
           (bytes (cdr bytes))
           ((unless (utf8-= (logand byte2 192) 128))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "a byte of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              after the first byte ~x0 ~
                                              of the form 1110... ~
                                              (i.e. between 224 and 239) ~
                                              of a three-byte UTF-8 encoding"
              byte)
             :found (msg "the byte ~x0" byte2)))
           ((unless (consp bytes))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "another byte after ~
                                              the first byte ~x0 ~
                                              of the form 1110... ~
                                              (i.e. between 224 to 239) ~
                                              and the second byte ~x1 ~
                                              of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              of a three-byte UTF-8 encoding"
              byte byte2)
             :found "end of file"))
           ((unless (>= ppstate.size 3))
            (raise
             "Internal error: ~
                          there are three or more bytes ~
                          but the size of the preprocessing state is ~x0."
             ppstate.size)
            (reterr t))
           (byte3 (car bytes))
           (bytes (cdr bytes))
           ((unless (utf8-= (logand byte3 192) 128))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "a byte of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              after the first byte ~x0 ~
                                              of the form 1110... ~
                                              (i.e. between 224 and 239) ~
                                              and the second byte ~x1 ~
                                              of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              of a three-byte UTF-8 encoding"
              byte byte2)
             :found (msg "the byte ~x0" byte3)))
           (code (+ (ash (logand byte 15) 12)
                    (ash (logand byte2 63) 6)
                    (logand byte3 63)))
           ((when (< code 2048))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "a value between 800h and FFFFh ~
                                              UTF-8-encoded in the three bytes ~
                                              (~x0 ~x1 ~x2)"
              byte byte2 byte3)
             :found (msg "the value ~x0" code)))
           ((when (or (and (utf8-<= 8234 code)
                           (utf8-<= code 8238))
                      (and (utf8-<= 8294 code)
                           (utf8-<= code 8297))))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             "a Unicode character with code ~
                                         in the range 9-13 or 32-126 ~
                                         or 128-8233 or 8239-8293 or ~
                                         or 8298-55295 or 57344-1114111"
             :found (char-to-msg code)))
           (new-position (position-inc-column 1 ppstate.position))
           (ppstate (update-ppstate-for-char
                         code bytes new-position 3 ppstate)))
          (retok code ppstate.position ppstate)))
        ((when (utf8-= (logand 248 byte) 240))
         (b*
          (((unless (consp bytes))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "another byte after ~
                                              the first byte ~x0 ~
                                              of the form 11110... ~
                                              (i.e. between 240 to 247) ~
                                              of a four-byte UTF-8 encoding"
              byte)
             :found "end of file"))
           ((unless (>= ppstate.size 2))
            (raise
             "Internal error: ~
                          there are two or more bytes ~
                          but the size of the preprocessing state is ~x0."
             ppstate.size)
            (reterr t))
           (byte2 (car bytes))
           (bytes (cdr bytes))
           ((unless (utf8-= (logand byte2 192) 128))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "a byte of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              after the first byte ~x0 ~
                                              of the form 11110... ~
                                              (i.e. between 240 and 247) ~
                                              of a four-byte UTF-8 encoding"
              byte)
             :found (msg "the byte ~x0" byte2)))
           ((unless (consp bytes))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "another byte after ~
                                              the first byte ~x0 ~
                                              of the form 11110... ~
                                              (i.e. between 240 to 247) ~
                                              and the second byte ~x1 ~
                                              of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              of a four-byte UTF-8 encoding"
              byte byte2)
             :found "end of file"))
           ((unless (>= ppstate.size 3))
            (raise
             "Internal error: ~
                          there are three or more bytes ~
                          but the size of the preprocessing state is ~x0."
             ppstate.size)
            (reterr t))
           (byte3 (car bytes))
           (bytes (cdr bytes))
           ((unless (utf8-= (logand byte3 192) 128))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "a byte of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              after the first byte ~x0 ~
                                              of the form 11110... ~
                                              (i.e. between 240 and 247) ~
                                              and the second byte ~x1 ~
                                              of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              of a four-byte UTF-8 encoding"
              byte byte2)
             :found (msg "the byte ~x0" byte3)))
           ((unless (consp bytes))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "another byte after ~
                                              the first byte ~x0 ~
                                              of the form 11110... ~
                                              (i.e. between 240 to 247) ~
                                              and the second byte ~x1 ~
                                              of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              and the third byte ~x2 ~
                                              of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              of a four-byte UTF-8 encoding"
              byte byte2 byte3)
             :found "end of file"))
           ((unless (>= ppstate.size 4))
            (raise
             "Internal error: ~
                          there are four or more bytes ~
                          but the size of the preprocessing state is ~x0."
             ppstate.size)
            (reterr t))
           (byte4 (car bytes))
           (bytes (cdr bytes))
           ((unless (utf8-= (logand byte4 192) 128))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "a byte of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              after the first byte ~x0 ~
                                              of the form 11110... ~
                                              (i.e. between 240 and 247) ~
                                              and the second byte ~x1 ~
                                              of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              and the third byte ~x2 ~
                                              of the form 10... ~
                                              (i.e. between 128 and 191) ~
                                              of a four-byte UTF-8 encoding"
              byte byte2 byte3)
             :found (msg "the byte ~x0" byte4)))
           (code (+ (ash (logand byte 7) 18)
                    (ash (logand byte2 63) 12)
                    (ash (logand byte3 63) 6)
                    (logand byte4 63)))
           ((when (or (< code 65536) (> code 1114111)))
            (reterr-msg
             :where (position-to-msg ppstate.position)
             :expected
             (msg
              "a value between 10000h and 10FFFFh ~
                                              UTF-8-encoded in the four bytes ~
                                              (~x0 ~x1 ~x2 ~x3)"
              byte byte2 byte3 byte4)
             :found (msg "the value ~x0" code)))
           (new-position (position-inc-column 1 ppstate.position))
           (ppstate (update-ppstate-for-char
                         code bytes new-position 3 ppstate)))
          (retok code ppstate.position ppstate))))
       (reterr-msg
            :where (position-to-msg ppstate.position)
            :expected "a byte in the range 9-13 or 32-126 or 192-223"
            :found (msg "the byte ~x0" byte)))))

    Theorem: nat-optionp-of-pread-char.char?

    (defthm nat-optionp-of-pread-char.char?
      (b* (((mv acl2::?erp
                ?char? acl2::?pos ?new-ppstate)
            (pread-char ppstate)))
        (nat-optionp char?))
      :rule-classes :rewrite)

    Theorem: positionp-of-pread-char.pos

    (defthm positionp-of-pread-char.pos
      (b* (((mv acl2::?erp
                ?char? acl2::?pos ?new-ppstate)
            (pread-char ppstate)))
        (positionp pos))
      :rule-classes :rewrite)

    Theorem: ppstatep-of-pread-char.new-ppstate

    (defthm ppstatep-of-pread-char.new-ppstate
      (implies (ppstatep ppstate)
               (b* (((mv acl2::?erp
                         ?char? acl2::?pos ?new-ppstate)
                     (pread-char ppstate)))
                 (ppstatep new-ppstate)))
      :rule-classes :rewrite)

    Theorem: pread-char.char?-type-prescription

    (defthm pread-char.char?-type-prescription
      (b* (((mv acl2::?erp
                ?char? acl2::?pos ?new-ppstate)
            (pread-char ppstate)))
        (or (equal char? nil) (natp char?)))
      :rule-classes :type-prescription)

    Theorem: ppstate->size-of-pread-char-uncond

    (defthm ppstate->size-of-pread-char-uncond
      (b* (((mv acl2::?erp
                ?char? acl2::?pos ?new-ppstate)
            (pread-char ppstate)))
        (<= (ppstate->size new-ppstate)
            (ppstate->size ppstate)))
      :rule-classes :linear)

    Theorem: ppstate->size-of-pread-char-cond

    (defthm ppstate->size-of-pread-char-cond
      (b* (((mv acl2::?erp
                ?char? acl2::?pos ?new-ppstate)
            (pread-char ppstate)))
        (implies (and (not erp) char?)
                 (<= (ppstate->size new-ppstate)
                     (1- (ppstate->size ppstate)))))
      :rule-classes :linear)