Token filtering.
(filter-tokens trees) → token-trees
As formalized in lexp, lexing organizes a Unicode character sequence into a sequence of lexemes. Comments and whitespace are discarded for parsing, which only applies to tokens. This process of discarding comments and whitespace is formalized via this function, which goes through a sequence of CSTs and only retains the ones for tokens.
We ignore any CST that is not a lexeme, but this never happens for a list of lexeme CSTs. We could strengthen the guard of this function to require lexeme CSTs.
Function:
(defun filter-tokens (trees) (declare (xargs :guard (abnf::tree-listp trees))) (let ((__function__ 'filter-tokens)) (declare (ignorable __function__)) (b* (((when (endp trees)) nil) (tree (car trees)) ((unless (and (abnf::tree-case tree :nonleaf) (equal (abnf::tree-nonleaf->rulename? tree) (abnf::rulename "lexeme")))) (filter-tokens (cdr trees))) (tree (caar (abnf::tree-nonleaf->branches tree)))) (if (abnf-tree-with-root-p tree "token") (cons tree (filter-tokens (cdr trees))) (filter-tokens (cdr trees))))))
Theorem:
(defthm tree-listp-of-filter-tokens (b* ((token-trees (filter-tokens trees))) (abnf::tree-listp token-trees)) :rule-classes :rewrite)
Theorem:
(defthm filter-tokens-of-tree-list-fix-trees (equal (filter-tokens (abnf::tree-list-fix trees)) (filter-tokens trees)))
Theorem:
(defthm filter-tokens-tree-list-equiv-congruence-on-trees (implies (abnf::tree-list-equiv trees trees-equiv) (equal (filter-tokens trees) (filter-tokens trees-equiv))) :rule-classes :congruence)