Fix unsupported regex patterns causing cvc5 theory-combination errors#699
Open
Fix unsupported regex patterns causing cvc5 theory-combination errors#699
Conversation
Previously, patterns using unsupported features like \S, \d, \w left `re_*_str` (RegLan-typed) as uninterpreted UFs in the SMT encoding. cvc5 rejects uninterpreted RegLan UFs with "Regular expression terms are not supported in theory combination", producing an internal error that aborts all verification results. Fix: promote `re_fullmatch_bool`, `re_match_bool`, `re_search_bool` from inline prelude functions to factory functions. Their concreteEval returns `Str.InRegEx(s, compiled_regex)` on success, or `.none` on failure — leaving an uninterpreted Bool UF instead of a RegLan UF. An uninterpreted Bool UF produces `unknown` gracefully. Also fix `parseCharClass` to handle backslash escape sequences inside character classes (e.g. `[\.\-]`), mirroring the top-level atom parser. Previously `\-` was misread as a range operator, triggering a spurious patternError and leaving the pattern uninterpreted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract `parseEscape` from the duplicated backslash-handling logic in `parseAtom` (top-level) and `parseCharClass`. Both call sites now delegate to a single function parameterised by a `context` string (e.g. `" in character class"`) used in error messages. Also add `#guard_msgs` tests for incomplete escape sequences at the top level (`"\\"`) and inside character classes (`"[a\\"`) to pin the error messages produced by `parseEscape`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…te cases DiffTestCore was missing `canBeFalseAndIsReachable` in its noMatch check after the VCOutcome refactor in #487. In default deductive mode, a failing assertion produces `canBeFalseAndIsReachable` (validity counterexample found, sat check not run), not `alwaysFalseAndReachable`. This caused 205 valid noMatch cases to fall through to `smtError:unknown` (investigate bucket). Fix: use `VCResult.isFailure` which covers all failure outcomes correctly. Also make diff_test.py exit non-zero when there are investigate cases, so CI catches regressions like this automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
These functions could produce uninterpreted RegLan UFs in SMT when concreteEval fires .none (unsupported patterns, non-constant arguments), causing cvc5 to abort with "Regular expression terms are not supported in theory combination". Now that re_*_bool handles all matching, these are dead code and a latent hazard. Also removes the unused mkModeCompileFunc helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root cause: Patterns using unsupported features (
\S,\d,\w,\t, non-greedy quantifiers, lookaheads, etc.) leftre_*_str(RegLan-typed) as uninterpreted UFs in the SMT encoding. cvc5 rejects these with"Regular expression terms are not supported in theory combination", which aborted all verification results with an internal error.Fix 1 —
re_*_boolas factory functions: Promotedre_fullmatch_bool,re_match_bool,re_search_boolfrominlineprelude functions to factory functions. TheirconcreteEvalreturnsStr.InRegEx(s, compiled_regex)on success, or.noneon failure — leaving an uninterpreted Bool UF instead of a RegLan UF. An uninterpreted Bool UF producesunknowngracefully (sound over-approximation).Fix 2 —
parseCharClassescape handling: Backslash escape sequences inside character classes (e.g.[a-zA-Z_\.\-/0-9:]) were misread:\-triggered a spurious range check, causing apatternErrorand leaving the pattern uninterpreted. Added escape handling toparseCharClassmirroring the top-level atom parser.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.