Skip to content

Fix unsupported regex patterns causing cvc5 theory-combination errors#699

Open
shigoel wants to merge 5 commits intomainfrom
shilpi/unsupported-regex-smt-fix
Open

Fix unsupported regex patterns causing cvc5 theory-combination errors#699
shigoel wants to merge 5 commits intomainfrom
shilpi/unsupported-regex-smt-fix

Conversation

@shigoel
Copy link
Copy Markdown
Contributor

@shigoel shigoel commented Mar 27, 2026

Summary

  • Root cause: Patterns using unsupported features (\S, \d, \w, \t, non-greedy quantifiers, lookaheads, etc.) left re_*_str (RegLan-typed) as uninterpreted UFs in the SMT encoding. cvc5 rejects these with "Regular expression terms are not supported in theory combination", which aborted all verification results with an internal error.

  • Fix 1 — re_*_bool as factory functions: Promoted re_fullmatch_bool, re_match_bool, re_search_bool from inline prelude functions to factory functions. Their concreteEval returns Str.InRegEx(s, compiled_regex) on success, or .none on failure — leaving an uninterpreted Bool UF instead of a RegLan UF. An uninterpreted Bool UF produces unknown gracefully (sound over-approximation).

  • Fix 2 — parseCharClass escape handling: Backslash escape sequences inside character classes (e.g. [a-zA-Z_\.\-/0-9:]) were misread: \- triggered a spurious range check, causing a patternError and leaving the pattern uninterpreted. Added escape handling to parseCharClass mirroring the top-level atom parser.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Previously, patterns using unsupported features like \S, \d, \w left
`re_*_str` (RegLan-typed) as uninterpreted UFs in the SMT encoding.
cvc5 rejects uninterpreted RegLan UFs with "Regular expression terms
are not supported in theory combination", producing an internal error
that aborts all verification results.

Fix: promote `re_fullmatch_bool`, `re_match_bool`, `re_search_bool`
from inline prelude functions to factory functions. Their concreteEval
returns `Str.InRegEx(s, compiled_regex)` on success, or `.none` on
failure — leaving an uninterpreted Bool UF instead of a RegLan UF.
An uninterpreted Bool UF produces `unknown` gracefully.

Also fix `parseCharClass` to handle backslash escape sequences inside
character classes (e.g. `[\.\-]`), mirroring the top-level atom parser.
Previously `\-` was misread as a range operator, triggering a spurious
patternError and leaving the pattern uninterpreted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shigoel shigoel requested a review from a team March 27, 2026 23:54
@shigoel shigoel added the Python label Mar 27, 2026
Extract `parseEscape` from the duplicated backslash-handling logic in
`parseAtom` (top-level) and `parseCharClass`.  Both call sites now
delegate to a single function parameterised by a `context` string
(e.g. `" in character class"`) used in error messages.

Also add `#guard_msgs` tests for incomplete escape sequences at the
top level (`"\\"`) and inside character classes (`"[a\\"`) to pin
the error messages produced by `parseEscape`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shigoel shigoel marked this pull request as draft March 28, 2026 00:23
…te cases

DiffTestCore was missing `canBeFalseAndIsReachable` in its noMatch check
after the VCOutcome refactor in #487. In default deductive mode, a failing
assertion produces `canBeFalseAndIsReachable` (validity counterexample found,
sat check not run), not `alwaysFalseAndReachable`. This caused 205 valid
noMatch cases to fall through to `smtError:unknown` (investigate bucket).

Fix: use `VCResult.isFailure` which covers all failure outcomes correctly.

Also make diff_test.py exit non-zero when there are investigate cases, so
CI catches regressions like this automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shigoel shigoel marked this pull request as ready for review March 28, 2026 01:35
These functions could produce uninterpreted RegLan UFs in SMT when
concreteEval fires .none (unsupported patterns, non-constant arguments),
causing cvc5 to abort with "Regular expression terms are not supported
in theory combination". Now that re_*_bool handles all matching, these
are dead code and a latent hazard.

Also removes the unused mkModeCompileFunc helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shigoel shigoel enabled auto-merge March 28, 2026 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant