Skip to content

Add API for regex queries against all files in a project (or multiple projects)#2183

Open
rmunn wants to merge 10 commits intodevelopfrom
feat/api-for-file-regex
Open

Add API for regex queries against all files in a project (or multiple projects)#2183
rmunn wants to merge 10 commits intodevelopfrom
feat/api-for-file-regex

Conversation

@rmunn
Copy link
Contributor

@rmunn rmunn commented Feb 27, 2026

Fixes #1973.

New API endpoint /api/project/queryRegexFiles takes a projectType parameter, and two regexes, file and count. Optional third regex fileExclude which can make the file regex simpler to write in some circumstances.

To test, try URLs like http://localhost/api/project/queryRegexFiles?projectType=FLEx&file=^C[abc]chedSettings&count=version&fileExclude=FLExProject.CustomProperties, which should return some 130 results. Or http://localhost/api/project/queryRegexFiles?projectType=FLEx&file=FLExProject.*&count=version&fileExclude=FLExProject.CustomProperties which should return 2 results on a default dev install with just two projects (remove the fileExclude param and you should see 4 results instead).

If this is correct, we'll use this approach to implement the regexcount
command (which will pass one regex to Mercurial for filenames, and a
second regex to grep for finding patterns in those files).
@rmunn rmunn self-assigned this Feb 27, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 129d5db4-feeb-4e8d-85f5-d05b1b2bb39e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This change introduces a new API endpoint for administrators to execute regex queries across multiple projects of a specified type, aggregating match counts from the Hg service. It extends the backend service layer with query parameter support for Hg command server calls and adds handling for the new "regexcount" command in the shell script runner.

Changes

Cohort / File(s) Summary
API Endpoint
backend/LexBoxApi/Controllers/ProjectController.cs
Added QueryRegexFiles GET endpoint requiring admin authentication; accepts projectType, file, regex, and optional fileExclude query parameters; aggregates and returns total regex match count across all projects of the specified type.
Service Layer
backend/LexBoxApi/Services/HgService.cs, backend/LexCore/ServiceInterfaces/IHgService.cs
Added GetRegexCount public method to query regex matches via Hg command server; extended ExecuteHgCommandServerCommand and MaybeExecuteHgCommandServerCommand to support optional query parameters; added dependency on Microsoft.AspNetCore.WebUtilities for query string formatting.
Command Handler
hgweb/command-runner.sh
Added "regexcount" to the allowed commands list; implemented preflight validation parsing and decoding URL query parameters (file, fileExclude, regex) with 400 error responses for missing required parameters (file, regex); added regexcount case handler with logic to apply include/exclude glob rules based on presence of fileExclude parameter.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 With whisker-twitched precision,
I query files with fine decision,
Across all projects, counts align,
Admin magic, regex divine,
Data patterns perfectly mine! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR implements the core regexcount functionality and admin-only API endpoint, but issue #1973 requests a JSON map result format while implementation returns an int count, and the single-project endpoint requirement status is unclear. Clarify whether the response format aligns with #1973 requirements (JSON map of project IDs to counts vs. single int) and confirm both all-projects and single-project endpoints are implemented as specified.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding an API for regex queries against files in projects, which is the primary objective of this PR.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the regex query API functionality requested in issue #1973, with no extraneous modifications detected.
Description check ✅ Passed The PR description clearly relates to the changeset, describing a new API endpoint that matches the QueryRegexFiles method added to ProjectController and the supporting changes in HgService.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/api-for-file-regex

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the 📦 Lexbox issues related to any server side code, fw-headless included label Feb 27, 2026
rmunn added 3 commits March 2, 2026 13:15
Now special characters like +, *, ?, and so on will make it through
unscathed to the command-runner script, and from there to hg.
One endpoint, queryRegexFilesOneProject, is just for ease of testing.
@rmunn rmunn marked this pull request as ready for review March 2, 2026 07:46
@rmunn rmunn requested a review from myieye March 2, 2026 07:46
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

C# Unit Tests

162 tests  ±0   162 ✅ ±0   19s ⏱️ ±0s
 23 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit bf6bbda. ± Comparison against base commit ef61e00.

♻️ This comment has been updated with latest results.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/LexBoxApi/Controllers/ProjectController.cs`:
- Around line 134-143: In QueryRegexFiles (ProjectController.QueryRegexFiles)
validate the incoming query parameters 'file' and 'regex' before iterating
projects or calling hgService.GetRegexCount: check
string.IsNullOrWhiteSpace(file) and string.IsNullOrWhiteSpace(regex) and
immediately return a BadRequest with a clear message when either is
missing/empty, so you never call hgService.GetRegexCount with invalid inputs
that could cause downstream EnsureSuccessStatusCode to produce a 500.
- Around line 134-146: QueryRegexFiles currently aggregates counts into a single
int but should return a per-project map; change the return shape to a dictionary
(e.g., ActionResult<Dictionary<string,int>> or Dictionary<int,int> depending on
desired key) and build a result map while iterating
lexBoxDbContext.Projects.AsAsyncEnumerable(): for each project call
hgService.GetRegexCount(project.Code, file, regex, fileExclude), coerce null to
0, and add an entry keyed by project.Code (or project.Id) with that count, then
return Ok(resultMap) instead of the aggregated count.

In `@backend/LexCore/ServiceInterfaces/IHgService.cs`:
- Line 26: Update the IHgService signature and controller to accept and
propagate a CancellationToken: change the interface method Task<int?>
GetRegexCount(ProjectCode code, string fileRegex, string contentRegex, string?
fileExcludeRegex = null) to include a final CancellationToken parameter (e.g.
CancellationToken cancellationToken), update the controller action
QueryRegexFiles to accept CancellationToken cancellationToken in its signature
or parameters, and pass that token into the hgService.GetRegexCount(...) call so
the long-running scan can be cancelled; ensure method implementations are
updated to accept and forward the token as well.

In `@hgweb/command-runner.sh`:
- Line 85: The test in the if-statement checking variables fileInclude and
contentRegex uses the non-POSIX operator -o; update the condition in the if
inside command-runner.sh (the if that reads "if [ -z \"$fileInclude\" -o -z
\"$contentRegex\" ]; then") to use the portable shell-level || operator between
the two tests (i.e., replace -o with ||) so the expression becomes
POSIX-compliant and passes shellcheck; ensure quoting of "$fileInclude" and
"$contentRegex" is preserved.
- Around line 131-133: The pipeline currently uses "chg cat -r tip ... | grep -c
-P \"$contentRegex\"" which counts matching lines not total matches; change it
to emit each match and count them by replacing the grep -c step with a command
that uses grep -oP "$contentRegex" piped to wc -l so multiple matches on a
single line are counted; apply this change in both branches where "chg cat -r
tip ... --include=\"re:$fileInclude\"" and the branch with
"--exclude=\"re:$fileExclude\"" are used, keeping the rest of the pipeline
intact.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef61e00 and e0eac29.

📒 Files selected for processing (4)
  • backend/LexBoxApi/Controllers/ProjectController.cs
  • backend/LexBoxApi/Services/HgService.cs
  • backend/LexCore/ServiceInterfaces/IHgService.cs
  • hgweb/command-runner.sh

rmunn added 6 commits March 2, 2026 15:04
Desired output shape is dictionary with project codes as keys and counts
as values, not just a single total count.
Instead of counting lines matched, we're more likely to want to count
multiple matches on the same line as counting multiple times. So we'll
have grep output one line per match, and use `wc -l` to count them.
Rest of script is using double-bracket syntax for if statements, so this
part should be consistent as well.
We can just retrieve the project codes in an array, since they're not
going to be all that large, and then iterate over them synchronously.
@rmunn
Copy link
Contributor Author

rmunn commented Mar 4, 2026

@myieye - Finished implementing all AI review suggestions, including the one about not tying up a DB connection by just grabbing the project codes and then iterating synchronously over them. Anything else you'd like me to tackle?

Copy link
Collaborator

@myieye myieye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think new names would be quite valuable. I don't think the API surface is self-explanatory at this point.

return result;
}

[HttpGet("queryRegexFiles")]
Copy link
Collaborator

@myieye myieye Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the names used in this API non self-explanatory.
As it is, without extra documentation in the Swagger UI, I don't think anyone could know quite for sure how this API works.

I think this would be a much more understandable API:

[HttpGet("projectMatchCounts")]
[AdminRequired]
public async Task<ActionResult<Dictionary<string, int>>> GetProjectMatchCounts(
    CancellationToken token,
    [FromQuery] ProjectType projectType,
    [FromQuery] string includeFileRegex,
    [FromQuery] string matchCountRegex,
    [FromQuery] string? excludeFileRegex)

With these names I think we're ok without extra API documentation. I seem to remember that it's not dead simple to add descriptions to the Swagger API/UI.

I see you have naming closer to my suggestions in HgServer.GetRegexCount. But these are my favorite names, because:
"projectMatchCounts" communicates that the result type is actually by project
"matchCountRegex" communicates a tad more clearly to me than "contentRegex" that this is the regex that really determines the count result.

I'd be fine renaming "includeFileRegex" to just "fileRegex", though I slightly prefer the more verbose name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do countProjectMatches for the API name.

The regex parameter names you suggested sound fine, I think.

return int.TryParse(str, out int result) ? result : null;
}

public async Task<int?> GetRegexCount(ProjectCode code, string fileRegex, string contentRegex, CancellationToken token = default, string? fileExcludeRegex = null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The position of the cancellation token is strange. So, how about we put it at the end?

Suggested change
public async Task<int?> GetRegexCount(ProjectCode code, string fileRegex, string contentRegex, CancellationToken token = default, string? fileExcludeRegex = null)
public async Task<int?> GetRegexCount(ProjectCode code, string fileRegex, string contentRegex, string? fileExcludeRegex = null, CancellationToken token = default)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, moving the token to the end makes sense. The reasons I had put it in the middle don't actually make sense, now that I think about it, since we're never actually calling this from other code, just passing in the query params that the API endpoint received.

return int.TryParse(str, out int result) ? result : null;
}

public async Task<int?> GetRegexCount(ProjectCode code, string fileRegex, string contentRegex, CancellationToken token = default, string? fileExcludeRegex = null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once my naming comment regarding the method and params in the controller is settled, I think those names should probably propagate to here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Comment on lines 569 to +576
var httpClient = _hgClient.Value;
var baseUri = _options.Value.HgCommandServer;
var response = await httpClient.GetAsync($"{baseUri}{code}/{command}", HttpCompletionOption.ResponseHeadersRead, token);
var uri = $"{baseUri}{code}/{command}";
if (queryParams is not null)
{
uri = QueryHelpers.AddQueryString(uri, queryParams);
}
var response = await httpClient.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead, token);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be in a shared internal method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Execute and MaybeExecute do share a lot of code.

And while I'm at it, I might fix something that's been bugging me, where the HTTP result returned from command-runner.sh doesn't get passed through to the client but gets turned into a 500 error no matter what. I'll see if there's a simple way to pass it through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📦 Lexbox issues related to any server side code, fw-headless included

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FW Project analysis querys

2 participants