Add API for regex queries against all files in a project (or multiple projects)#2183
Add API for regex queries against all files in a project (or multiple projects)#2183
Conversation
If this is correct, we'll use this approach to implement the regexcount command (which will pass one regex to Mercurial for filenames, and a second regex to grep for finding patterns in those files).
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis change introduces a new API endpoint for administrators to execute regex queries across multiple projects of a specified type, aggregating match counts from the Hg service. It extends the backend service layer with query parameter support for Hg command server calls and adds handling for the new "regexcount" command in the shell script runner. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Now special characters like +, *, ?, and so on will make it through unscathed to the command-runner script, and from there to hg.
One endpoint, queryRegexFilesOneProject, is just for ease of testing.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/LexBoxApi/Controllers/ProjectController.cs`:
- Around line 134-143: In QueryRegexFiles (ProjectController.QueryRegexFiles)
validate the incoming query parameters 'file' and 'regex' before iterating
projects or calling hgService.GetRegexCount: check
string.IsNullOrWhiteSpace(file) and string.IsNullOrWhiteSpace(regex) and
immediately return a BadRequest with a clear message when either is
missing/empty, so you never call hgService.GetRegexCount with invalid inputs
that could cause downstream EnsureSuccessStatusCode to produce a 500.
- Around line 134-146: QueryRegexFiles currently aggregates counts into a single
int but should return a per-project map; change the return shape to a dictionary
(e.g., ActionResult<Dictionary<string,int>> or Dictionary<int,int> depending on
desired key) and build a result map while iterating
lexBoxDbContext.Projects.AsAsyncEnumerable(): for each project call
hgService.GetRegexCount(project.Code, file, regex, fileExclude), coerce null to
0, and add an entry keyed by project.Code (or project.Id) with that count, then
return Ok(resultMap) instead of the aggregated count.
In `@backend/LexCore/ServiceInterfaces/IHgService.cs`:
- Line 26: Update the IHgService signature and controller to accept and
propagate a CancellationToken: change the interface method Task<int?>
GetRegexCount(ProjectCode code, string fileRegex, string contentRegex, string?
fileExcludeRegex = null) to include a final CancellationToken parameter (e.g.
CancellationToken cancellationToken), update the controller action
QueryRegexFiles to accept CancellationToken cancellationToken in its signature
or parameters, and pass that token into the hgService.GetRegexCount(...) call so
the long-running scan can be cancelled; ensure method implementations are
updated to accept and forward the token as well.
In `@hgweb/command-runner.sh`:
- Line 85: The test in the if-statement checking variables fileInclude and
contentRegex uses the non-POSIX operator -o; update the condition in the if
inside command-runner.sh (the if that reads "if [ -z \"$fileInclude\" -o -z
\"$contentRegex\" ]; then") to use the portable shell-level || operator between
the two tests (i.e., replace -o with ||) so the expression becomes
POSIX-compliant and passes shellcheck; ensure quoting of "$fileInclude" and
"$contentRegex" is preserved.
- Around line 131-133: The pipeline currently uses "chg cat -r tip ... | grep -c
-P \"$contentRegex\"" which counts matching lines not total matches; change it
to emit each match and count them by replacing the grep -c step with a command
that uses grep -oP "$contentRegex" piped to wc -l so multiple matches on a
single line are counted; apply this change in both branches where "chg cat -r
tip ... --include=\"re:$fileInclude\"" and the branch with
"--exclude=\"re:$fileExclude\"" are used, keeping the rest of the pipeline
intact.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/LexBoxApi/Controllers/ProjectController.csbackend/LexBoxApi/Services/HgService.csbackend/LexCore/ServiceInterfaces/IHgService.cshgweb/command-runner.sh
Desired output shape is dictionary with project codes as keys and counts as values, not just a single total count.
Instead of counting lines matched, we're more likely to want to count multiple matches on the same line as counting multiple times. So we'll have grep output one line per match, and use `wc -l` to count them.
Rest of script is using double-bracket syntax for if statements, so this part should be consistent as well.
We can just retrieve the project codes in an array, since they're not going to be all that large, and then iterate over them synchronously.
|
@myieye - Finished implementing all AI review suggestions, including the one about not tying up a DB connection by just grabbing the project codes and then iterating synchronously over them. Anything else you'd like me to tackle? |
myieye
left a comment
There was a problem hiding this comment.
I think new names would be quite valuable. I don't think the API surface is self-explanatory at this point.
| return result; | ||
| } | ||
|
|
||
| [HttpGet("queryRegexFiles")] |
There was a problem hiding this comment.
I find the names used in this API non self-explanatory.
As it is, without extra documentation in the Swagger UI, I don't think anyone could know quite for sure how this API works.
I think this would be a much more understandable API:
[HttpGet("projectMatchCounts")]
[AdminRequired]
public async Task<ActionResult<Dictionary<string, int>>> GetProjectMatchCounts(
CancellationToken token,
[FromQuery] ProjectType projectType,
[FromQuery] string includeFileRegex,
[FromQuery] string matchCountRegex,
[FromQuery] string? excludeFileRegex)With these names I think we're ok without extra API documentation. I seem to remember that it's not dead simple to add descriptions to the Swagger API/UI.
I see you have naming closer to my suggestions in HgServer.GetRegexCount. But these are my favorite names, because:
"projectMatchCounts" communicates that the result type is actually by project
"matchCountRegex" communicates a tad more clearly to me than "contentRegex" that this is the regex that really determines the count result.
I'd be fine renaming "includeFileRegex" to just "fileRegex", though I slightly prefer the more verbose name.
There was a problem hiding this comment.
Let's do countProjectMatches for the API name.
The regex parameter names you suggested sound fine, I think.
| return int.TryParse(str, out int result) ? result : null; | ||
| } | ||
|
|
||
| public async Task<int?> GetRegexCount(ProjectCode code, string fileRegex, string contentRegex, CancellationToken token = default, string? fileExcludeRegex = null) |
There was a problem hiding this comment.
The position of the cancellation token is strange. So, how about we put it at the end?
| public async Task<int?> GetRegexCount(ProjectCode code, string fileRegex, string contentRegex, CancellationToken token = default, string? fileExcludeRegex = null) | |
| public async Task<int?> GetRegexCount(ProjectCode code, string fileRegex, string contentRegex, string? fileExcludeRegex = null, CancellationToken token = default) |
There was a problem hiding this comment.
Sure, moving the token to the end makes sense. The reasons I had put it in the middle don't actually make sense, now that I think about it, since we're never actually calling this from other code, just passing in the query params that the API endpoint received.
| return int.TryParse(str, out int result) ? result : null; | ||
| } | ||
|
|
||
| public async Task<int?> GetRegexCount(ProjectCode code, string fileRegex, string contentRegex, CancellationToken token = default, string? fileExcludeRegex = null) |
There was a problem hiding this comment.
Once my naming comment regarding the method and params in the controller is settled, I think those names should probably propagate to here.
| var httpClient = _hgClient.Value; | ||
| var baseUri = _options.Value.HgCommandServer; | ||
| var response = await httpClient.GetAsync($"{baseUri}{code}/{command}", HttpCompletionOption.ResponseHeadersRead, token); | ||
| var uri = $"{baseUri}{code}/{command}"; | ||
| if (queryParams is not null) | ||
| { | ||
| uri = QueryHelpers.AddQueryString(uri, queryParams); | ||
| } | ||
| var response = await httpClient.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead, token); |
There was a problem hiding this comment.
Maybe this should be in a shared internal method?
There was a problem hiding this comment.
Yes, Execute and MaybeExecute do share a lot of code.
And while I'm at it, I might fix something that's been bugging me, where the HTTP result returned from command-runner.sh doesn't get passed through to the client but gets turned into a 500 error no matter what. I'll see if there's a simple way to pass it through.
Fixes #1973.
New API endpoint
/api/project/queryRegexFilestakes aprojectTypeparameter, and two regexes,fileandcount. Optional third regexfileExcludewhich can make thefileregex simpler to write in some circumstances.To test, try URLs like http://localhost/api/project/queryRegexFiles?projectType=FLEx&file=^C[abc]chedSettings&count=version&fileExclude=FLExProject.CustomProperties, which should return some 130 results. Or http://localhost/api/project/queryRegexFiles?projectType=FLEx&file=FLExProject.*&count=version&fileExclude=FLExProject.CustomProperties which should return 2 results on a default dev install with just two projects (remove the
fileExcludeparam and you should see 4 results instead).