Skip to content

Remove quick-fingerprinting analysis (moved to bioinfo research)#259

Open
doron-st wants to merge 12 commits intomasterfrom
doronst/fingerprint_wes
Open

Remove quick-fingerprinting analysis (moved to bioinfo research)#259
doron-st wants to merge 12 commits intomasterfrom
doronst/fingerprint_wes

Conversation

@doron-st
Copy link
Copy Markdown
Collaborator

@doron-st doron-st commented Mar 22, 2026

Remove quick-figerprinting analyssis from repo


Note

Medium Risk
Moderate risk because it deletes an end-to-end fingerprinting CLI/test surface and changes how bcftools mpileup region restriction is constructed (including new -R BED handling), which could alter variant-calling scope in downstream workflows.

Overview
Removes the quick-fingerprinting feature and its system test fixtures: deletes the quick_fingerprinting pipeline entrypoint, the QuickFingerprinter implementation, the example JSON config, and the associated system test/resources.

Updates hit-fraction variant calling and report dependencies: extends VariantHitFractionCaller.call_variants to accept an optional regions_bed (and treats .bed/.bed.gz passed via region as -R for backward compatibility), refactors major-alt extraction to use column apply, and switches run_no_gt_report.py to import ugbio_comparison.vcf_comparison_utils instead of comparison_utils.

Written by Cursor Bugbot for commit 4a25f59. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Regression: call_variants now misuses previously-ignored region parameter
    • call_variants now treats BED-like region values as -R targets, preserving compatibility with callers that pass BED paths in the region argument.
  • ✅ Fixed: Ground truth filtered by BED only, not intersection
    • When both region and regions_bed are set, ground-truth filtering now uses a precomputed BED intersection so denominator and called variants are restricted to the same locus set.

Create PR

Or push these changes by commenting:

@cursor push 5882cc3db9
Preview (5882cc3db9)
diff --git a/ugvc/comparison/quick_fingerprinter.py b/ugvc/comparison/quick_fingerprinter.py
--- a/ugvc/comparison/quick_fingerprinter.py
+++ b/ugvc/comparison/quick_fingerprinter.py
@@ -46,8 +46,16 @@
 
     def prepare_ground_truth(self):
         ground_truths_to_check = {}
-        if self.regions_bed is None:
-            self.sp.print_and_run(f"echo {self.region} | sed 's/:/\t/' | sed 's/-/\t/' > {self.out_dir}/region.bed")
+        region_bed = f"{self.out_dir}/region.bed"
+        regions_bed_in_region = None
+        if self.region != "":
+            self.sp.print_and_run(f"echo {self.region} | sed 's/:/\t/' | sed 's/-/\t/' > {region_bed}")
+        if self.regions_bed is not None and self.region != "":
+            regions_bed_in_region = f"{self.out_dir}/regions_bed_in_region.bed"
+            self.sp.print_and_run(
+                f"bedtools intersect -a {self.regions_bed} -b {region_bed} | "
+                f"sort -k 1,1 -k 2,2n > {regions_bed_in_region}"
+            )
 
         for sample_id in self.ground_truth_vcfs:
             ground_truth_vcf = optional_cloud_sync(self.ground_truth_vcfs[sample_id], self.out_dir)
@@ -62,8 +70,9 @@
             )
             self.vpu.index_vcf(ground_truth_in_hcr)
             if self.regions_bed is not None:
+                regions_to_use = regions_bed_in_region if regions_bed_in_region is not None else self.regions_bed
                 self.sp.print_and_run(
-                    f"bcftools view {ground_truth_in_hcr} -R {self.regions_bed} -Oz -o {ground_truth_to_check_vcf}"
+                    f"bcftools view {ground_truth_in_hcr} -R {regions_to_use} -Oz -o {ground_truth_to_check_vcf}"
                 )
             else:
                 self.sp.print_and_run(
@@ -78,7 +87,7 @@
                 )
             elif self.region != "":
                 self.sp.print_and_run(
-                    f"bedtools intersect -a {hcr} -b {self.out_dir}/region.bed | "
+                    f"bedtools intersect -a {hcr} -b {region_bed} | "
                     f"sort -k 1,1 -k 2,2n > {hcr_in_region}"
                 )
             else:

diff --git a/ugvc/comparison/variant_hit_fraction_caller.py b/ugvc/comparison/variant_hit_fraction_caller.py
--- a/ugvc/comparison/variant_hit_fraction_caller.py
+++ b/ugvc/comparison/variant_hit_fraction_caller.py
@@ -21,7 +21,13 @@
         self.region = region
 
     def call_variants(self, cram: str, output_vcf: str, region: str, min_af: float, regions_bed: str | None = None) -> None:
-        regions_arg = f"-R {regions_bed}" if regions_bed is not None else (f"-r {region}" if region else "")
+        if regions_bed is not None:
+            regions_arg = f"-R {regions_bed}"
+        elif region and (region.endswith(".bed") or region.endswith(".bed.gz")):
+            # Backward compatibility: some callers pass a BED path in `region`.
+            regions_arg = f"-R {region}"
+        else:
+            regions_arg = f"-r {region}" if region else ""
         self.sp.print_and_run(
             f"bcftools mpileup {cram} -f {self.ref} -a ad,format/dp --skip-indels -d 500 {regions_arg} "
             + f"| bcftools view -i 'AD[0:1] / format/DP >= {min_af}' -Oz -o {output_vcf}"

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

@doron-st
Copy link
Copy Markdown
Collaborator Author

@cursor push 5882cc3

@doron-st doron-st requested a review from gilhornung March 23, 2026 11:31
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

if self.regions_bed is not None:
self.sp.print_and_run(
f"bedtools intersect -a {hcr} -b {self.out_dir}/region.bed | "
f"bedtools intersect -a {hcr} -b {self.regions_bed} | "
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HCR intersected with full bed, not region-restricted bed

Low Severity

When both regions_bed and region_str are provided, the ground truth VCF is correctly filtered using regions_bed_in_region (the intersection of regions_bed with the region), but the HCR bed is intersected with the full self.regions_bed instead of self.regions_bed_in_region. This inconsistency means hcr_in_region may contain regions outside the specified region_str, producing an overly broad output file.

Fix in Cursor Fix in Web

@doron-st doron-st changed the title allow fingerprinting WES data by providing exome intervals bed Remove quick-fingerprinting analysis (moved to bioinfo research) Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants