Merged_updates by Cristianetaniguti · Pull Request #50 · Breeding-Insight/BIGr

Cristianetaniguti · 2026-03-13T14:24:42Z

Branches indels_support and check_ped_updated merged and submitted to development.
Github actions tests fix required before merging to development
@josuechinchilla your updates are now here

…t/BIGr into ped_indels_update

Copilot

Pull request overview

This PR merges updates from indels_support and check_ped_updated into development, expanding BIGr’s DArTag/MADC handling (sanity checks + additional metadata support), improving documentation, and enhancing some utilities (VCF filtering, concordance reporting).

Changes:

Add check_madc_sanity() plus a basic test and documentation; integrate sanity checks into madc2vcf_targets().
Extend madc2vcf_targets() with markers_info to populate CHROM/POS/REF/ALT directly (incl. indel metadata).
Update check_ped(), filterVCF(), and imputation_concordance() behaviors/docs (new options, new checks, additional outputs).

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
tests/testthat/test-check_madc_sanity.R	Adds a test file for `check_madc_sanity()`.
man/madc2vcf_targets.Rd	Expands `madc2vcf_targets()` documentation for new args and behavior.
man/check_ped.Rd	Updates `check_ped()` documentation to reflect new behavior/output.
man/check_madc_sanity.Rd	Adds generated Rd for new `check_madc_sanity()` function.
R/utils.R	Adjusts `check_botloci()` to rewrite `AlleleID` after padding changes.
R/madc2vcf_targets.R	Adds MADC sanity checks + `markers_info` branch for CHROM/POS/REF/ALT.
R/madc2vcf_all.R	Adds input validation + passes botloci into hap-seq helper; padding checks.
R/imputation_concordance.R	Adds plotting/printing options and ggplot2-based plot.
R/filterVCF.R	Adds optional pre-filter quality-rate outputs; refactors messages and export logic.
R/check_ped.R	Refactors pedigree checks and adds interactive/global-env save behavior.
R/check_madc_sanity.R	Implements new MADC sanity-check helper with messages and checks.
NEWS.md	Adds 0.6.3 release notes.
NAMESPACE	Exports `check_madc_sanity`.
DESCRIPTION	Bumps version/roxygen note; updates cph affiliation.
BIGr.Rproj	Adds ProjectId metadata.
.gitignore	Adds `.DS_Store`.

Comments suppressed due to low confidence (1)

R/filterVCF.R:488

There is a large block of commented-out experimental code after the end of filterVCF() (custom VCF reading tests, hard-coded local paths, etc.). Keeping this in the package source makes maintenance harder and bloats the file. Please remove it or move it to a vignette/dev script under dev/ if it needs to be kept for reference.

#This is not reliable, so no longer use this shortcut to get dosage matrix
#test2 <- vcfR2genlight(vcf)


#####Testing custom VCF reading function######
# Open the gzipped VCF file
#con <- gzfile("/Users/ams866/Desktop/output.vcf", "rt")

# Read in the entire file
#lines <- readLines(con)
#close(con)
# Read in the entire file
#lines <- readLines("/Users/ams866/Desktop/output.vcf")
# Filter out lines that start with ##
#filtered_lines <- lines[!grepl("^##", lines)]
# Create a temporary file to write the filtered lines
#temp_file <- tempfile()
#writeLines(filtered_lines, temp_file)
# Read in the filtered data using read.table or read.csv
#vcf_data <- read.table(temp_file, header = TRUE, sep = "\t", comment.char = "", check.names = FALSE)
# Clean up the temporary file
#unlink(temp_file)

##Extract INFO column and Filter SNPs by those values
#Update the filtering options by the items present in the INFO column?

# Load required library
#library(dplyr)

# Split INFO column into key-value pairs
#vcf_data_parsed <- vcf_data %>%
#  mutate(INFO_PARSED = strsplit(INFO, ";")) %>%
#  unnest(INFO_PARSED) %>%
#  separate(INFO_PARSED, into = c("KEY", "VALUE"), sep = "=") %>%
#  spread(KEY, VALUE)

#Filter by DP
#filtered_vcf_data <- vcf_data_parsed %>%
#  filter(as.numeric(DP) > 10)

# View the filtered dataframe
#print(filtered_vcf_data)

##Extracting and filtering by FORMAT column
# Identify the columns that are not sample columns
#non_sample_cols <- c("#CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", "FORMAT")
# Identify the sample columns
#sample_cols <- setdiff(names(vcf_data), non_sample_cols)
# Extract FORMAT keys
#format_keys <- strsplit(as.character(vcf_data$FORMAT[1]), ":")[[1]]
# Split SAMPLE columns based on FORMAT
#vcf_data_samples <- vcf_data %>%
#  mutate(across(all_of(sample_cols), ~strsplit(as.character(.), ":"))) %>%
#  mutate(across(all_of(sample_cols), ~map(., ~setNames(as.list(.), format_keys)))) %>%
#  unnest_wider(all_of(sample_cols), names_sep = "_")

# View the parsed dataframe
#print(head(vcf_data_samples))

# Create separate dataframes for each FORMAT variable
#format_dfs <- lapply(format_keys, function(format_key) {
#  vcf_data_samples %>%
#    select(ID, ends_with(paste0("_", format_key))) %>%
#    column_to_rownames("ID")
#})

# Assign names to the list elements
#names(format_dfs) <- format_keys

# Access the separate dataframes
#gt_df <- format_dfs$GT  # Genotype dataframe
#ad_df <- format_dfs$AD  # Allelic depths dataframe

#*I think the above method is okay if you only need to filter at the INFO level,
#*But I think if you want to filter for FORMAT, that vcfR is probably best,
#*Will need to explore further if I can easily just filter for MPP by checking if it is above a
#*threshold, and then converting the GT and UD values to NA if so...
#*If that is efficient and works, then I will just use this custom VCF method...

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

man/check_madc_sanity.Rd

+\value{
+A list with:
+\describe{
+\item{checks}{Named logical vector with entries
+\code{Columns}, \code{FixAlleleIDs}, \code{IUPACcodes}, \code{LowerCase}, \code{Indels}.}
+\item{indel_clone_ids}{Character vector of \code{CloneID}s where ref/alt lengths differ.
+Returns \code{character(0)} if none, or \code{NULL} when required columns are missing.}
+}


R/check_madc_sanity.R

R/madc2vcf_targets.R

+    if(!all(rownames(ad_df)%in% df$BI_markerID))
+      warning("Not all MADC CloneID was found in the markers_info file. These markers will be removed.")
+
+    matched <- df[match(rownames(ad_df), df$BI_markerID),]
+
+    new_df <- data.frame(
+      CHROM = matched$Chr,
+      POS = matched$Pos
+    )
+
+    #Get read count sums
+    new_df$TotalRef <- rowSums(ref_df)
+    new_df$TotalAlt <- rowSums(alt_df)
+    new_df$TotalSize <- rowSums(size_df)
+
+    ref_base <- matched$Ref
+    alt_base <- matched$Alt
+  }


R/check_ped.R

+    # Silent automatic mode
+    assign(corrected_name, data, envir = .GlobalEnv)
+    assign(report_name, input_ped_report, envir = .GlobalEnv)


R/utils.R

        report$CloneID <- paste0(sub("_(.*)", "", report$CloneID), "_",
                                 sprintf(paste0("%0", pad_botloci, "d"), as.integer(sub(".*_", "", report$CloneID)))
        )
+        report$AlleleID <- paste0(report$CloneID, "|", sapply(strsplit(report$AlleleID, "[|]"), "[[",2))


R/filterVCF.R

R/imputation_concordance.R

+    plot_df <- data.frame(
+      ID = imputed_genos$ID,
+      Concordance = percentage_match * 100
+    )
+
+    concordance_plot <- ggplot(plot_df,
+                               aes(x = reorder(ID, Concordance),
+                                   y = Concordance)) +
+      geom_bar(stat = "identity") +
+      labs(title = "Imputation Concordance by Sample",
+           x = "Sample ID",
+           y = "Concordance (%)") +
+      theme_minimal() +
+      theme(axis.text.x = element_text(angle = 90, hjust = 1))


NEWS.md

R/filterVCF.R

-    if (!is.null(output.file)) {
-      output_name <- paste0(output.file, ".vcf.gz")
+  cat("Exporting VCF\n")
+  if (!class(vcf.file) == "vcfR"){


R/check_ped.R

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Cristianetaniguti and others added 11 commits October 3, 2025 15:11

indels support for madc2vcf_targets

778aefa

updated check_ped to save corrected dataframe and report

1b761b9

reorganized report and fixed language

743043a

bugfix - if hapDB padding is not matching report

0b97b46

added option to print plot or list to imputation_concordance

82279af

ignore DS_STore

6b81982

added option to print pre-filtering depth and genotyping rate

8205e4e

added calculation for Ho

31248e3

up version

757b01c

Merge branch 'check_ped_update' of https://github.com/Breeding-Insigh…

0934210

…t/BIGr into ped_indels_update

merge dev branches

e18b2c6

Cristianetaniguti requested review from alex-sandercock and Copilot March 13, 2026 14:24

Cristianetaniguti added bug Something isn't working enhancement New feature or request in_progress not ready to merge labels Mar 13, 2026

Copilot started reviewing on behalf of Cristianetaniguti March 13, 2026 14:25 View session

Merge branch 'development' into ped_indels_update

768ab93

Copilot AI reviewed Mar 13, 2026

View reviewed changes

alex-sandercock and others added 4 commits March 27, 2026 09:02

Potential fix for pull request finding

ee50981

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

d3a4061

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

f765c7c

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

87bb1fc

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Cristianetaniguti closed this Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merged_updates#50

Merged_updates#50
Cristianetaniguti wants to merge 16 commits intodevelopmentfrom
ped_indels_update

Cristianetaniguti commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Cristianetaniguti commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants