Madc2vcf and Pedigree functions updates by Cristianetaniguti · Pull Request #54 · Breeding-Insight/BIGr

Cristianetaniguti · 2026-03-27T13:47:15Z

Updates on madc2vcf functions

Details:

If verbose = TRUE, the functions output informative messages along the process
both functions (targets and all (targets + off-targets) markers now have check_madc_sanity function implemented. It tests:
- [Columns] If MADC has the expected columns
- [allNArow | allNAcol] Presence of columns and rows with all NA (happens often when people open the MADC in excel before loading in R)
- [IUPACcodes] Presence of IUPAC codes on AlleleSequence
- [LowerCase] Presence of lower case bases on AlleleSequence
- [Indels] Presence of Indels
- [ChromPos] If CloneID follows the format Chr_Pos
- [RefAltSeqs] If all Ref Allele has corresponding Alt and vice-versa
- [OtherAlleles] If "Other" exists in the MADC AlleleID
madc2vcf_targets doesn’t run if:
- MADC Column names are not correct
- Ignore Other alleles - but inform the user if they exist or not and direct them to madc2vcf_all in case they want to extract them as well
See the table for madc2vcf_targets requirements accordingly to MADC content:

	check status	get_REF_ALT	Requires
IUPAC	TRUE	TRUE	markers_info REF/ALT
	TRUE	FALSE	-
	FALSE	TRUE	botloci or markers_info REF/ALT
	FALSE	FALSE	-
Indels	TRUE	TRUE	markers_info REF/ALT
	TRUE	FALSE	-
	FALSE	TRUE	botloci or markers_info REF/ALT
	FALSE	FALSE	-
ChromPos	TRUE	TRUE	botloci or markers_info REF/ALT
	TRUE	FALSE	-
	FALSE	TRUE	markers_info CHR/POS/REF/ALT or markers_info CHR/POS/ + botloci
	FALSE	FALSE	markers_info CHR/POS
FixAlleleIDs	TRUE	TRUE	botloci or markers_info REF/ALT
	TRUE	FALSE	-
	FALSE	TRUE	markers_info REF/ALT
	FALSE	FALSE	-

madc2vcf_targets got a new argument: collapse_matches_counts, if TRUE, it collapses the read counts of the RefMatch to Ref and AltMatch to Alt. Default is FALSE.

Still in progress:

Users now have the option to generate multiallelic VCF - new function madc2vcf_multi
madc2vcf_all and madc2vcf_multi doesn’t run if:
- MADC Column names are not correct
- If it is raw MADC
- If it has IUPAC codes
See the table for madc2vcf_all and madc2vcf_multi requirements accordingly to MADC content:

	Check status	Requires
Indels	TRUE	markers_info REF/ALT/IndelPos/IndelLenght + botloci
	FALSE	botloci
ChromPos	TRUE	botloci
	FALSE	markers_info CHR/POS + botloci
RefAltSeqs	TRUE	botloci
	FALSE	botloci + microhapdb

test scripts were added and simulated MADC files were created to test all scenarios in all madc2vcf functions. Test files are located at: https://github.com/Breeding-Insight/BIGapp-PanelHub/tree/long_seq/test_madcs
updated version, packages Imports in DESCRIPTION and functions documentations

…t/BIGr into ped_indels_update

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Madc2vcf updates

Copilot

Pull request overview

This PR enhances BIGr’s MADC→VCF conversion and pedigree utilities by adding structured verbose messaging, stronger MADC sanity validation, and new/refined arguments and documentation to support more robust input handling.

Changes:

Added vmsg() and integrated verbose, stepwise messaging into madc2vcf_targets(), get_countsMADC(), and madc2vcf_all().
Introduced check_madc_sanity() and integrated it into MADC→VCF workflows, plus expanded tests and roxygen docs.
Extended APIs (madc2vcf_targets(), get_countsMADC(), madc2vcf_all(), imputation_concordance(), filterVCF(), check_ped()) with new parameters and updated manuals.

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 22 comments.

Show a summary per file

File	Description
tests/testthat/test-madc2vcf_targets.R	Expands MADC→VCF target tests using external PanelHub fixtures.
tests/testthat/test-check_madc_sanity.R	Adds tests for `check_madc_sanity()` using external fixtures.
man/vmsg.Rd	Documents new verbose message helper.
man/madc2vcf_targets.Rd	Updates targets conversion documentation for new args and behavior.
man/madc2vcf_all.Rd	Adds `markers_info` argument to docs.
man/imputation_concordance.Rd	Documents new plotting/printing options.
man/get_countsMADC.Rd	Updates docs for new args and behavior.
man/get_counts.Rd	Adds internal docs for new helper `get_counts()`.
man/filterVCF.Rd	Documents new `quality.rates` parameter and example edits.
man/check_ped.Rd	Updates pedigree check docs to reflect new behavior/output.
man/check_madc_sanity.Rd	Documents `check_madc_sanity()` checks and return structure.
R/utils.R	Adds `vmsg()` and `url_exists()`.
R/madc2vcf_targets.R	Refactors `madc2vcf_targets()` with sanity checks, markers_info support, new args, and verbose metadata header.
R/madc2vcf_all.R	Adds input validation, sanity checks, and `markers_info` support in validation flow.
R/imputation_concordance.R	Adds `plot` and `print_result` options and ggplot2-based plotting.
R/get_countsMADC.R	Adds `madc_object`, collapsing behavior, verbose messaging; introduces `get_counts()` helper.
R/filterVCF.R	Adds `quality.rates` reporting and adjusts I/O/filter logging.
R/check_ped.R	Refactors pedigree validation and introduces optional interactive/global save behavior.
R/check_madc_sanity.R	Adds new exported sanity checker and an updated `check_botloci()` implementation.
NEWS.md	Adds release notes for 0.6.4 and related changes.
NAMESPACE	Exports new functions and adds new imports.
DESCRIPTION	Updates authorship/copyright entry and roxygen note version.
BIGr.Rproj	Adds `ProjectId`.
.gitignore	Adds `.DS_Store`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

R/madc2vcf_targets.R

Copilot · 2026-03-27T13:57:06Z

tests/testthat/test-madc2vcf_targets.R

+  test_that("ALFALFA — clean fixed allele ID MADC", {
+    out <- tempfile(fileext = ".vcf")
+    expect_no_error(


Multiple test_that() calls are nested inside the outer test_that("simu alfalfa", ...). testthat doesn’t reliably support nested test_that() and this can cause misreporting or skipped tests. Split the inner blocks into separate top-level test_that() calls (and share fixture setup via helper code).

Copilot · 2026-03-27T13:57:06Z

R/filterVCF.R

+#This is not reliable, so no longer use this shortcut to get dosage matrix
+#test2 <- vcfR2genlight(vcf)
+
+
+#####Testing custom VCF reading function######
+# Open the gzipped VCF file
+#con <- gzfile("/Users/ams866/Desktop/output.vcf", "rt")
+
+# Read in the entire file
+#lines <- readLines(con)
+#close(con)
+# Read in the entire file
+#lines <- readLines("/Users/ams866/Desktop/output.vcf")
+# Filter out lines that start with ##
+#filtered_lines <- lines[!grepl("^##", lines)]
+# Create a temporary file to write the filtered lines
+#temp_file <- tempfile()
+#writeLines(filtered_lines, temp_file)
+# Read in the filtered data using read.table or read.csv
+#vcf_data <- read.table(temp_file, header = TRUE, sep = "\t", comment.char = "", check.names = FALSE)
+# Clean up the temporary file
+#unlink(temp_file)
+
+##Extract INFO column and Filter SNPs by those values
+#Update the filtering options by the items present in the INFO column?
+
+# Load required library
+#library(dplyr)
+
+# Split INFO column into key-value pairs


There is a large block of commented-out experimental code (including local file paths) kept at the end of this exported source file. This makes maintenance harder and adds noise to the package. Consider moving it to a vignette/dev note or removing it once the approach is finalized.

Suggested change

#This is not reliable, so no longer use this shortcut to get dosage matrix

#test2 <- vcfR2genlight(vcf)

#####Testing custom VCF reading function######

# Open the gzipped VCF file

#con <- gzfile("/Users/ams866/Desktop/output.vcf", "rt")

# Read in the entire file

#lines <- readLines(con)

#close(con)

# Read in the entire file

#lines <- readLines("/Users/ams866/Desktop/output.vcf")

# Filter out lines that start with ##

#filtered_lines <- lines[!grepl("^##", lines)]

# Create a temporary file to write the filtered lines

#temp_file <- tempfile()

#writeLines(filtered_lines, temp_file)

# Read in the filtered data using read.table or read.csv

#vcf_data <- read.table(temp_file, header = TRUE, sep = "\t", comment.char = "", check.names = FALSE)

# Clean up the temporary file

#unlink(temp_file)

##Extract INFO column and Filter SNPs by those values

#Update the filtering options by the items present in the INFO column?

# Load required library

#library(dplyr)

# Split INFO column into key-value pairs

Copilot · 2026-03-27T13:57:06Z

R/imputation_concordance.R

+#' @examples
+#' result <- imputation_concordance(
+#'   reference_genos = ref,
+#'   imputed_genos = test,
+#'   snps_2_exclude = snps,


The roxygen examples reference ref, test, and snps, but these objects aren’t defined in the example block. This will fail during R CMD check example execution. Make the example self-contained or wrap it in \dontrun{} / \donttest{}.

Copilot · 2026-03-27T13:57:06Z

man/imputation_concordance.Rd

+result <- imputation_concordance(
+  reference_genos = ref,
+  imputed_genos = test,
+  snps_2_exclude = snps,
+  missing_code = 5,
+  verbose = TRUE,
+  plot = TRUE
+)



The Rd example calls imputation_concordance(reference_genos = ref, imputed_genos = test, ...) but ref, test, and snps are undefined in the example section. This will fail R CMD check. Provide a runnable minimal example or guard with \dontrun{} / \donttest{}.

Suggested change

result <- imputation_concordance(

reference_genos = ref,

imputed_genos = test,

snps_2_exclude = snps,

missing_code = 5,

verbose = TRUE,

plot = TRUE

)

# Minimal example data

ref <- data.frame(

ID = paste0("sample", 1:3),

snp1 = c(0, 1, 2),

snp2 = c(2, 2, 0)

)

test <- data.frame(

ID = paste0("sample", 1:3),

snp1 = c(0, 1, 2),

snp2 = c(2, 1, 0)

)

# Exclude a SNP from the concordance calculation

snps <- "snp2"

result <- imputation_concordance(

reference_genos = ref,

imputed_genos = test,

snps_2_exclude = snps,

missing_code = NULL,

verbose = TRUE,

plot = FALSE

)

R/check_madc_sanity.R

Copilot · 2026-03-27T13:57:10Z

R/check_madc_sanity.R

+      first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1)))
+      second <- suppressWarnings(all(sapply(pos, function(x) as.numeric(x[2])) > 0))


ChromPos validation doesn’t match the documented Chr_Pos requirement: it currently only checks the first token starts with any letter and can yield NA when as.numeric() returns NAs (because all() over NAs yields NA). This can force downstream code into the "ChromPos invalid" path unexpectedly. Tighten the prefix check (e.g., ^chr case-insensitive) and coerce non-numeric positions to FALSE (not NA).

Suggested change

first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1)))

second <- suppressWarnings(all(sapply(pos, function(x) as.numeric(x[2])) > 0))

first_tokens <- sapply(pos, "[", 1)

second_tokens <- sapply(pos, "[", 2)

first <- all(grepl("^chr", first_tokens, ignore.case = TRUE))

suppressWarnings(num_pos <- as.numeric(second_tokens))

second <- all(!is.na(num_pos) & num_pos > 0)

R/get_countsMADC.R

Copilot · 2026-03-27T13:57:10Z

R/check_ped.R

+    #### interactive save ####
+    cat(paste0("\nDo you want to save the corrected pedigree as dataframe `", corrected_name, "`? (y/n): "))
+    ans <- tolower(trimws(readline()))
+    if (ans == "y") {


check_ped() prompts with readline() when verbose = TRUE (the default). This will hang in non-interactive runs (CI, scripts, Shiny). Gate prompting behind interactive() and/or add an explicit argument controlling whether to prompt/save.

Copilot · 2026-03-27T13:57:10Z

R/madc2vcf_all.R

                      "verbose= ", verbose,')">')

-  if(!is.null(madc)) report <- read.csv(madc, check.names = FALSE) else stop("Please provide a MADC file")
+  report <- read.csv(madc, check.names = FALSE)
+  checks <- check_madc_sanity(report)


madc still defaults to NULL, but the function now calls read.csv(madc, ...) unconditionally. If madc is omitted, this will fail with a cryptic read.csv error instead of a clear message. Add an explicit is.null(madc) check that stop()s with a helpful error before reading.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ht/BIGr into ped_indels_update

Cristianetaniguti and others added 22 commits October 3, 2025 15:11

indels support for madc2vcf_targets

778aefa

updated check_ped to save corrected dataframe and report

1b761b9

reorganized report and fixed language

743043a

bugfix - if hapDB padding is not matching report

0b97b46

added option to print plot or list to imputation_concordance

82279af

ignore DS_STore

6b81982

added option to print pre-filtering depth and genotyping rate

8205e4e

added calculation for Ho

31248e3

up version

757b01c

Merge branch 'check_ped_update' of https://github.com/Breeding-Insigh…

0934210

…t/BIGr into ped_indels_update

merge dev branches

e18b2c6

Merge branch 'development' into ped_indels_update

768ab93

opt messages

5c0b590

messages ok

9afb265

targets okay

c31118d

targets ok

5d54f0d

Potential fix for pull request finding

ee50981

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

d3a4061

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

f765c7c

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

87bb1fc

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Merge branch 'ped_indels_update' into madc2vcf_all_updates

7c12d49

Merge pull request #53 from Breeding-Insight/madc2vcf_all_updates

df6fe92

Madc2vcf updates

Cristianetaniguti requested a review from Copilot March 27, 2026 13:47

Copilot started reviewing on behalf of Cristianetaniguti March 27, 2026 13:47 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Cristianetaniguti and others added 5 commits March 27, 2026 10:33

Update R/madc2vcf_targets.R

6059c10

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/check_madc_sanity.R

b09b0c1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/get_countsMADC.R

409dbd3

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/get_countsMADC.R

e6fce19

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/check_madc_sanity.R

669ac4e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Cristianetaniguti added 3 commits March 27, 2026 10:50

fix tests

bbfbee2

Merge branch 'ped_indels_update' of https://github.com/Breeding-Insig…

38c3564

…ht/BIGr into ped_indels_update

madc2vcf_all indels support okay

55ee61a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Madc2vcf and Pedigree functions updates #54

Madc2vcf and Pedigree functions updates #54
Cristianetaniguti wants to merge 30 commits intodevelopmentfrom
ped_indels_update

Cristianetaniguti commented Mar 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-result <- imputation_concordance(
-  reference_genos = ref,
-  imputed_genos = test,
-  snps_2_exclude = snps,
-  missing_code = 5,
-  verbose = TRUE,
-  plot = TRUE
-)
+# Minimal example data
+ref <- data.frame(
+  ID   = paste0("sample", 1:3),
+  snp1 = c(0, 1, 2),
+  snp2 = c(2, 2, 0)
+)
+test <- data.frame(
+  ID   = paste0("sample", 1:3),
+  snp1 = c(0, 1, 2),
+  snp2 = c(2, 1, 0)
+)
+# Exclude a SNP from the concordance calculation
+snps <- "snp2"
+result <- imputation_concordance(
+  reference_genos  = ref,
+  imputed_genos    = test,
+  snps_2_exclude   = snps,
+  missing_code     = NULL,
+  verbose          = TRUE,
+  plot             = FALSE
+)

		first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1)))
		second <- suppressWarnings(all(sapply(pos, function(x) as.numeric(x[2])) > 0))

-      first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1)))
-      second <- suppressWarnings(all(sapply(pos, function(x) as.numeric(x[2])) > 0))
+      first_tokens <- sapply(pos, "[", 1)
+      second_tokens <- sapply(pos, "[", 2)
+      first <- all(grepl("^chr", first_tokens, ignore.case = TRUE))
+      suppressWarnings(num_pos <- as.numeric(second_tokens))
+      second <- all(!is.na(num_pos) & num_pos > 0)

Conversation

Cristianetaniguti commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updates on madc2vcf functions

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cristianetaniguti commented Mar 27, 2026 •

edited

Loading