Skip to content

Merged_updates#50

Closed
Cristianetaniguti wants to merge 16 commits intodevelopmentfrom
ped_indels_update
Closed

Merged_updates#50
Cristianetaniguti wants to merge 16 commits intodevelopmentfrom
ped_indels_update

Conversation

@Cristianetaniguti
Copy link
Copy Markdown
Collaborator

  • Branches indels_support and check_ped_updated merged and submitted to development.
  • Github actions tests fix required before merging to development
  • @josuechinchilla your updates are now here

@Cristianetaniguti Cristianetaniguti added bug Something isn't working enhancement New feature or request in_progress not ready to merge labels Mar 13, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR merges updates from indels_support and check_ped_updated into development, expanding BIGr’s DArTag/MADC handling (sanity checks + additional metadata support), improving documentation, and enhancing some utilities (VCF filtering, concordance reporting).

Changes:

  • Add check_madc_sanity() plus a basic test and documentation; integrate sanity checks into madc2vcf_targets().
  • Extend madc2vcf_targets() with markers_info to populate CHROM/POS/REF/ALT directly (incl. indel metadata).
  • Update check_ped(), filterVCF(), and imputation_concordance() behaviors/docs (new options, new checks, additional outputs).

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
tests/testthat/test-check_madc_sanity.R Adds a test file for check_madc_sanity().
man/madc2vcf_targets.Rd Expands madc2vcf_targets() documentation for new args and behavior.
man/check_ped.Rd Updates check_ped() documentation to reflect new behavior/output.
man/check_madc_sanity.Rd Adds generated Rd for new check_madc_sanity() function.
R/utils.R Adjusts check_botloci() to rewrite AlleleID after padding changes.
R/madc2vcf_targets.R Adds MADC sanity checks + markers_info branch for CHROM/POS/REF/ALT.
R/madc2vcf_all.R Adds input validation + passes botloci into hap-seq helper; padding checks.
R/imputation_concordance.R Adds plotting/printing options and ggplot2-based plot.
R/filterVCF.R Adds optional pre-filter quality-rate outputs; refactors messages and export logic.
R/check_ped.R Refactors pedigree checks and adds interactive/global-env save behavior.
R/check_madc_sanity.R Implements new MADC sanity-check helper with messages and checks.
NEWS.md Adds 0.6.3 release notes.
NAMESPACE Exports check_madc_sanity.
DESCRIPTION Bumps version/roxygen note; updates cph affiliation.
BIGr.Rproj Adds ProjectId metadata.
.gitignore Adds .DS_Store.
Comments suppressed due to low confidence (1)

R/filterVCF.R:488

  • There is a large block of commented-out experimental code after the end of filterVCF() (custom VCF reading tests, hard-coded local paths, etc.). Keeping this in the package source makes maintenance harder and bloats the file. Please remove it or move it to a vignette/dev script under dev/ if it needs to be kept for reference.
#This is not reliable, so no longer use this shortcut to get dosage matrix
#test2 <- vcfR2genlight(vcf)


#####Testing custom VCF reading function######
# Open the gzipped VCF file
#con <- gzfile("/Users/ams866/Desktop/output.vcf", "rt")

# Read in the entire file
#lines <- readLines(con)
#close(con)
# Read in the entire file
#lines <- readLines("/Users/ams866/Desktop/output.vcf")
# Filter out lines that start with ##
#filtered_lines <- lines[!grepl("^##", lines)]
# Create a temporary file to write the filtered lines
#temp_file <- tempfile()
#writeLines(filtered_lines, temp_file)
# Read in the filtered data using read.table or read.csv
#vcf_data <- read.table(temp_file, header = TRUE, sep = "\t", comment.char = "", check.names = FALSE)
# Clean up the temporary file
#unlink(temp_file)

##Extract INFO column and Filter SNPs by those values
#Update the filtering options by the items present in the INFO column?

# Load required library
#library(dplyr)

# Split INFO column into key-value pairs
#vcf_data_parsed <- vcf_data %>%
#  mutate(INFO_PARSED = strsplit(INFO, ";")) %>%
#  unnest(INFO_PARSED) %>%
#  separate(INFO_PARSED, into = c("KEY", "VALUE"), sep = "=") %>%
#  spread(KEY, VALUE)

#Filter by DP
#filtered_vcf_data <- vcf_data_parsed %>%
#  filter(as.numeric(DP) > 10)

# View the filtered dataframe
#print(filtered_vcf_data)

##Extracting and filtering by FORMAT column
# Identify the columns that are not sample columns
#non_sample_cols <- c("#CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", "FORMAT")
# Identify the sample columns
#sample_cols <- setdiff(names(vcf_data), non_sample_cols)
# Extract FORMAT keys
#format_keys <- strsplit(as.character(vcf_data$FORMAT[1]), ":")[[1]]
# Split SAMPLE columns based on FORMAT
#vcf_data_samples <- vcf_data %>%
#  mutate(across(all_of(sample_cols), ~strsplit(as.character(.), ":"))) %>%
#  mutate(across(all_of(sample_cols), ~map(., ~setNames(as.list(.), format_keys)))) %>%
#  unnest_wider(all_of(sample_cols), names_sep = "_")

# View the parsed dataframe
#print(head(vcf_data_samples))

# Create separate dataframes for each FORMAT variable
#format_dfs <- lapply(format_keys, function(format_key) {
#  vcf_data_samples %>%
#    select(ID, ends_with(paste0("_", format_key))) %>%
#    column_to_rownames("ID")
#})

# Assign names to the list elements
#names(format_dfs) <- format_keys

# Access the separate dataframes
#gt_df <- format_dfs$GT  # Genotype dataframe
#ad_df <- format_dfs$AD  # Allelic depths dataframe

#*I think the above method is okay if you only need to filter at the INFO level,
#*But I think if you want to filter for FORMAT, that vcfR is probably best,
#*Will need to explore further if I can easily just filter for MPP by checking if it is above a
#*threshold, and then converting the GT and UD values to NA if so...
#*If that is efficient and works, then I will just use this custom VCF method...


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +14 to +21
\value{
A list with:
\describe{
\item{checks}{Named logical vector with entries
\code{Columns}, \code{FixAlleleIDs}, \code{IUPACcodes}, \code{LowerCase}, \code{Indels}.}
\item{indel_clone_ids}{Character vector of \code{CloneID}s where ref/alt lengths differ.
Returns \code{character(0)} if none, or \code{NULL} when required columns are missing.}
}
Comment on lines +235 to +252
if(!all(rownames(ad_df)%in% df$BI_markerID))
warning("Not all MADC CloneID was found in the markers_info file. These markers will be removed.")

matched <- df[match(rownames(ad_df), df$BI_markerID),]

new_df <- data.frame(
CHROM = matched$Chr,
POS = matched$Pos
)

#Get read count sums
new_df$TotalRef <- rowSums(ref_df)
new_df$TotalAlt <- rowSums(alt_df)
new_df$TotalSize <- rowSums(size_df)

ref_base <- matched$Ref
alt_base <- matched$Alt
}
Comment on lines +232 to +234
# Silent automatic mode
assign(corrected_name, data, envir = .GlobalEnv)
assign(report_name, input_ped_report, envir = .GlobalEnv)
report$CloneID <- paste0(sub("_(.*)", "", report$CloneID), "_",
sprintf(paste0("%0", pad_botloci, "d"), as.integer(sub(".*_", "", report$CloneID)))
)
report$AlleleID <- paste0(report$CloneID, "|", sapply(strsplit(report$AlleleID, "[|]"), "[[",2))
Comment on lines +137 to +150
plot_df <- data.frame(
ID = imputed_genos$ID,
Concordance = percentage_match * 100
)

concordance_plot <- ggplot(plot_df,
aes(x = reorder(ID, Concordance),
y = Concordance)) +
geom_bar(stat = "identity") +
labs(title = "Imputation Concordance by Sample",
x = "Sample ID",
y = "Concordance (%)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
if (!is.null(output.file)) {
output_name <- paste0(output.file, ".vcf.gz")
cat("Exporting VCF\n")
if (!class(vcf.file) == "vcfR"){
alex-sandercock and others added 4 commits March 27, 2026 09:02
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request in_progress not ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants