Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
778aefa
indels support for madc2vcf_targets
Cristianetaniguti Oct 3, 2025
1b761b9
updated check_ped to save corrected dataframe and report
Nov 4, 2025
743043a
reorganized report and fixed language
Nov 4, 2025
0b97b46
bugfix - if hapDB padding is not matching report
Cristianetaniguti Nov 14, 2025
82279af
added option to print plot or list to imputation_concordance
Feb 26, 2026
6b81982
ignore DS_STore
Mar 3, 2026
8205e4e
added option to print pre-filtering depth and genotyping rate
Mar 3, 2026
31248e3
added calculation for Ho
Mar 4, 2026
757b01c
up version
Cristianetaniguti Mar 13, 2026
0934210
Merge branch 'check_ped_update' of https://github.com/Breeding-Insigh…
Cristianetaniguti Mar 13, 2026
e18b2c6
merge dev branches
Cristianetaniguti Mar 13, 2026
768ab93
Merge branch 'development' into ped_indels_update
Cristianetaniguti Mar 13, 2026
5c0b590
opt messages
Cristianetaniguti Mar 13, 2026
9afb265
messages ok
Cristianetaniguti Mar 14, 2026
c31118d
targets okay
Cristianetaniguti Mar 25, 2026
5d54f0d
targets ok
Cristianetaniguti Mar 27, 2026
ee50981
Potential fix for pull request finding
alex-sandercock Mar 27, 2026
d3a4061
Potential fix for pull request finding
alex-sandercock Mar 27, 2026
f765c7c
Potential fix for pull request finding
alex-sandercock Mar 27, 2026
87bb1fc
Potential fix for pull request finding
alex-sandercock Mar 27, 2026
7c12d49
Merge branch 'ped_indels_update' into madc2vcf_all_updates
Cristianetaniguti Mar 27, 2026
df6fe92
Merge pull request #53 from Breeding-Insight/madc2vcf_all_updates
Cristianetaniguti Mar 27, 2026
6059c10
Update R/madc2vcf_targets.R
Cristianetaniguti Mar 27, 2026
b09b0c1
Update R/check_madc_sanity.R
Cristianetaniguti Mar 27, 2026
409dbd3
Update R/get_countsMADC.R
Cristianetaniguti Mar 27, 2026
e6fce19
Update R/get_countsMADC.R
Cristianetaniguti Mar 27, 2026
669ac4e
Update R/check_madc_sanity.R
Cristianetaniguti Mar 27, 2026
bbfbee2
fix tests
Cristianetaniguti Mar 27, 2026
38c3564
Merge branch 'ped_indels_update' of https://github.com/Breeding-Insig…
Cristianetaniguti Mar 27, 2026
55ee61a
madc2vcf_all indels support okay
Cristianetaniguti Mar 27, 2026
bf5ff4c
madc2vcf_all support indel
Cristianetaniguti Mar 31, 2026
291ae8e
add support for Others
Cristianetaniguti Apr 1, 2026
84852da
up version
Cristianetaniguti Apr 1, 2026
96a4ed1
add madc2vcf_multi
Cristianetaniguti Apr 1, 2026
cec168d
fix checks
Cristianetaniguti Apr 1, 2026
0be2e0f
fix checks 2
Cristianetaniguti Apr 1, 2026
33fc87c
add VariantAnnotation to test env
Cristianetaniguti Apr 1, 2026
77107ba
ignore madc2vcf_multi tests in actions
Cristianetaniguti Apr 1, 2026
ccf9e77
more messages and tests
Cristianetaniguti Apr 2, 2026
8a00c9e
bugfix
Cristianetaniguti Apr 2, 2026
f2013e3
update man
Cristianetaniguti Apr 2, 2026
b01c12b
minor version up
Cristianetaniguti Apr 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,12 @@ jobs:
extra-packages: |
any::rcmdcheck
any::covr
any::polyRAD
needs: check

- name: Install VariantAnnotation (no Suggests)
run: pak::pkg_install("bioc::VariantAnnotation", dependencies = c("Depends", "Imports", "LinkingTo"))
shell: Rscript {0}
- uses: r-lib/actions/check-r-package@v2

- name: Generate test coverage report
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
.RData
.Ruserdata
revdep/
.DS_Store
1 change: 1 addition & 0 deletions BIGr.Rproj
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Version: 1.0
ProjectId: 0eeaab63-2615-4da7-b10a-927160fc78a3
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding ProjectId to the .Rproj file is usually machine-specific and can cause noisy diffs for collaborators (RStudio may regenerate it). If it’s not intentionally needed by the project, consider removing it to keep the repo stable across environments.

Suggested change
ProjectId: 0eeaab63-2615-4da7-b10a-927160fc78a3

Copilot uses AI. Check for mistakes.

RestoreWorkspace: No
SaveWorkspace: No
Expand Down
10 changes: 6 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: BIGr
Title: Breeding Insight Genomics Functions for Polyploid and Diploid Species
Version: 0.6.3
Version: 0.7.0
Authors@R: c(person(given='Alexander M.',
family='Sandercock',
email='sandercock.alex@gmail.com',
Expand All @@ -23,7 +23,7 @@ Authors@R: c(person(given='Alexander M.',
person(given='Dongyan',
family='Zhao',
role='ctb'),
person('Cornell', 'University',
person('University', "of Florida",
role=c('cph'),
comment = "Breeding Insight"))
Maintainer: Alexander M. Sandercock <sandercock.alex@gmail.com>
Expand All @@ -44,7 +44,7 @@ URL: https://github.com/Breeding-Insight/BIGr
BugReports: https://github.com/Breeding-Insight/BIGr/issues
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
RoxygenNote: 7.3.3
Depends: R (>= 4.4.0)
biocViews:
Imports:
Expand All @@ -62,12 +62,14 @@ Imports:
janitor,
quadprog,
tibble,
stringr
stringr,
ggplot2
Suggests:
covr,
spelling,
rmdformats,
knitr (>= 1.10),
rmarkdown,
polyRAD,
testthat (>= 3.0.0)
RdMacros: Rdpack
14 changes: 14 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ export(allele_freq_poly)
export(calculate_Het)
export(calculate_MAF)
export(check_homozygous_trios)
export(check_madc_sanity)
export(check_ped)
export(check_replicates)
export(dosage2vcf)
Expand All @@ -15,12 +16,15 @@ export(get_countsMADC)
export(imputation_concordance)
export(madc2gmat)
export(madc2vcf_all)
export(madc2vcf_multi)
export(madc2vcf_targets)
export(merge_MADCs)
export(solve_composition_poly)
export(thinSNP)
export(updog2vcf)
export(vmsg)
import(dplyr)
import(ggplot2)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAMESPACE imports ggplot2 (import(ggplot2)), but ggplot2 is not listed in DESCRIPTION Imports. This will make the package fail to load if ggplot2 is not installed. Either add ggplot2 to Imports or move plotting to a Suggests + requireNamespace() path.

Suggested change
import(ggplot2)

Copilot uses AI. Check for mistakes.
import(janitor)
Comment on lines +25 to 28
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import(ggplot2) was added, but ggplot2 is not listed in DESCRIPTION under Imports:. This will cause R CMD check to fail with a missing dependency. Either add ggplot2 to Imports (if it's a hard dependency) or move plotting behind requireNamespace("ggplot2", quietly = TRUE) and list it in Suggests.

Copilot uses AI. Check for mistakes.
import(parallel)
import(quadprog)
Expand All @@ -33,12 +37,22 @@ importFrom(Biostrings,DNAString)
importFrom(Biostrings,reverseComplement)
importFrom(Rdpack,reprompt)
importFrom(Rsamtools,bgzip)
importFrom(dplyr,"%>%")
importFrom(dplyr,across)
importFrom(dplyr,case_when)
importFrom(dplyr,filter)
importFrom(dplyr,group_by)
importFrom(dplyr,mutate)
importFrom(dplyr,select)
importFrom(dplyr,summarise)
importFrom(dplyr,where)
importFrom(pwalign,nucleotideSubstitutionMatrix)
importFrom(pwalign,pairwiseAlignment)
importFrom(readr,read_csv)
importFrom(reshape2,dcast)
importFrom(reshape2,melt)
importFrom(stats,cor)
importFrom(stats,reorder)
importFrom(stats,setNames)
importFrom(utils,packageVersion)
importFrom(utils,read.csv)
Expand Down
86 changes: 85 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,90 @@
# BIGr 0.7.0

## New function `madc2vcf_multi`

- New function `madc2vcf_multi` to convert a DArTag MADC file to a VCF using the polyRAD pipeline for multiallelic genotyping
- Runs `check_madc_sanity` before loading the data and stops with informative errors if:
- Required columns are missing
- IUPAC (non-ATCG) codes are present in AlleleSequence
- Ref/Alt sequences are unpaired (`RefAltSeqs = FALSE`)
- Allele IDs have not been fixed by HapApp (`FixAlleleIDs = FALSE`)
- CloneIDs do not follow `Chr_Pos` format and no `markers_info` is provided
- New argument `markers_info`: optional path or URL to a CSV with `CloneID`/`BI_markerID`, `Chr`, and `Pos` columns; required when CloneIDs do not follow the `Chr_Pos` format
- Runs `check_botloci` to validate and reconcile CloneIDs between the MADC and botloci file, automatically fixing padding mismatches
- A corrected temp file is written and passed to `readDArTag` only when needed (all-NA rows/columns detected, CloneIDs remapped by `check_botloci`, or botloci IDs remapped)
- Accepts paths or URLs for `madc_file`, `botloci_file`, and `markers_info`
- Estimates overdispersion with `polyRAD::TestOverdispersion`, iterates priors with `polyRAD::IterateHWE`, and exports the result with `polyRAD::RADdata2VCF`
- `polyRAD` is a soft dependency (listed under `Suggests`); an informative error is raised if it is not installed

# BIGr 0.6.6

## Updates on `madc2vcf_all`

- New arguments for controlling processing of `Other` alleles:
- `add_others`: if `TRUE` (default), alleles labeled "Other" in the MADC are included in off-target SNP extraction
- `others_max_snps`: discards Other alleles with more than this many SNP differences relative to the Ref sequence (default: 5)
- `others_rm_with_indels`: discards Other alleles containing insertions or deletions relative to the Ref sequence (default: `TRUE`)
- Others alleles that carry a different base at the target SNP position are now reported as a 3rd allele in the VCF instead of being silently dropped
- Target position is now correctly removed from Others alignments, preventing duplicate VCF positions and marker IDs
- Fixed a bug where Others alleles with "Ref_" or "Alt_" in their AlleleID would corrupt the target SNP REF/ALT fields and read depth counts in `merge_counts`
- Improved verbose messages throughout: counts of Other alleles found, kept, and discarded (by indel filter and by max SNP filter) are now reported; multiallelic target SNPs with a 3rd allele from Others are counted and reported
- Debug-level message (level 3) listing each Other allele added and its genomic position

# BIGr 0.6.5

## Updates on madc2vcf functions
Details:

- both functions targets and all (targets + off-targets) markers now have `check_madc_sanity` function implemented. It tests:
- [Columns] If MADC has the expected columns
- [allNArow | allNAcol] Presence of columns and rows with all NA (happens often when people open the MADC in excel before loading in R)
- [IUPACcodes] Presence of IUPAC codes on AlleleSequence
- [LowerCase] Presence of lower case bases on AlleleSequence
- [Indels] Presence of Indels
- [ChromPos] If CloneID follows the format Chr_Pos
- [RefAltSeqs] If all Ref Allele has corresponding Alt and vice-versa
- [OtherAlleles] If "Other" exists in the MADC AlleleID

- Better messages if `verbose = TRUE` in `madc2vcf_all`
- `madc2vcf_all` support for Indels - markers_info with Indels position is required; only the target indel is extracted, off-targets are ignored for the tag
- `madc2vcf_targets` doesn’t run if:
- MADC Column names are not correct
- Ignore Other alleles - but inform the user if they exist or not and direct them to `madc2vcf_all` in case they want to extract them as well
- See the table for madc2vcf_targets requirements accordingly to MADC content:

  | check status | get_REF_ALT | Requires
-- | -- | -- | --
IUPAC | TRUE | TRUE | markers_info REF/ALT
  | TRUE | FALSE | -
  | FALSE | TRUE | botloci or markers_info REF/ALT
  | FALSE | FALSE | -
Indels | TRUE | TRUE | markers_info REF/ALT
  | TRUE | FALSE | -
  | FALSE | TRUE | botloci or markers_info REF/ALT
  | FALSE | FALSE | -
ChromPos | TRUE | TRUE | botloci or markers_info REF/ALT
  | TRUE | FALSE | -
  | FALSE | TRUE | markers_info CHR/POS/REF/ALT or markers_info CHR/POS/ + botloci
  | FALSE | FALSE | markers_info CHR/POS
FixAlleleIDs | TRUE | TRUE | botloci or markers_info REF/ALT
  | TRUE | FALSE | -
  | FALSE | TRUE | markers_info REF/ALT
  | FALSE | FALSE | -

# BIGr 0.6.4

- Add function `vmsg` to organize messages printed on the console
- Add metadata to VCF header from madc2vcf_targets
Comment on lines +74 to +77
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEWS.md adds a BIGr 0.6.4 section, but DESCRIPTION still reports version 0.6.3. Align these so tooling/users don’t see conflicting package versions.

Copilot uses AI. Check for mistakes.
- Add argument `madc_object` to `get_countsMADC` to avoid reading the MADC file twice and to get directly the MADC fixed padding output from `check_botloci`
- Organize messages from `madc2vcf_targets` checks
- Add argument `collapse_matches_counts` and `verbose` to `madc2vcf_targets` function

# BIGr 0.6.3

- Ignore tags when targets are indels
- New function to check MADC files: `check_madc_sanity`. Currently, it checks for the presence of required columns, whether fixed allele IDs were assigned, the presence of IUPAC codes, lowercase sequence bases, indels, and chromosome and position information.
- Added new argument `markers_info`, which allows users to provide a CSV file with marker information such as CHROM, POS, marker type, and position of indels. For BI species, this information is available from [PanelHub](https://github.com/Breeding-Insight/BIGapp-PanelHub).
- Checked inputs for `madc2vcf_all`.
- Updated affiliation in `DESCRIPTION`.

# BIGr 0.6.2

Expand Down
Loading
Loading