-
Notifications
You must be signed in to change notification settings - Fork 3
Madc2vcf and Pedigree functions updates #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Changes from all commits
778aefa
1b761b9
743043a
0b97b46
82279af
6b81982
8205e4e
31248e3
757b01c
0934210
e18b2c6
768ab93
5c0b590
9afb265
c31118d
5d54f0d
ee50981
d3a4061
f765c7c
87bb1fc
7c12d49
df6fe92
6059c10
b09b0c1
409dbd3
e6fce19
669ac4e
bbfbee2
38c3564
55ee61a
bf5ff4c
291ae8e
84852da
96a4ed1
cec168d
0be2e0f
33fc87c
77107ba
ccf9e77
8a00c9e
f2013e3
b01c12b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,3 +3,4 @@ | |
| .RData | ||
| .Ruserdata | ||
| revdep/ | ||
| .DS_Store | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| Version: 1.0 | ||
| ProjectId: 0eeaab63-2615-4da7-b10a-927160fc78a3 | ||
|
|
||
| RestoreWorkspace: No | ||
| SaveWorkspace: No | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -4,6 +4,7 @@ export(allele_freq_poly) | |||
| export(calculate_Het) | ||||
| export(calculate_MAF) | ||||
| export(check_homozygous_trios) | ||||
| export(check_madc_sanity) | ||||
| export(check_ped) | ||||
| export(check_replicates) | ||||
| export(dosage2vcf) | ||||
|
|
@@ -15,12 +16,15 @@ export(get_countsMADC) | |||
| export(imputation_concordance) | ||||
| export(madc2gmat) | ||||
| export(madc2vcf_all) | ||||
| export(madc2vcf_multi) | ||||
| export(madc2vcf_targets) | ||||
| export(merge_MADCs) | ||||
| export(solve_composition_poly) | ||||
| export(thinSNP) | ||||
| export(updog2vcf) | ||||
| export(vmsg) | ||||
| import(dplyr) | ||||
| import(ggplot2) | ||||
|
||||
| import(ggplot2) |
Copilot
AI
Mar 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import(ggplot2) was added, but ggplot2 is not listed in DESCRIPTION under Imports:. This will cause R CMD check to fail with a missing dependency. Either add ggplot2 to Imports (if it's a hard dependency) or move plotting behind requireNamespace("ggplot2", quietly = TRUE) and list it in Suggests.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,90 @@ | ||
| # BIGr 0.7.0 | ||
|
|
||
| ## New function `madc2vcf_multi` | ||
|
|
||
| - New function `madc2vcf_multi` to convert a DArTag MADC file to a VCF using the polyRAD pipeline for multiallelic genotyping | ||
| - Runs `check_madc_sanity` before loading the data and stops with informative errors if: | ||
| - Required columns are missing | ||
| - IUPAC (non-ATCG) codes are present in AlleleSequence | ||
| - Ref/Alt sequences are unpaired (`RefAltSeqs = FALSE`) | ||
| - Allele IDs have not been fixed by HapApp (`FixAlleleIDs = FALSE`) | ||
| - CloneIDs do not follow `Chr_Pos` format and no `markers_info` is provided | ||
| - New argument `markers_info`: optional path or URL to a CSV with `CloneID`/`BI_markerID`, `Chr`, and `Pos` columns; required when CloneIDs do not follow the `Chr_Pos` format | ||
| - Runs `check_botloci` to validate and reconcile CloneIDs between the MADC and botloci file, automatically fixing padding mismatches | ||
| - A corrected temp file is written and passed to `readDArTag` only when needed (all-NA rows/columns detected, CloneIDs remapped by `check_botloci`, or botloci IDs remapped) | ||
| - Accepts paths or URLs for `madc_file`, `botloci_file`, and `markers_info` | ||
| - Estimates overdispersion with `polyRAD::TestOverdispersion`, iterates priors with `polyRAD::IterateHWE`, and exports the result with `polyRAD::RADdata2VCF` | ||
| - `polyRAD` is a soft dependency (listed under `Suggests`); an informative error is raised if it is not installed | ||
|
|
||
| # BIGr 0.6.6 | ||
|
|
||
| ## Updates on `madc2vcf_all` | ||
|
|
||
| - New arguments for controlling processing of `Other` alleles: | ||
| - `add_others`: if `TRUE` (default), alleles labeled "Other" in the MADC are included in off-target SNP extraction | ||
| - `others_max_snps`: discards Other alleles with more than this many SNP differences relative to the Ref sequence (default: 5) | ||
| - `others_rm_with_indels`: discards Other alleles containing insertions or deletions relative to the Ref sequence (default: `TRUE`) | ||
| - Others alleles that carry a different base at the target SNP position are now reported as a 3rd allele in the VCF instead of being silently dropped | ||
| - Target position is now correctly removed from Others alignments, preventing duplicate VCF positions and marker IDs | ||
| - Fixed a bug where Others alleles with "Ref_" or "Alt_" in their AlleleID would corrupt the target SNP REF/ALT fields and read depth counts in `merge_counts` | ||
| - Improved verbose messages throughout: counts of Other alleles found, kept, and discarded (by indel filter and by max SNP filter) are now reported; multiallelic target SNPs with a 3rd allele from Others are counted and reported | ||
| - Debug-level message (level 3) listing each Other allele added and its genomic position | ||
|
|
||
| # BIGr 0.6.5 | ||
|
|
||
| ## Updates on madc2vcf functions | ||
| Details: | ||
|
|
||
| - both functions targets and all (targets + off-targets) markers now have `check_madc_sanity` function implemented. It tests: | ||
| - [Columns] If MADC has the expected columns | ||
| - [allNArow | allNAcol] Presence of columns and rows with all NA (happens often when people open the MADC in excel before loading in R) | ||
| - [IUPACcodes] Presence of IUPAC codes on AlleleSequence | ||
| - [LowerCase] Presence of lower case bases on AlleleSequence | ||
| - [Indels] Presence of Indels | ||
| - [ChromPos] If CloneID follows the format Chr_Pos | ||
| - [RefAltSeqs] If all Ref Allele has corresponding Alt and vice-versa | ||
| - [OtherAlleles] If "Other" exists in the MADC AlleleID | ||
|
|
||
| - Better messages if `verbose = TRUE` in `madc2vcf_all` | ||
| - `madc2vcf_all` support for Indels - markers_info with Indels position is required; only the target indel is extracted, off-targets are ignored for the tag | ||
| - `madc2vcf_targets` doesn’t run if: | ||
| - MADC Column names are not correct | ||
| - Ignore Other alleles - but inform the user if they exist or not and direct them to `madc2vcf_all` in case they want to extract them as well | ||
| - See the table for madc2vcf_targets requirements accordingly to MADC content: | ||
|
|
||
| | check status | get_REF_ALT | Requires | ||
| -- | -- | -- | -- | ||
| IUPAC | TRUE | TRUE | markers_info REF/ALT | ||
| | TRUE | FALSE | - | ||
| | FALSE | TRUE | botloci or markers_info REF/ALT | ||
| | FALSE | FALSE | - | ||
| Indels | TRUE | TRUE | markers_info REF/ALT | ||
| | TRUE | FALSE | - | ||
| | FALSE | TRUE | botloci or markers_info REF/ALT | ||
| | FALSE | FALSE | - | ||
| ChromPos | TRUE | TRUE | botloci or markers_info REF/ALT | ||
| | TRUE | FALSE | - | ||
| | FALSE | TRUE | markers_info CHR/POS/REF/ALT or markers_info CHR/POS/ + botloci | ||
| | FALSE | FALSE | markers_info CHR/POS | ||
| FixAlleleIDs | TRUE | TRUE | botloci or markers_info REF/ALT | ||
| | TRUE | FALSE | - | ||
| | FALSE | TRUE | markers_info REF/ALT | ||
| | FALSE | FALSE | - | ||
|
|
||
| # BIGr 0.6.4 | ||
|
|
||
| - Add function `vmsg` to organize messages printed on the console | ||
| - Add metadata to VCF header from madc2vcf_targets | ||
|
Comment on lines
+74
to
+77
|
||
| - Add argument `madc_object` to `get_countsMADC` to avoid reading the MADC file twice and to get directly the MADC fixed padding output from `check_botloci` | ||
| - Organize messages from `madc2vcf_targets` checks | ||
| - Add argument `collapse_matches_counts` and `verbose` to `madc2vcf_targets` function | ||
|
|
||
| # BIGr 0.6.3 | ||
|
|
||
| - Ignore tags when targets are indels | ||
| - New function to check MADC files: `check_madc_sanity`. Currently, it checks for the presence of required columns, whether fixed allele IDs were assigned, the presence of IUPAC codes, lowercase sequence bases, indels, and chromosome and position information. | ||
| - Added new argument `markers_info`, which allows users to provide a CSV file with marker information such as CHROM, POS, marker type, and position of indels. For BI species, this information is available from [PanelHub](https://github.com/Breeding-Insight/BIGapp-PanelHub). | ||
| - Checked inputs for `madc2vcf_all`. | ||
| - Updated affiliation in `DESCRIPTION`. | ||
|
|
||
| # BIGr 0.6.2 | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding
ProjectIdto the.Rprojfile is usually machine-specific and can cause noisy diffs for collaborators (RStudio may regenerate it). If it’s not intentionally needed by the project, consider removing it to keep the repo stable across environments.