How does TCRconvertR work?
TCRconvertR essentially performs a merge
between the
input data and a lookup table that includes the naming conventions for
each gene. These lookup tables are constructed from IMGT reference FASTA
files and account for the specific naming peculiarities of each
platform. The built-in lookup tables are located under
inst/extdata/
.
TCRconvert performs a merge
between the input data and a
lookup table that includes gene names with each nomenclature. These
lookup tables are constructed from IMGT reference FASTA files and
account for the specific naming peculiarities of each platform. The
built-in lookup tables are located under inst/extdata/
. The
code used to build the lookup tables, which demonstrates the conversion
logic, is within the build_lookup_from_fastas
function.
What input columns are required?
TCRconvertR expects at least one V, D, J, and/or C gene column. You can use standard 10X and Adaptive column names or custom names.
What if I have missing genes?
NA
values in the input dataframe will remain
NA
in the output. Genes that are not found in the lookup
table (which is based on the IMGT reference), will be converted to
NA
. The built-in lookup tables are located under
inst/extdata/
.
How are alleles added from 10X data?
Since 10X does not provide allele-level information, all genes are
assigned the allele *01
.
How are C genes converted to Adaptive?
Adaptive does not capture constant (“C”) gene information. If
converting to the Adaptive format, all C genes will be set to
NA
.
What column names should I use for my IMGT-formatted data?
IMGT does not have standard column names, so it’s assumed that the
10X names are used: c(v_gene
, d_gene
,
j_gene
, c_gene
). To use other names, specify
them as a character vector with frm_cols
.
Can I input AIRR files?
Yes, just specify the AIRR column names c(v_call
,
d_call
, j_call
, c_call
) using
frm_cols
. You must still specify the input naming
convention with frm
.
What if I have custom column names?
If you’re using non-standard column names that do not match 10X,
Adaptive, or Adaptive V2 formats, specify them with
frm_cols
.
What about odd names (e.g. TRAV14DV4
,
TCRAV01-02/12-02
)?
Gene names containing “OR” or “DV” are accounted for in the lookup tables.
Combinations of gene names, like TCRAV01-02/12-02
, will
be converted to NA
because they are not in the IMGT
reference.
Are non-human species supported?
Mouse and rhesus macaque are supported out-of-the-box. For other species, see “Using a custom reference” in the usage pages.
The rhesus and mouse lookup tables were built from IMGT reference FASTAs and gene tables. Mouse genes cover both “Mouse” and “Mouse C57BL/6J” as listed in IMGT.
What if my Adaptive data lacks
x_resolved
/xMaxResolved
columns?
Create them by combining x_gene
/xGeneName
and x_allele
/xGeneAllele
with *
as a separator. Example code:
library(dplyr)
# Adaptive
new_df <- adaptive_df %>%
mutate(v_resolved = case_when(
!is.na(v_allele) ~ paste0(v_gene, "*", v_allele),
.default = v_gene
),
d_resolved = case_when(
!is.na(d_allele) ~ paste0(d_gene, "*", d_allele),
.default = d_gene
),
j_resolved = case_when(
!is.na(j_allele) ~ paste0(j_gene, "*", j_allele),
.default = j_gene
))
# Adaptive v2
new_df <- adaptive_df %>%
mutate(vMaxResolved = case_when(
!is.na(vGeneAllele) ~ paste0(vGeneName, "*", vGeneAllele),
.default = vGeneName
),
dMaxResolved = case_when(
!is.na(dGeneAllele) ~ paste0(dGeneName, "*", dGeneAllele),
.default = dGeneName
),
jMaxResolved = case_when(
!is.na(jGeneAllele) ~ paste0(jGeneName, "*", jGeneAllele),
.default = jGeneName
))