RepLoad 10X data #374

TheRaspberryFox · 2023-08-13T19:36:02Z

Hello,

Great package. However, I am running into an issue when loading in my files with RepLoad. Specifically, my files are filtered_contig_annotation.csv files from 10X. However, these are the only headers. I am running into issues as there appears to be a requirement for fwr1,fwr1_nt,cdr1,cdr1_nt,fwr2,fwr2_nt,cdr2,cdr2_nt,fwr3,fwr3_nt column headers. I only have column names for the following:

"barcode" "is_cell" "contig_id" "high_confidence" "length" "chain" "v_gene" "d_gene" "j_gene" "c_gene" "full_length" "productive" "cdr3" "cdr3_nt" "reads" "umis" "raw_clonotype_id" "raw_consensus_id"

Is there a way to read in the data with the data that I have?

Thanks

margaretc-ho · 2024-02-20T17:02:27Z

I have the same question! I am following these instructions https://immunarch.com/articles/web_only/load_10x.html#prepare-10x-data and trying to read in the data downloaded from this 10X genomics dataset (seems like a very standard dataset) and the filtered_contig_annotations has the following columns:
barcode is_cell contig_id high_confidence length chain v_gene d_gene j_gene c_gene full_length productive cdr3 cdr3_nt reads umis raw_clonotype_id raw_consensus_id donor origin
and not the following columns that RepLoad is looking for. Namely I get the error:

> file_path = "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data"
> immdata_10x <- repLoad(file_path)

== Step 1/3: loading repertoire files... ==

Processing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data" ...
  -- [1/5] Parsing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data/sc5p_v2_hs_B_flu_aggregated_5gex_b_vdj_b_clonotypes.csv" -- unsupported format, skipping
  -- [2/5] Parsing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data/sc5p_v2_hs_B_flu_aggregated_5gex_b_vdj_b_consensus_annotations.csv" -- 10x (consensus)
  -- [3/5] Parsing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data/sc5p_v2_hs_B_flu_aggregated_5gex_b_vdj_b_filtered_contig_annotations.csv" -- 10x (filt.contigs)
Error in `df[, vec_names]`:
! Can't subset columns that don't exist.
✖ Columns `cdr1_nt`, `cdr1`, `cdr2_nt`, `cdr2`, `fwr1_nt`, etc. don't exist.
Backtrace:
 1. immunarch::repLoad(file_path)
 2. immunarch (local) .process_batch(batches[[batch_i]], .mode, .coding)
 3. immunarch (local) .read_repertoire(.filepath, .mode, .coding, ...)
 4. immunarch (local) parse_fun(.path, .mode, ...)
 5. immunarch:::parse_repertoire(...)
 9. tibble:::`[.tbl_df`(df, , vec_names)

Any suggestions? @vadimnazarov

margaretc-ho · 2024-02-20T17:12:11Z

It seems like this is a common issue that many others are having when trying to load in 10X Genomics data
#363
#358
We are all seeming to get this same error because of the column names
So far, seems no solution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RepLoad 10X data #374

RepLoad 10X data #374

TheRaspberryFox commented Aug 13, 2023

margaretc-ho commented Feb 20, 2024 •

edited

margaretc-ho commented Feb 20, 2024 •

edited

RepLoad 10X data #374

RepLoad 10X data #374

Comments

TheRaspberryFox commented Aug 13, 2023

margaretc-ho commented Feb 20, 2024 • edited

margaretc-ho commented Feb 20, 2024 • edited

margaretc-ho commented Feb 20, 2024 •

edited

margaretc-ho commented Feb 20, 2024 •

edited