Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RepLoad 10X data #374

Open
TheRaspberryFox opened this issue Aug 13, 2023 · 2 comments
Open

RepLoad 10X data #374

TheRaspberryFox opened this issue Aug 13, 2023 · 2 comments

Comments

@TheRaspberryFox
Copy link

Hello,

Great package. However, I am running into an issue when loading in my files with RepLoad. Specifically, my files are filtered_contig_annotation.csv files from 10X. However, these are the only headers. I am running into issues as there appears to be a requirement for fwr1,fwr1_nt,cdr1,cdr1_nt,fwr2,fwr2_nt,cdr2,cdr2_nt,fwr3,fwr3_nt column headers. I only have column names for the following:

"barcode" "is_cell" "contig_id" "high_confidence" "length" "chain" "v_gene" "d_gene" "j_gene" "c_gene" "full_length" "productive" "cdr3" "cdr3_nt" "reads" "umis" "raw_clonotype_id" "raw_consensus_id"

Is there a way to read in the data with the data that I have?

Thanks

@margaretc-ho
Copy link

margaretc-ho commented Feb 20, 2024

I have the same question! I am following these instructions https://immunarch.com/articles/web_only/load_10x.html#prepare-10x-data and trying to read in the data downloaded from this 10X genomics dataset (seems like a very standard dataset) and the filtered_contig_annotations has the following columns:
barcode is_cell contig_id high_confidence length chain v_gene d_gene j_gene c_gene full_length productive cdr3 cdr3_nt reads umis raw_clonotype_id raw_consensus_id donor origin
and not the following columns that RepLoad is looking for. Namely I get the error:

> file_path = "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data"
> immdata_10x <- repLoad(file_path)

== Step 1/3: loading repertoire files... ==

Processing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data" ...
  -- [1/5] Parsing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data/sc5p_v2_hs_B_flu_aggregated_5gex_b_vdj_b_clonotypes.csv" -- unsupported format, skipping
  -- [2/5] Parsing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data/sc5p_v2_hs_B_flu_aggregated_5gex_b_vdj_b_consensus_annotations.csv" -- 10x (consensus)
  -- [3/5] Parsing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data/sc5p_v2_hs_B_flu_aggregated_5gex_b_vdj_b_filtered_contig_annotations.csv" -- 10x (filt.contigs)
Error in `df[, vec_names]`:
! Can't subset columns that don't exist.
✖ Columns `cdr1_nt`, `cdr1`, `cdr2_nt`, `cdr2`, `fwr1_nt`, etc. don't exist.
Backtrace:
 1. immunarch::repLoad(file_path)
 2. immunarch (local) .process_batch(batches[[batch_i]], .mode, .coding)
 3. immunarch (local) .read_repertoire(.filepath, .mode, .coding, ...)
 4. immunarch (local) parse_fun(.path, .mode, ...)
 5. immunarch:::parse_repertoire(...)
 9. tibble:::`[.tbl_df`(df, , vec_names)

Any suggestions? @vadimnazarov

@margaretc-ho
Copy link

margaretc-ho commented Feb 20, 2024

It seems like this is a common issue that many others are having when trying to load in 10X Genomics data
#363
#358
We are all seeming to get this same error because of the column names
So far, seems no solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants