Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about put pattern and internal reads together to analyse #382

Open
Dkaaaaa opened this issue Dec 4, 2023 · 3 comments
Open

Question about put pattern and internal reads together to analyse #382

Dkaaaaa opened this issue Dec 4, 2023 · 3 comments

Comments

@Dkaaaaa
Copy link

Dkaaaaa commented Dec 4, 2023

Hi,
I am confused about the result and use of zUMIs pipeline.
Here are my yaml content.
pm1-1.yaml.txt
Firts, my input was the paired-end reads with start of specify pattern sequence. The reads1 contain total 1657623 reads, while the STAR result : filtered.tagged.Log.final.out shows that Number of input reads are 846533, Q1 I think that was because of the filterring condition and cDNA range setted in reads1 in my yaml file. Am I right?
image

Secondly, I have found that reads id in
"pm1.filtered.tagged.unmapped.bam <flag including: 4>",
"pm1.filtered.tagged.Aligned.out.bam<flag including: 0, 16>" and
"pm1.filtered.Aligned.GeneTagged.sorted.bam<flag including: 0, 16>" are same.
Q2 And why "pm1.filtered.tagged.unmapped.bam reads id are the same as pm1.filtered.tagged.Aligned.out.bam and pm1.filtered.Aligned.GeneTagged.sorted.bam.
What's more, "pm1.filtered.tagged.Aligned.toTranscriptome.out.bam<flag including: 0, 16, 252, 276>" has missed some reads according to above three bam files, below is the miss reads in "pm1.filtered.tagged.Aligned.toTranscriptome.out.bam" in the pm1.filtered.Aligned.GeneTagged.sorted.bam file.
miss-in-toTranscriptome.bam.txt, I also check the first read in miss reads bam result mapping position, below is the ENSG0000014267 position of transcriptome of my reference, and is no problem.
Snipaste_2023-12-04_13-23-32
Q3 Why these reads miss in pm1.filtered.tagged.Aligned.toTranscriptome.out.bam?

Finally, I have separate my raw reads into paired patterned_reads and paired internal_reads. And I think you should know that my data was silmilar to smart-seq3, but my data was based on 3' polyA to obtain the mRNA.
pm1-1.yaml above was input with patterned_reads, and the reads1 was set for BC and UMI only, the reads2 are set for cDNA. Now, I wanna put my internal reads together to analyze, below is my new yaml content<pm1-2.yaml.txt>. I set the paired internal reads as file3 and file4, with cDNA range: 1-150. When I run with this yaml file, there are some erro below. _Q4 How should I do to put my patterned_reads and internal reads together to analyze?
image
image
below is my yaml file.
pm1-2.yaml.txt
below is my new STAR filtered.tagged.Log.final.out shows that Number of input reads are 846533, it seem the file3 and file4 are fail to put together to analyze. While the Uniquely mapped reads number are less than not put together to analyze.
image

I am so puzzled about above, looking forward to your reply, thanks a lot!
Dka

@cziegenhain
Copy link
Collaborator

Hi,

as mentioned in your other issue, the use of the particular 11bp pattern "ATTGCGCAATG" is reserved to the processing of Smart-seq3 data. our pipeline is hardcoded in this case and I am unfortunately unable to provide support to custom protocols that you might be trying to process.
Sorry about this,

Christoph

@Dkaaaaa
Copy link
Author

Dkaaaaa commented Dec 4, 2023

I am still puzzle about your answer.
Below is the smartseq3 yaml.
image

What are the file3 and file4 function for this pipline?
What if I do not separate my data into patterns reads and internal reads, and than just setup the file1 and file2 like this:
file1:
name: /home/ccy/1-scrna-data-2023-11-14/rawdata/star-test-1/patterns_and_internal_1.fq.gz
base_definition:
- BC(12-17,33-40,56-63)
- UMI(64-69)
find_pattern: ATTGCGCAATG
file2:
name: /home/ccy/1-scrna-data-2023-11-14/rawdata/star-test-1/patterns_and_internal_2.fq.gz
base_definition:
- cDNA(1-150)
What happen to those do not start with pattern's reads in file1, will they use to mapping? or will be drop?

@bioinfotec
Copy link

bioinfotec commented Jan 9, 2024

@Dkaaaaa zUMIs will filter some low quality reads according to barcode and UMIs before go to STAR and i think that's why the number of input reads is less than in reads1 file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants