Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very low detection rate with custom-cDNA using native barcoding LSK114 #57

Open
patbohn opened this issue Jul 20, 2023 · 0 comments
Open

Comments

@patbohn
Copy link

patbohn commented Jul 20, 2023

Hi, I want to use tailfindr to estimate the poly A tail lengths on some PCR-cDNA data generated with custom primers with native ligation Kit 14 (LSK114 with EXP-NBD114). The library structure looks like this:

Native Adapter (motor) - barcode - [ TSO - mRNA - polyA - revcomp_RT_primer] - (optional: revcomp_barcode)

Alternatively the revcomp of the insert can ligate to the motor protein to yield:

Native Adapter (motor) - barcode - [RT_primer - polyT - revcomp_mRNA - revcomp_TSO] - (optional: revcomp_barcode)

Note that if there is no barcode on the 3' end of the DNA, the revcomp_RT_primer or the TSO will be partially present (as the last ~15 nt are not sequenced/basecalled), i.e.

Native Adapter (motor) - barcode - [ TSO - mRNA - polyA - revcomp_RT_primer_wo_last_15nt]

Native Adapter (motor) - barcode - [RT_primer - polyT - revcomp_mRNA - revcomp_TSO_wo_last_15nt]

One more thing to note is that our TSO contains a randomized sequence, i.e. the actual 3' end of the TSO is GCTCTTCCGATCTNNNNNNNTATAGGG - however I am not sure whether tailfindr is able to accept this.

I am using this command to perform the analysis:
df <- find_tails(fast5_dir = fast5_dir, save_dir = out_dir, csv_filename = 'cdna_tails.csv', num_cores = 4, dna_datatype = 'custom-cdna', save_plots = TRUE, plotting_library = 'rbokeh', plot_debug_traces = TRUE, front_primer = "GCTCTTCCGATCT", end_primer = "CTAAGAGCAAGAAGAAGCC" )

To find out the right conditions for front and end_primer (i.e. how long I should specify them) I wanted to evaluate the matching from plots, however when I run the command on 1000 reads for only 20 there is a tail detected ("tail_is_valid" == TRUE), and for 7 of these there is a tail_start, tail_end and tail_length value.

Evaluating the plots for these, they don't align to actual polyA/T tails (some of them I'm not able to see a tail at all, which may be related to some biological condition where we do expect some short, ~ 10 nt long tails). However, from basecalled data alone we do see some poly-A/T tails that we would expect to be detected by tailfindr.

In a parallel workflow using cutadapt, we have detected adapters, oriented reads and trimmed polyA tails for ~50% of all reads, so we do expect to see higher rates with tailfindr. I suspect that our primer sequences (or tailfindr's internal detection) may not be adapted to our library setup? If so, where should we alter this?

Attached is the plot where even though a polyA/T tail exists, it is not detected properly.

tailfindr_troubleshooting.zip

PS: With ONT now advertising the native barcoding LSK114 kit for direct cDNA sequencing, I guess that this type of library (possibly with different RT and TSO primers though) may become more common.

Thanks in advance and best wishes,
Patrick

@patbohn patbohn changed the title Plots not generated and tail lengths all NA Very low detection rate with custom-cDNA Kit14 Jul 20, 2023
@patbohn patbohn changed the title Very low detection rate with custom-cDNA Kit14 Very low detection rate with custom-cDNA using native barcoding LSK114 Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant