List of ideas to improve assemblies #57

d4straub · 2021-08-19T08:24:13Z

This is a collection of ideas that should be considered after the DSL2 conversion #56 is finished. The list is subject to change. Any ideas or discussions are welcome.

Preprocessing (check out nf-core/mag, any other examples out there?)

Filtlong to filter ONT by quality (e.g. >7)
Bowtie2 to remove Illumina PhiX reads
Nanolyse (alternatively Minimap2) to remove ONT Lambda reads
add option to down-sample reads, because sometimes this can actually improve assembly

Assemblers:

MEGAHIT (a5-miseq Add A5-miseq support #23 , ...) to have alternative short read assembler
Trycycler to have better hybrid and long read assembly than Unicycler
Flye (Tulip, Redbean, Raven) to have more long read assemblers at hand
Pilon to polish Nanopore-derived contigs with Illumina reads (for long read assemblers)

Assembly QC:

BUSCO to check completeness and contamination of assemblies (and possibly bins)
MaxBin2 (or any other binner) to separate assembly (cleanup if contaminated). In contrast to other binners, MaxBin2 outputs "Completeness, Genome size, GC content" for each bin it found, that comes very handy when judging whether there is real contamination.

Structural:

Use only the most polished assembly for Prokka & QUAST (currently assemblies before polishing are used!)
By default, run all (or at least many) assemblers inclusive polishing (Medaka & Pilon) that are appropriate for a data set. That allows easy comparison (with e.g. QUAST and BUSCO) of the performance of different assemblers and choosing the best assembly.

Defaults

In my opinion, --skip_kraken2 should be either removed (i.e. using --krakendb to determine whether Kraken2 is used) or a simple default (small, fast, but helpful) value should be chosen for --krakendb, e.g. "https://genome-idx.s3.amazonaws.com/kraken/16S_Greengenes13.5_20200326.tgz". This is a very small 16S database but should be sufficient to detect serious bacterial contamination.

The text was updated successfully, but these errors were encountered:

Daniel-VM · 2023-10-18T14:38:10Z

Working on Flye and Pilon!

erinyoung · 2023-10-18T15:51:20Z

add option to down-sample reads, because sometimes this can actually improve assembly

Filtlong can down-sample reads to the longest/highest quality reads and rasusa can downsample randomly.

I know there are more papers about the ideal depth for assembly, but I can only find this old one for now (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0060204).

In my own experience, there are a lot more sequencing artifacts once you get above 100X.

erinyoung · 2023-10-18T15:57:25Z

Another idea I recommend adding is a rotation step. This ensures all bacterial chromosomes at least start at dnaA.

A case-in-point. These are two chromosomes from a clonal outbreak. They are actual very similar, but one wasn't rotated correctly.

There are a few tools that rotate circular sequences. I think circlator fixstart (abandonware) and dnaapler are the ones that I use most.

erinyoung · 2023-10-18T16:03:09Z

For Assembly QC, I'm a fan of gfastats for metrics about the created gfa files and nanoplot. They have a lot of overlapping features, but gfastats does indicate if a sequence is circular. Nanoplot already has a module in multiQC.

d4straub · 2023-10-19T07:20:25Z

I actually made very good experience for nanopore assembly with dragonflye (in nf-core modules: https://nf-co.re/modules/dragonflye), the results were close to identical with trycycler results, but execution of the former was very fast (few minutes) while with trycycler it was a chore with many manual inventions.

Daniel-VM · 2023-10-19T12:48:56Z

Those are really good points @erinyoung and @d4straub 🙌🏾 🙌🏾 .

Downsample step

Yep, downsample is indeed necessary. We could try random subsampling with rasusa.. In De Maio N et.al., 2019 mentioned that the random strategy generates better assemblies compared to filtering strategy. But, it always depends on the input data and goal.
Nevertheless, we can think about adding Filtlong or NanoFilt in the quality filtering step (after adapter trimming with porechop?).

Rotation step

Sure, but I think that Ciclator is not supported either... What do you suggest? Adding ciclator together with dnaapler?, or just dnaapler?

dragonflye - Longreads assembly

Interesting, I haven't tried this tool yet. But if it overcomes the manual intervention of Tricycler, then it would be great to add this module. I know that Flye allows not only ONT but also PACBIO.
dragonfly works with ONT reads only, doesn't it? .

Daniel-VM · 2023-10-19T12:52:18Z

I have found these two papers that may help us to decide. Both include a detailed flowchart with some of the tools we already have included and additional tools/strategies:

Molina-Mora J.A et.al, 2020

LaSarre B et.al., 2022

d4straub · 2023-10-19T14:14:26Z

Trycycler will require large effort to automatize. For example rrwick/Trycycler#47
So Dragonflye is the way to go for now I think.

erinyoung · 2023-11-07T17:20:04Z

Here's a blog post from Dr. Wick about depth and quality : https://rrwick.github.io/2023/11/06/accuracy-vs-depth-update.html

You can see in the plot that accuracy improved up to ~100× depth, after which additional reads brought no benefit. In fact, some of the genomes got a bit worse with higher depth, which was surprising.

d4straub added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Aug 19, 2021

d4straub added this to the 2.1.0 milestone Aug 19, 2021

d4straub mentioned this issue Aug 20, 2021

Merging template updates 2.1 #56

Merged

18 tasks

This was referenced Oct 13, 2021

new module: trycycler nf-core/modules#829

Open

new module: flye nf-core/modules#830

Closed

new module: pilon nf-core/modules#831

Closed

new module: filtlong nf-core/modules#832

Closed

Daniel-VM mentioned this issue Oct 25, 2023

New module - dragonflye - for long-read assembly #101

Closed

Daniel-VM mentioned this issue Mar 18, 2024

Add flye to nf-core/bacass #115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of ideas to improve assemblies #57

List of ideas to improve assemblies #57

d4straub commented Aug 19, 2021 •

edited

Loading

Daniel-VM commented Oct 18, 2023

erinyoung commented Oct 18, 2023

erinyoung commented Oct 18, 2023 •

edited

Loading

erinyoung commented Oct 18, 2023

d4straub commented Oct 19, 2023

Daniel-VM commented Oct 19, 2023

Daniel-VM commented Oct 19, 2023

d4straub commented Oct 19, 2023

erinyoung commented Nov 7, 2023

List of ideas to improve assemblies #57

List of ideas to improve assemblies #57

Comments

d4straub commented Aug 19, 2021 • edited Loading

Preprocessing (check out nf-core/mag, any other examples out there?)

Assemblers:

Assembly QC:

Structural:

Defaults

Daniel-VM commented Oct 18, 2023

erinyoung commented Oct 18, 2023

erinyoung commented Oct 18, 2023 • edited Loading

erinyoung commented Oct 18, 2023

d4straub commented Oct 19, 2023

Daniel-VM commented Oct 19, 2023

Downsample step

Rotation step

dragonflye - Longreads assembly

Daniel-VM commented Oct 19, 2023

d4straub commented Oct 19, 2023

erinyoung commented Nov 7, 2023

d4straub commented Aug 19, 2021 •

edited

Loading

erinyoung commented Oct 18, 2023 •

edited

Loading