Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Essentiality function not working #129

Open
sbastkowski opened this issue Dec 15, 2021 · 1 comment
Open

Essentiality function not working #129

sbastkowski opened this issue Dec 15, 2021 · 1 comment

Comments

@sbastkowski
Copy link
Contributor

Hi,

I have issues with the essentiality function not returning any essential genes due to not finding an intersection point. I attach the histogram of the sample and you can see that there are 2 peaks, but the first is not recognised. I had this issue with several samples. Could you suggest a solution for this?
Regards and thanks for your help.
Sarah
Screenshot 2021-12-09 at 12 09 42

@lbarquist
Copy link
Contributor

Hi Sarah,

So, the first thing this script does is look for the first local minimum in the histogram, and uses this to heuristically separate the 'essential' (typically with a mode at 0) and non-essential (mode somewhere positive) distributions for curve fitting. In this case, your 'essential' distribution doesn't have a mode at 0, but is shifted right for some reason -- basically it means you're getting quite a few insertion sites called in what are supposed to be essential genes. The script is finding the local minimum at ~0 and trying to fit an exponential distribution to basically nothing, failing, and giving you nonsense as a result. My first thought is it might help to trim 3' and 5' ends, as the gene termini tend to tolerate insertions, though I'd be surprised if this explains the effect.

So, I've never seen this before in a lot of different datasets with different transposons and organisms, which leads me to think it's probably something with your data that's causing this. The possibilities I can think of are that there's an issue with the read mapping, and you're getting a lot of false insertion sites (this could be for instance an issue with soft-clipping), or you've sequenced stuff that's not just insertion sites but has some genomic DNA contamination. If you want to email me directly with some more details of the data you're working on, I'd be happy to discuss it more.

-Lars

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants