Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWM detection issue? [HELP] [QUESTION] #41

Open
EdPym opened this issue Nov 7, 2023 · 2 comments
Open

PWM detection issue? [HELP] [QUESTION] #41

EdPym opened this issue Nov 7, 2023 · 2 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@EdPym
Copy link

EdPym commented Nov 7, 2023

Describe the bug
Using individual Motif finder it appears to detect binding sites that don't match the PWM.

Here are two results from a search.
1518 | aatAAATCAGAGCTAaag | 0.769912 | + | → | n.d. | n.d | n.d
463 | gtcAAACTAAAGGACcgg | 0.769912 | + | → | n.d. | n.d | n.d

The G (7th Position) and C (10th position) are absolutely required in the PWM. So not sure why site 463 is found?

PWM = MA0451.1

Seq
atatcccaaggccgcaaagtcaacaagtcggcagcaaatttccctttgtccggcgatgtgttttttttttagccataactcgctgcattgtttgggccaagtttttcttctgccaaattgcggagatgatgcggggattatgcgctgattgcgtgcaattatggacatcctgcgaggccccgaggaacttcctgctaaatcctttcatccgcctacagaacccctttgtgtcccgttcgccgggagtccttgacgggtccttcgactattcgcttacagcagcttgcgtaaaatttcataaccctacgagcggctcttccgcggaatccctggcattatcctttttacctcttgccaatccgttggctaaaaaacggcttcgacttccgcgtaactgctggacaacaaagacaaaaaacggcgaaaggacggcgatttccaggtagcattgcgaattccgtcaaactaaaggaccggttatataacgggtttatatggccagaatctctgcatctccacgaccgccagaagctgcgtaaaactgcaggctctgttttgatttctgcaacttcagttaattgcccgggatggccagcaattgccggcaattataaaacagcgcagatgtgactcagcttccatatctaactctatatctcatgccgaaaatcGagggtggggagcggaggggcggggtgcgtgggtgacttgcctgccagggaaagggggcgggggttcagcgggtgataaatgtgcgtgatttggaatgaatgcgcatcgattaaaaccgcagggcaatcaatttagcgccttttacgccaaattggctcgtacacaaccaattaatgtcagcgggtgaactgacaccatcgcccaccaccgcatcccccttCcccctgttggccatccacccccgaaaaacaattacaacaacgaagacaagcagagggactgctgcagattccgctcaataaacctccaataaagcgaatccagcgtgaggcgtcgacgtctaattgctgttaactcgtcaactaggagaacgctccatcctcgccgttgtgcggctccttggacgcctgattaaacggattggagatgcgaggtgtacagtcgagcctccgtaagggcaaccaaaagtaaaaaacatcgactatttgaaatacaaagttttatatgtacatataatttatcaggctccggatgtaacttaattaaaacatttccttttcataaaatattgctagctgatagctgctcaaaagaacaataaaggtaataaattatgtttgcttgcaaacaattttcaatcaaaaaagtatgcgttccatcttagttaataattaattacctggataaagacttttgaaacatatcatagcgtttctttgcatattcaatactaaccaattttttataaatgAagttacaccgtttgtcgtcttgtcaagtagtatcttcacaataagtataatacagaatcaagatagtaaaataaaacaaaaaaCcgtgtgaataaatcagagctaaagacgtcggac

@EdPym EdPym added the bug Something isn't working label Nov 7, 2023
@Jumitti
Copy link
Owner

Jumitti commented Nov 7, 2023

Hi @EdPym

Yes, you are totally right. But this is not a bug. The Rel Score is calculated like this:

relscore equation

This means that, roughly, I sum the corresponding probabilities of each nucleotide at each position. So yes, for the same score, there are patterns found which are less relevant than certain or even false. And this is a very good example.

With a PWM more simple:
A [ 1.0000 0.5000 0.5000 0.0000 0.0000 ]
T [ 0.0000 0.0000 0.0000 0.0000 0.0000 ]
G [ 0.0000 0.5000 0.5000 1.0000 0.0000 ]
C [ 0.0000 0.0000 0.0000 0.0000 1.0000 ]

For AGGAC: 1 + 0.5 + 0.5 + 0 + 1 = 3
For ATTGC: 1 + 0 + 0 + 1 + 1 = 3

In this example, let's assume that A in position 1 and G in position 4 are obligatory. You see that for AGGACthere is no G at positon 4 and the score is equal to 3. In ATTGC A and G are good but the score is also equal to 3.

I'm working on a way to discriminate this more effectively. I create LCS option. It allows you to look at the number of similar consecutive nucleotides between the pattern found and the PWM. It requires a lot of resources so it is possible that it will crash the software. I am also working on a standalone which will allow us to get rid of Streamlit and have good computing power. But for your example it works. And you will see that ultimately, you may have other more interesting targets.
In your example:

Position Sequence RelScore LCS LCS lenght LCS RelScore Strand Direction
463 gtcAAACTAAAGGACcgg 0.769912 TAAAGG 6 0.300885 +
1518 aatAAATCAGAGCTAaag 0.769912 AAATCAGAGC 10 0.747788 +

It is important to understand that the RelScore is a global score. The LCS also calculates a RelScore but only on the retained part. So the LCS does a local score.

@Jumitti Jumitti added help wanted Extra attention is needed question Further information is requested and removed bug Something isn't working labels Nov 7, 2023
@Jumitti Jumitti changed the title PWM detection issue? [BUG] PWM detection issue? [HELP] [QUESTION] Nov 7, 2023
@EdPym
Copy link
Author

EdPym commented Nov 17, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants