Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use viz_sequence. #110

Open
FritzPeleke opened this issue Sep 7, 2020 · 1 comment
Open

How to use viz_sequence. #110

FritzPeleke opened this issue Sep 7, 2020 · 1 comment

Comments

@FritzPeleke
Copy link

Hi @AvantiShri,
i have two questions:
i have used deeplift to compute importance scores for my sequences (true positive and true negative predictions). I wanted to get a logo representation as the one in the notebook under examples which uses deeplift.visualization. How will you suggest i do that. I want the sequence logo to point out the nucleotides with really highscores that are responsible for shifting the decision to the particular class. I am working with a binary classification problem, where sequences are either expressed or unexpressed. The output dimension of my sequence inputs are (1, 4, 1000, 1).

My second question is how do i make the nucleotides that contribute thesame for the true positives and true negative look like the contribute nothing. I want to focus on only the positions which contribute to either true negative or true positive.
Below is small part of my deeplift code that calculates the contribution scores.

scores_tp = np.array(deeplift_contrib_func(task_idx=1, input_data_list=[tp_data],
input_references_list=[tp_shuff_data],
batch_size=10, progress_update=1000))
scores_tn = np.array(deeplift_contrib_func(task_idx=1, input_data_list=[tn_data],
input_references_list=[tn_shuff_data],
batch_size=10, progress_update=1000))

@AvantiShri
Copy link
Collaborator

Hi Fritz, based on your email to me it seems like you figured out the visualization. I am not sure I understand your second question i.e. "how do i make the nucleotides that contribute the same for the true positives and true negative look like the contribute nothing" - aren't the sequences different between the true positive and true negative sequences? If so, in what sense is a given nucleotide contributing "the same" to both true positives and true negatives? Do you mean to identify locations within the sequence that are specifically contributing to true positives? If so, you could probably compare the average score across all true positives to the average score across all true negatives - but I'm not sure this is what you are looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants