Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculation of <prob_topics> in <Latent Dirichlet Allocation.ipynb> #2

Open
DaveRockt opened this issue Jun 29, 2018 · 4 comments
Open

Comments

@DaveRockt
Copy link

DaveRockt commented Jun 29, 2018

Hello,

is the calculation of the conditional probability of assigning each topic <prob_topics> correct?
It does not seem to be the same as in the referenced Wikipedia article.

Best,
David

@hammadshaikhha
Copy link
Owner

@DaveRockt Thanks for raising the issue, I just noticed this. Could you be a bit more specific and compare the formula I have with the one in wikipedia?

I will review my LDA notebook in the meantime and see if I can spot a mistake.

@DaveRockt
Copy link
Author

DaveRockt commented Jul 8, 2018

Thank you for your answer and sorry for not being specific:

In the formula on Wikipedia, I cannot find what you call 'denom1'. In the meantime, however, I found out that after normalising, I get the same result as you.

However, I have another question:

In your code in the 'Main part of LDA algorithm' under 'Add in current word back into count matrixes', you use 'init_topic_assign'. Does this make sense? Shouldn't you use the new assigned topic?

Thank you for this project by the way, it really helped me to understand LDA better.

Best,
David

@hammadshaikhha
Copy link
Owner

Hi David,

Sorry for the late reply, I skimmed over the code and you may be right. I think one way to check would be to use a python library and run LDA on the same data set and see whether the results match. Could you work on doing that?

If there is indeed a mistake in the code, feel free to fix it and do a pull request.

@DaveRockt
Copy link
Author

DaveRockt commented Jul 17, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants