Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kl_divergence inside QBC does not work #134

Open
terry07 opened this issue Nov 21, 2017 · 7 comments
Open

kl_divergence inside QBC does not work #134

terry07 opened this issue Nov 21, 2017 · 7 comments

Comments

@terry07
Copy link

terry07 commented Nov 21, 2017

I am trying to get appropriate results through the disagreement method of kl_divergence, but an error is returned each time, reporting "mtrand.pyx", line 1121, in mtrand.RnadomState.choice - ValueError: a must be non-empty

Any ideas?
Thanks in advance.

@yangarbiter
Copy link
Collaborator

It seems like the avg_kl is empty in this case.
https://github.com/ntucllab/libact/blob/master/libact/query_strategies/query_by_committee.py#L208

Can you make sure that the unlabeled pool is not empty?

@terry07
Copy link
Author

terry07 commented Nov 22, 2017

Thanks for this notification. I used some flags and i noticed that the avg_kl ndarray consists of nan values, except one only. Is this the proper function?

@yangarbiter
Copy link
Collaborator

yangarbiter commented Nov 22, 2017

I don't think it is the proper function.

I guess these nan are generated here by the log function L156

One thing to check is the probability output of the students L204, which model are you using for the students?

@terry07
Copy link
Author

terry07 commented Nov 22, 2017

I am using ExtraTrees and SVC, but i tried also LogisticRegression as the example in the corresponding script, but i got the same error.

@yangarbiter
Copy link
Collaborator

Can you use a python debugger to check the value in the variable proba and check if the values in that list are all valid probability (0<p<1 and sum of each row are 1)
L205

Thanks.

@terry07
Copy link
Author

terry07 commented Nov 23, 2017

The dimensions of the exported proba are: (935, 3, 8) -> (number of unlabeled instances, number of students, numbers of classes)

The result of print np.sum(proba[:,0]) , np.sum(proba[:,1]) , np.sum(proba[:,2]) is 935.0 935.0 935.0
without any of these values violating probability terms.

@yangarbiter
Copy link
Collaborator

The probability should also not being 0 in proba and consensus https://github.com/ntucllab/libact/blob/master/libact/query_strategies/query_by_committee.py#L153. Maybe the probability output should be added with a small epsilon to all probability.

I would suggest using https://github.com/gotcha/ipdb to trace the code and find out where the nan starts to come out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants