-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: stats.zipf: incorrect pmf values #20692
Comments
Thanks for the report. Could you elaborate with a minimal working example for which parameter values the error occurs? I have a hard time interpreting the Wolfram Alpha computarion. The same holds for the other issue you opened. |
Thanks for adding the MWE, that helps a lot. |
For example, if the following code is executed: import numpy as np
import scipy.stats as stats
x = np.arange(0, 16).astype(np.int32)
dist = stats.zipf(9.0)
pmf = dist.pmf(x)
print(pmf) The values are as expected. [0.00000000e+00 9.97995633e-01 1.94921022e-03 5.07034310e-05 However, if the shape parameter is an integer, an incorrect value is obtained for x>=11, as shown below. import numpy as np
import scipy.stats as stats
x = np.arange(0, 16).astype(np.int32)
dist = stats.zipf(9)
pmf = dist.pmf(x)
print(pmf) [0.00000000e+00 9.97995633e-01 1.94921022e-03 5.07034310e-05 When executing the following lines def _pmf(self, k, a):
# zipf.pmf(k, a) = 1/(zeta(a) * k**a)
Pk = 1.0 / special.zeta(a, 1) / k**a
return Pk The input value k and the shape parameter a are both integer types, which may be causing the overflow. |
That is indeed the issue. The maximum value for int32 is |
Describe your issue.
The pmf value of zipf may return an incorrect value.
If the shape parameter 'a' is set to 9, pmf returns an incorrect value.
Scipy returns the following for inputs greater than or equal to 0:
0.000000000000000000e+00
9.979956327307618613e-01
1.949210220177269260e-03
5.070343101817618650e-05
3.807051211283729024e-06
5.109737639581500849e-07
9.903013870737536426e-08
2.473126213304208103e-08
7.435646897038533250e-09
2.576001169444504624e-09
9.979956327307618846e-10
0.000000000000000000e+00
1.154001579655571043e-09
4.953901915407116765e-10
0.000000000000000000e+00
0.000000000000000000e+00
In wolfram, the values will be as follows:
PDF[zipfdistribution[8], 11]
1/(2357947691 ζ(9))≈4.23248×10^-10
(Note: that in wolfram's definition, the shape parameter has a difference of 1.)
In my experiment, I obtained the following.
0
0.9979956327307621568646761321051
0.001949210220177269837626320570518
5.070343101817620062311010171748e-5
3.807051211283730151613907364292e-6
5.109737639581502243147141796378e-7
9.903013870737539184201191741694e-8
2.473126213304208857623746328563e-8
7.435646897038535452370912820884e-9
2.576001169444505442417827654193e-9
9.979956327307621568646761321051e-10
4.232475709872573915655519654635e-10
1.93418239662842562191429526205e-10
9.411058434986077082628878685352e-11
The cause of this bug is that there is no overflow avoidance when the shape paramter and input values to pmf are both integer types.
Reproducing Code Example
Error message
SciPy/NumPy/Python version and system information
The text was updated successfully, but these errors were encountered: