Skip to content
This repository has been archived by the owner on Dec 6, 2023. It is now read-only.

[BUG] Results from liblinear and SAGAClassifier do not match #113

Open
arthurmensch opened this issue Mar 2, 2017 · 5 comments
Open

[BUG] Results from liblinear and SAGAClassifier do not match #113

arthurmensch opened this issue Mar 2, 2017 · 5 comments

Comments

@arthurmensch
Copy link

I wanted to check the differences between the output of SAGAClassifier and the output of liblinear, from LogisticRegression in sklearn. It turns out that both estimators return very different coefficients when solving what I thought would be the same problem. I suspect that I do not set the regularisation parameters correctly, would you be able to enlighten me ?

  iris = load_iris()
    X, y = iris.data, iris.target
    X = np.concatenate([X] * 10)
    y = np.concatenate([y] * 10)

    X_bin = X[y <= 1]
    y_bin = y[y <= 1] * 2 - 1

    X_sparse, y_sparse = make_classification(n_samples=100, n_features=50,
                                             random_state=0)
    X_sparse = sparse.csr_matrix(X_sparse)

    for penalty in ['l2', 'l1']:
        for (X, y) in ((X_bin, y_bin), (X_sparse, y_sparse)):
            n_samples = X.shape[0]
            for alpha in [1e-4, 0.001, 0.01, 0.1, 1, 10, 100]:
                liblinear = LogisticRegression(
                    C=1. / (n_samples * alpha),
                    solver='liblinear',
                    multi_class='ovr',
                    max_iter=500,
                    fit_intercept=False,
                    penalty=penalty, random_state=0, tol=1e-24)

                if penalty == 'l1':
                    lalpha = 0
                    lbeta = alpha
                    lpenalty = 'l1'
                else:
                    lalpha = alpha
                    lbeta = 0
                    lpenalty = None
                lsaga = SAGAClassifier(loss='log',
                                       beta=lbeta, penalty=lpenalty,
                                       alpha=lalpha,
                                       max_iter=5000,
                                       random_state=0)

                lsaga.fit(X, y)
                liblinear.fit(X, y)
                print('[penalty=%s, alpha=%s, solver=liblinear]' % (penalty, alpha),
                      liblinear.coef_[0, :4])
                print('[penalty=%s, alpha=%s, solver=lightning]' % (penalty, alpha),
                      lsaga.coef_[0, :4])
                print('-------------------------------')

returns

[penalty=l2, alpha=0.0001, solver=liblinear] [-1.02531282 -3.15400719  4.88951892  2.4742867 ]
[penalty=l2, alpha=0.0001, solver=lightning] [-0.96940347 -2.93327997  4.42703281  2.06674462]
-------------------------------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.70002361 -2.29196084  3.47376739  1.6533943 ]
[penalty=l2, alpha=0.001, solver=lightning] [-0.81731435 -2.54960375  3.90450615  1.82232541]
-------------------------------
[penalty=l2, alpha=0.01, solver=liblinear] [-0.44234397 -1.48544888  2.23987699  1.01147785]
[penalty=l2, alpha=0.01, solver=lightning] [-0.48432713 -1.47725009  2.27903945  1.0521148 ]
-------------------------------
[penalty=l2, alpha=0.1, solver=liblinear] [-0.22184315 -0.77409488  1.17300667  0.51035061]
[penalty=l2, alpha=0.1, solver=lightning] [-0.24775234 -0.78256783  1.16843028  0.50506395]
-------------------------------
[penalty=l2, alpha=1, solver=liblinear] [-0.05236005 -0.22889097  0.36543505  0.15461335]
[penalty=l2, alpha=1, solver=lightning] [-0.03478164 -0.22241039  0.38099032  0.15941418]
-------------------------------
[penalty=l2, alpha=10, solver=liblinear] [ 0.00451328 -0.02524653  0.05703947  0.0231134 ]
[penalty=l2, alpha=10, solver=lightning] [ 0.00504563 -0.02507184  0.05748536  0.02325224]
-------------------------------
[penalty=l2, alpha=100, solver=liblinear] [ 0.00194671 -0.0018153   0.00675508  0.00263555]
[penalty=l2, alpha=100, solver=lightning] [ 0.00196397 -0.00180612  0.00676533  0.00263853]
-------------------------------
[penalty=l2, alpha=0.0001, solver=liblinear] [-0.82022128  0.92745245  0.42971564 -1.32700661]
[penalty=l2, alpha=0.0001, solver=lightning] [-0.2659946   0.39512438  0.2258429  -0.81652081]
-------------------------------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.46367915  0.56980419  0.26686545 -0.89167593]
[penalty=l2, alpha=0.001, solver=lightning] [-0.22980838  0.36281145  0.20797067 -0.74839268]
-------------------------------
[penalty=l2, alpha=0.01, solver=liblinear] [-0.14942186  0.2984446   0.13926796 -0.51601786]
[penalty=l2, alpha=0.01, solver=lightning] [-0.1077949   0.26593897  0.11853815 -0.47146184]
-------------------------------
[penalty=l2, alpha=0.1, solver=liblinear] [-0.01941695  0.15948197  0.04128331 -0.15189835]
[penalty=l2, alpha=0.1, solver=lightning] [-0.02838471  0.15804439  0.02516528 -0.14709557]
-------------------------------
[penalty=l2, alpha=1, solver=liblinear] [-0.00669644  0.04957467 -0.00320788 -0.00985278]
[penalty=l2, alpha=1, solver=lightning] [-0.0062626   0.04967    -0.00297072 -0.00952412]
-------------------------------
[penalty=l2, alpha=10, solver=liblinear] [-0.00093277  0.00708531 -0.00238096  0.00041233]
[penalty=l2, alpha=10, solver=lightning] [-0.0009186   0.00712768 -0.00248968  0.00046571]
-------------------------------
[penalty=l2, alpha=100, solver=liblinear] [ -9.60703453e-05   7.45333077e-04  -2.91400764e-04   7.26364227e-05]
[penalty=l2, alpha=100, solver=lightning] [ -9.66109354e-05   7.45361637e-04  -2.92248813e-04   7.30893945e-05]
-------------------------------
[penalty=l1, alpha=0.0001, solver=liblinear] [-0.32907481 -5.48237105  6.84810188  0.        ]
[penalty=l1, alpha=0.0001, solver=lightning] [-0.97892471 -2.97072992  4.4728889   2.07649284]
-------------------------------
[penalty=l1, alpha=0.001, solver=liblinear] [ 0.         -4.09888584  4.5791252   0.        ]
[penalty=l1, alpha=0.001, solver=lightning] [-0.88400508 -2.86859767  4.29820189  1.88896483]
-------------------------------
[penalty=l1, alpha=0.01, solver=liblinear] [ 0.         -2.51730703  2.81037122  0.        ]
[penalty=l1, alpha=0.01, solver=lightning] [-0.23747537 -2.27213624  2.99581196  0.24480634]
-------------------------------
[penalty=l1, alpha=0.1, solver=liblinear] [ 0.         -1.05958628  1.22152771  0.        ]
[penalty=l1, alpha=0.1, solver=lightning] [ 0.         -1.07317208  1.23880475  0.        ]
-------------------------------
[penalty=l1, alpha=1, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=1, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=10, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=10, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=100, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=100, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=0.0001, solver=liblinear] [-0.44078639  0.01458268  1.32293892 -1.10445545]
[penalty=l1, alpha=0.0001, solver=lightning] [-0.25885727  0.38927315  0.22696332 -0.81394116]
-------------------------------
[penalty=l1, alpha=0.001, solver=liblinear] [-0.07313686  0.06330551  0.64000195 -0.76230004]
[penalty=l1, alpha=0.001, solver=lightning] [-0.16498383  0.31122609  0.21626616 -0.72676074]
-------------------------------
[penalty=l1, alpha=0.01, solver=liblinear] [ 0.          0.          0.09711175 -0.36113837]
[penalty=l1, alpha=0.01, solver=lightning] [ 0.          0.11207056  0.08627249 -0.30509918]
-------------------------------
[penalty=l1, alpha=0.1, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=0.1, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=1, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=1, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=10, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=10, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=100, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=100, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------

@arthurmensch
Copy link
Author

arthurmensch commented Mar 2, 2017

I should add that SAGClassifier (for l2 penalty) seems to behave the same way.

    iris = load_iris()
    X, y = iris.data, iris.target
    X = np.concatenate([X] * 10)
    y = np.concatenate([y] * 10)

    X_bin = X[y <= 1]
    y_bin = y[y <= 1] * 2 - 1

    X_sparse, y_sparse = make_classification(n_samples=1000, n_features=50,
                                             random_state=0)
    X_sparse = sparse.csr_matrix(X_sparse)

    for (X, y) in ((X_bin, y_bin), (X_sparse, y_sparse)):
        for penalty in ['l2']:
            n_samples = X.shape[0]
            for alpha in np.logspace(-3, 3, 5):
                liblinear = LogisticRegression(
                    C=1. / (n_samples * alpha),
                    solver='liblinear',
                    multi_class='ovr',
                    max_iter=500,
                    fit_intercept=False,
                    penalty=penalty, random_state=0, tol=1e-24)

                if penalty == 'l1':
                    lalpha = 0
                    lbeta = alpha
                    lpenalty = 'l1'
                else:
                    lalpha = alpha
                    lbeta = 0
                    lpenalty = None
                lsag = SAGClassifier(loss='log',
                                       beta=lbeta, penalty=lpenalty,
                                       alpha=lalpha,
                                       max_iter=3000,
                                       random_state=0)

                lsaga = SAGAClassifier(loss='log',
                                       beta=lbeta, penalty=lpenalty,
                                       alpha=lalpha,
                                       max_iter=3000,
                                       random_state=0)

                lsaga.fit(X, y)
                lsag.fit(X, y)
                liblinear.fit(X, y)
                print('[penalty=%s, alpha=%s, solver=liblinear]' % (penalty, alpha),
                      liblinear.coef_[0, :4])
                print('[penalty=%s, alpha=%s, solver=lightning saga]' % (penalty, alpha),
                      lsaga.coef_[0, :4])
                print('[penalty=%s, alpha=%s, solver=lightning sag]' % (penalty, alpha),
                      lsag.coef_[0, :4])
                print('-------------------------------')
--------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.70002361 -2.29196084  3.47376739  1.6533943 ]
[penalty=l2, alpha=0.001, solver=lightning saga] [-0.81731435 -2.54960375  3.90450615  1.82232541]
[penalty=l2, alpha=0.001, solver=lightning sag] [-1.20407646 -4.19130767  6.18008358  2.65898967]
-------------------------------
[penalty=l2, alpha=0.0316227766017, solver=liblinear] [-0.32789047 -1.11596898  1.68437735  0.74534804]
[penalty=l2, alpha=0.0316227766017, solver=lightning saga] [-0.37339261 -1.12060485  1.68188194  0.74885022]
[penalty=l2, alpha=0.0316227766017, solver=lightning sag] [-0.33469617 -1.11047128  1.6849291   0.74818174]
-------------------------------
[penalty=l2, alpha=1.0, solver=liblinear] [-0.05236005 -0.22889097  0.36543505  0.15461335]
[penalty=l2, alpha=1.0, solver=lightning saga] [-0.03478164 -0.22241039  0.38099032  0.15941418]
[penalty=l2, alpha=1.0, solver=lightning sag] [-0.05261102 -0.2286055   0.36511032  0.15468706]
-------------------------------
[penalty=l2, alpha=31.6227766017, solver=liblinear] [ 0.00430706 -0.00666999  0.02016419  0.00797502]
[penalty=l2, alpha=31.6227766017, solver=lightning saga] [ 0.00447114 -0.00657897  0.02025385  0.00799994]
[penalty=l2, alpha=31.6227766017, solver=lightning sag] [ 0.00431818 -0.00666541  0.02017375  0.00797861]
-------------------------------
[penalty=l2, alpha=1000.0, solver=liblinear] [ 0.0002283  -0.00016419  0.00069642  0.00026974]
[penalty=l2, alpha=1000.0, solver=lightning saga] [ 0.00022848 -0.00016409  0.00069653  0.00026977]
[penalty=l2, alpha=1000.0, solver=lightning sag] [ 0.00022836 -0.00016415  0.00069646  0.00026975]
-------------------------------
--------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.07378359 -0.08176338  0.15239206  0.150337  ]
[penalty=l2, alpha=0.001, solver=lightning saga] [-0.11194055 -0.08520462  0.17792485  0.13378968]
[penalty=l2, alpha=0.001, solver=lightning sag] [-0.10485795 -0.0808454   0.17528227  0.15793832]
-------------------------------
[penalty=l2, alpha=0.0316227766017, solver=liblinear] [-0.00666195 -0.057333    0.07570911  0.06161208]
[penalty=l2, alpha=0.0316227766017, solver=lightning saga] [-0.00618323 -0.07040822  0.10739897  0.01089067]
[penalty=l2, alpha=0.0316227766017, solver=lightning sag] [-0.00214345 -0.05492471  0.07050465  0.06115048]
-------------------------------
[penalty=l2, alpha=1.0, solver=liblinear] [ 0.00666119 -0.00775256  0.01907765  0.01085946]
[penalty=l2, alpha=1.0, solver=lightning saga] [ 0.00667705 -0.0076178   0.01907703  0.01098435]
[penalty=l2, alpha=1.0, solver=lightning sag] [ 0.00669436 -0.00763707  0.01888839  0.01085222]
-------------------------------
[penalty=l2, alpha=31.6227766017, solver=liblinear] [ 0.00041758 -0.00028051  0.00096995  0.00054382]
[penalty=l2, alpha=31.6227766017, solver=lightning saga] [ 0.00041997 -0.00028113  0.0009699   0.00054591]
[penalty=l2, alpha=31.6227766017, solver=lightning sag] [ 0.00041891 -0.00028166  0.00096977  0.00054561]
-------------------------------
[penalty=l2, alpha=1000.0, solver=liblinear] [  1.35809058e-05  -8.90077970e-06   3.13036897e-05   1.75473968e-05]
[penalty=l2, alpha=1000.0, solver=lightning saga] [  1.35823360e-05  -8.90193905e-06   3.13035749e-05   1.75491991e-05]
[penalty=l2, alpha=1000.0, solver=lightning sag] [  1.35823230e-05  -8.90193147e-06   3.13035523e-05   1.75492055e-05]

@fabianp
Copy link
Member

fabianp commented Mar 2, 2017

Add tol=-1 to the kwargs of SAG*Classifier :-)

Also, note that lightning has a definition the logistic loss that is very slightly different from that of liblinear. In particular, the one in lightning is an approximation using some heuristics. See https://github.com/scikit-learn-contrib/lightning/blob/master/lightning/impl/sgd_fast.pyx , class Log . This only matters typically if you look beyond 10^-4 (e.g. if you plot convergence plots).

@arthurmensch
Copy link
Author

Indeed, sorry for the noise, it now works. Do you use this approximation for speed or underflow control ?

@fabianp
Copy link
Member

fabianp commented Mar 2, 2017

Mathieu did this, but I would say speed as underflow can be controlled by other ways (liblinear does this pretty well).

@mblondel
Copy link
Member

mblondel commented Mar 6, 2017

If I am not mistaken, the log loss trick comes from scikit-learn, which itself comes from Leon Bottou's SGD code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants