sklearn compatibility #101

petrovicboban · 2023-04-21T12:28:33Z

No description provided.

petrovicboban · 2023-04-21T12:33:52Z

@edwardwliu can you figure out why test is failing here? I've tried to do it, but it goes deep into logic (categorical feature mapping).

edwardwliu · 2023-04-22T17:46:00Z

@petrovicboban The issue was find_match() wasn't identifying matching values for different np.int and np.float64 types. I added a cast, but it's possible this could be handled in a cleaner manner. The specific test appears case to be passing now.

petrovicboban · 2023-04-22T19:24:12Z

@edwardwliu thanks! I thought the issue was in calling function, because the test was failing when arr_b was numpy array. In all other tests, it was list.

petrovicboban · 2023-04-22T20:50:26Z

@edwardwliu
it seems that fit() should not change parameters which are set during __init__(), and that's exactly what we do.
Here is test failure: https://github.com/forestry-labs/Rforestry/actions/runs/4774656551/jobs/8488479086

I guess we should skip setting those parameters in __init__() and set them in fit() only, but I'm not sure about implications of that.

edwardwliu · 2023-04-22T23:12:38Z

@petrovicboban I see, let's only use the following parameters in fit() and remove them from __init__() if overlapping:

feature_weights
deep_feature_weights
observation_weights
lin_feats
monotonic_constraints
groups

petrovicboban · 2023-04-23T10:06:30Z

@edwardwliu it looks like we need to move these too:

        if self.max_depth is None:
            self.max_depth = round(nrow / 2) + 1

        if self.interaction_depth is None:
            self.interaction_depth = self.max_depth

        if self.max_obs is None:
            self.max_obs = y.size

because they depend on argument of fit()

edwardwliu · 2023-04-24T14:29:19Z

I see, yes let's move those parameters to fit() for now as well. Conceptually, the params max_depth and interaction_depth could be agnostic to the data provided in fit(), but the implementation in this package requires this change.

petrovicboban · 2023-04-25T20:29:03Z

@edwardwliu can you check current test failures? One is related to you recent change (groups) and one is from check_estimator()

edwardwliu · 2023-04-25T22:36:37Z

@petrovicboban Both of these test cases should be passing be now. The estimator test was because we do not currently support a data type scipy.sparse.csc_matrix. In addition, it may be useful to parameterize checks while I debug.

petrovicboban · 2023-04-26T13:07:16Z

@edwardwliu this looks good for now. It fails only on estimator's pickle check. Probably because saved_forest_ attribute is expected during the process but is added only after call to translate_tree. I've tried several solutions, but failed.

After we solve that, I need to improve __init__ parameters validation.

github-actions · 2023-05-22T17:41:06Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Statements	Covered	Coverage	Threshold	Status
814	646	79%	60%	🟢

New Files

No new covered files...

Modified Files

File	Coverage	Status
Python/random_forestry/forestry.py	79%	🟢
Python/random_forestry/preprocessing.py	72%	🟢
Python/random_forestry/validators/init.py	100%	🟢
Python/random_forestry/validators/base_validator.py	91%	🟢
Python/random_forestry/validators/fit_validator.py	87%	🟢
Python/random_forestry/validators/predict_validator.py	89%	🟢
TOTAL	86%	🟢

**updated for commit: dc2031a

petrovicboban · 2023-05-22T17:56:04Z

@petrovicboban Both of these test cases should be passing be now. The estimator test was because we do not currently support a data type scipy.sparse.csc_matrix. In addition, it may be useful to parameterize checks while I debug.

It's passing with:

    def _more_tags(self):
        return {
            "_xfail_checks": {
                "check_estimators_pickle": "To be fixed later",
                "check_n_features_in": "To be fixed later",
                "check_estimators_nan_inf": "To be fixed later",
                "check_dtype_object": "To be fixed later",
            },
        }

Ilia-Shutov · 2023-08-22T05:57:20Z

New PR based on this one was created #146

petrovicboban self-assigned this Apr 21, 2023

petrovicboban added enhancement New feature or request Python Python changes labels Apr 21, 2023

petrovicboban force-pushed the bp/sklearn branch 2 times, most recently from 70b6587 to a1b0f64 Compare April 21, 2023 19:24

petrovicboban force-pushed the bp/sklearn branch from ab807cc to 0e8115b Compare April 25, 2023 20:22

petrovicboban force-pushed the bp/sklearn branch 2 times, most recently from ae7357e to 15b4b49 Compare April 26, 2023 12:41

edwardwliu mentioned this pull request Apr 28, 2023

Include tests for cross-language exact reproducibility #102

Merged

petrovicboban force-pushed the bp/sklearn branch from 5ec4c63 to b750c35 Compare May 1, 2023 08:36

petrovicboban force-pushed the bp/sklearn branch from b750c35 to 4481241 Compare May 10, 2023 18:11

This was linked to issues May 11, 2023

Compatibility with sklearn estimators #98

Open

has_nas method defined, but not used #111

Open

petrovicboban marked this pull request as ready for review May 23, 2023 20:56

Boban Petrovic and others added 4 commits May 24, 2023 13:49

Initial work on sklearn compatibility

a975c71

cast int to float for dict keys

fb8285c

Fix some "predict" non-compatibility

7db3446

Move parameters from __init__() to fit()

f426600

edwardwliu and others added 7 commits May 24, 2023 13:49

update group_memberships parameter

8278236

do not allow sparse matrices

4d4fd60

More changes on sklearn compatibility

cfe2210

Change in traslate_tree

fa8ce82

Add more parameters validations

de080af

Allow NaNs in X

e596b22

Code cleanup

317414d

petrovicboban force-pushed the bp/sklearn branch 5 times, most recently from d39b195 to a849644 Compare May 24, 2023 18:09

petrovicboban added 2 commits May 24, 2023 14:10

Add magic method __eq__

5f48df4

Reorganize tests

dc2031a

petrovicboban force-pushed the bp/sklearn branch from a849644 to dc2031a Compare May 24, 2023 18:10

petrovicboban marked this pull request as draft June 5, 2023 14:52

Ilia-Shutov mentioned this pull request Aug 22, 2023

Python: RandomForest conforms to sklearn Estimator interface #146

Open

Ilia-Shutov closed this Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sklearn compatibility #101

sklearn compatibility #101

petrovicboban commented Apr 21, 2023

petrovicboban commented Apr 21, 2023

edwardwliu commented Apr 22, 2023 •

edited

petrovicboban commented Apr 22, 2023

petrovicboban commented Apr 22, 2023 •

edited

edwardwliu commented Apr 22, 2023

petrovicboban commented Apr 23, 2023

edwardwliu commented Apr 24, 2023

petrovicboban commented Apr 25, 2023 •

edited

edwardwliu commented Apr 25, 2023

petrovicboban commented Apr 26, 2023

github-actions bot commented May 22, 2023 •

edited

petrovicboban commented May 22, 2023

Ilia-Shutov commented Aug 22, 2023

sklearn compatibility #101

sklearn compatibility #101

Conversation

petrovicboban commented Apr 21, 2023

petrovicboban commented Apr 21, 2023

edwardwliu commented Apr 22, 2023 • edited

petrovicboban commented Apr 22, 2023

petrovicboban commented Apr 22, 2023 • edited

edwardwliu commented Apr 22, 2023

petrovicboban commented Apr 23, 2023

edwardwliu commented Apr 24, 2023

petrovicboban commented Apr 25, 2023 • edited

edwardwliu commented Apr 25, 2023

petrovicboban commented Apr 26, 2023

github-actions bot commented May 22, 2023 • edited

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

petrovicboban commented May 22, 2023

Ilia-Shutov commented Aug 22, 2023

edwardwliu commented Apr 22, 2023 •

edited

petrovicboban commented Apr 22, 2023 •

edited

petrovicboban commented Apr 25, 2023 •

edited

github-actions bot commented May 22, 2023 •

edited