automatic detection of multioutput datasets #1001

jhmenke · 2020-01-14T14:51:53Z

What does this PR do?

Slight variation of #903 depending on the shape of the target array.

What are the relevant issues?

coveralls · 2020-01-14T14:55:30Z

Coverage remained the same at 96.747% when pulling 3dd6475 on jhmenke:development into 410d88c on EpistasisLab:development.

weixuanfu · 2020-01-14T15:17:06Z

Neat! Thank you for this PR. I think it should work for checking multi output dataset. But I checked this link, I think that not all regressors in our default config are supporting multi-output regression. I think we need a workaround for this issue.

jhmenke · 2020-01-15T07:41:15Z

There are several ways to go about this:

Give a warning if the standard regression / classification config is chosen for multi outputs, which would require the user to create their own config for multi output datasets.
Make a list with regressors and classifiers that don't support multi output and remove them automatically if such a dataset is detected
Embed these regressors and classifiers in MultiOutputRegressor/Classifier in the default config. Not sure about the performance losses there

weixuanfu · 2020-01-17T14:25:25Z

Thank you for your ideas here. I prefer the 3rd one but I think a better solution based on it is to automatically use MultiOutputRegressor/Classifier over pipelines that generated from current default config if TPOT detected the y is a multi-output target. Please let me know if you have any idea.

jhmenke · 2020-01-20T11:48:58Z

While working on it, i think i spotted a bug:

tpot/tpot/operator_utils.py

Line 215 in aea42a5

import_hash[import_str].append(dep_op_str)

This should be dep_import_str as the key and not import_str, right? It seems this also changes some tests.

…velopment

jhmenke · 2020-01-21T07:34:44Z

If i run the failing tests manually, they work. It has to be some unsuitable combination due to the random seed.. i'm not sure how to debug this.

weixuanfu · 2020-01-21T14:03:17Z

While working on it, i think i spotted a bug:

tpot/tpot/operator_utils.py

Line 215 in aea42a5

import_hash[import_str].append(dep_op_str)

This should be dep_import_str as the key and not import_str, right? It seems this also changes some tests.

Yes, that is a bug. Hmm, I thought I fixed it a while ago but might not merge to master/dev branch.

weixuanfu · 2020-01-21T15:10:49Z

tpot/base.py

@@ -501,6 +500,35 @@ def _fit_init(self):
 self._last_optimized_pareto_front_n_gens = 0
 self._setup_config(self.config_dict)

+ if multi_output_target:


Modifying _config_dict may not work in the situation that use use a customized configurations instead of default one. So, I think a practical way is to modify the _compile_to_sklearn function (here). If multi_output_target is True, then
sklearn_pipeline=MultiOutputClassifier(estimator=sklearn_pipeline) or sklearn_pipeline=MultiOutputRegessor(estimator=sklearn_pipeline) . I think it maybe a more general solution for multioutput dataset.

This seems better, to be honest i didn't find a good place where to put my code and only settled on the _fit_init function therefore.

Okay i looked into it, but the code would be a mess. Several functions would have to take multi_output_target as a new argument (most of them in export_utils.py), since they don't have access to the data or the TPOT Object.

Imo _fit_init seems to be the least intrusive point to include the checks

Thank you for looking into. You are right. I think TPOT exported codes should also include MultiOutputRegessor/MultiOutputClassifier, which should change a lot of codes in TPOT. I will look into it when I get some time next week.

any updates?

@jhmenke I am sorry for overlooking this. I did not get a chance to look into this issue those days due to my busy schedule. I agree that TPOT need some major changes for including MultiOutputRegessor/MultiOutputClassifier. I may get some time in March to add those changes. You are welcome to push any changes meanwhile.

Could we use my PR as a temporary fix until you have time to thoroughly refactor the code? I can prepare an update with the current development branch.

Sorry for the delay. I think we can use it for a temporary solution with a minor release.

Okay i merged the current master/development into this. Should be good to go as an interim solution then.

Fix a bug in the PR

Sorry, my mistake! I changed it back.

weixuanfu

I think we cannot add this support for now since it need more changes.

weixuanfu · 2020-05-13T12:57:56Z

tpot/base.py

+ if model in single_output_classifiers:
+ if 'sklearn.multioutput.MultiOutputClassifier' not in self._config_dict.keys():
+ self._config_dict['sklearn.multioutput.MultiOutputClassifier'] = {"estimator": {}}
+ self._config_dict['sklearn.multioutput.MultiOutputClassifier']['estimator'][model] = self._config_dict[model]


There is only one sklearn.multioutput.MultiOutputClassifier in self._config_dict and estimator is keeping updated until the last model in single_output_classifiers so that the rest models should be removed.

jessegmeyerlab · 2020-08-30T18:00:54Z

Is this the closest pull to multi-output? any plans to finish it?

jhmenke · 2020-09-09T17:52:45Z

Is this the closest pull to multi-output? any plans to finish it?

sorry, no time for this right now.. feel free to work with my branch. i think the last todo is cloning the multioutputclassifier for every estimator that only supports single outputs as per weixuanfu's review.

windowshopr · 2021-10-01T01:47:43Z

Would love to see multioutput regression/classification working with tpot! Keep going! Haha

automatic detection of multioutput datasets

3dd6475

jhmenke added 3 commits January 20, 2020 12:53

Merge branch 'master' of https://github.com/EpistasisLab/tpot into de…

e017fbe

…velopment

bugfix in operator_utils

81c0079

enable automatic multioutput for default regressors/classifiers

ae72c92

weixuanfu reviewed Jan 21, 2020

View reviewed changes

jhmenke and others added 3 commits May 9, 2020 14:00

Merge branch 'development' into development

2f5140a

Update operator_utils.py

124ec4b

Fix a bug in the PR

Update operator_utils.py

a86fdf9

Sorry, my mistake! I changed it back.

weixuanfu suggested changes May 13, 2020

View reviewed changes

eddiebergman mentioned this pull request Dec 1, 2020

Does TPOT support multi-label classification? #856

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatic detection of multioutput datasets #1001

automatic detection of multioutput datasets #1001

jhmenke commented Jan 14, 2020

coveralls commented Jan 14, 2020

weixuanfu commented Jan 14, 2020

jhmenke commented Jan 15, 2020

weixuanfu commented Jan 17, 2020

jhmenke commented Jan 20, 2020

jhmenke commented Jan 21, 2020

weixuanfu commented Jan 21, 2020

weixuanfu Jan 21, 2020

jhmenke Jan 22, 2020

jhmenke Jan 23, 2020

weixuanfu Jan 23, 2020

jhmenke Feb 14, 2020

weixuanfu Feb 14, 2020

jhmenke Apr 30, 2020

weixuanfu Apr 30, 2020

jhmenke May 9, 2020

weixuanfu left a comment

weixuanfu May 13, 2020

jessegmeyerlab commented Aug 30, 2020

jhmenke commented Sep 9, 2020

windowshopr commented Oct 1, 2021

automatic detection of multioutput datasets #1001

Are you sure you want to change the base?

automatic detection of multioutput datasets #1001

Conversation

jhmenke commented Jan 14, 2020

What does this PR do?

What are the relevant issues?

coveralls commented Jan 14, 2020

weixuanfu commented Jan 14, 2020

jhmenke commented Jan 15, 2020

weixuanfu commented Jan 17, 2020

jhmenke commented Jan 20, 2020

jhmenke commented Jan 21, 2020

weixuanfu commented Jan 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weixuanfu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jessegmeyerlab commented Aug 30, 2020

jhmenke commented Sep 9, 2020

windowshopr commented Oct 1, 2021