Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance freqai to accept categorical_features #8785

Open
mhgutier opened this issue Jun 14, 2023 · 4 comments
Open

enhance freqai to accept categorical_features #8785

mhgutier opened this issue Jun 14, 2023 · 4 comments
Assignees
Labels
freqAI Issues and PR's related to freqAI Question Questions - will be closed after some period of inactivity.

Comments

@mhgutier
Copy link

Describe your environment

(if applicable)

  • Operating system: macOS Ventura 13.4
  • Python Version: Python 3.9.6
  • CCXT version: ccxt==1.93.6
  • Freqtrade Version: freqtrade 2023.5.1

Describe the enhancement

im using categorical feature however they turn into negative values and lightgbm converts them to Nan

image

freqtrade | [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN
freqtrade | [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN
freqtrade | [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN
freqtrade | [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN
freqtrade | [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN
freqtrade | [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN
freqtrade | [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN
freqtrade | [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN
freqtrade | [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN
freqtrade | /home/ftuser/.local/lib/python3.10/site-packages/lightgbm/basic.py:1780: UserWarning: Overriding the parameters from Reference Dataset.
freqtrade | _log_warning('Overriding the parameters from Reference Dataset.')
freqtrade | /home/ftuser/.local/lib/python3.10/site-packages/lightgbm/basic.py:1513: UserWarning: categorical_column in param dict is overridden.
freqtrade | _log_warning(f'{cat_alias} in param dict is overridden.')

the expectation to categorical features should be (0,1,2)
image

Explain the enhancement you would like
is there a way to exclude categorical features normalization in the training_features?

below is the model used to let lightgbm recognize categorical_features

def fit(self, data_dictionary: Dict, dk: FreqaiDataKitchen, **kwargs) -> Any:
        """
        Most regressors use the same function names and arguments e.g. user
        can drop in LGBMRegressor in place of CatBoostRegressor and all data
        management will be properly handled by Freqai.
        :param data_dictionary: the dictionary constructed by DataHandler to hold
                                all the training and test data/labels.
        """
        categ_columns = [c for c in data_dictionary["train_features"].columns if "categ" in c]
        pd.set_option('display.max_rows', None)
        pd.set_option('display.max_columns', None)
        pd.set_option('display.width', None)
        pd.set_option('display.max_colwidth', None)
        print(categ_columns)
        print(data_dictionary["train_features"][categ_columns][-50:])
        if self.freqai_info.get('data_split_parameters', {}).get('test_size', 0.1) == 0:
            eval_set = None
            eval_weights = None
        else:
            eval_set = (data_dictionary["test_features"], data_dictionary["test_labels"])
            eval_weights = data_dictionary["test_weights"]
        X = data_dictionary["train_features"]
        y = data_dictionary["train_labels"]
        train_weights = data_dictionary["train_weights"]

        init_model = self.get_init_model(dk.pair)

        model = LGBMRegressor(**self.model_training_parameters)

        model.fit(X=X, 
                  y=y, 
                  eval_set=eval_set, 
                  sample_weight=train_weights,
                  eval_sample_weight=[eval_weights], 
                  init_model=init_model,
                  categorical_feature=categ_columns)

        return model

@xmatthias xmatthias added Question Questions - will be closed after some period of inactivity. freqAI Issues and PR's related to freqAI labels Jun 14, 2023
@mhgutier
Copy link
Author

mhgutier commented Jun 15, 2023

is it possible to enhance freqai and add categorical features in feature engineering with column name like "%-categ-"
user should use one hot encoder or labelencoder for the categorical features.
in my case i used (0,1,2)
categorical_features should be excluded from normalization.
then in the model user will be able to use these features by adding line in the fit()
categ_columns = [c for c in data_dictionary["train_features"].columns if "categ" in c]

and by adding in model.fit()
categorical_feature=categ_columns)

dataframe["%-categ-SUPERTd-period"]=np.where(dataframe['SUPERTd-period'] == 1, 1, np.where(dataframe['SUPERTd-period'] == -1, 2,0))
dataframe['%-categ-macd'] = np.where(dataframe['macd'] > dataframe['signal'], 1, np.where(dataframe['macd'] <= dataframe['signal'], 2, 0))
dataframe['%-categ-stoch1'] = np.where(dataframe['STOCH_K'] > dataframe['STOCH_D'], 1, np.where(dataframe['STOCH_K'] < dataframe['STOCH_D'], 2, 0))
dataframe['%-categ-stoch2'] = np.where(dataframe['STOCH_K'] <= 30, 1, np.where(dataframe['STOCH_K'] >= 70, 2, 0))
dataframe['%-categ-rsi'] = np.where(dataframe['rsi'] <= 30, 1, np.where(dataframe['rsi'] >= 70, 2, 0))

@mhgutier mhgutier changed the title categorical features are turning into negative (-) and lightgbm converts them to Nan enhance freqai to accept categorical_features Jun 16, 2023
@robcaulk
Copy link
Member

Hello,

Yes I recognize this because I identified this problem during a discord conversation with you. If you remember, i asked you to begin using the new pipeline PR #8692 so you could take full control over the normalization range, switching to 0,1 from the present -1,1. Thus avoiding the negative values entirely.

As I mentioned to you in the discord, your pipeline would be:

    import datasieve.transforms as ds
    def define_data_pipeline(self) -> Pipeline:
        """
        User defines their custom feature pipeline here (if they wish)
        """
        feature_pipeline = Pipeline([
            ('const', ds.VarianceThreshold()),
            ('qt', ds.SKLearnWrapper(MaxMinScaler(feature_range=(0,1))),
            ('di', ds.DissimilarityIndex(di_threshold=1)
        ])
        return feature_pipeline

while I agree it’s not the final solution to your problem, it is certainly the step in the correct direction. It makes no sense to spend time modifying the stable branch pipeline now since it’s being torn out and replaced as shown in #8692 .

As for automating something and using a keyword like categ, I’m not necessarily opposed to it, but it’s not quite so simple since there are implications to ensuring ensuring users can remove outliers from the data.

As for your current problem, please use the aforementioned code with #8692 (make sure to reinstall as the PR has new dependencies) to ensure you don’t have any negative values, then please let us know if it fixes your problem or not.

@mhgutier
Copy link
Author

thanks @robcaulk haha didnt know it was you . thanks for always helping me out... will check the pipeline....

@robcaulk
Copy link
Member

Also, I should add, if you aren’t making use of SVM, DI, or PCA, then you actually don’t need the normalization or the pipeline at all. Decision trees do not require normalized data, it is these other methods that do. You can determine if you are using those by looking at your config and seeing if they are set to true or false.

@robcaulk robcaulk reopened this Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
freqAI Issues and PR's related to freqAI Question Questions - will be closed after some period of inactivity.
Projects
None yet
Development

No branches or pull requests

3 participants