Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] Can't continued training on Dataset with SequenceDataset(lgb.Sequence) #6413

Open
eromoe opened this issue Apr 11, 2024 · 0 comments
Labels

Comments

@eromoe
Copy link

eromoe commented Apr 11, 2024

Description

Can't continued training on Dataset with SequenceDataset(lgb.Sequence)

Reproducible example

I create dataset from lgb.Sequence

class PartitionSequence(lgb.Sequence):
    def __init__(self, data:np.ndarray, batch_size=4096):
        self.data = data
        self.batch_size = batch_size

    def __getitem__(self, idx):
        return self.data[idx]

    def __len__(self):
        return len(self.data)


....
....
data.append(PartitionSequence(X.values, batch_size))
dataset = lgb.Dataset(data, label=y, 
                      feature_name=list(X.columns), 
                      weight=weight, 
                      position=dates, 
                      categorical_feature=cat_cols,
                      free_raw_data=free_raw_data)

When I continue trainning with loop

model=None
for train_data in iter_read(....):
        model = lgb.train(
            params,
            train_data,
            init_model=model,
            num_boost_round=num_boost_round,
            keep_training_booster=True,
        )

Get error

Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Cannot convert data list to numpy array.
  File "C:\envs\quant\Lib\site-packages\lightgbm\basic.py", line 1174, in predict
    data = np.array(data)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (24,) + inhomogeneous part.

The above exception was the direct cause of the following exception:

  File "C:\envs\quant\Lib\site-packages\lightgbm\basic.py", line 1176, in predict
    raise ValueError('Cannot convert data list to numpy array.') from err
  File "C:\envs\quant\Lib\site-packages\lightgbm\basic.py", line 1972, in _set_init_score_by_predictor
    init_score: Union[np.ndarray, scipy.sparse.spmatrix] = predictor.predict(
  File "C:\envs\quant\Lib\site-packages\lightgbm\basic.py", line 2801, in _set_predictor
    self._set_init_score_by_predictor(
  File "C:\envs\quant\Lib\site-packages\lightgbm\engine.py", line 202, in train
    ._set_predictor(predictor) \
  File "E:\Workspace\github_me\聚宽项目\notebook_exports.py", line 2271, in train_lgb_ensemble
    m = lgb.train(params, ds,
  File "E:\Workspace\github_me\聚宽项目\train_test_model_scroll.py", line 224, in <module>
    model = train_lgb_ensemble(params, train_data,
  File "C:\envs\quant\Lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\envs\quant\Lib\runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
ValueError: Cannot convert data list to numpy array.

Debug into find
image

image

Apparently can't convert list of lgb.Sequence into np.array .

Environment info

LightGBM version or commit hash:
lightgbm 4.3.0

@jameslamb jameslamb added the bug label Apr 11, 2024
@jameslamb jameslamb changed the title Can't continued training on Dataset with SequenceDataset(lgb.Sequence) [python-package] Can't continued training on Dataset with SequenceDataset(lgb.Sequence) Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants