Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛BUG] FEARec模型训练时陷入死循环 #2020

Open
yin214 opened this issue Mar 18, 2024 · 2 comments
Open

[🐛BUG] FEARec模型训练时陷入死循环 #2020

yin214 opened this issue Mar 18, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@yin214
Copy link

yin214 commented Mar 18, 2024

描述这个 bug
FEARec模型在sports数据集上训练时会陷入死循环卡住

如何复现
复现这个 bug 的步骤:

  1. 您引入的额外 yaml 文件

hidden_dropout_prob: 0.5         # (float) The probability of an element to be zeroed.
attn_dropout_prob: 0.5           # (float) The probability of an attention score to be zeroed.


global_ratio: 0.6                  # (float) The ratio of frequency components
dual_domain: False               # (bool) Frequency domain processing or not
std: False                       # (bool) Use the specific time index or not
spatial_ratio: 0.1                 # (float) The ratio of the spatial domain and frequency domain
fredom: True                    # (bool)  Regularization in the frequency domain or not
fredom_type: None                # (str)  The type of loss in different scenarios
topk_factor: 5                   # (int)  To aggregate time delayed sequences with high autocorrelation


epochs: 100  #训练的最大轮数
train_batch_size: 8192
eval_batch_size: 8192

learning_rate: 0.001
# training_neg_sample_num: 1 #负采样数目
eval_step: 1 #每次训练后做evalaution的次数
stopping_step: 10
valid_metric: recall@20

topk: [1,5,10,20]

neg_sampling: ~

eval_args: {'split':{'RS': [0.8,0.1,0.1]}, 'order': 'TO', 'mode': 'full'}
  1. 您的运行脚本
    python run_recbole.py --model=FEARec --dataset=sports --config_files=./config_files/fearec.yaml --checkpoint_dir='./saved/FEARec/sports'
    预期
    跑了其他几个数据集没有出现这种情况

屏幕截图
卡在这种状态不动了
屏幕截图 2024-03-18 215932
应该是在模型代码213到223行陷入死循环

            while True:
                sample_index = random.choice(targets_index)
                cur_item_list = interaction[self.ITEM_SEQ][i].to("cpu")
                sample_item_list = dataset.inter_feat[self.ITEM_SEQ][sample_index]
                are_equal = torch.equal(cur_item_list, sample_item_list)
                sample_item_length = dataset.inter_feat[self.ITEM_SEQ_LEN][sample_index]
                if not are_equal or lens == 1:
                    #print("helllo")
                    sem_pos_lengths.append(sample_item_length)
                    sem_pos_seqs.append(sample_item_list)
                    break

链接
添加能够复现 bug 的代码链接,如 Colab 或者其他在线 Jupyter 平台。(可选)

实验环境(请补全下列信息):
我在两台机器上都出现了这个bug

@yin214 yin214 added the bug Something isn't working label Mar 18, 2024
@yin214
Copy link
Author

yin214 commented Mar 18, 2024

# Basic Information
USER_ID_FIELD: user_id          # (str) Field name of user ID feature.
ITEM_ID_FIELD: item_id          # (str) Field name of item ID feature.
RATING_FIELD: rating            # (str) Field name of rating feature.
TIME_FIELD: timestamp           # (str) Field name of timestamp feature.
seq_len: ~                      # (dict) Field name of sequence feature: maximum length of each sequence
LABEL_FIELD: label              # (str) Expected field name of the generated labels for point-wise dataLoaders. 
threshold: ~                    # (dict) 0/1 labels will be generated according to the pairs.
NEG_PREFIX: neg_                # (str) Negative sampling prefix for pair-wise dataLoaders.

# Sequential Model Needed
ITEM_LIST_LENGTH_FIELD: item_length   # (str) Field name of the feature representing item sequences' length. 
LIST_SUFFIX: _list              # (str) Suffix of field names which are generated as sequences.
MAX_ITEM_LIST_LENGTH: 50       # (int) Maximum length of each generated sequence.
POSITION_FIELD: position_id     # (str) Field name of the generated position sequence.

user_inter_num_interval: "[10,inf)"
item_inter_num_interval: "[10,inf)"

load_col:                       # (dict) The suffix of atomic files: (list) field names to be loaded.
    inter: [user_id, item_id, rating, timestamp]
    item: [item_id, categories]
selected_features: [categories]
item_attribute: categories

@TayTroye
Copy link
Collaborator

@yin214 Hello! Thanks for your careful check! We have fixed this bug in #2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants