time_range for the finetuning experiment #31

yihong-chen · 2021-05-03T17:01:13Z

Thanks again for this awesome repo. It helps me a lot. I've got a question regarding which time_range to use for sampling subgraphs for test. For example, in finetune_OAG_PF.py, this line is used to prepare the input to GNN:

node_feature, node_type, edge_time, edge_index, edge_type, x_ids, ylabel =  node_classification_sample(randint(), test_pairs, test_range)

where test_range is used to filter out nodes when sampling the subgraph as shown in L128 in data.py

if source_time > np.max(list(time_range.keys())) or source_id in layer_data[source_type]:
    continue

It looks that some test edges (which are not the prediction targets for current batch but might be the prediction targets for other batches) might be included in the sampled subgraph even after the masking process in Line 114 in finetune_OAG_PF.py.

    '''
        (3) Mask out the edge between the output target nodes (paper) with output source nodes (L2 field)
    '''
    masked_edge_list = []
    for i in edge_list['paper']['field']['rev_PF_in_L2']:
        if i[0] >= args.batch_size:
            masked_edge_list += [i]
    edge_list['paper']['field']['rev_PF_in_L2'] = masked_edge_list

    masked_edge_list = []
    for i in edge_list['field']['paper']['PF_in_L2']:
        if i[1] >= args.batch_size:
            masked_edge_list += [i]
    edge_list['field']['paper']['PF_in_L2'] = masked_edge_list

I'm not sure how this will impact on the evaluation. Looking forward to your feedback on this.

The text was updated successfully, but these errors were encountered:

acbull · 2021-05-03T17:28:04Z

Hi Yihong: Thanks for pointing that out. Yes it's indeed a issue if we didn't know the other batches output. The reason I add the label node is to allow some kind of label propagation, but it definitely shouldn't allow ground-truth of other test nodes. I'll modify the masking code to remove them. A most simple way is to remove all the links to the label during the masking, and it should be fine.

…

On Mon, May 3, 2021, 13:01 Yihong Chen ***@***.***> wrote: Thanks again for this awesome repo. It helps me a lot. I've got a question regarding which time_range to use for sampling subgraphs for test. For example, in finetune_OAG_PF.py <https://github.com/acbull/GPT-GNN/blob/master/example_OAG/finetune_OAG_PF.py>, this line is used to prepare the input to GNN: node_feature, node_type, edge_time, edge_index, edge_type, x_ids, ylabel = node_classification_sample(randint(), test_pairs, test_range) where test_range is used to filter out nodes when sampling the subgraph as shown in L128 in data.py <https://github.com/acbull/GPT-GNN/blob/8ac50bf0720e5260b175f55e338c09e3a3adf729/example_OAG/GPT_GNN/data.py#L128> if source_time > np.max(list(time_range.keys())) or source_id in layer_data[source_type]: continue It looks that some test edges (which are not the prediction targets for current batch but might be the prediction targets for other batches) might be included in the sampled subgraph even after the masking process in Line 114 in finetune_OAG_PF.py <https://github.com/acbull/GPT-GNN/blob/8ac50bf0720e5260b175f55e338c09e3a3adf729/example_OAG/finetune_OAG_PF.py#L114> . ''' (3) Mask out the edge between the output target nodes (paper) with output source nodes (L2 field) ''' masked_edge_list = [] for i in edge_list['paper']['field']['rev_PF_in_L2']: if i[0] >= args.batch_size: masked_edge_list += [i] edge_list['paper']['field']['rev_PF_in_L2'] = masked_edge_list masked_edge_list = [] for i in edge_list['field']['paper']['PF_in_L2']: if i[1] >= args.batch_size: masked_edge_list += [i] edge_list['field']['paper']['PF_in_L2'] = masked_edge_list I'm not sure how this will impact on the evaluation. Looking forward to your feedback on this. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#31>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHREXR4FR5HKWMU42WQHS7LTL3JGZANCNFSM44BFASZA> .

yihong-chen · 2021-05-03T17:39:16Z

Hi @acbull, Thanks for the very quick feedback and the suggested solution :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time_range for the finetuning experiment #31

time_range for the finetuning experiment #31

yihong-chen commented May 3, 2021 •

edited

acbull commented May 3, 2021 via email

yihong-chen commented May 3, 2021

time_range for the finetuning experiment #31

time_range for the finetuning experiment #31

Comments

yihong-chen commented May 3, 2021 • edited

acbull commented May 3, 2021 via email

yihong-chen commented May 3, 2021

yihong-chen commented May 3, 2021 •

edited