-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time_range for the finetuning experiment #31
Comments
Hi Yihong:
Thanks for pointing that out. Yes it's indeed a issue if we didn't know the
other batches output.
The reason I add the label node is to allow some kind of label propagation,
but it definitely shouldn't allow ground-truth of other test nodes. I'll
modify the masking code to remove them. A most simple way is to remove all
the links to the label during the masking, and it should be fine.
…On Mon, May 3, 2021, 13:01 Yihong Chen ***@***.***> wrote:
Thanks again for this awesome repo. It helps me a lot. I've got a question
regarding which time_range to use for sampling subgraphs for test. For
example, in finetune_OAG_PF.py
<https://github.com/acbull/GPT-GNN/blob/master/example_OAG/finetune_OAG_PF.py>,
this line is used to prepare the input to GNN:
node_feature, node_type, edge_time, edge_index, edge_type, x_ids, ylabel = node_classification_sample(randint(), test_pairs, test_range)
where test_range is used to filter out nodes when sampling the subgraph
as shown in L128 in data.py
<https://github.com/acbull/GPT-GNN/blob/8ac50bf0720e5260b175f55e338c09e3a3adf729/example_OAG/GPT_GNN/data.py#L128>
if source_time > np.max(list(time_range.keys())) or source_id in layer_data[source_type]:
continue
It looks that some test edges (which are not the prediction targets for
current batch but might be the prediction targets for other batches) might
be included in the sampled subgraph even after the masking process in Line
114 in finetune_OAG_PF.py
<https://github.com/acbull/GPT-GNN/blob/8ac50bf0720e5260b175f55e338c09e3a3adf729/example_OAG/finetune_OAG_PF.py#L114>
.
'''
(3) Mask out the edge between the output target nodes (paper) with output source nodes (L2 field)
'''
masked_edge_list = []
for i in edge_list['paper']['field']['rev_PF_in_L2']:
if i[0] >= args.batch_size:
masked_edge_list += [i]
edge_list['paper']['field']['rev_PF_in_L2'] = masked_edge_list
masked_edge_list = []
for i in edge_list['field']['paper']['PF_in_L2']:
if i[1] >= args.batch_size:
masked_edge_list += [i]
edge_list['field']['paper']['PF_in_L2'] = masked_edge_list
I'm not sure how this will impact on the evaluation. Looking forward to
your feedback on this.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#31>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHREXR4FR5HKWMU42WQHS7LTL3JGZANCNFSM44BFASZA>
.
|
Hi @acbull, Thanks for the very quick feedback and the suggested solution :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks again for this awesome repo. It helps me a lot. I've got a question regarding which time_range to use for sampling subgraphs for test. For example, in finetune_OAG_PF.py, this line is used to prepare the input to GNN:
where
test_range
is used to filter out nodes when sampling the subgraph as shown in L128 in data.pyIt looks that some test edges (which are not the prediction targets for current batch but might be the prediction targets for other batches) might be included in the sampled subgraph even after the masking process in Line 114 in finetune_OAG_PF.py.
I'm not sure how this will impact on the evaluation. Looking forward to your feedback on this.
The text was updated successfully, but these errors were encountered: