Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run make_graphfeat.sh script #8

Open
VincentBt opened this issue Nov 29, 2022 · 2 comments
Open

Cannot run make_graphfeat.sh script #8

VincentBt opened this issue Nov 29, 2022 · 2 comments

Comments

@VincentBt
Copy link

VincentBt commented Nov 29, 2022

Hi @linminhtoo, thank you for the work! I'd like to reproduce the results of the paper thanks to the README file. I'm trying to generate graph features (as they are not in the google drive, contrary to your statement "We again provide them in our Drive") but I cannot execute bash scripts/retrosim/make_graphfeat.sh as it raises the following exception:

File "trainEBM.py", line 477, in main
    raise ValueError(f"Model {args.model_name} not supported!")

I've looked at the history of the sh files and the trainEBM.py file and I guess it's simply a problem in trainEBM.py not properly dealing with the case where model_name = None?

@linminhtoo
Copy link
Contributor

linminhtoo commented Dec 15, 2022

hello @VincentBt , sorry for my late reply. i've since graduated and am working full-time, so I've not been checking these repos as regularly. please feel free to message me on LinkedIn if my replies are slow.

Yes, you are right - we decided not to upload the graph feats anymore (we used to) because they take up too much space and it's easier to just generate them from scratch. I've made a PR to remove that incorrect statement in the README.

As for the generation itself, you're also right, the bash script has incorrect arguments, somehow (it definitely was working before, hahaha...).

It's been a long time since I last ran it, but the idea is the PyTorch Dataset class we've defined will always attempt to precompute (or load precomputed files from disk) whenever it's initialised, see the entire class here:

class ReactionDatasetSMILES(Dataset):
"""Dataset class for SMILES/Graph representation of reactions, should be good for both GNN and Transformer"""
(this line calls the precompute function)
self.precompute()

now, i admit it's a convoluted way of doing it (back when i was still young in college...), but the idea is to run trainEBM.py such that we reach the part where the Dataset gets initialised, which then triggers the graph feat precompute function. this should really be a separate script of its own, which I might get to refactoring some day haha

here, you can see that we will first look for precomputed files, and if they don't exist at the expected paths, then we will proceed with the precomputation:

def precompute(self):
if self.args.representation == 'graph':
# for graph, we want to cache since the pre-processing is very heavy
cache_smi = self.root / f"{self.rxn_smis_filename}.{self.args.cache_suffix}.cache_smi.pkl"
cache_mask = self.root / f"{self.rxn_smis_filename}.{self.args.cache_suffix}.cache_mask.pkl"
cache_feat = self.root / f"{self.rxn_smis_filename}.{self.args.cache_suffix}.cache_feat.npz"
cache_feat_index = self.root / f"{self.rxn_smis_filename}.{self.args.cache_suffix}.cache_feat_index.npz"
if all(os.path.exists(cache) for cache in [cache_smi, cache_mask, cache_feat, cache_feat_index]):

if we really only want to make the graphfeats, then we could set the training epochs to 0 so that no training happens. for the model name, you could provide --model_name "GraphEBM_1MPN" and provide the correct argument --representation "graph". alternatively, this also means that if you attempt to run an actual graphEBM training, the code should do the graphfeat precomputation needed to make the training happen. just make sure to give the correct paths to store the graphfeats so you won't have multiple copies of those massive files on your storage (or HPC cluster)

@linminhtoo
Copy link
Contributor

Hello @VincentBt , I wanted to check in if you're still facing any other issues with using our work? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants