[Initiative]: Annotate Ersilia's models following BioModels standards #1059

miquelduranfrigola · 2024-03-11T18:42:30Z

Summary

We have partnered with BioModels at EMBL-EBI (Hinxton) to explore potential ways to incorporate Ersilia's models into well-established BioModels resource.

Of note, BioModels model annotation is based on ontologies as reported in the Ontology Lookup Service. We expect to reach similar standards thanks to the current project.

Scope

Initiative 🐋

Objective(s)

The objectives of the project are the following:

Incorporate Ersilia's models into BioModels (metadata only).
Adopt an ontology-based model annotation procedure for Ersilia that is harmonized with that of BioModels.
Set the basis for a more ambitious incorporation of models based on ONNX format.

Team

Role & Responsibility	Username(s)
DRI / Lead Developer	@Zainab-ik
Project Manager	@miquelduranfrigola

@Zainab-ik is currently doing an internship at EBI-EMBL in the BioModels team.

Importantly, @Zainab-ik will meet with @miquelduranfrigola twice a week to report progress and decide next steps. Previous to the meeting, @Zainab-ik will update the corresponding model issues and, after the meeting, actionables will be reflected in the issues.

Timeline

The project timeline is still up for discussion. This are some tentative milestones:

Incorporate metadata only of a simple model into BioModels (i.e. antimalarial activity prediction).
Incorporate metadata only of a a more complex model into BioModels, potentially involving multiple outputs (i.e. H3D models).
Define ontology-based rules to improve Ersilia's metadata, harmonizing it with BioModels standards.
Incorporate metadata only for a substantial number of models.
Incorporate at least one model in ONNX format.

Documentation

A backlog of models can be found in the Ersilia BioModels Spreadsheet. This spreadsheet should act as a centralized resource to keep track of progress.

The shared folder in Google Drive can be accessed here.

miquelduranfrigola · 2024-03-11T18:45:04Z

Hello @Zainab-ik, as discussed, let's start by conceiving an issue template to prompt discussion about each model individually.

I suggest that we start by doing this in the current antimalarial model, then we can replicate the template to other models as we see fit. In my opinion, the template should not be too complex.

miquelduranfrigola · 2024-03-12T10:36:44Z

@Zainab-ik here are some questions in preparation with our meeting with Sheriff. Feel free to add more:

What is the minimum and maximum number of qualifiers in a model? How many are recommended?
Is there a convention for naming models? Is it the year & title of publication?
Is there a structure or guidelines for model descriptions?
Many papers have extra analysis not directly related to the model. For example, dimensionality reduction with UMAP, or clustering. Do we need to include these in the metadata?
Do you have any experience with the chemical information ontology?

Zainab-ik · 2024-03-12T10:55:20Z

I'd be working on the issue template.
Note: The Ersilia BioModel spreadsheet seems to be empty.

Zainab-ik · 2024-03-12T11:00:23Z

@Zainab-ik here are some questions in preparation with our meeting with Sheriff. Feel free to add more:

What is the minimum and maximum number of qualifiers in a model? How many are recommended?

Is there a convention for naming models? Is it the year & title of publication?

Is there a structure or guidelines for model descriptions?

Many papers have extra analysis not directly related to the model. For example, dimensionality reduction with UMAP, or clustering. Do we need to include these in the metadata?

Do you have any experience with the chemical information ontology?

For the Machine learning standard ontology, what standard should we stick to : OBCS / MCRO / STATO
In situation where there are no related ontology terms to a metadata, what's the way forward?
In metadata term differences, what standard should we stick to? Ontology term or Research paper term.
BioModel identification numbers; How to assign? This might not be so important but we can ask.
Ontology request for new terms. For example, ZairaChem e.t.c.

miquelduranfrigola · 2024-03-12T11:29:21Z

I'd be working on the issue template. Note: The Ersilia BioModel spreadsheet seems to be empty.

Yes it is empty for now. Please add the two models that we are currently working on and then we will add more.

Zainab-ik · 2024-03-13T14:23:04Z

Update
After meeting with Sheriff;

Conventions for naming model: First author, Year, a one-line description.
For example; Swanson2023 - ADMET Properties Predictions. Reference - eos7d58
It's a free text standard for model descriptions. Ersilia model description is best used.
Extra Analysis not related to the model functions should not be included in annotation. It's a high-level annotation with focus on discoverability.
Chemical information Ontology is quite interesting, suits the models more. Provides qualitative attributes to chemical entities - Cheminf
A much better standard for ML ontologies would be STATO - It encompasses quite a number of ML terms - still exploring other options
Other important ontology to consider: Bioassay Ontology, Software Ontology
A new metadata term could be added to the ontology search; in non-resolvable situation (if a metadata doesn't exist in ontology search)
Ontology search term should be the standard in case of metadata clash
A resolver - identifiers.org is used to standardize references e.g pubmed url

@miquelduranfrigola Am I missing anything?

miquelduranfrigola · 2024-03-15T13:05:46Z

Thanks @Zainab-ik - this is very useful. I don't think anything is missing.
Perhaps just mention that BAO is also an important ontology to consider.

Zainab-ik · 2024-03-18T15:45:27Z

Update!!!

Regarding Citation.
Sheriff mentioned there's an option to indicate Modeller while uploading the annotation files. The modeller incorporates the model into the Ersilia Model Hub. He mentioned he'd have a discussion with @GemmaTuron regarding this.
Mode Annotation
I've completed the first 2 annotation and I've made comparison with the initial annotation. I think ours is more detailed.
I included more model properties, and used ontologies closer to the Chemistry term.
However, there's a couple of things to be done before finalizing.
Some ontologies aren't registered with the resolver which i'm making requests for at the moment. They'd be updated after it's published in the resolver registry. We are making use of the resolver for safe referencing and to standardize the URL.
Although, not yet finalized, I've added the 2 models; eos80ch and eos7kp for review.

Zainab-ik · 2024-03-19T19:19:36Z

GitHub Issue Template

While discussing with @miquelduranfrigola, He suggested I create an issue template, open it for each models i'm annotating, link them to this main issue to keep track of the work, and finally close them after the model is uploaded to the BioModel repository.

Using the Ersilia issue template as sample, I came up with a draft and I'd like a review before incorporating into each model repository.
BioModel Incorporation Issue

I'd like to ask about the issue usage considering we'd have to open in each model repository and not the general repository?

GemmaTuron · 2024-03-20T14:41:38Z

Hi @Zainab-ik

After our meeting today, please:

go ahead an open the issues in the two models we are working on following your proposed template. We will try it out and once we are happy with it, we will upload it to all repos as a template
Add the publications of the models in the folder
Finish the model annotations for both and add any questions / comments you might have on the issues, so we can initiate a discussion

From my side, I'll prioritize some further models for annotation. And we have decided that, once we have completed the annotation of at least 10 models, we will start thinking about:

validation of the models
automatically storing biomodel annotations in Ersilia

Zainab-ik · 2024-03-20T17:05:58Z

Hi @Zainab-ik

After our meeting today, please:

go ahead an open the issues in the two models we are working on following your proposed template. We will try it out and once we are happy with it, we will upload it to all repos as a template

Add the publications of the models in the folder

Finish the model annotations for both and add any questions / comments you might have on the issues, so we can initiate a discussion

From my side, I'll prioritize some further models for annotation. And we have decided that, once we have completed the annotation of at least 10 models, we will start thinking about:

validation of the models

automatically storing biomodel annotations in Ersilia

Following the meeting.

Step 1: I've created the issues in each model repository linked here. eos80ch and eos7kbp.
Step 2: Both models publication uploaded in the folder and linked here. eos80ch and eos7kpb.

I'd work on completing the annotation, I've sorted the compact identifiers with the EBI team. I'd also try uploading one model to the BioModels with Sheriff to give a sample of what the issue template information would look like.

GemmaTuron · 2024-03-21T07:47:23Z

Hi @Zainab-ik

Thanks! This is looking good, as I stated in the model issues I suggest we have two issues, one for discussion and one we will only open once we know which data from BioModels we want to store in Ersilia as well.
If you agree, then let's go ahead and use the open issues to create those "discussion" issues around models eos80ch and eos7kbp so we can fully annotate these two and then proceed onto the next ones.
I'd say the second issue, to collect data from BioModels for storing in Ersilia, can be built once we have at least 10 models annotated and know better the kind of information we want to collect

Zainab-ik · 2024-03-21T16:53:42Z

Hi @GemmaTuron

I've created the discussion issue around eos80ch and eos7kpb.

I've completed the annotation of eos80ch and I'd like your review before uploading.
Annotation of eos7kpb should be completed before tomorrow.
I'd make changes to the uploaded file since it's not google sheet.

GemmaTuron · 2024-03-22T09:20:16Z

Thanks @Zainab-ik !
I have a few suggestions on the discussion template, let me know your thoughts

Zainab-ik · 2024-03-22T14:14:57Z

Hi @GemmaTuron

I've worked around the suggestions.
Completed the annotation for the 2 models, updated the link, and added metadata information for eos7kbp. I'm clear on the eos80ch model, and it's been uploaded. I'd share when it's available to the public, that'd be by tomorrow.

Do I go ahead and start working on the priority models in the sheet?

Also, there's an option of opening an account on BioModels to review submissions.
BioModels facilitates some ways to offer collaboration or review or access of models.

Invite your team/colleagues/contributors to open an account on BioModels and then you add them as model contributors. You can also grant write or read permission to these contributors.
Regarding review account, you can also request and open a reviewer account. Using this option is when the scientific manuscript is in the middle of the review process and reviewers ask you to allow them to look into your model. The reviewer account comes in handy in this case. Off course, this type of account only gives read-access permission.

I think 1 applies to us. I could share my submission for review.
Either @GemmaTuron or @miquelduranfrigola or both can have an account, what do you think?

miquelduranfrigola · 2024-03-25T07:23:39Z

@GemmaTuron feel free to take the lead here 👍
Thanks @Zainab-ik for a very clear update.

GemmaTuron · 2024-03-25T09:32:05Z

Hi @Zainab-ik !

Thanks, good start! Feedback from today's meeting:

Let's consolidate both models, eos80ch and eos7kbp - some fields are only present in one but apply to both (like, Blood, Machine Learning...) We will annotate these two models with the maximum depth possible, make the changes we have discussed in the meeting
Get feedback from BioModels Team on which fields are redundant and we should not add them (like Machine Learning)
Improve DOME annotation with more granularity
Start working on the three other models in the list

If you are done with all the tasks before our next meeting, I suggest you have a look at the model incorporation that is still midway, but this is less prioritary

Zainab-ik · 2024-03-25T14:11:18Z

Feedback from BioModels (Sheriff) !!

For Proprietary data, URL should be added if it's available. If not, it should be included in the metadata for transparency. For eos7kbp, I added it and annotated it with a suitable ontology since there's no URL available.
General output can be added comprehensiveness, however, specific output is preferred. I retained the general output, let me know if i should do otherwise.
Broader terms like machine learning and Artificial intelligence should be added to enhance findability. BioModels will undergo an upgrade and broader terms might not be essential in future but for now, it's important.
Infectious disease is a central theme across all Ersilia models, so it is important to annotate the models with infectious disease and it should be a standard.
Since active/inactive can be central to training data, it should be included.
For terms with synonyms; compounds/molecule. One is enough. I picked compound since it's a more suitable term.

I've incorporated all feedbacks into the two models. I believe both models are fully annotated.

Based on the feedback

The following are/would be standard metadata in all models;

Infectious diseases
Compounds
Small molecules
Active & Inactive
Machine learning & Artificial Intelligence
drug discovery
Smiles (Input)
Ersilia implementation
PubMed ID

Zainab-ik · 2024-03-25T17:29:46Z

Update!!!

DOME annotation completed and both models are up on BioModels.
eos7kpb - https://www.ebi.ac.uk/biomodels/MODEL2403270001
eos80ch - https://www.ebi.ac.uk/biomodels/MODEL2403270002

This has been linked in the respective repository.

Zainab-ik · 2024-03-28T14:40:34Z

eos46ev !!!

I opened an issue here and listed a few comments from both papers and Ersilia implementation, also listed below;

For this model, 4 ML algorithm was used in building the models. I added all to the metadata considering that the final model (deployed to the web server) is a combination of all.
Although, stated that XGBoost is the best, the final model is a fusion of 4 algorithm; Random forest, Deep Neural Network, Support Vector Machine, XGBoost.
Looking at the repository, I realised Ersilia only implemented XGBoost model, does that nullify the rest of the algorithm as an unimportant metadata?

A more detailed comments/question is in the issue here
The curation/annotation completed and can be accessed here

Zainab-ik · 2024-03-28T15:21:45Z

eos4e40 !!!

I opened an issue here, and added a comment below;

Halicin was discovered with the DNN model, as an important part of the paper, would it be an important metadata?And which category (property or output)?
Halicin has bactericidal activity against Mycobacterium tuberculosis and carbapenem-resistant Enterobacteriaceae, do they classify as biological properties of the model.

A quick question

I realized the use of term active, inactive, hit, non-hit, when describing data binarization is dependent on a paper. How do we pick a standard then? They are all mapped with ontology terms except non-hit

The curation/annotation can be accessed here

Zainab-ik · 2024-03-28T16:42:26Z

eos5xng !!!

I opened an issue here, and added a comment below;

ESKAPE pathogen inhibition is the experimental validation of the AI model, if i'm right? If yes, then those pathogens do not classify as a taxonomy in the metadata.
For the model training and prediction, both classification and regression tasks were performed. Ersilia model only performed classification and that should be the only one included in the metadata, right?
Both RMSE and MAE scores are evaluation metrics for regression tasks, if 2 is yes, then both methods would apply.

The curation/annotation completed and linked here

Zainab-ik · 2024-04-02T12:01:21Z

An open-ended Question

"How much of the model properties i.e. core model properties (e.g., packages, libraries, open source software) should be curated and annotated?"
Examples below;

XGBoost python package
Keras deep learning python package
TensorFlow
AtomPairs fingerprints e.t.c.,

GemmaTuron · 2024-04-03T08:49:38Z

Hi @Zainab-ik,

Good job, thanks for the updates, please find below some comments:

I do not understand this sentence: For Proprietary data, URL should be added if it's available. If not, it should be included in the metadata for transparency. For eos7kbp, I added it and annotated it with a suitable ontology since there's no URL available. As it is proprietary data, it will never have an available URL as the data is not shared. What do you mean you have added it?
Regarding the updated models, please do not update them on BioModels until I have revised them and given the final OK. Remember to use this excel to track progress, if the model is still "To review" means it has not yet been approved - this way we can be sure all the information in biomodels is 100% correct
Some of the links in the BioModels website seem broken, could you check that?
Le'ts consolidate the tags for all models. Can you share with me what is the list of available tags?
Are Active / Inactive properties or Outputs?

Zainab-ik · 2024-04-03T09:07:16Z

Hi @Zainab-ik,

Good job, thanks for the updates, please find below some comments:

Thank you @GemmaTuron

I do not understand this sentence: For Proprietary data, URL should be added if it's available. If not, it should be included in the metadata for transparency. For eos7kbp, I added it and annotated it with a suitable ontology since there's no URL available. As it is proprietary data, it will never have an available URL as the data is not shared. What do you mean you have added it?

For this, I added H3D Priopetary term as a metadata and just annotated with a suitable ontology and the ontology link. I didn'r necessarily mean I added the priopetary data link. Sheriff mentioned the term should be added for transparency.

Regarding the updated models, please do not update them on BioModels until I have revised them and given the final OK. Remember to use this excel to track progress, if the model is still "To review" means it has not yet been approved - this way we can be sure all the information in biomodels is 100% correct

Noted @GemmaTuron, That was uploaded as a sample to have an insight into how the overview would look and if there's any comment or any changes the Ersilia team would like. I'd appreciate a feedback on that. The upload can always be updated.

Some of the links in the BioModels website seem broken, could you check that?

I'd inform the BioModels team. Could you please specify which so I can exactly mention.

Le'ts consolidate the tags for all models. Can you share with me what is the list of available tags?

These are the lists of tags available. A new one can be proposed if that'd be more suitable for Ersilia models.

Are Active / Inactive properties or Outputs?

They are properties. More like data properties very relevant to the model.

Zainab-ik · 2024-04-23T10:39:57Z

I created a new tag in BioModels called Ersilia and that'd be attached to all models.

Antimicrobial models annotation

eos24jm - issue
eos5cl7 - issue
eos18ie - issue

Questions

Can all the drug discovery models be referred to as a QSAR model?
If an animal model is used to perform experimental validation of the model, should that be added as a biological properties of the mode i.e.,taxonomy

Zainab-ik · 2024-04-25T22:06:58Z

SARS-COV2 model annotation

eos8fth - issue
eos4cxk - issue
eos9f6t - issue

Regarding eos9f6t - The publication here is the same as eose40 but this is SARS-COV2 Inhibition. the paper is discussing antibiotics but SARS-COV2 should be antiviral. Can you clarify please.

GemmaTuron · 2024-04-26T10:41:54Z

Hi @Zainab-ik !

Good job thanks for keeping it up! I have answered your questions in the respective models and below the general ones:

I created a new tag in BioModels called Ersilia and that'd be attached to all models. - Fantastic!
Questions
Can all the drug discovery models be referred to as a QSAR model? Mmm at the moment, most of the models we have are QSAR yes, but that might not be true in the future. @miquelduranfrigola what do you say here?
If an animal model is used to perform experimental validation of the model, should that be added as a biological properties of the mode i.e.,taxonomy I don't think so, this is related to the validation but not how the dataset for the model was built.
The publication here is the same as eose40 but this is SARS-COV2 Inhibition. the paper is discussing antibiotics but SARS-COV2 should be antiviral. Can you clarify please. - The antiviral model does not have a publication per se, but they developed it in parallel with the antibiotic predictor, using the ChemProp. Since the antibiotic prediction paper is the one which describes the original ChemProp development, is the most appropriate citation

Zainab-ik · 2024-04-26T15:56:53Z

Hi @Zainab-ik !

Good job thanks for keeping it up! I have answered your questions in the respective models and below the general ones:

Thank you @GemmaTuron

I created a new tag in BioModels called Ersilia and that'd be attached to all models. - Fantastic!
Questions

Can all the drug discovery models be referred to as a QSAR model? Mmm at the moment, most of the models we have are QSAR yes, but that might not be true in the future. @miquelduranfrigola what do you say here?

That's great. That'd mean a QSAR metadata should be constant one, right. Just a thought;can a generative model classify as QSAR too?

If an animal model is used to perform experimental validation of the model, should that be added as a biological properties of the model i.e.,taxonomy I don't think so, this is related to the validation but not how the dataset for the model was built.

Okay, that's clarified. What if an experimental method (in-vivo precisely) is used to generate the dataset then, should experimental method and the in-vivo model be added as a metadata then?

The publication here is the same as eose40 but this is SARS-COV2 Inhibition. the paper is discussing antibiotics but SARS-COV2 should be antiviral. Can you clarify please. - The antiviral model does not have a publication per se, but they developed it in parallel with the antibiotic predictor, using the ChemProp. Since the antibiotic prediction paper is the one which describes the original ChemProp development, is the most appropriate citation

The metadata would be the same except for the organism and output and adding an antiviral metadata to it.

Zainab-ik · 2024-05-01T12:52:30Z

SARS-COV2 model annotation

eos8fth - issue

eos4cxk - issue

eos9f6t - issue

Regarding eos9f6t - The publication here is the same as eose40 but this is SARS-COV2 Inhibition. the paper is discussing antibiotics but SARS-COV2 should be antiviral. Can you clarify please.

@GemmaTuron All models ready for review.

Zainab-ik · 2024-05-07T11:01:50Z

Hi @GemmaTuron

A few clarifications from the meeting;

Experimental method emerges from both data generation and model validation. How to represent in the annotation and curation should be

If it's data generation - A dome annotation identifying data source and using metadata like in-vivo or in-vitro. e,g.,
in-vivo model - data source
in-vitro model - data source
if it's model validation - A dome annotation identifying evaluation e.g.,
in-vivo model - model validation
Does this best describe the experimentation part of the model?

Organism without taxonomy; properties, right?
Model validation data source aren't essential part of the model and shouldn't be a metadata.
All models are QSAR at this moment and should be a constant metadata.
Removal of not-so important metadata e.g., hits
Evaluation metrics not used shouldn't be added
Hackathon schedule.

GemmaTuron · 2024-05-07T11:04:26Z

Hi @Zainab-ik !
I have reviewed the models, please amend them and then upload to BioModels. A few general comments from our meeting:

There are general fields that do not add information. Please revise all the models and let's agree on which fields we do not want to add information (Hit, Compound Identification...) Please list them here so we know we won't be using them
Like wise there are general fields that should be everywhere like QSAR
The in vitro model and in vivo model should only refer to the model validation in the laboratory, please make sure to annotate the models accordingly and use the DOME to specify
The libraries used for model validation should not be listed as data sources

After redoing the current models to review, let's get back to the old ones before we move onto the new ones. Feel free to reopen the issues and note the changes that should be made

Zainab-ik · 2024-05-07T11:09:50Z

A clarification regarding the in-vivo and in-vitro, if it's used for data generation, it's not to be added, right @GemmaTuron

GemmaTuron · 2024-05-07T11:19:47Z

A clarification regarding the in-vivo and in-vitro, if it's used for data generation, it's not to be added, right @GemmaTuron

exactly, all data has been eventually generated experimentally, so it is not that relevant to collect this information

Zainab-ik · 2024-05-08T10:43:56Z

General fields that do not add information;

Hit
Molecular representation
chemical libraries
I think the MACCS key is similar to RDKit and shouldn't be added also.

GemmaTuron · 2024-05-08T10:49:00Z

Hi @Zainab-ik

I agree with most of them but MACCS keys are a different type of descriptor. IF the model is using RDKIT descriptors we should annotate that, if it is using MACCS we should annotate it and maybe we should think if we want to annotate all the different descriptors used

Zainab-ik · 2024-05-08T11:09:05Z

That's right. The only challenge is MACCS and RDKIT are the only descriptors present in OLS that can be annotated.

Zainab-ik · 2024-05-09T08:47:36Z

New Models

eos4zfy - issue
eos6hy3 - issue
This publication is the same as eos4cxk, and the same rules that applies to eos4e40 and eos9f6t can apply here, right?
eos42ez - issue
This publication is the same as eos18ie, and same rule applies.
eos31ve - issue
This is also same publication as eos9yy1

Zainab-ik · 2024-05-09T09:18:57Z

Antimicrobial and COVID models uploaded to BioModels

eos3804 - https://www.ebi.ac.uk/biomodels/MODEL2405080001
eos18ie - https://www.ebi.ac.uk/biomodels/MODEL2405080002
eos5cl7 - https://www.ebi.ac.uk/biomodels/MODEL2405080003
eos24jm - https://www.ebi.ac.uk/biomodels/MODEL2405080004
eos4cxk - https://www.ebi.ac.uk/biomodels/MODEL2405080005
eos9f6t - https://www.ebi.ac.uk/biomodels/MODEL2405080006

GemmaTuron · 2024-05-09T14:35:24Z

Hey @Zainab-ik

Before starting with new models, can you have a look at the existing ones and make sure they all comply with the latest decisions we have made? Note down here any changes that had to be made in the annotations.

thanks!

Zainab-ik · 2024-05-09T14:54:33Z

Hey @Zainab-ik

Before starting with new models, can you have a look at the existing ones and make sure they all comply with the latest decisions we have made? Note down here any changes that had to be made in the annotations.

thanks!

Yes, working on that.

Zainab-ik · 2024-05-10T09:25:35Z

Previous Model review
Summary - Removed general metadata, and confirmed experimental validation

eos46ev - removed unnecessary metadata e.g., molecular representation, confirmed there's no experimental validation
eos5xng - edited the metadata. Removed; hit selection, chemical library, compound, validation dataset, in-silico approach (it's also a general term). Added in-vitro experimental validation
eos4e40 - Model was validated experimentally in-vivo and in-vitro, both metadata added, QSAR added, data source confirmed, non-specific metadata removed e.g., chemical library, molecular representation.
NCATS CYP Models; eos44zp, eos5jz9, eos7nno, eos3ev6 .
Metadata removed; chemical library, hit, molecular representation.
NCATS Permeability Models; eos81ew, eos9tyg .
Metadata removed; molecular representation, Permeability assay (there's already a PAMPA metadata).
NCATS Stability Models; eos5505, eos9yy1.
Metadata removed; insilico model, molecular representation, chemical library, CYP metabolism (doesn't fit the context of the model), compound stability.
NCATS Solubility model; eos74bo.
Metadata revised; organic molecule, hit

Zainab-ik · 2024-05-12T19:44:00Z

Regarding the first 2 models; eos7kpb, eos80ch

eos7kpb ;
Physicochemical Assays
Clearance
Solubility assay
cytotoxicity
Aqueous solubility
permeability assay
Microsomal metabolic stability
These metadata aren't integral to the Zairachem model, I want to run by you first.
eos80ch ;
Removed the following metadata; compound screening, phenotype, molecular representation, molecular representation, parasites, phenotype.

GemmaTuron · 2024-05-13T09:35:45Z

Hi @Zainab-ik

Good on the corrections, as we discussed let's leave all the biological endpoints on eos7kpb

Zainab-ik · 2024-05-14T11:31:50Z

Update:
eos4zfy ready for review.

BioModels Upload;

All revised model have been re-uploaded
New model upload

eos6hy3 - https://www.ebi.ac.uk/biomodels/MODEL2405130001
eos42ez - https://www.ebi.ac.uk/biomodels/MODEL2405130002
eos31ve - https://www.ebi.ac.uk/biomodels/MODEL2405130005

To-do's

Create a google form for the upcoming hackathon - https://forms.gle/4xadZBjvP2SfgY1b9
Design a flier - here
Prepare a slide deck

Zainab-ik · 2024-05-14T13:52:23Z

Automating Metadata Annotation using Zooma
This process involves mapping the right ontology to the metadata automatically to speed up annotation process
For this process, I'd be starting with these two models

Steps;

Extract relevant metadata manually
Copy the metadata on Zooma to Annotate
Compare annotation accuracy with manual annotation.

Comments/Observation

Biological component mapping for organism has high accuracy
Biological component mapping for property is average
Computational component mapping for property is low.

Zainab-ik · 2024-05-17T09:17:22Z

Coloring molecules model annotation

eos6ao8 - issue
eos1af5 - issue
eos43at - issue

All ready for review.

More permeability model annotation

eos97yu - issue
eos2hbd - issue
Ready for review.

Zainab-ik · 2024-05-21T11:08:06Z

Models uploaded to BioModels

eos4zfy - https://www.ebi.ac.uk/biomodels/MODEL2405210002
eos96ia - https://www.ebi.ac.uk/biomodels/MODEL2405210003
eos8d8a - https://www.ebi.ac.uk/biomodels/MODEL2405210004
eos6ao8 - https://www.ebi.ac.uk/biomodels/MODEL2405210005
eos1af5 - https://www.ebi.ac.uk/biomodels/MODEL2405210006
eos43at - https://www.ebi.ac.uk/biomodels/MODEL2405210007

Zainab-ik · 2024-05-21T11:31:34Z

New model Annotation - In Progress

eos2lqb - issue
eos6oli - issue
eos7d58 - issue
eos8lok - issue

Note: I've been working with a lot of regression model recently which is quite exciting. One of the evaluating metrics is root-mean-square error (RMSE), which I believe is also known as RMSD while reading. On OLS, RMSE doesn't exists but RMSD does, and i've been using that in my annotation.

GemmaTuron · 2024-05-23T10:16:58Z

Hi @Zainab-ik !

I'm having a look at the models you are annotating, let me know when the excel files are ready - RMSE and RMSD are the same ;)

Zainab-ik · 2024-05-23T10:24:55Z

Hi @Zainab-ik !

I'm having a look at the models you are annotating, let me know when the excel files are ready - RMSE and RMSD are the same ;)

Alright, Thanks @GemmaTuron

Zainab-ik · 2024-05-27T21:29:35Z

New model Annotation - In Progress

eos2lqb - issue eos6oli - issue eos7d58 - issue eos8lok - issue

Note: I've been working with a lot of regression model recently which is quite exciting. One of the evaluating metrics is root-mean-square error (RMSE), which I believe is also known as RMSD while reading. On OLS, RMSE doesn't exists but RMSD does, and i've been using that in my annotation.

Hi @GemmaTuron
All models ready for review except eos7d58. It has a broad output and I'd like to comfirm if all the output are incorporated into the Ersilia version.

Zainab-ik · 2024-05-29T10:27:49Z

Grover Models

is Grover a framework/code base like Chemprop that's fine-tuned and trained on different datasets for different outputs?
What's a labelled and unlabelled molecular data?
What's the difference between pre-training and training an ML/DL model?
There's no clear mention of how the models were evaluated except for comparism with other models based on the mean and standard deviation. There's also a mention of % relative improvement - can that be classified as accuracy?. Are these regarded as the model evaluation metrics. (In the author-feedback section, AUC-ROC was mentioned as the metric for comparism) - This is the metric for Grover
How's the fine-tuning task evaluated? Let's say, Grover was trained on predicting Water solubility as is the case for grover-esol - eos8451, how's the model performance evaluated to be good or not? - In the supplementary file, ROC-AUC is the metric for the classification tasks while RMSE is the metric for Physical chemistry regression tasks while MAE is the metric for Quantum mechanics regression tasks. (it feels like i'm answering myself 🙂).
Can you kindly clarify validation loss and training loss.
Thanks.

General comments about the Grover model

The metadata is determined by what task the Grover model is fine-tuned on.
Grover was leveraged for Molecular property prediction task and task -specific fine tuning. We'd be annotating for the task-specific fine-tuning taking note of the specific dataset, the type of task (classification/regression), and predictions.
In the context of data-splitting for fine-tuning, active and inactive suits...

Zainab-ik · 2024-05-30T18:33:51Z

eos7w6n - This is the base model (GROVER) that was fine-tuned for task-specific dataset.

Grover Models - Annotation in Progress (Metadata extraction and curation done)

eos3xip - issue
eos6o0z - issue
eos85a3 - issue
eos8451 - issue
eos157v - issue
eos481p - issue
eos2mhp - issue
eos6fza - issue
eos5smc - issue
eos7w6n - issue
eos77w8 - issue
eos1amr - issue

Zainab-ik · 2024-05-31T15:44:29Z

eos7w6n - This is the base model (GROVER) that was fine-tuned for task-specific dataset.

Grover Models - Annotation in Progress (Metadata extraction and curation done)

eos3xip - issue

eos6o0z - issue

eos85a3 - issue

eos8451 - issue

eos157v - issue

eos481p - issue

eos2mhp - issue

eos6fza - issue

eos5smc - issue

eos7w6n - issue

eos77w8 - issue

eos1amr - issue

All models ready for review.

miquelduranfrigola added the documentation Improvements or additions to documentation label Mar 11, 2024

miquelduranfrigola assigned GemmaTuron and miquelduranfrigola Mar 11, 2024

miquelduranfrigola unassigned GemmaTuron Mar 11, 2024

miquelduranfrigola assigned Zainab-ik Mar 12, 2024

[Initiative]: Annotate Ersilia's models following BioModels standards #1059

[Initiative]: Annotate Ersilia's models following BioModels standards #1059

Comments

miquelduranfrigola commented Mar 11, 2024 • edited by GemmaTuron

Summary

Scope

Objective(s)

Team

Timeline

Documentation

miquelduranfrigola commented Mar 11, 2024

miquelduranfrigola commented Mar 12, 2024

Zainab-ik commented Mar 12, 2024

Zainab-ik commented Mar 12, 2024 • edited

miquelduranfrigola commented Mar 12, 2024

Zainab-ik commented Mar 13, 2024 • edited

miquelduranfrigola commented Mar 15, 2024

Zainab-ik commented Mar 18, 2024

Zainab-ik commented Mar 19, 2024

GemmaTuron commented Mar 20, 2024

Zainab-ik commented Mar 20, 2024 • edited

GemmaTuron commented Mar 21, 2024

Zainab-ik commented Mar 21, 2024

GemmaTuron commented Mar 22, 2024

Zainab-ik commented Mar 22, 2024 • edited

miquelduranfrigola commented Mar 25, 2024

GemmaTuron commented Mar 25, 2024

Zainab-ik commented Mar 25, 2024 • edited

Based on the feedback

Zainab-ik commented Mar 25, 2024 • edited

Zainab-ik commented Mar 28, 2024 • edited

Zainab-ik commented Mar 28, 2024 • edited

A quick question

Zainab-ik commented Mar 28, 2024 • edited

Zainab-ik commented Apr 2, 2024

GemmaTuron commented Apr 3, 2024

Zainab-ik commented Apr 3, 2024

Zainab-ik commented Apr 23, 2024 • edited

Zainab-ik commented Apr 25, 2024

GemmaTuron commented Apr 26, 2024 • edited

Zainab-ik commented Apr 26, 2024

Zainab-ik commented May 1, 2024

Zainab-ik commented May 7, 2024 • edited

GemmaTuron commented May 7, 2024

Zainab-ik commented May 7, 2024

GemmaTuron commented May 7, 2024

Zainab-ik commented May 8, 2024

GemmaTuron commented May 8, 2024

Zainab-ik commented May 8, 2024

Zainab-ik commented May 9, 2024 • edited

Zainab-ik commented May 9, 2024

GemmaTuron commented May 9, 2024

Zainab-ik commented May 9, 2024

Zainab-ik commented May 10, 2024 • edited

Zainab-ik commented May 12, 2024 • edited

GemmaTuron commented May 13, 2024

Zainab-ik commented May 14, 2024 • edited

Zainab-ik commented May 14, 2024 • edited

Zainab-ik commented May 17, 2024 • edited

Zainab-ik commented May 21, 2024

Zainab-ik commented May 21, 2024 • edited

GemmaTuron commented May 23, 2024 • edited

Zainab-ik commented May 23, 2024

Zainab-ik commented May 27, 2024

Zainab-ik commented May 29, 2024 • edited

Zainab-ik commented May 30, 2024 • edited

Zainab-ik commented May 31, 2024

miquelduranfrigola commented Mar 11, 2024 •

edited by GemmaTuron

Zainab-ik commented Mar 12, 2024 •

edited

Zainab-ik commented Mar 13, 2024 •

edited

Zainab-ik commented Mar 20, 2024 •

edited

Zainab-ik commented Mar 22, 2024 •

edited

Zainab-ik commented Mar 25, 2024 •

edited

Zainab-ik commented Mar 25, 2024 •

edited

Zainab-ik commented Mar 28, 2024 •

edited

Zainab-ik commented Mar 28, 2024 •

edited

Zainab-ik commented Mar 28, 2024 •

edited

Zainab-ik commented Apr 23, 2024 •

edited

GemmaTuron commented Apr 26, 2024 •

edited

Zainab-ik commented May 7, 2024 •

edited

Zainab-ik commented May 9, 2024 •

edited

Zainab-ik commented May 10, 2024 •

edited

Zainab-ik commented May 12, 2024 •

edited

Zainab-ik commented May 14, 2024 •

edited

Zainab-ik commented May 14, 2024 •

edited

Zainab-ik commented May 17, 2024 •

edited

Zainab-ik commented May 21, 2024 •

edited

GemmaTuron commented May 23, 2024 •

edited

Zainab-ik commented May 29, 2024 •

edited

Zainab-ik commented May 30, 2024 •

edited