BRENDA content collaboration #2

Midnighter · 2018-02-20T10:40:32Z

Hi,

I'm currently working on upgrading my parser for the BRENDA flat file download. I've implemented a few SQLAlchemy models that seemed fitting for the content. Is there any interest on your side in the content of BRENDA?

jonrkarr · 2018-11-23T20:42:11Z

Rik van Rosmalen has also written a BRENDA parser
https://gitlab.com/wurssb/brenda-parser

Currently it dumps all of Brenda either in a SQLLite DB or a JSON file

One of the main issues right now is that BRENDA's download does not include a metabolite reference table or any cross-references. However, UniChem does cross-reference metabolites to BRENDA via InChi, and has all their data open. This could make integration possible.

jonrkarr · 2020-04-22T23:39:15Z

@Midnighter, we're finally starting to work on BRENDA. We're trying to determine if BRENDA contains a record of the reaction associated with each K_cat and K_m (which SABIO-RK clearly displays). Neither the website or the text file shows this information, but the BRENDA output seems to contain this information. I suspect that the SBML output contains inferred kinetic parameters, rather than directly measured kinetic constants. Do you know what information is encoded in the SBML output?

Any code we write will be shared via this repo.

We tried to use Rik's code. Unfortunately, it appears to be out of date with respect to the current format of the BRENDA text file.

Midnighter · 2020-04-23T08:58:41Z

We're trying to determine if BRENDA contains a record of the reaction associated with each K_cat and K_m

I don't fully understand what you want to achieve. Given a specific Kcat or Km value, you want to list all reactions (by EC-code) that have this value? This should be possible with a SQL query, however, there are many reactions in BRENDA that specify Kcat and Km as ranges rather than fixed values. The same EC-code can also have different Kcat and Km values in different organisms, of course.

I still haven't finished my BRENDA work as it was not high priority to me. I do have a branch that uses pyparsing to go over the flat file and it's quite promising. I can try to deliver a working version by the end of May.

jonrkarr · 2020-04-23T13:31:02Z

SABIO-RK contains information about the exact reaction associated with each measured kinetic parameter. In addition, SABIO-RK often presents pairs of kinetic parameters that were measured together (e.g., paired k_cat, K_m).

In contrast, the BRENDA website, text file, and SOAP interface present coarser information. This is why we have preferred to work with SABIO-RK, even though SABIO-RK is also difficult to scrape. The BRENDA website only displays the EC number associated with each kinetic measurement, and the website doesn't present pairs of parameters.

It appears that BRENDA annotates reactions more coarsely than SABIO-RK. However, BRENDA's SBML output suggests that the underlying BRENDA database might have finer-grained reaction information than what is presented in the BRENDA website, text file, and SOAP interface. We haven't found any documentation about the SBML output. We're trying to understand what those files means, and if this is a way to pull more information out of BRENDA than what is provided in the text file.

jonrkarr · 2020-04-23T13:37:22Z

SABIO-RK contains information about the exact reaction associated with each measured kinetic parameter. In addition, SABIO-RK often presents pairs of kinetic parameters that were measured together (e.g., paired k_cat, K_m).

In contrast, the BRENDA website, text file, and SOAP interface present coarser information. This is why we have preferred to work with SABIO-RK, even though SABIO-RK is also difficult to scrape. The BRENDA website only displays the EC number associated with each kinetic measurement, and the website doesn't present pairs of parameters.

It appears that BRENDA annotates reactions more coarsely than SABIO-RK. However, BRENDA's SBML output suggests that the underlying BRENDA database might have finer-grained reaction information than what is presented in the BRENDA website, text file, and SOAP interface. We haven't found any documentation about the SBML output. We're trying to understand what those files mean, and if they are a way to pull more information out of BRENDA than what is provided in the text file.

jonrkarr · 2020-04-23T13:54:49Z

SABIO-RK contains information about the exact reaction associated with each measured kinetic parameter. In addition, SABIO-RK often presents pairs of kinetic parameters that were measured together (e.g., paired k_cat, K_m).

In contrast, the BRENDA website, text file, and SOAP interface present coarser information. This is why we have preferred to work with SABIO-RK, even though SABIO-RK is also difficult to scrape. The BRENDA website only displays the EC number associated with each kinetic measurement, and the website doesn't present pairs of parameters.

It appears that BRENDA annotates reactions more coarsely than SABIO-RK. However, BRENDA's SBML output suggests that the underlying BRENDA database might have finer-grained reaction information than what is presented in the BRENDA website, text file, and SOAP interface. We haven't found any documentation about the SBML output. We're trying to understand what those files mean, and if they are a way to pull more information out of BRENDA than what is provided in the text file.

Midnighter · 2020-04-23T14:03:57Z

I have not found a way to reliably scrape all SBML output files from BRENDA as this required paid access previously, I think. It would be preferable, though, of course, to the terrible test format.

With regard to the information that you are looking for: BRENDA gives entries for the K_cat value divided by the K_m value, for example,

KKM	#2# 314 (#2# recombinant isozyme, pH 7.5, 30°C <45>) <45>

So one could look at the matching K_m value (by protein and citation), in this case

KM	#2# 0.165 {GMP}  (#2# recombinant isozyme, pH 7.5, 30°C <45>) <45>

FYI, this is for EC-code 2.7.4.8 and this specific entry is for

PR	#2# Bacillus subtilis   <45>

So that would give you what you are looking for?

jonrkarr · 2020-04-23T14:16:30Z

Basically, we're trying to infer the link between the SP entries and the TN, KM, and KKM entries.

I don't think the BRENDA text files provide enough information to reconstruct this.

Each PR entry can be associated with multiple SP entries
Each PR entry can have multiple associated KM, TN, and KKM entries, far more than the number of substrates of products of a single reaction.
Each RF , can be associated with many PR, KM, TN and KKM entries

This is what motivated us to look at the other BRENDA outputs, to try to extract this mapping out of BRENDA.

jonrkarr · 2020-04-23T14:21:14Z

I'll contact BRENDA to ask them about the SBML output. I can share what I learn.

Midnighter · 2020-04-23T14:22:36Z

It would be super nice to just get a database dump rather than having to jump through so many hoops.

jonrkarr · 2020-04-23T14:28:50Z

I'm looking to understand if the text file lacks relationships between KM and TN entries that the underlying database captures, and if these relationships are captured, I'd like to obtain this information.

A database dump would be nice. Any format with this relational information would be an improvement.

Midnighter · 2020-04-23T14:37:14Z

I still think it's possible to tell these apart, however, if you look at the comment in each entry.

TN	#2# 52 {GMP}  (#2# recombinant isozyme, pH 7.5, 30°C <45>) <45>

There is only one entry in each section that has the same protein reference #2#, comment (...) and literature reference <45>.

I'm not sure what you gain from the SP entry. The substrate is already provided in the KM and TN entries.

So if you start with KM or TN entries you should be able to identify all the information that you need?

I've only looked at a few examples, though, so I'm easily proven wrong. Also, it'd be painful to parse the information in this way so something structured is definitely preferable 👍

jonrkarr · 2020-04-23T14:52:15Z

It shouldn't be this hard.

Inferring the reaction associated with each `KM`, `TN` entry from the substrate information

The substrate of each KM or TN entry doesn't contain information about the entire reaction. The reaction can't be inferred from the substrate because the metabolite can participate in multiple reactions.

For example, you can't infer the reaction associated with this TN

TN      #114# 1646 {NADH}  (#114# cosubstrate acetaldehyde, pH 8.0, 60°C <215>)
        <215>

because multiple SP entries involve NADH

SP      #96# hexaldehyde + NADH + H+ = 1-hexanol + NAD+ (#96# 7% activity
        compared to benzyl alcohol <156>) <156>
SP      #96# hydrocinnamaldehyde + NADH + H+ = hydrocinnamyl alcohol + NAD+
        (#96# 12% activity compared to benzyl alcohol <156>) {r} <156>
SP      #96# nonyl aldehyde + NADH + H+ = 1-nonanol + NAD+ (#96# 25% activity
        compared to benzyl alcohol <156>) <156>
SP      #96# octyl aldehyde + NADH + H+ = 1-octanol + NAD+ (#96# 29% activity
        compared to benzyl alcohol <156>) <156>

Inferring pairs of `KM`, `TN`, `KKM`, `SP` from unique tuples of substrates, comments, and references

This is an interesting idea. This might work for inferring relationships between KM and TN entries. I don't think this will work for inferring relationships between KKM and other entries because they don't include substrates. The SP entries don't appear to have the same comments as KM and TN entries.

Example from 1.1.1.1:

KKM	#115# 3.6 (#115# cosubstrate NADP+, pH 8.0, 60°C <215>) <215>
KKM	#115# 67.2 (#115# cosubstrate NADP+, pH 8.0, 60°C <215>) <215>

Midnighter · 2020-04-23T15:00:20Z

Okay, that's a clear counter example. Let's see if you get a reply from BRENDA. I tried once some years back and never got an answer. I was probably not persistent enough.

The way that the textual data is structured I would definitely manually check a number of example to see if the associations presented by BRENDA are correct...

jonrkarr · 2020-04-23T15:02:20Z

FYI, I think the SBML output would also be difficult to use. It times out easily. You'd have to figure out how to make the queries small enough not to time out. One possibility is to iterate of each EC and each organism.

for ec_code in ec_codes:
    for organism in organisms:
        get-sbml(ec_code, organism)

jonrkarr · 2020-04-23T15:45:08Z

Also the SBML output is missing some of the information from the HTML preview of the SBML

No enzyme info (UniProt id)
No comments
No references

The SMBL does give insight into how to parse temperature and pH from the comments:

r'(^|,[ \n])(\d+(\.\d+)?)°C(,[ \n]|$)'
r'(^|,[ \n])pH[ \n](\d+(\.\d+)?)(,[ \n]|$)'

jonrkarr · 2020-04-23T18:49:49Z

I'm looking into your suggestion about matching tuples of protein ids, comments, and references. This might work for pairing k_cats with K_ms, but I don't think this works for inferring the reaction associated with each k_cat/K_m. It doesn't look like these relationships have been encoded into the text file. While you can find pairs of entries with overlapping protein ids, substrates, comments, and references, it appears to be difficult to unambiguously resolve relationships. I think trying to infer relationships is likely to infer false relationships that are not present in the underlying database. At least for our purposes, we're hesitant to add additional interpretation on top of the BRENDA data.

In spite of these problems, I think BRENDA is doing exactly what you've suggested to build the SBML output. However, I think this is difficult to replicate because we don't know the details how BRENDA is encoded into the text file.

jonrkarr · 2020-04-27T17:00:34Z

I got a response from the BRENDA team:

Recently, they have begun to track the specific reaction associated with each KM and TN. However, I don't think we have a way to access this information, or to discern which entries have this metadata.
For the the oldest curated entries (entries curated > 15 years ago), there is no way to discern the reaction associated with KM and TN because these entries don't have sufficient metadata to attempt to infer the associated reaction. The BRENDA team is slowly filling in this missing metadata.
For most entries, the organism, comments, and references can potentially be used to infer the specific reaction associated with each KM and TN. However, there's no way avoid inferring false relationships.
We don't have any timestamps that we can use to discern when an entry was curated.

For Datanator, we're hesitant to infer false relationships. We want Datanator to be as free of interpretation as possible so that our downstream projects have as much control over the representational of experimental data as possible.

Midnighter · 2020-04-27T17:11:08Z

Thanks for the input. Any word on accessing all SBML or other structured data set?

jonrkarr · 2020-04-27T17:44:45Z

The BRENDA team didn't respond to my question about the SMBL output. I suspect the reactions in the SBML output are inferred from common enzymes, comments, and references. I think the temperature and pH are also inferred by similar string pattern matching of the comments.

There's no other more structured output available. In any case, this wouldn't have the missing relationships because they have never been recorded.

If you're looking for a more structured dataset, I recommend SABIO-RK.

jonrkarr added the enhancement label Mar 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BRENDA content collaboration #2

BRENDA content collaboration #2

Midnighter commented Feb 20, 2018

jonrkarr commented Nov 23, 2018

jonrkarr commented Apr 22, 2020

Midnighter commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

Midnighter commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

Midnighter commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

Midnighter commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

Midnighter commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 27, 2020

Midnighter commented Apr 27, 2020

jonrkarr commented Apr 27, 2020

BRENDA content collaboration #2

BRENDA content collaboration #2

Comments

Midnighter commented Feb 20, 2018

jonrkarr commented Nov 23, 2018

jonrkarr commented Apr 22, 2020

Midnighter commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

Midnighter commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

Midnighter commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

Midnighter commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

Inferring the reaction associated with each KM, TN entry from the substrate information

Inferring pairs of KM, TN, KKM, SP from unique tuples of substrates, comments, and references

Midnighter commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 23, 2020

jonrkarr commented Apr 27, 2020

Midnighter commented Apr 27, 2020

jonrkarr commented Apr 27, 2020

Inferring the reaction associated with each `KM`, `TN` entry from the substrate information

Inferring pairs of `KM`, `TN`, `KKM`, `SP` from unique tuples of substrates, comments, and references