Within the ChEMBL database we spend a lot of time manually curating links between FDA approved drugs and their efficacy targets. With collaborators from the University of New Mexico and the Institute of Cancer Research, we have now published an analysis of these drug efficacy targets:
Santos R, Ursu O, Gaulton A, Bento AP, Donadi RS, Bologa CG, Karlsson A, Al-Lazikani B, Hersey A, Oprea TI & Overington JP. A comprehensive map of molecular drug targets
Nature Reviews Drug Discovery (2016) doi:10.1038/nrd.2016.230
In the article we address the complexities of assigning drug targets, describe the 667 human proteins and 189 pathogen proteins through which 1,578 FDA-approved drugs act and map each drug to its therapeutic indication via the WHO ATC classification system.
We show that 70% of small molecule drugs still act through privileged families (GPCRs, ion channels, kinases and nuclear receptors), highlight the differences in innovation between different therapeutic areas, look at conservation of targets across different model organisms and demonstrate that only 5% of identified cancer driver genes are targeted by current cancer therapies.
As an aside, the drug-target data within ChEMBL is used in a number of other platforms such as Pharos (the portal for the NIH Illuminating the Druggable Genome project), Open Targets (a resource for pre-competitive target validation) and DrugCentral (a drug compendium from the University of New Mexico), all of which have papers in the 2017 Database Issue of Nucleic Acids Research, alongside ChEMBL:
This paper describes some of the additions to ChEMBL over the last few releases (ChEMBL_18 to ChEMBL_22) such as drug indications and clinical candidates, patent bioactivity data from BindingDB, drug metabolism information and richer assay annotation. A number of papers from our collaborators will also feature in the 2017 NAR database issue, so watch this space...
ChEMBL_22_1 data update: We would like to inform users that an update to ChEMBL_22 has been released.
The new version, ChEMBL_22_1, corrects an issue with the targets assigned to some BindingDB assays in ChEMBL (src_id = 37). If you are using the BindingDB data from ChEMBL, we recommend you download this update. This update also incorporates the mol file/canonical smiles correction announced previously.
Updates have been made to BindingDB data in the ASSAYS, ACTIVITIES, CHEMBL_ID_LOOKUP, LIGAND_EFF and PREDICTED_BINDING_DOMAINS tables. Corrections have also been made to molfiles and canonical_smiles in the COMPOUND_STRUCTURES table. No changes have been made to other data sets or to other drug/compound/target tables in ChEMBL_22.
As you can see, the corresponding fragment is highlighted.You can add all parameters that are present in the standard "image" endpoint so format (png or svg), engine (rdkit or indigo), ignoreCoords to recompute coordinates from scratch and dimensions to change image size.
3. Document terms (keywords)
We used pytextrank package to extract most relevant terms from all document abstracts stored in ChEMBL, along with their significance score against each document (the code we used to perform the extraction is available).
The current protocol is fairly simple (measuring overlap in compounds and targets between the two documents) and not very granular (it can be difficult to choose N most relevant documents from the 50 documents that the protocol returns). However, we are currently investigating alternative methods such as topic modelling.
The ChEMBL 22 release brings lots of new data. But we also released some new software so if you are interested in technical details please read on. 1. First of all, please note that ChEMBL 22 is the last release where we provide Oracle 9i dumps. Oracle 9i has been out of support now for at nearly a decade and shouldn't be in use anymore but please let us know if this is a problem. On the other hand, we will do our best to provide Oracle 12c dumps for the next release.
2. If you are using the python API client please upgrade it by running:
[sudo] pip install -U chembl_webresource_client
This will upgrade the client to the latest version which solves some minor bugs and adds an ability to search in document abstracts. It will also create a new cache so you will see new chembl data immediately. Otherwise, you will need to clear your cache manually.
4. Since our API is maturing we started preparing collection of embedable widgets written in JS/CSS/HTML that you can use on your website/blog/webapplication. This will be a base for our new ChEMBL website. An example widget providing some besic information about a ChEMBL compound can be found below, the code used to embed it is:
Another example is an assay co-occurance matrix for compounds extracted from a single document. Again the code to embed is: <object data="https://glados-ebitest.rhcloud.com/document_report_card/CHEMBL1151960/embed/assay_network/" width="800px" height="800px"></object>
In addition to the regular updates to the Scientific Literature, PubChem, FDA Orange Book and USP Dictionary of USAN and INN Investigational Drug Names this release of ChEMBL also includes the following new data:
We have worked with the BindingDB team to integrate the bioactivity data that they have extracted from more than 1000 granted US patents published from 2013 onwards (https://www.bindingdb.org/bind/ByPatent.jsp) into ChEMBL. This data is incorporated into ChEMBL in the same way as literature-extracted bioactivity information, but with a new source (SRC_ID = 37, BindingDB Database) and a document type of 'PATENT'. In total this data set provides 99K bioactivity measurements for 68K compounds.
We have compiled a list of drugs that have been withdrawn in one or more countries due to safety or efficacy issues from multiple sources. Where available, the year of withdrawal, the applicable countries/areas and the reasons for the withdrawal are captured. Withdrawal information is shown on the Compound Report Card and a new icon has been added to the availability type section of the Molecule Features image to denote drugs that have been withdrawn (e.g., https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL408).
Indication information has now been extended to cover clinical candidates. This information has been extracted from ClinicalTrials.gov and is included in the 'Browse Drug Indications' view and on Compound Report Cards.
Drug Metabolism Viewer:
An additional section has been added to Compound Report Cards to display drug metabolism schemes (e.g., https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL1064). These schemes can be opened in an expanded view by clicking the link above the image. Where known, enzyme information is shown on edges and clicking on an edge of interest will provide additional information about the reaction, including references. Clicking on the nodes allows linking to Compound Report Cards for the metabolites.
For cases where assay data has been measured against a variant protein (e.g., site-directed mutagenesis or drug-resistance studies) we have created a VARIANT_SEQUENCES table to store the variant protein sequence used in the assay (the target for the assay will still be the wild-type protein). Since the exact protein sequence used in an assay is rarely reported in the medicinal chemistry literature, these sequences have been re-created by introducing the specified point mutation into the current UniProt sequence for the target. The resulting sequence is not therefore guaranteed to be the exact sequence used in the assay but provides a more robust way to document the relevant mutation(s) than the current use of residue name and position in most publications and ChEMBL assay descriptions (which quickly becomes obsolete when sequences change). In cases where the reported residue positions could not be reconciled with any UniProt sequence, variant sequence information has not been included in ChEMBL. Further sequences (requiring more curation) will be added in future releases. Assays with variant sequence information available are linked to the VARIANT_SEQUENCES table via the VARIANT_ID column. Please note, this information is not yet displayed on the ChEMBL interface.
We recommend you review the ChEMBL_22 release notes for a comprehensive overview of all updates and changes in ChEMBL 22, including schema changes, and as always, we greatly appreciate the reporting of any omissions or errors.
Keep an eye on the ChEMBL twitter and blog accounts for news and updates.
We are currently seeking multiple talented individuals to join the Chemogenomics team here at EMBL-EBI, both to work on our group resources (ChEMBL, SureChEMBL) and support external projects (FP7 HeCaToS and NIH Illuminating the Druggable Genome). If you are interested in applying for these positions (or for more information) please follow the links below. The closing date for all positions is 12th June.
In case you have been too busy to notice, ChEMBL_21 has arrived with the usual additions, improvements and enhancements both on the data/annotation side, as well as on the interface/services. To complement this, we have also updated the target prediction models, which can be downloaded from our ftp here.
The good news is that, besides the increase in terms of training data (compounds and targets), the new models were built using the latest stable versions of RDKit (2015.09.2) and scikit-learn (0.17). The latter was upgraded from the much older 0.14 version, which was causing incompatibility issues (see MultiLabelBinarizer) to several of you while trying to use the models.
We've also put together a quick Jupyter Notebook demo on how to get predictions from the models here:
This scientific clickbait title introduces our promised blog post about the integration of UniChem into our ChEMBL python client. UniChem is a very important resource, as it contains information about 134 million (and counting) unique compound structures and cross references between various chemistry resources. Since UniChem is developed in-house and provides its own web services, we thought it would make sense to integrate it with our python client library. Before we present a systematic translation between raw HTTP calls described in the UniChem API documentation and client calls, let us provide some preliminary information:
In order to install the client, you should use pip:
pip install -U chembl_webresource_client
Once you have it installed, you can import the unichem module:
from chembl_webresource_client.unichem import unichem_client as unichem
OK, so how to resolve an InChI Key to InChI string? It's very simple:
Of course in order to resolve InChI Key to InChI, the client connects to the UniChem database via REST, retrieves the results and gives them to you. This is done in a very efficient way and abstracts away all network-related issues, such as setting the HTTP session, handling retries, caching results etc. From the user point of view, it looks like a standard function call, that is executed locally but in fact working internet connection is required to successfully fetch results (unless they are already cached). We've just heard that there is another InChi resolver written entirely in CSS3, click here to find out.
OK, we admit that resolving InChI key to InChI may not be the most important UniChem use case so we invite you to take a look at the Jupyter notebook providing examples of using all available methods: