Home » Chemical Space is Big, Really Big.

Chemical Space is Big, Really Big.

by Darryl B McConnell
5 minutes read

Finding a hit for a new drug target can be one of the biggest challenges for a drug discovery program. This is especially the case for drug targets dissimilar to those that have come before. The main reason for this is that chemical space is big, really big.

how big is big?

Thanks to work by Blum and Reymond, Ruddigkeit and Hall we can now enumerate molecules from mathematical graphs with geometrical strain and functional group stability criteria in order to generate non-biased, chemically meaningful molecules. The GDB-17 is a database of all such molecules up to 17 heavy atoms (C, N, O, S, and halogens) for which there are estimated to be 166.4 billion molecules. This built on the previous work on the GBD-11 and GDB-13 databases. This gives us an estimate for the size of chemical space up to 17 heavy atoms. The average increase in chemical space per heavy atom in the GDB-17 is 5-fold which is used here to estimate the size of chemical space beyond 17 heavy atoms. It should be noted that these numbers do not include stereoisomers. For reference, Osimertinib, the EGFR kinase inhibitor, has a molecular weight of almost exactly 500 Da (one of Lipinski´s favourite numbers) of which 37 are heavy atoms (molecular formular C₂₈H₃₃NO). There are a mindboggling 1.6×1025 compounds that have the same number of heavy atoms or less than Osimertinib.

The Cumulative Size of Chemical Space per Heavy Atom

From the GDB-17 derived cumulative chemical space size numbers above we can take the expected number of the chemically meaningful molecules for each heavy atom count (see figure below). For example, there are almost 1 billion other compounds in addition to aspirin that have a heavy atom count of 13 (837 million as estimated by GDB-17). The chemical space sky-rockets as one moves to more contemporary drug sizes. For example the chemical space size in Osimertinib’s weight class of 37 heavy atoms is 1.3×1025.  Molecules with the impressive size of Everolimus’ 68 heavy atoms, which includes PROTACs, share this chemical space with an astronomical 6×1046 molecules. Chemical space is clearly vastly, hugely, mindbogglingly big.

The Chemical Space Size for each Heavy Atom Count

needles in hackstacks

Finding a hit for a given target has been likened to finding a needle in a hackstack. Four key factors play a role in the chance of finding a hit for a new drug target of interest (assuming at least one small molecule binding pocket is present).

  1. the coverage of chemical space of the compound library used for screening. 
  2. the similarity of the binding pocket on the new target to pockets on historical drug targets.
  3. the sensitivity (and signal to noise) of the assay technology used to screen 
  4. the ability to robustly confirm that a given hit binds to the intented target in a dose dependent manner

It is point number 1, the coverage of chemical space which will be discussed further in this post.

chemical space coverage

If one could screen all possible compounds against all proteins it is likely that drug molecules would be found for all druggable targets. In the absence of a compound screening library containing all possible compounds, the main goal for general compound libraries is to maximize the coverage of chemical space. Chemical space coverage is the ratio of the total number of compounds in the compound library to be screened and the entire chemical space. Relative chemical space coverage is the percentage chemical space covered by a given compound library at a given molecule size (heavy atom count is typically used).

Fragment based screening maximizes chemical space coverage by using compound libraries of molecules with low heavy atoms counts (called fragments) where the relative chemical space is still relatively small and therefore covered much more easily. Pioneered by Steve Fesik in 1996, fragment-based drug discovery combines libraries of fragments with sensitive detection methods (you need this for fragments) to identify hits for protein targets of interest. Lets look at the comparison of relative chemical space coverage between a typical fragment library and a typical high-throughput screening library. 

Relative Coverage of Chemical Space: Fragment Library versus HTS Library

A typical average size of compounds in a fragment library is 15 heavy atoms where as for a HTS library this is typically double at around 30 heavy atoms.  On going from 15 to 30 heavy atoms, the GDB-17 estimated chemical space increases by 10 billion-fold (that’s 10 orders of magnitude folks). For the calculation, lets assume that our fragment library consists of a thousand compounds while our HTS library has a million compounds (it´s easier to make big libraries of big molecules). Even with a thousand-times less compounds in the library, our fragment library covers the relative chemical space 10 million times more effectively. Clearly chemical libraries contain distributions of compound size and vary with respect to their composition of chemical functionalities but such effects are likely to be only minor in the face of the enormous differences in relative chemical space coverage upon increased compound size.

To maximize the coverage of chemical space, build big libraries of small compounds.

This concept has been taken to the extreme by colleagues at Astex with so-called minifrag libraries (libraries of ultra-low-molecular-weight compounds – heavy atom count 5–7).  A library of 80 minifrags averaging 6.4 heavy atoms in size showed a marked increase in hit rate versus the larger (but still very small) Xray screening set of 440 compounds averaging 10.6 heavy atoms. The relative chemical space coverage for the minifrag library lies impressively in the range of 1 and 7% versus the Xray screening set which is between 0.004 and 0.03%. 

Related Articles

2 comments

marcus 09/01/2024 - 4:55 pm

Beautifully outlined, as always, Darryl. Thanks very much.
For those of you who are interested more in Chemical Space, staff from BASF, BioSolveIT, and the NIH have launched the “Chemical Space Club” on LinkedIn a little while ago.
Be cordially invited to contribute with your thoughts and research.
https://www.linkedin.com/groups/9004052 A very happy 2024! Marcus Gastreich

Darryl B McConnell 09/01/2024 - 6:09 pm

Thanks Marcus and congratulations on pushing the exploration of chemical space together with the Enamine folk. And thanks for the headsup on the linkedin group.

Comments are closed.