As Numerous as the Stars in the Sky

starsMaybe not quite as numerous as the stars in the sky, but chemists at the University of Berne have created a database of almost 1 billion drug-like molecules with 13 or fewer heavy atoms. Writing in the Journal of the American Chemical Society, Jean-Louis Reymond and Lorenz Blum describe a new searchable database, GDB13, of molecules containing up to 13 atoms of C, N, O, S, and Cl.

The search for novel leads is one of the key challenges in drug discovery and Reymond and Blum have previously developed chemical universe database GDB-11 which describes 26.4 million structures containing 11 or fewer atoms of C, N, O, and F and satisfying simple chemical stability and synthetic feasibility rules. The limiting factor in computing GDB-11 was elimination from the initial list of unstable or chemically impossible molecules, most of which contained multiple heteroatoms. To speed up computation of GDB-13, a very fast ‘element-ratio’ filter was used, with cut-off values of (N + O)/C < 1.0, N/C < 0.571, and O/C < 0.666. Fluorine was eliminated from GDB-13 since it was rarely found and had not proved attractive to the group when following up output from GDB-11. With these modifications, the algorithm was sufficiently fast to compute the database up to 13 heavy atoms, producing 910 million molecules in 40,000 CPU h. The molecular enumeration was dominated by monocyclic, bicyclic, and tricyclic molecules, most of which were heterocyclic – 54% of GDB-13 molecules have at least one three- or four-membered ring. Essentially all of the molecules are drug-like (Lipinski or Vieth criteria) and many were also lead-like or fragment-like. A chlorine/sulphur set (67.3 million compounds) that enumerates molecules with chlorine atoms as aromatic substituents and sulphur atoms in aromatic heterocycles, sulphones, sulphonamides, and thioureas was also generated. This set is felt to be of particular interest for virtual screening because of the distinct molecular shapes and functional groups that are possible with these larger atoms. Despite a large fraction of chemical space being excluded to accelerate computation, the authors believe that, with 977,468,314 entries, GDB-13 is the largest publicly available database of virtual molecules ever reported. The database is available free of charge at http://www.gdb.unibe.ch and should provide a rich source of inspiration for previously un-described bioactive fragments. For those unable to resist, fluorine atoms can be added during optimisation.