Viewing Large Collections of Chemical Structures with Molecule Cloud
Visualization techniques continue to advance in importance and sophistication as datasets continue to grow. An often used technique for gleaning the gist of large amounts of text is the word cloud. The appearance of text correlates with some metric - for example frequency of use.
Tag Cloud for Molecules
Ertl and Rohde at Novartis recently described an adaptation of the word cloud to chemical structures, dubbing it "Molecule Cloud". A Molecule Cloud works like a word cloud in which chemical structure scaffolds take the place of words and a molecular readout determines appearance.
Building a Molecule Cloud
The authors describe an algorithm for generating Molecule Clouds from a collection of chemical structures and associated data:
- Determine the scaffold for every structure in the collection. A scaffold consists of a substituent-free ring system. Structures without rings are reduced to their longest chain.
- Tabulate the number of occurrences of each scaffold, and place the result on a logarithmic scale. Ignore extremely common scaffolds such as benzene.
- Select the top 100-250 scaffolds for display.
- Scale each scaffold image according to the logarithmic scale, and place into an image area to avoid overlap.
Scaffold Layout in Detail
The most challenging part of building a Molecule Cloud was found to be Step (4) - finding an aesthetically pleasing layout of structures. The reported procedure consists of two steps in which scaffolds are first placed into a display context in decreasing order of area. Then, an overlap score is calculated for each image, and used in subsequent optimization steps.
Resources
The authors' Java implementation of Molecule Cloud is available on request from Peter Ertl.
Conclusions
Molecule Cloud offers an efficient method to visually inspect large chemical structure data collections using a metaphor already familiar to many computer users.