gChem: Easily Convert Names and CAS Numbers to Chemical Structures in Google Spreadsheets
Note: the information on this page is out-of-date.
ChemCell is group of Excel macros for interconverting chemical structures and names in Excel. The high level of interest in this tool is understandable: there are hundreds and probably thousands of mission-critical chemical databases in existence today that consist of nothing more than a list of chemical names and associated data. And they're all stored in spreadsheets of one kind or another.
Getting these spreadsheet-based databases into a format offering more power requires automated name to structure conversion.
This article describes our solution to the problem, "gChem". gChem makes it possible to interconvert names and chemical structure representations in Google Spreadsheets with simple, easy-to-use formulas.
Quickstart
Note: We've submitted gChem for approval in the global Google Spreadsheets Scripts collection, with the intent of calling it 'gChem'. We'll update this article once approval is finalized.
Once approved, installing gChem will easy. Simply select Insert->Script from any open spreadsheet and search for 'gChem'.
Until then, you can install gChem by creating a custom script from the source below (Tools->Scripts->Script Editor).
gChem gives your Google Spreadsheets four new formulas that accept as an argument a chemical name or Chemical Abstracts Service (CAS) Registry number:
- getInChI Return a standard InChI.
- getInChIKey Return a standard InChIKey.
- getSMILES Return a SMILES string.
- getSynonyms Return up to ten alternate names.
Applying these formulas is a simple as:
which results in the following on pressing return:
A sample spreadsheet, generated with gChem and listing some of the top-selling pharmaceuticals from 2009 is available online (HTML, Excel).
Technical Details
Google Spreadsheets offers the ability to create custom formulas written in JavaScript. Even better, these scripts can retrieve content from arbitrary URLs.
ChemCell used the National Cancer Institute's Chemical Identifier Resolver (CIR). gChem does the same thing.
Here's the source of our first version of gChem:
/**
* gChem: Power tools for chemistry in Google Spreadsheets
*/
function getCAS(id) {
var content = lookup(id, 'cas');
return content ? content : 'NOT FOUND';
};
function getSMILES(id) {
var content = lookup(id, 'smiles');
return content ? content : 'NOT FOUND';
};
function getInChIKey(id) {
var content = lookup(id, 'stdinchikey');
return content ? content.split('=')[1] : 'NOT FOUND';
};
function getInChI(id) {
var content = lookup(id, 'stdinchi');
return content ? content : 'NOT FOUND';
};
function getSynonyms(id) {
var content = lookup(id, 'names');
if (!content) {
return 'NOT FOUND';
}
var names = content.split('\n');
if (names.length <= 5) {
return content;
}
names.splice(5);
return names.join('\n') + '\n...';
};
function lookup(id, representation) {
var url = 'http://cactus.nci.nih.gov/chemical/structure/' + id + '/' + representation;
var result;
try {
var response = UrlFetchApp.fetch(url);
result = response.getContentText();
} catch (error) {
// do nothing
}
return result;
};
Conclusions
gChem combines three key Web technologies: Google Spreadsheets, JavaScript, and the Chemical Identifier Resolver web service. It's not hard to imagine using other public-facing web services to extend the basic concepts found in gChem.
Update March 9, 2011: Added function getCAS, which returns zero, one, or many CAS numbers based on a name.