Cheminformatics

A Guide to Cheminformatics

SMILES

Simplified Molecule Input Line Entry System (SMILES) encodes a molecular structure as single line of text. Although designed for small organic molecules in particular, SMILES can capture connectivity and stereochemical configuration from other kinds of molecules as well.

SMILES is used in many contexts, including copy/paste between software applications, persistence in chemical databases, and unique molecular identifiers.

Examples

Molecules Encoded by SMILES
Structure SMILES
CCCCCC
C1CCCCC1
N1CCC(=O)CC1
Clc1cc2cnccc2cc1
O=C(O)[C@H](O)[C@@H](O)C(=O)O

Language Definition

SMILES can be defined using a two-level approach in which a syntactic specification first enumerates overall acceptable language patterns. A semantic specification then describes the meaning of these patterns.

SMILES syntax is most precisely defined in terms of a formal grammar.

Parsing SMILES

Although many SMILES parsers are hand-crafted, a parser generator operating on a formal grammar offers many advantages. For example, Smidge, a lightweight SMILES parser for JavaScript that can tokenize and validate SMILES strings, was auto-generated from a formal grammar.

Resources