Simplified Molecule Input Line Entry System (SMILES) encodes a molecular structure as single line of text. Although designed for small organic molecules in particular, SMILES can capture connectivity and stereochemical configuration from other kinds of molecules as well.
SMILES is used in many contexts, including copy/paste between software applications, persistence in chemical databases, and unique molecular identifiers.
Examples
Structure | SMILES |
---|---|
|
CCCCCC |
|
C1CCCCC1 |
|
N1CCC(=O)CC1 |
|
Clc1cc2cnccc2cc1 |
|
O=C(O)[C@H](O)[C@@H](O)C(=O)O |
Language Definition
SMILES can be defined using a two-level approach in which a syntactic specification first enumerates overall acceptable language patterns. A semantic specification then describes the meaning of these patterns.
SMILES syntax is most precisely defined in terms of a formal grammar.
Parsing SMILES
Although many SMILES parsers are hand-crafted, a parser generator operating on a formal grammar offers many advantages. For example, Smidge, a lightweight SMILES parser for JavaScript that can tokenize and validate SMILES strings, was auto-generated from a formal grammar.