Signals Blog by Metamolecular2021-01-05T00:50:14+00:00Metamolecular, LLCurn:uuid:EB08D4CF-EAE0-4700-9ED1-CDD634424BA3Kemga, a Solution to the Chemical eCommerce Problemtag:metamolecular.com,2017-01-04:/blog/2017/01/04/kemga-a-solution-to-the-chemical-ecommerce-problem2017-01-04T18:00:00+00:00
2017-01-04T18:00:00+00:00
<p>Companies selling small organic molecule products face <a href="/blog/2016/12/20/the-chemical-ecommerce-problem/">special challenges when building eCommerce websites</a>. Off-the-shelf solutions lack vital functionality, and Web developers with the skill and flexibility to do the job are scarce. Even with the right tools and people, long-term maintenance costs can stack up fast. This article describes a practical solution.<p>Companies selling small organic molecule products face <a href="/blog/2016/12/20/the-chemical-ecommerce-problem/">special challenges when building eCommerce websites</a>. Off-the-shelf solutions lack vital functionality, and Web developers with the skill and flexibility to do the job are scarce. Even with the right tools and people, long-term maintenance costs can stack up fast. This article describes a practical solution.
<!-- more --></p>
<h4 id="introducing-kemga">Introducing Kemga</h4>
<p><a href="https://kemga.com">Kemga</a> is the eCommerce platform for companies that sell small organic molecules. Kemga is a Web service, which means there's no software to install or maintain. Kemga is fully-integrated, which means that all software and services are included. Kemga is also chemistry-centric, meaning that it understands and uses chemical structures natively.</p>
<h4 id="features">Features</h4>
<p>Focussing on structures allows Kemga to offer a rich set of chemistry features, including:</p>
<ul>
<li>exact structure search and substructure search, allowing customers to find what they're looking for</li>
<li>color 2D structure images for clear product presentation</li>
<li>chemical property assignment (molecular weight, molecular formula, etc.) for easy reference</li>
<li>automatic structure-based categorization ("imidazoles", "pyridines", etc.) to highlight important product lines</li>
<li>optional CAS number conversion and/or assignment for quick product identification</li>
<li>optional systematic nomenclature conversion and/or assignment to eliminate manual structure entry</li>
</ul>
<p>As an eCommerce platform, Kemga also offers these business features:</p>
<ul>
<li>built-in shopping cart for better customer engagement</li>
<li>search engine optimization to bring in new customers</li>
<li>responsive page layout that works with a wide range of devices</li>
<li>style and branding to match your current website</li>
<li>served from any top-level host name (e.g., mycompany.com) to promote your brand</li>
</ul>
<p>Test all of Kemga's features with <a href="https://kemga.com/">the online demonstration</a>.</p>
<h4 id="getting-started">Getting Started</h4>
<p>A chemical eCommerce site can be started with as little as a catalog file containing product identifiers. Supported formats include Excel, CSV, and SDF. If no machine-readable structures are present, they can be generated from <a href="/cheminformatics/smiles/">SMILES</a>, <a href="/cheminformatics/what-is-a-cas-number/">CAS Numbers</a>, and systematic names contained within the catalog file. Quantity, pricing, and other product metadata are read from the catalog file. Updates use the same process. Kemga rebuilds your website based on the contents of the file, eliminating the need for redundant data entry.</p>
<p>A Kemga preview is available to qualifying companies. Find out more <a href="mailto:kemga@metamolecular.com">by email</a> or <a href="/contact/">online</a>.</p>
The Chemical eCommerce Problemtag:metamolecular.com,2016-12-20:/blog/2016/12/20/the-chemical-ecommerce-problem2016-12-20T10:30:00+00:00
2016-12-20T10:30:00+00:00
<p>Building an eCommerce website for small molecule products isn't easy. Most of the issues can be traced to chemical information management, a job for which off-the-shelf eCommerce systems are ill-suited. This article describes the problem in detail, and a future article will describe a practical solution.<p>Building an eCommerce website for small molecule products isn't easy. Most of the issues can be traced to chemical information management, a job for which off-the-shelf eCommerce systems are ill-suited. This article describes the problem in detail, and a future article will describe a practical solution.
<!-- more --></p>
<h4 id="structure-search">Structure Search</h4>
<p>The most pressing need of a chemical eCommerce system is <em>structure search</em>. Structure search allows a customer to find products that contain a structural motif of interest. The results of a structure search for commercially-available materials can markedly influence the scope of a customer project. For example, many projects require ready access to families of starting materials sharing a common substructure. Identifying a source of related building blocks can motivate many repeat purchases.</p>
<p>Structure search presents a pivotal opportunity to match customer with product. When implemented properly, structure search can become a business-critical feature.</p>
<h4 id="machine-readable-structures">Machine-Readable Structures</h4>
<p>Before structure search can even be implemented, each product must be associated with a <em>machine-readable structure</em>. A machine-readable structure represents a molecule in a format suitable for manipulation by software. The two most useful formats are <a href="/cheminformatics/smiles/">SMILES</a> and <a href="/cheminformatics/molfile-format/">Molfile</a>. Both formats have been standardized over the course of many years, and are well supported by a wide range of software.</p>
<p>Given a collection of products, each of which is linked to a machine-readable structure, a chemical eCommerce system can implement structure search. There are two basic approaches that differ markedly in their performance characteristics. Both approaches attempt an atom-by-atom match between a query structure drawn by the customer and a target catalog structure. However, one approach precompiles an index to avoid performing the search on every product structure. Indexes are most suitable for large collections of more than a few thousand products.</p>
<h4 id="more-than-just-structure-search">More than Just Structure Search</h4>
<p>It turns out that a chemical eCommerce system needs machine-readable structures for much more than just structure search. Examples include:</p>
<ul>
<li><strong>2D Structure.</strong> Customers expect to see a structure image when browsing product selections.</li>
<li><strong>Systematic Name.</strong> An IUPAC name on a product summary page further confirms identity and can promote search engine discoverability.</li>
<li><strong>Chemical Properties.</strong> Customers often use product catalogs as a source of information about products. Properties such as molecular weight and molecular formula are especially handy.</li>
<li><strong>Chemical Identifiers.</strong> Cross-checking a catalog entry with another database gives added confidence that the right product is being purchased. Both <a href="/cheminformatics/what-is-a-cas-number/">CAS Numbers</a> and InChI are helpful in this regard.</li>
</ul>
<h4 id="manual-labor">Manual Labor</h4>
<p>The unmistakable need for machine-readable structures raises the question of how these structures will be generated. Companies whose products number in the dozens might get by with manual curation. In other words, each 2D structure, chemical property, name, and identifier is assigned by a person. The data may be tracked in a spreadsheet or other ad hoc database. Publishing the data might involve copying it from one location (e.g., a spreadsheet) to another (e.g. a data entry form on the company website).</p>
<p>Unfortunately, manual curation fails for catalogs of even a few hundred entries. Human error inevitably causes inconsistencies in reporting and bad data to seep into the catalog. These errors can be very difficult to spot, and vexing to customers and company management.</p>
<p>Accuracy aside, manual curation of a product catalog can be harmful to a company in other ways. The most obvious is scale: limiting product selection to that which can be managed by hand prevents a company from growing sales through product line expansion. Manual processes are also slow, which can delay the effective release date of new products. Finally, the need to convert catalog data by hand can harm marketing efforts, which require wide publication of product data in a variety of formats.</p>
<h4 id="automation">Automation</h4>
<p>Given the right skills and tools, manual labor can be almost entirely eliminated from a chemical eCommerce system. Machine readable structures can be generated from either CAS numbers or IUPAC names alone. From these machine readable structures, an eCommerce system with structure search can be built. Product summary pages complete with 2D structure, IUPAC name, CAS number, and molecular weight/formula can be generated without human intervention. Moreover, catalogs in a wide variety of formats can be published to ensure the largest possible audience. The size of the catalog isn't constrained by data management issues. New products can be introduced to customers as soon as they become available. And customers can use a variety of search methods to find what they're looking for.</p>
<p>Unfortunately, high up-front and maintenance costs put automation out of reach for many companies.</p>
<h4 id="niche-market">Niche Market</h4>
<p>Many eCommerce websites are built through minor customization of an existing content management system (CMS). A handful of off-the-shelf components are bundled together with the CMS, yielding the finished site. However, chemistry applications require components that don't usually exist. <a href="/blog/2013/05/22/substructure-search-for-websites/">An earlier article</a> explained this problem in detail. To summarize, a structure searchable website requires the coordination of four key pieces of software within a content management system.</p>
<p>Compounding this problem is the highly specialized nature of chemical data processing, or <a href="/cheminformatics/">cheminformatics</a>, in which simple, reasonable-sounding solutions often fail. For example, on more than one occasion an experienced developer who was a cheminformatics novice has suggested searching and indexing SMILES like ordinary text as a way to implement substructure search. Many cheminformatics novices don't understand the business case for implementing structure search in the first place and so guess. Even worse, many problems won't surface until an system has been in use for some time. For example, products available as different salt forms need to be handled in a consistent way - from the beginning. And so on.</p>
<p>Perhaps the most vexing problem is tracking down and evaluating software options. Good cheminformatics software also tends to carry a hefty price tag. Free alternatives do exist, but filtering the good from the bad takes experience, something that cheminformatics novices lack by definition. Resources that compile information about various cheminformatics packages are often scattered, out of date, or incomplete. Just fully understanding the software that must be assembled can consume valuable time and money.</p>
<p>Taken together, these difficulties can result in an expensive proposition for companies trying to roll their own chemical eCommerce solutions. First, a team of developers willing and able to extend its standard CMS with nonstandard cheminformatics components - usually on a tight budget - needs to be hired. Next, this team needs to sort through a broad but often shallow body of documentation in search of the appropriate cheminformatics components. Finally, the required components need to somehow be integrated within the CMS to yield a chemical eCommerce site. Even if everything comes together on time and within budget, ancillary data management tasks can inflate already stretched budgets. Most importantly, all of this work is typically performed by cheminformatics novices, resulting in inevitable cost overruns and delays.</p>
<h4 id="point-solutions">Point Solutions</h4>
<p>In the face of what may seem like an insurmountable technical challenge, many companies turn to point solutions. Given the importance of structure search, for example, one approach would be to simply solve that one problem. As a simple and inexpensive approach, Metamolecular has introduced <a href="/chemserver/">ChemServer</a>. This self-contained module is capable of performing exact- and substructure matching within either PHP or Ruby environments. Catalogs of up to a few thousand structures in size can be searched without an index step or database modifications.</p>
<p>This solution works well if structure search is the only cheminformatics problem to be solved, but most companies will face many, if not all of the requirements described above. For example, images and chemical properties still need to be assigned to products. The site's database still needs to be populated with machine-readable chemical structures for each product. Moreover, ancillary data management (e.g., creation of PDF or SD file catalogs) will remain an expensive and error-prone process that can stunt sales.</p>
<h4 id="conclusions">Conclusions</h4>
<p>Point solutions to the chemical eCommerce problem may appear attractive due to their low initial cost. But integration and the burden of manual data management can spawn new problems. What's needed is a fundamentally new approach to the chemical eCommerce problem. The next aricle in this series will introduce one.</p>
Shorten Long IUPAC Chemical Names with this One Weird CSS Tricktag:metamolecular.com,2013-11-06:/blog/2013/11/06/shorten-long-iupac-chemical-names-with-this-one-weird-css-trick2013-11-06T11:30:00+00:00
2013-12-01T20:00:00+00:00
<p><a href="/blog/2013/10/31/computer-translation-of-iupac-chemical-nomenclature/">IUPAC Nomenclature</a>, although valuable as a naming system for organic chemicals, often produces very long names. Generally speaking, the larger the structure, the longer the IUPAC name. The enormous length variation in IUPAC names poses a particular problem for fixed-width HTML elements such as those used in grids and tables. Fortunately, an easy workaround comes in the form of the CSS3 <code>text-overflow</code> property.<p><a href="/blog/2013/10/31/computer-translation-of-iupac-chemical-nomenclature/">IUPAC Nomenclature</a>, although valuable as a naming system for organic chemicals, often produces very long names. Generally speaking, the larger the structure, the longer the IUPAC name. The enormous length variation in IUPAC names poses a particular problem for fixed-width HTML elements such as those used in grids and tables. Fortunately, an easy workaround comes in the form of the CSS3 <code>text-overflow</code> property.
<!-- more --></p>
<h4 id="using-code-text-overflow-code-">Using <code>text-overflow</code></h4>
<p>Setting an element's <code>text-overflow</code> property to <code>ellipsis</code> instructs the browser to truncate overflowing text and replace what's removed with the ellipsis (…) character. As <a href="http://css-tricks.com/snippets/css/truncate-string-with-ellipsis/">discussed elsewhere</a>, three other properties need to be set: <code>width</code>; <code>white-space</code>; and <code>overflow</code>.</p>
<div class="code-caption">Truncating IUPAC Names with CSS</div>
<pre><code class="lang-css"><span class="selector-class">.iupac-name</span> {
<span class="attribute">width</span>: <span class="number">250px</span>;
<span class="attribute">white-space</span>: nowrap;
<span class="attribute">overflow</span>: hidden;
<span class="attribute">text-overflow</span>: ellipsis;
}
</code></pre>
<div class="run-demo"><a href="/examples/20131106/name-shorten-1.html" target="_blank">Run in New Tab</a></div>
<h4 id="selectively-showing-the-full-name">Selectively Showing the Full Name</h4>
<p>If you viewed the <a href="/examples/20131106/name-shorten-1.html">demo</a>, you probably realized that naive truncation of IUPAC names isn't really that helpful. Forced into a small enough space, most names will be truncated. This of course renders them all unreadable.</p>
<p>We need to give the user a way to reveal the full IUPAC name. The <code>title</code> attribute offers a simple solution.</p>
<div class="code-caption">Truncated IUPAC Name with Tooltip</div>
<pre><code class="lang-xml"><span class="tag"><<span class="name">div</span> <span class="attr">class</span>=<span class="string">"iupac-name"</span> <span class="attr">title</span>=<span class="string">"(3R,5R)-7-[2-(4-fluorophenyl)-3-phenyl-4-(phenylcarbamoyl)-5-propan-2-ylpyrrol-1-yl]-3,5-dihydroxyheptanoic acid"</span>></span>(3R,5R)-7-[2-(4-fluorophenyl)-3-phenyl-4-(phenylcarbamoyl)-5-propan-2-ylpyrrol-1-yl]-3,5-dihydroxyheptanoic acid<span class="tag"></<span class="name">div</span>></span>
</code></pre>
<div class="run-demo"><a href="/examples/20131106/name-shorten-2.html" target="_blank">Run in New Tab</a></div>
<p>Now when a user hovers over the truncated IUPAC name, a tooltip showing the full name appears.</p>
<p>The <code>title</code> attribute may not be the best solution in every situation. For example, users on touchscreen devices will be unable to see the full IUPAC name. </p>
<h4 id="conclusions">Conclusions</h4>
<p>Displaying IUPAC names in fixed-width page elements can wreak havoc on page layouts. This article illustrates a simple solution using standard HTML and CSS. Many more complex solutions are available.</p>
<p>However, before applying any solution, it's worth reflecting on the purpose of showing the IUPAC name in the first place. "Requirements" around showing IUPAC names in web pages sometimes stem from old habits rooted in print media. A <a href="http://chemwriter.com">well-rendered chemical structure</a> often conveys all of the information a chemist needs.</p>
Ryan Scientific Does Chemical Structure Searchtag:metamolecular.com,2013-11-05:/blog/2013/11/05/ryan-scientific-does-chemical-structure-search2013-11-05T09:20:00+00:00
2016-12-02T09:11:00+00:00
<p>Implementing <a href="/blog/2013/05/22/substructure-search-for-websites/">chemical structure search</a> isn't an end to itself - it's something that fits into a larger goal of providing excellent customer service. This video by <a href="http://ryansci.com">Ryan Scientific</a> shows how the pieces fit together.<p>Implementing <a href="/blog/2013/05/22/substructure-search-for-websites/">chemical structure search</a> isn't an end to itself - it's something that fits into a larger goal of providing excellent customer service. This video by <a href="http://ryansci.com">Ryan Scientific</a> shows how the pieces fit together.
<!-- more --></p>
<div class="video-wrapper">
<iframe width="620" height="465" src="http://www.youtube.com/embed/B5MvfWAoCKc" frameborder="0" allowfullscreen></iframe>
</div>
<p>By the way, don't miss the cameo by <a href="http://chemwriter.com">ChemWriter</a>!</p>
Computer Translation of IUPAC Chemical Nomenclaturetag:metamolecular.com,2013-10-31:/blog/2013/10/31/computer-translation-of-iupac-chemical-nomenclature2013-10-31T12:00:00+00:00
2013-10-31T12:00:00+00:00
<p>Few methods for conveying organic chemical structures can match the scope of IUPAC nomenclature. Central to patents, papers, and reports, IUPAC names have the rare distinction of being readable by humans and machines alike. This article, the first in a series on IUPAC Nomenclature translation, introduces some of the foundational works in the field.<p>Few methods for conveying organic chemical structures can match the scope of IUPAC nomenclature. Central to patents, papers, and reports, IUPAC names have the rare distinction of being readable by humans and machines alike. This article, the first in a series on IUPAC Nomenclature translation, introduces some of the foundational works in the field.
<!-- more --></p>
<h4 id="eugene-garfield">Eugene Garfield</h4>
<p>Although several name to structure software systems have been developed over the last thirty years, their origins can be traced to a <a href="http://dx.doi.org/10.1038/192192a0">single 1961 paper</a> by <a href="http://www.garfield.library.upenn.edu/">Eugene Garfield</a> that was subsequently <a href="http://garfield.library.upenn.edu/essays/v6p489y1983.pdf">republished</a>.</p>
<p>Garfield's paper described a computer program capable of converting systematic organic nomenclature into empirical formulas. As explained in a much later <a href="http://www.webofstories.com/play/eugene.garfield/1">interview</a>, Garfield's immediate interest was to use the formulas as unique keys in his growing <em>Index Chemicus</em>. Nevertheless, the ultimate goal was to develop a program capable of producing structural diagrams from arbitrary systematic names.</p>
<p>A <a href="http://dx.doi.org/10.1021/c160006a021">later paper</a> described an eight-step algorithm for generating molecular formulas from systematic chemical names.</p>
<h4 id="chemical-abstracts-service">Chemical Abstracts Service</h4>
<p>In 1967, Chemical Abstracts Service (CAS) <a href="http://dx.doi.org/10.1021/c160026a009">published</a> the first widely-applicable set of rules for converting systematic organic nomenclature into machine-readable structures (a process the authors termed "Nomenclature Translation"). The algorithm begins at the first character of a name and works its way rightward, one character at a time. State is accumulated along the way by recognizing name components such as locants, punctuation, and name roots.</p>
<p>The CAS group later described a <a href="http://dx.doi.org/10.1021/c160055a009">software implementation</a> of the original algorithm. Written in assembly language, it weighed in at 205K of machine code, ran on an IBM 360/370, and could process 4900 names per minute.</p>
<h4 id="grammar-based-translators">Grammar-Based Translators</h4>
<p>Although <a href="/blog/2013/10/07/create-a-smiles-parser-and-grammar-with-pegjs/">grammar-based analysis</a> might seem like an obvious choice for chemical nomenclature translation given its systematic nature, the first in-depth studies did not appear until 1988. A series of papers by Kirby's group at the University of Hull comprehensively surveyed the field, developed a detailed context-free grammar for an important subset of IUPAC nomenclature, and described a working software implementation.</p>
<ul>
<li><a href="http://dx.doi.org/10.1021/ci00062a009">Introduction and background to a grammar-based approach</a> Comprehensive review of the field of systematic nomenclature translation.</li>
<li><a href="http://dx.doi.org/10.1021/ci00062a010">Development of a formal grammar</a> Illustrates the process of building a systematic nomenclature grammar starting with saturated hydrocarbons.</li>
<li><a href="http://pubs.acs.org/doi/suppl/10.1021/ci00062a010/suppl_file/ci00062a010_si_001.pdf">Development of a formal grammar (Supporting Information)</a> Provides the first published example of an IUPAC nomenclature grammar.</li>
<li><a href="http://dx.doi.org/10.1021/ci00062a011">Syntax analysis and semantic processing</a> Implementation of a nomenclature translator using a Simple Left to Right (SLR) backtracking algorithm together with the proceeding grammar to produce a semantic tree.</li>
<li><a href="http://dx.doi.org/10.1021/ci00066a004">Concise connection tables to structure diagrams</a> Description of the temporary data structure obtained immediately after parsing a name, and how to transform it into a connection table.</li>
<li><a href="http://dx.doi.org/10.1021/ci00066a005">Steroid nomenclature</a> Expansion of the grammar and parser to include steroids.</li>
<li><a href="http://dx.doi.org/10.1021/ci00001a028">(Semi)automatic name correction</a> A combination of loosened grammar rules and ad-hoc procedures can be used to correct many common errors in systematic names.</li>
</ul>
<h4 id="name-struct">Name=Struct</h4>
<p>Rejecting grammar-based approaches as too rigid for the loose way systematic nomenclature has been used in practice, CambridgeSoft worked on a different approach to the problem, publishing a <a href="http://dx.doi.org/10.1021/ci990062c">description</a> 1999. Following a set of principles derived in large part from vendor catalogs and name queries received by a company-run web service, Name=Struct consisted of two main steps:</p>
<ol>
<li>Divide a name into a set of recognized fragments of maximum length, proceeding left to right, one character at a time.</li>
<li>Given a set of fragments, assemble the corresponding structure.</li>
</ol>
<p>Although conceptually simple, the Name=Struct approach required close attention to the ways name fragments relate to one another. The complete implementation to produce in-memory structure representations from arbitrary names consisted of roughly 30,000 lines of C++.</p>
<h4 id="opsin">OPSIN</h4>
<p><a href="http://opsin.ch.cam.ac.uk/">OPSIN</a> currently stands as the only broadly-applicable, open source systematic name translation software. A <a href="http://dx.doi.org/10.1021/ci100384d">2011 paper</a> by Murray Rust's group at Cambridge describes OPSIN's design and implementation in Java. A high-level overview can be given as:</p>
<ol>
<li>Tokenization into "Words" via a backtracking, grammar-based automaton similar to that described by the Kirby group. Multiple valid parses may be detected, although in practice this was true for fewer than 10% of all names.</li>
<li>Generation, processing, and assembly of structure fragments.</li>
</ol>
<p>OPSIN's recall and accuracy were found to be competitive with that of the CambridgeSoft implementation (in the form of ChemDraw 12). Machine-readable datasets used to determine the accuracy of OPSIN's results are available in the <a href="http://pubs.acs.org/doi/suppl/10.1021/ci100384d">supporting information</a>, as are the specific failure cases.</p>
<p>OPSIN continues to be actively maintained with an up-to-date source code repository hosted on <a href="https://bitbucket.org/dan2097/opsin/">BitBucket</a> and a <a href="http://opsin.ch.cam.ac.uk/">Web-based demo</a>.</p>
<h4 id="conclusions">Conclusions</h4>
<p>Systematic nomenclature translation continues to play an important role in many cheminformtics workflows today. As shown by the diversity and complexity of the approaches disclosed over the last 50+ years, the problem remains both challenging and quite difficult to solve comprehensively.</p>
Styling the ChemWriter Editor with CSStag:metamolecular.com,2013-10-23:/blog/2013/10/23/styling-the-chemwriter-editor-with-css2013-10-23T10:45:00+00:00
2013-10-23T10:45:00+00:00
<p>Customizing the appearance of page elements through CSS has been a staple of Web development for many years. ChemWriter's <code>Editor</code> component renders itself using DOM elements, making fine-grained customization possible using CSS. Read on to learn how to use this capability.<p>Customizing the appearance of page elements through CSS has been a staple of Web development for many years. ChemWriter's <code>Editor</code> component renders itself using DOM elements, making fine-grained customization possible using CSS. Read on to learn how to use this capability.
<!-- more --></p>
<h4 id="about-chemwriter-css">About chemwriter.css</h4>
<p>The <a href="https://chemwriter.com/sdk/chemwriter.css">chemwriter.css</a> file defines on-screen appearance of <code>Editor</code> components. To prevent collisions with other definitions appearing on the same page, each class name uses the prefix <code>chemwriter-</code>.</p>
<p>For example, the appearance of <code>Editor</code>'s top-level DOM element is defined by a set of rules defining size, border, background color and box sizing model.</p>
<div class="code-caption">ChemWriter Editor top-level DOM element styling</div>
<pre><code class="lang-css"><span class="comment">/*
* Main Editor window
*/</span>
<span class="selector-class">.chemwriter-editor</span> {
<span class="attribute">width</span>: <span class="number">100%</span>;
<span class="attribute">height</span>: <span class="number">100%</span>;
<span class="attribute">position</span>: relative;
<span class="attribute">border</span>: <span class="number">2px</span> solid gray;
<span class="attribute">border-radius</span>: <span class="number">5px</span>;
<span class="attribute">background-color</span>: <span class="number">#dcdcdc</span>;
<span class="attribute">box-sizing</span>: border-box;
<span class="attribute">-moz-box-sizing</span>: border-box;
}
</code></pre>
<p>Changing any attribute is a simple matter of modifying the CSS class. Although in principle it's possible to do so by directly editing the <code>chemwriter.css</code> file, this is not recommended because the stylesheet regularly gets updated with new releases.</p>
<p>A far better approach would be to <em>override</em> the <code>chemwriter-editor</code> class with the specific change of interest.</p>
<div class="code-caption">Overriding chemwriter.css</div>
<pre><code class="lang-css"><span class="comment">/*
* Ensure that this definition loads after chemwriter.css
*/</span>
<span class="selector-class">.chemwriter-editor</span> {
<span class="attribute">border</span>: <span class="number">2px</span> solid red;
}
</code></pre>
<p>However, all classes defined in <code>chemwriter.css</code> are subject to change at any time. The situation is similar to undocumented API methods. They are intended for internal use only and can change without notice - wrecking anything built on top.</p>
<p>What's needed is a set of <em>stable</em> ChemWriter CSS classes that are safe to override.</p>
<h4 id="introducing-chemwriter-user-css">Introducing chemwriter-user.css</h4>
<p>To provide a safe mechanism for customizing ChemWriter component appearance, <a href="http://chemwriter.com/sdk/chemwriter-user.css">chemwriter-user.css</a> has been introduced. This file defines those classes that can be safely overridden. Unlike <code>chemwriter.css</code>, <code>chemwriter-user.css</code> can either be edited directly, or used as a template.</p>
<div class="code-caption">Currently-usable CSS classes</div>
<pre><code class="lang-css"><span class="comment">/* Main Editor window */</span>
<span class="selector-class">.chemwriter-editor</span> { }
<span class="comment">/* Editor button */</span>
<span class="selector-class">.chemwriter-button</span> { }
<span class="comment">/* Hovering over an enabled button */</span>
<span class="selector-class">.chemwriter-button-enabled</span><span class="selector-pseudo">:hover</span> { }
<span class="comment">/* Disabled Editor button */</span>
<span class="selector-class">.chemwriter-button-disabled</span> { }
<span class="comment">/* Pressed button */</span>
<span class="selector-class">.chemwriter-button-pressed</span> { }
<span class="comment">/* About button */</span>
<span class="selector-class">.chemwriter-button-about</span> { }
<span class="comment">/* Editor button icon. Font "ChemWriter Symbols" is defined in */</span>
<span class="comment">/* chemwriter.css. To change button icons, use a different font. */</span>
<span class="selector-class">.chemwriter-icon</span> { }
<span class="comment">/* The small triangle that appears to the lower-right of button icons. */</span>
<span class="selector-class">.chemwriter-detail-disclosure</span> { }
<span class="comment">/* Use a small triangle shape */</span>
<span class="selector-class">.chemwriter-detail-disclosure</span><span class="selector-pseudo">:after</span> { }
<span class="comment">/* The main structure display area. */</span>
<span class="selector-class">.chemwriter-document-view</span> { }
<span class="comment">/* IE8 only */</span>
<span class="selector-class">.chemwriter-fallback-content</span> { }
</code></pre>
<div class="run-demo"><a href="http://chemwriter.com/sdk/chemwriter-user.css" target="_blank">View full stylesheet</a></div>
<h4 id="styling-the-editor">Styling the Editor</h4>
<p>A site may use a dark color theme. As a result, the <code>Editor</code> might show up better given a dark theme for it as well. This can be accomplished by overriding a few of the class definitions contained in <code>chemwriter-user.css</code>.</p>
<div class="code-caption">Dark-Themed ChemWriter Editor</div>
<pre><code class="lang-css"><span class="selector-tag">body</span> {
<span class="attribute">background-color</span>: black;
}
<span class="selector-class">.chemwriter-editor</span> {
<span class="attribute">background-color</span>: <span class="number">#808080</span>;
<span class="attribute">border</span>: <span class="number">1px</span> solid <span class="number">#b0b0b0</span>;
}
<span class="selector-class">.chemwriter-button</span> {
<span class="attribute">color</span>: <span class="number">#e0e0e0</span>;
<span class="attribute">text-shadow</span>: <span class="number">0</span> <span class="number">1px</span> <span class="number">#000000</span>;
}
<span class="selector-class">.chemwriter-button-disabled</span> {
<span class="attribute">color</span>: <span class="number">#a0a0a0</span>;
}
</code></pre>
<div class="run-demo"><a href="/examples/20131023/chemwriter-dark.html" target="_blank">Run in New Tab</a></div>
<h4 id="conclusions">Conclusions</h4>
<p>ChemWriter components such as <code>Editor</code> are built from standard HTML5 elements and can be styled with CSS. The newest release makes it possible to reliably customize many aspects of <code>Editor</code>'s appearance.</p>
Validating CAS Numbers in JavaScripttag:metamolecular.com,2013-10-11:/blog/2013/10/11/validating-cas-numbers-in-javascript2013-10-11T12:00:00+00:00
2013-10-11T12:00:00+00:00
<p><a href="/cheminformatics/what-is-a-cas-number/">CAS Registry Numbers</a> are used extensively in chemistry and commerce to identify chemical substances. One reason is that CAS Numbers tolerate human data entry errors thanks to a built-in <a href="http://en.wikipedia.org/wiki/Check_digit">check digit</a>. This article shows how to validate arbitrary CAS numbers in JavaScript through comparison of expected and actual check digits.<p><a href="/cheminformatics/what-is-a-cas-number/">CAS Registry Numbers</a> are used extensively in chemistry and commerce to identify chemical substances. One reason is that CAS Numbers tolerate human data entry errors thanks to a built-in <a href="http://en.wikipedia.org/wiki/Check_digit">check digit</a>. This article shows how to validate arbitrary CAS numbers in JavaScript through comparison of expected and actual check digits.
<!-- more --></p>
<h4 id="anatomy-of-a-cas-number">Anatomy of a CAS Number</h4>
<p>A CAS Number is represented by a character sequence containing three groups of digits separated by hyphens. The first group contains two to seven digits. The second group contains two digits. The last group contains a single check digit.</p>
<p>For example, the CAS number for Imatinib, <code>152459-95-5</code>, contains six digits in the first block, and a check digit of "5".</p>
<p>Chemical Abstracts Service reports that CAS numbers are <a href="http://www.cas.org/content/chemical-substances/faqs#q4">sequentially assigned</a> as they enter the CAS Registry. Unlike some chemical identifiers such as InChI Key, no other chemical meaning can be ascribed to a substance's CAS Number.</p>
<h4 id="calculating-the-check-digit">Calculating the Check Digit</h4>
<p>CAS Numbers are represented generally as a sequence of digits numbered right-to-left, excluding the rightmost digit, which is labeled "R".</p>
<pre><code class="bash">N<sub>i</sub> ... N<sub>5</sub> N<sub>4</sub> N<sub>3</sub> - N<sub>2</sub> N<sub>1</sub> - R</code>
</pre>
<p>The check digit <code>R</code> is calculated by summing the product of all <code>i</code> times <code>N<sub>i</sub></code> and taking mod 10 of the result.</p>
<pre><code class="bash">(ixN<sub>i</sub> + ... + 5xN<sub>5</sub> + 4xN<sub>4</sub> + 3xN<sub>3</sub> + 2xN<sub>2</sub> + N<sub>1</sub>) mod 10
</code>
</pre>
<p>We can write a JavaScript function that will calculate the CAS Number check digit.</p>
<div class="code-caption">Calculation of the CAS Number check digit</div>
<pre><code class="lang-javascript"><span class="keyword">var</span> getCheckDigit = <span class="function"><span class="keyword">function</span>(<span class="params">cas</span>) </span>{
<span class="keyword">var</span> match = cas.match(<span class="regexp">/([0-9]{2,7})-([0-9]{2})-[0-9]/</span>);
<span class="keyword">var</span> digits = (match[<span class="number">1</span>] + match[<span class="number">2</span>]).split(<span class="string">''</span>).reverse();
<span class="keyword">var</span> sum = <span class="number">0</span>;
<span class="keyword">for</span> (<span class="keyword">var</span> i = <span class="number">0</span>; i < digits.length; i++) {
sum += (i + <span class="number">1</span>) * <span class="built_in">parseInt</span>(digits[i]);
}
<span class="keyword">return</span> sum % <span class="number">10</span>;
};
</code></pre>
<div class="run-demo"> </div>
<h4 id="sample-page">Sample Page</h4>
<p>Adding a bit of HTML results in a fully-functional CAS number validator.</p>
<div class="code-caption">CAS Number Validator</div>
<pre><code class="lang-xml"><span class="meta"><!DOCTYPE html></span>
<span class="tag"><<span class="name">html</span>></span>
<span class="tag"><<span class="name">head</span>></span>
<span class="tag"><<span class="name">title</span>></span>CAS Number Validator<span class="tag"></<span class="name">title</span>></span>
<span class="tag"><<span class="name">script</span>></span><span class="javascript">
<span class="keyword">var</span> validator = {
<span class="attr">regex</span>: <span class="regexp">/([0-9]{2,7})-([0-9]{2})-[0-9]/</span>,
<span class="attr">getCheckDigit</span>: <span class="function"><span class="keyword">function</span>(<span class="params">cas</span>) </span>{
<span class="keyword">var</span> match = cas.match(<span class="keyword">this</span>.regex);
<span class="keyword">var</span> digits = (match[<span class="number">1</span>] + match[<span class="number">2</span>]).split(<span class="string">''</span>).reverse();
<span class="keyword">var</span> sum = <span class="number">0</span>;
<span class="keyword">for</span> (<span class="keyword">var</span> i = <span class="number">0</span>; i < digits.length; i++) {
sum += (i + <span class="number">1</span>) * <span class="built_in">parseInt</span>(digits[i]);
}
<span class="keyword">return</span> sum % <span class="number">10</span>;
},
<span class="attr">isValid</span>: <span class="function"><span class="keyword">function</span>(<span class="params">cas</span>) </span>{
<span class="keyword">if</span> (!cas.match(<span class="keyword">this</span>.regex)) {
<span class="keyword">return</span> <span class="literal">false</span>;
}
<span class="keyword">return</span> <span class="keyword">this</span>.getCheckDigit(cas).toString() === cas.slice(<span class="number">-1</span>);
},
<span class="attr">validate</span>: <span class="function"><span class="keyword">function</span>(<span class="params"></span>) </span>{
<span class="keyword">var</span> cas = <span class="built_in">document</span>.querySelector(<span class="string">'#cas'</span>).value;
<span class="keyword">if</span> (<span class="keyword">this</span>.isValid(cas)) {
alert(cas + <span class="string">' is a valid CAS Number.'</span>);
} <span class="keyword">else</span> {
alert(cas + <span class="string">' is an invalid CAS Number.'</span>);
}
}
}
</span><span class="tag"></<span class="name">script</span>></span>
<span class="tag"></<span class="name">head</span>></span>
<span class="tag"><<span class="name">body</span>></span>
<span class="tag"><<span class="name">h1</span>></span>CAS Number Validator<span class="tag"></<span class="name">h1</span>></span>
<span class="tag"><<span class="name">input</span> <span class="attr">id</span>=<span class="string">"cas"</span> <span class="attr">type</span>=<span class="string">"text"</span>></span>
<span class="tag"><<span class="name">input</span> <span class="attr">type</span>=<span class="string">"submit"</span> <span class="attr">value</span>=<span class="string">"Check CAS Number"</span> <span class="attr">onclick</span>=<span class="string">"validator.validate()"</span>></span>
<span class="tag"></<span class="name">body</span>></span>
<span class="tag"></<span class="name">html</span>></span>
</code></pre>
<div class="run-demo"><a href="/examples/20131011/cas-number-validator.html" target="_blank">Run in New Tab</a></div>Create a SMILES Grammar and Parser with PEG.jstag:metamolecular.com,2013-10-07:/blog/2013/10/07/create-a-smiles-parser-and-grammar-with-pegjs2013-10-07T10:00:00+00:00
2013-10-07T10:00:00+00:00
<p>Most SMILES parsers in use today were hand crafted. In other words, a team of developers transcribed a written specification into detailed instructions written in a general purpose programming language. The task is tedious, error-prone, and time-consuming - exactly the kind of work that computers excel at.<p>Most SMILES parsers in use today were hand crafted. In other words, a team of developers transcribed a written specification into detailed instructions written in a general purpose programming language. The task is tedious, error-prone, and time-consuming - exactly the kind of work that computers excel at.
<!-- more --></p>
<p>Parser generators offer an automated alternative capable of transforming a high-level language specification into running code. This article, part of a continuing series, demonstrates the process of building a SMILES grammar and auto-generated parser with <a href="http://pegjs.majda.cz/">PEG.js</a>.</p>
<h4 id="baby-talk">Baby Talk</h4>
<p>SMILES is a non-trivial language capable of representing a large swath of known chemistry. Rather than diving straight into a full grammar, let's start with a subset consisting of a few basic features.</p>
<p>Consider a dialect of SMILES that encodes only unbranched, saturated carbon chains:</p>
<table>
<caption>Straight-chain saturated hydrocarbon SMILES subset</caption>
<thead>
<tr>
<th scope="col">String</th>
<th scope="col">Substance</th>
</tr>
</thead>
<tbody>
<tr>
<td>C</td>
<td>Methane</td>
</tr>
<tr>
<td>CC</td>
<td>Ethane</td>
</tr>
<tr>
<td>CCC</td>
<td>Propane</td>
</tr>
</tbody>
</table>
<p>Using the <a href="http://pegjs.majda.cz/online">online PEG.js tool</a>, we can define a grammar for this language.</p>
<div class="code-caption">Straight-chain saturated hydrocarbon SMILES subset grammar</div>
<pre><code class="lang-bash">SMILES = atom+
atom = <span class="string">'C'</span>
</code></pre>
<div class="run-demo"> </div>
<p>This grammar can be entered into the left-hand side of the online PEG tool, followed by sample input to the right. Parsing the string 'CCC' returns the expected result.</p>
<div class="code-caption">JSON result of parsing 'CCC'</div>
<pre><code class="lang-json">[
<span class="string">"C"</span>,
<span class="string">"C"</span>,
<span class="string">"C"</span>
]
</code></pre>
<div class="run-demo"> </div>
<h4 id="supporting-more-atom-types">Supporting More Atom Types</h4>
<p>SMILES supports a range of atom types in the so-called "organic subset". Let's add them as well.</p>
<div class="code-caption">Straight-Chain Organic Subset Atoms</div>
<pre><code class="lang-bash">SMILES = atom+
atom = <span class="string">'B'</span><span class="string">'r'</span>? / <span class="string">'C'</span><span class="string">'l'</span>? / <span class="string">'N'</span> / <span class="string">'O'</span> / <span class="string">'P'</span> / <span class="string">'S'</span> / <span class="string">'F'</span> / <span class="string">'I'</span>
</code></pre>
<div class="run-demo"> </div>
<p>In PEG.js, alternatives are separated by a forward slash (<code>/</code>) and are processed left-to-right. Distinguishing <code>Cl</code> from [<code>C</code> + lowercase l] requires the ordering to be as shown.</p>
<p>Running the resulting parser on the string 'BrCCCl' returns the expected result.</p>
<div class="code-caption">JSON result of parsing 'BrCCCl'</div>
<pre><code class="lang-json">[
[
<span class="string">"B"</span>,
<span class="string">"r"</span>
],
[
<span class="string">"C"</span>,
<span class="string">""</span>
],
[
<span class="string">"C"</span>,
<span class="string">""</span>
],
[
<span class="string">"C"</span>,
<span class="string">"l"</span>
]
]
</code></pre>
<div class="run-demo"> </div>
<p>Our grammar states that each atom consists of one or two characters. PEG.js fulfilled this request by returning an array of arrays, each containing one or two matched characters.</p>
<p>But for any real SMILES parser, element symbols should be represented as <code>String</code>s. How can we get PEG.js to do this?</p>
<h4 id="mixing-code-with-grammar">Mixing Code with Grammar</h4>
<p>PEG.js supports the transformation of matched elements using inline JavaScript functions. For example, to get our parser to return one <code>String</code> for each element symbol, we'd use named arguments together with a function that produces a string from an array.</p>
<div class="code-caption">PEG.js grammar containing inlined JavaScript function</div>
<pre><code class="lang-bash">SMILES = atom+
atom = symbol:(<span class="string">'B'</span><span class="string">'r'</span>? / <span class="string">'C'</span><span class="string">'l'</span>? / <span class="string">'N'</span> / <span class="string">'O'</span> / <span class="string">'P'</span> / <span class="string">'S'</span> / <span class="string">'F'</span> / <span class="string">'I'</span>) {
<span class="built_in">return</span> symbol.join(<span class="string">''</span>);
}
</code></pre>
<div class="run-demo"> </div>
<p>The new parser now returns an array of <code>String</code>s.</p>
<div class="code-caption">JSON result of parsing 'BrCCCl'</div>
<pre><code class="lang-json">[
<span class="string">"Br"</span>,
<span class="string">"C"</span>,
<span class="string">"C"</span>,
<span class="string">"Cl"</span>
]
</code></pre>
<div class="run-demo"> </div>
<p>Notice how this approach leads to both a <em>grammar</em> and a <em>parser</em>. Both remain synchronized throughout the development cycle. Not only can we parse the SMILES language, but we can easily communicate how both the language and parser work to non-experts. Fixing parsing bugs automatically results in fixing the grammar - and vice versa.</p>
<p><a href="/smidge/">Smidge</a> is a complete SMILES parser developed using the procedure described here.</p>
<h4 id="using-the-parser">Using the Parser</h4>
<p>The parser generated by PEG.js is a standalone JavaScript module that accepts arbitrary SMILES input. To obtain the parser, click the "Download Parser" button in the lower-right of the online tool.</p>
<p>A parser can also be produced from a command-line build tool. Given an environment with both Node.js and the <a href="https://npmjs.org/package/pegjs"><code>pegjs</code></a> package, a short program prints the parser source code.</p>
<div class="code-caption">PEG.js from the command line</div>
<pre><code class="lang-javascript"><span class="keyword">var</span> PEG = <span class="built_in">require</span>(<span class="string">'pegjs'</span>);
<span class="keyword">var</span> grammar = <span class="string">'SMILES = atom+\natom = \'C\''</span>;
<span class="keyword">var</span> parser = PEG.buildParser(grammar);
<span class="built_in">console</span>.log(parser.toSource());
</code></pre>
<div class="run-demo"> </div>
<h4 id="conclusions">Conclusions</h4>
<p>Parsers for SMILES and many other languages can be developed with <a href="http://pegjs.majda.cz">PEG.js</a> via a two-step iterative procedure:</p>
<ol>
<li>Define a grammar component capable of matching a SMILES language feature.</li>
<li>Write an inline JavaScript function to process the captured feature.</li>
</ol>
<p>An important advantage over the more traditional manual approach is that using the parser generator results in both a working parser and a grammar. The grammar can in turn be used as high-level documentation and as a starting point for <a href="/blog/2013/09/18/visualizing-the-smiles-language-with-railroad-diagrams/">automated tools</a>.</p>
Visualizing the SMILES Language with Railroad Diagramstag:metamolecular.com,2013-09-18:/blog/2013/09/18/visualizing-the-smiles-language-with-railroad-diagrams2013-09-18T12:00:00+00:00
2013-09-18T12:00:00+00:00
<p><a href="/cheminformatics/smiles/">SMILES</a> is a language for encoding chemical structures. Like any language, a "grammar" (or set of rules) determines how SMILES components can be arranged. Unfortunately, most descriptions of SMILES grammar begin and end with text narratives. Text has its place, but pictures can be far more effective.<p><a href="/cheminformatics/smiles/">SMILES</a> is a language for encoding chemical structures. Like any language, a "grammar" (or set of rules) determines how SMILES components can be arranged. Unfortunately, most descriptions of SMILES grammar begin and end with text narratives. Text has its place, but pictures can be far more effective.
<!-- more --></p>
<p>How can we graphically represent the rules for making a valid SMILES string?</p>
<h4 id="railroad-diagrams">Railroad Diagrams</h4>
<p><a href="http://en.wikipedia.org/wiki/Syntax_diagram">Railroad Diagrams</a> have been used to describe many computer languages. An excellent example can be found in the <a href="http://www.json.org/">JSON Specification</a>. The fundamental unit of JSON, <code>object</code>, is defined graphically as:</p>
<figure>
<a href="http://json.org"><img src="/images/posts/20130918/object.gif" alt="JSON Object"></a>
<figcaption>Railroad Diagram Example (<a href="http://www.json.org/">json.org</a>)</figcaption>
</figure>
<p>Reading Railroad Diagrams is simple. Start on the left. Follow the horizontal line rightward until reaching a square, oval, or branching path. Exit to the right. Applying these rules to the above diagram gives these valid JSON <code>objects</code>:</p>
<pre><code class="lang-json">{}
{<span class="attr">"color"</span>:<span class="string">"red"</span>}
{<span class="attr">"width"</span>:<span class="number">10</span>,<span class="attr">"height"</span>:<span class="number">5</span>}
</code></pre>
<div class="run-demo"> </div>
<h4 id="a-railroad-diagram-for-smiles">A Railroad Diagram for SMILES</h4>
<p>A <a href="/cheminformatics/smiles/railroad-diagram/">full Railroad Diagram for SMILES</a> is available online as a hyperlinked document. To my knowledge, it is the only published, complete example of such a diagram for the SMILES language. This diagram is a work in progress and is based on the <a href="http://www.opensmiles.org/opensmiles.html">OpenSMILES specification</a>.</p>
<p>The SMILES Railroad Diagram is made up of several interlocking modules. A few of them are described in detail here to illustrate interpretation.</p>
<figure>
<a href="/cheminformatics/smiles/railroad-diagram/"><img src="/images/posts/20130918/smiles.png" alt="SMILES"></a>
<figcaption>Top-Level SMILES Railroad Diagram</figcaption>
</figure>
<p>A <code>SMILES</code> consists of one mandatory <code>Atom</code> optionally followed by any number <code>Chain</code> or <code>Branch</code> elements in any order. The terms <code>Atom</code>, <code>Chain</code>, and <code>Branch</code> are themselves defined elsewhere within the <a href="/cheminformatics/smiles/railroad-diagram/">full diagram</a>.</p>
<figure>
<a href="/cheminformatics/smiles/railroad-diagram/"><img src="/images/posts/20130918/atom.png" alt="Atom"></a>
<figcaption>Atom Railroad Diagram</figcaption>
</figure>
<p>An <code>Atom</code> is comprised of an element selected from the list of: <code>OrganicSymbol</code>; <code>AromaticSymbol</code>; <code>AtomSpec</code>; or <code>WILDCARD</code>.</p>
<figure>
<a href="/cheminformatics/smiles/railroad-diagram/"><img src="/images/posts/20130918/organicsymbol.png" alt="OrganicSymbol"></a>
<figcaption>OrganicSymbol Railroad Diagram</figcaption>
</figure>
<p>An <code>OrganicSymbol</code> represents those chemical elements making up the "Organic Subset" that is widely-used in organic chemistry: B; C; N; O; P; S; F; Cl; Br; and I. The branching notation used here is useful both to compact the graphical presentation and to aid in developing <a href="/blog/2013/09/11/smidge-a-lightweight-smiles-validator-and-parser-written-in-javascript/">automated parsers</a>.</p>
<p>The <a href="/cheminformatics/smiles/railroad-diagram/">full diagram</a> defines every element of the SMILES language in terms of similar Railroad Diagrams.</p>
<h4 id="testing">Testing</h4>
<p>Specific SMILES examples can be parsed and validated using <a href="/smidge/">Smidge</a>. This tool is based on the same underlying grammar used to generate the SMILES Railroad Diagram described here. What holds for the diagrams should hold for the parser, and vice versa. As the SMILES grammar is refined, both <a href="/smidge/">Smidge</a> and the <a href="/cheminformatics/smiles/railroad-diagram/">SMILES Railroad Diagram</a> will immediately reflect those changes.</p>
<h4 id="conclusions">Conclusions</h4>
<p>Railroad Diagrams are extremely useful, both for beginners as a learning tool, and as a communication medium for experts. This article hasn't described how the SMILES Railroad Diagrams were generated, nor the development of the required grammar. Future articles will address these points.</p>
Smidge: A Lightweight SMILES Validator and Parser Written in JavaScripttag:metamolecular.com,2013-09-11:/blog/2013/09/11/smidge-a-lightweight-smiles-validator-and-parser-written-in-javascript2013-09-11T12:00:00+00:00
2013-09-11T12:00:00+00:00
<p>The previous article in this series described <a href="/blog/2013/09/10/parsing-smiles-from-scratch-in-javascript/">some advantages of starting with a formal grammar when writing a SMILES parser</a>. Before diving too far into technical details, have a look at <a href="/smidge/">Smidge</a>, a new browser-based tool that validates and parses SMILES strings into an array of JSON token objects.<p>The previous article in this series described <a href="/blog/2013/09/10/parsing-smiles-from-scratch-in-javascript/">some advantages of starting with a formal grammar when writing a SMILES parser</a>. Before diving too far into technical details, have a look at <a href="/smidge/">Smidge</a>, a new browser-based tool that validates and parses SMILES strings into an array of JSON token objects.
<!-- more --></p>
<figure>
<a href="/smidge/"><img src="/images/cheminformatics/smidge-screenshot.png"></a>
<figcaption><a href="/smidge/">Smidge</a>, a SMILES parser written in JavaScript</figcaption>
</figure>
<p>Given an invalid SMILES string, Smidge responds with a message indicating both the position of the error and the valid characters at that position.</p>
<p>If the entered string represents valid SMILES, Smidge responds with a graphical interpretation of all tokens, together with a JSON array containing these tokens.</p>
<p>The core Smidge parser was auto-generated from an enhanced grammar notation capable of combining syntax definitions with processing instructions. Future articles in this series will describe the creation of Smidge in detail.</p>