This is Signals, a blog about chemistry and software by Metamolecular, LLC.

How to Balance Any Chemical Equation

February 20th, 2013

The art of balancing chemical equations is taught very early chemistry degree programs, and understandably so. Correctly balancing a chemical equation is the first step in a great number of chemistry problems including reaction setup, percentage yield determination, and equilibrium constant calculations, among others.

Although the most popular method, “balancing by inspection”, works in simple cases, a large number of exceptions and traps makes this method frustrating to learn and difficult to apply to even moderately complex equations. For practical purposes, many equations simply can’t be balanced by inspection.

What if there were a systematic method for balancing any chemical equation, regardless of complexity?

A paper published by Lawrence Thorne in 2010 describes such a method. This matrix-based approach balances a large number of equations that can’t be balanced by inspection, or even other matrix approaches. An introduction to using this method is given in the video presentation (slidedeck here).

Although operationally simple, Thorne’s method does require a lot of arithmetic, which can become tedious. Matrix algebra can be done on many scientific calculators or spreadsheets, but setup requires technical skill and data entry is lengthy at best.

For those so inclined, ReactionMate is an iOS app that offers a convenient user interface for Thorne’s method.

ChemWriter Keyboard Shortcuts for Faster Structure Drawing

February 6th, 2013

ChemWriter is a chemical structure editor that can be embedded into web pages. One of our main goals was to make chemical structure entry as fast as it can be. Toward this end, a number of keyboard shortcuts were built in.

Keyboard shortcuts are accessed by hovering the mouse cursor over an existing atom and pressing a key, either with or without the shift key. All ChemWriter keyboard shortcuts are listed below:

  • a: Benzene (aromatic) ring.
  • b: Boron atom.
  • c: Carbon atom.
  • f: Fluorine atom.
  • h: Hydrogen atom.
  • i: Iodine atom.
  • l: Chlorine atom.
  • n: Nitrogen atom.
  • o: Oxygen atom.
  • p: Phosphorous atom.
  • r: Bromine atom.
  • s: Sulfur atom.
  • t: Tin atom.
  • z: Silicon atom.
  • delete/backspace: Delete currrent atom or selection.

Product Preview: ChemWriter App for iPad

January 22nd, 2013

Within three short years, tablet computers have gone from clumsy gadgets few cared about or bought to workhorses rapidly replacing laptops and desktops. Key to this transformation has been a booming app economy. Although many software categories such as entertainment, business, and time management have been inundated with new apps, niche ares such as chemistry research and advanced education have experienced a more muted uptake.

Chemical structure editors are essential tools in modern organic chemistry and related fields. This article offers a preview of a new chemical structure editor app for iPad® devices now under development at Metamolecular.

About ChemWriter

ChemWriter® is a chemical structure editor originally built for use on Web pages. As such, it’s a tool for software developers. ChemWriter isn’t currently something that most chemists would have a need to buy for themselves. However, given numerous questions I’ve fielded on this very topic, the demand for alternatives in this space is quite clear.

Over the last five years, ChemWriter has evolved significantly, mostly driven by excellent customer feedback. A lot has been learned about what works and what doesn’t for browser-based chemical structure editors.

Now it’s time to apply that experience to building a great chemical structure editor app for iPad devices.

Fast and Fluid

Users of tablet computers expect a different experience than users of desktops, and it’s important to deliver on those expectations. Although tablet computing resources like memory and processor power are quite constrained compared to desktops, it’s critical that these limitations never show themselves. From the first moment an app is launched, every user interaction with it needs to be fast, fluid, and intuitive.

For this reason, the ChemWriter app is being written from scratch around the touch screen. Details of the development process may be the subject of future posts. Suffice it to say that the ChemWriter app is being written with a keen respect for the iPad’s strengths and limitations.

Features and Pricing

A two-tiered pricing model is currently envisioned in which basic drawing functionality would be available in a free app. An in-app purchase would then enable premium features.

Free Features
  • Draw molecules with an emphasis on fast, efficient structure creation.
  • Draw non-molecule shapes including straight/curved reaction arrows, boxes, geometrical shapes, and curves.
  • Search select databases by exact- and substructure.
  • Obtain Structure from name.
Premium Features
  • Save as Scalable Vector Graphics (SVG) files.
  • Save as high-resolution PNG images.
  • Save as ChemDraw (*.cdx) files.
  • Load ChemDraw files.
  • Calculations including molecular weight, InChI, and SMILES.
  • iCloud and DropBox synchronization.

Initial releases would focus on enabling the fast and efficient drawing of chemical structures through a touch interface. A major problem to be addressed is how to enable precision structure drawing using a non-precision implement (the finger).

Also available in the first release would be the ability to read and write files in a variety of formats. Chemistry formats including molfiles and ChemDraw® files would both be supported, as would image formats including SVG and PNG. The app would support seamlessly moving these file representations into and out of popular cloud-based storage utilities including iCloud® and DropBox®.

Subsequent released would focus on calculations and derivation of other kinds of information from chemical structures.

Conclusions

Tools for drawing and using chemical structures are essential in organic chemistry and related fields. A new app based on ChemWriter is in development that aims to bring high-quality structure drawing and analysis to iPad devices.

Optimizing Organic Reactions with Design of Experiments and Principal Component Analysis

January 16th, 2013

Few topics in organic chemistry are more important than reaction optimization. The availability of an efficient reaction can add millions of dollars to the bottom line of a company. Likewise, access to a practical reaction often opens up entirely new areas of scientific study, as was the case with Suzuki coupling.

A recent paper by CatScI on reaction optimizationn combines two powerful, although not widely-known techniques to reaction optimization: Design of Experiments (DoE) and Principal Component Analysis (PCA).

The Reaction Optimization Problem

A reaction can be thought of as a system accepting a number of inputs (parameters) and providing one or more outputs. Example inputs might include: temperature; solvent; pH; catalyst; and time. Example outputs might include: yield; selectivity; purity; and cost. The goal of reaction optimization is to select the best inputs to achieve a given output.

It’s all too easy to forget that even the simplest reactions can accept multiple inputs. Yet limitations of time and money make the exhaustive exploration of every input impractical.

Given very real practical constraints, what’s the best way to optimize multiple reaction inputs to give the desired outputs?

Why Single Variable Optimization Can Fail

A popular way to deal with the multi-variable nature of reaction optimization is the One Variable at a Time (OVAT) approach. Here, all experimental inputs, except one, are kept constant, let’s say time. An output, let’s say yield, is then recorded at multiple time values. In this way, the “optimal” reaction time is revealed. This reaction time is then kept constant and another variable is chosen, for example temperature. The process continues until all inputs have been probed and a set of optimal inputs have been determined.

Many reactions have been optimized using the OVAT approach, so what’s the problem?

Understanding the problem with OVAT can be made easier by visualizing reaction space. For the highly simplified example above, reaction space can be represented as a plot in which time and temperature are axes. The color of a point represents yield at a given time and temperature, with red representing higher yield and black representing lower yield.

The study might begin at Point “S” with an initial discovery or literature procedure. Using a systematic approach, two temperatures on either side of “S” are probed. Likewise, yields at two times on either side of “S” are determined.

All modifications to the original procedure resulted in equal or lower yields. The conclusion would likely be that the original conditions were optimal - case closed.

And this is where the problem lies. Given infinite resources, we might perform a more comprehensive study and discover the following “response surface”:

The OVAT study identified a local maximum at Point “S”. It failed to identify a much better combination of time and temperature at point “M”.

Of course, resources are always limited. So other than blind luck, how can the risk of getting stuck on a local maximum be reduced?

Billions and Billions

The CatScI paper describes an interesting though experiment. Given a generic Suzuki reaction, how many runs would needed to fully map reaction space, and therefore identify an absolute yield maximum?

Before calculating the number of runs, it’s important to distinguish between two fundamentally different types of reaction parameters (the term used in the paper for inputs). These are “discrete” parameters and “continuous” parameters.

A continuous parameter can take an infinitely divisible range of values. Examples of discrete parameters in the Suzuki reaction might include: temperature; time; and palladium precatalyst loading.

In contrast, a discrete parameters can only be selected from a list of values. Discrete parameters in the Suzuki reaction might include: ligand identity; base identity; palladium source; and order of addition.

To calculate the number of runs needed to map the Suzuki reaction space, let’s only choose two values for continuous parameters - one high value and one low value. Note that even this simplification leaves vast regions unmapped.

For discrete parameters, each possible value will need to be present in the set of runs.

The combinatorial increase in the number of runs can be calculated easily:

Although after 51.2 million runs we’d have a good idea of the gross topology of the reaction space, this knowledge would come with an impractical price tag. Furthermore, we’d be missing a lot of information. For example, just increasing the number of runs for continuous parameters from two to four gives 6.6 billion runs. And this number doesn’t take into account duplicate runs needed to reduce error.

Referring back to the combinatorial diagram above, notice how the number of ligands and solvents, and the fact that ligand and solvent identities are both discrete parameters, greatly contributed to the increase of required runs.

Reducing Discrete Parameters (and Reaction Runs) Through Principal Component Analysis

To arrive at a practical number of runs needed to map reaction space, the CatScI authors turned to Principal Component Analysis (PCA). At its simplest level, PCA is a mathematical technique that can be used to convert a discrete parameter into one or more continuous parameters.

In an early application of PCA to chemistry, Rolf Carlson’s group attempted to solve the problem of choosing a reaction solvent. Noting the inherent problems in working with loosely-defined concepts such as “polarity”, Carlson’s group wanted a better way to classify solvents.

Starting from experimentally-derived physical properties, an approach was developed that was capable of placing each of 63 common solvents into a two-dimensional “solvent space”. In principle, even more axes (dimensions) could be added to increase precision.

This work made it possible to refer to organic solvents, not by name or ill-defined concepts like polarity, but by their coordinates on a grid constructed mathematically from measured physical properties.

More recent application of this technique using computationally-derived properties led to the development of a “ligand space” for 348 known monodentate phosphine ligands by Fey and coworkers.

Although the underlying math is quite complicated, the pattern is simple enough. Given any set of discrete parameters, define a group of measurements for each item. The number of measurements can be as large as necessary, and can even be computationally-derived. Then, using PCA, derive a continuous, multidimensional space in which each item can be assigned a coordinate.

PCA itself is not without limitations. For example, the effectiveness of the approach greatly depends on the set of descriptors assigned to each item in a continuous parameter set. Moreover, the most appropriate descriptor set can vary from application to application.

Bearing these and other PCA limitations in mind, the big win was being able to convert the discrete parameters of solvent and ligand into continuous parameters. This conversion made feasible a DoE study that would have otherwise been impractical.

DoE and PCA In Practice

To test the approach, the CatScI group optimized a Buchwald-Hartwig sulfamidation reaction with the goal of identifying a cost-effective ligand without intellectual property (IP) restrictions:

A variant of DoE was used in which an initial broad and shallow iteration revealed reaction space “hot spots” that were then probed with progressively narrow focus in subsequent iterations:

  1. With the goal of finding an alternative ligand, 35 runs were made in which 9 ligands were tested with nine solvents and two palladium precatalysts.
  2. Twelve runs in the hot spot from Iteration 1 then identified an additional four ligands giving high or complete conversion.
  3. Twelve more runs identified four more ligands.
  4. Analysis of the ligands in Iteration 3 identified one with no intellectual property restrictions and and acceptable cost.
  5. Nineteen runs were made with the aim of optimizing various stoichiometries using the ligand found in Iteration 4.

Helpful Resources

Although an excellent illustration of the utility of combining PCA and DoE, the CatScI paper does little to answer specific implementation questions. What software, if any, was used for the DoE? How exactly were ligands chosen using the Fey phosphine ligand space? What special statistical analysis, if any, was used to progress from iteration to iteration?

These and other questions are not answered. This is not to fault the authors, because the paper is clearly marked as a “Concept Article”. Still, even at this level of detail the study offers a compelling argument with plenty of jumping off points.

But the question remains: how can a chemist interested in trying either DoE, PCA, or both together in the lab get started?

A number of resources are available, some of which are summarized below:

Introductions to DoE with a Chemistry Emphasis
Introduction to Principal Component Analysis
Interactive Tutorial
  • Stat-Ease [Step-by-step interactive tutorial using DoE software]
Blog Posts on DoE Applied to Chemistry
Some Other Applications of DoE to Organic Syntheis
Software with Free Trials

Conclusions

Design of Experiments (DoE) offers compelling advantages over single variable optimization for organic synthesis. Principal Component Analysis (PCA) can greatly reduce the number of runs required to map reaction space during a DoE experiment. Using both techniques together offers important advantages over the single-variable optimization approach.

Interactive, Browser-Based 3D Molecule Visualizations with GLmol and WebGL

January 10th, 2013

Although many tools for 3D visualization of small molecules and biopolymers have been released as desktop applications, relatively few programs are available for use in Web applications. GLmol is one such tool that takes advantage of fast in-browser 3D graphics capabilities now available through WebGL. This article introduces GLmol by discussing its main features, and provides fully-functional examples of deployment and scripting.

Application Demo

The GLmol download package contains a sample page illustrating the most important features:

  • Load PDF files from a local directory or by URL
  • View proteins using a variety of standard representations, including: thick ribbons; thin ribbons; strands; and cylinders/plates.
  • Zoom, pan, translate, and slab
  • Show additional information, including crystal packing, unit cell, and sidechains.
  • A variety of coloring options.
  • Screenshot capture

Deployment Demo: Embedding a Protein Structure

GLmol can read PDB files loaded via asynchronous HTTP calls. This use is illustrated in the protein embedding demo:

Note that same origin policies may prevent direct loading of files from sources other than the original host in some situations.

Scripting Demo: Spinning Molecule

GLmol is written entirely in JavaScript. As such, the software lends itself to a variety of interesting scripting techniques by default. The spinning molecule demo shows how to combine GLmol with the requestAnimationFrame API to animate a scene:

WebGL Support

WebGL is now supported on all modern browsers except Internet Explorer. Microsoft has so far not publicly indicated whether WebGL would be supported in IE 11.

Other Software

Other pure JavaScript 3D molecule display components have been described. One of the authors of Jmol has developed a Java- and WebGL-free version of the software called JSmol. Jolicule is a JavaScript application for 3D molecular visualization that also requires no WebGL. iChemLabs offers a set of 3D Structure Canvases based on WebGL.

Those wanting a more detailed look at how to use WebGL in the context of a molecule display element may find a recent tutorial helpful.

Conclusions

GLmol offers many possibilities for both interactive and scripted 3D molecule visualizations. Avoiding the Java Plugin and its accompanying complications, GLmol offers an attractive alternative to the popular Jmol applet. GLmol’s liberal open source license (MIT or GPL3) makes it an appealing starting point for further development.

Opening SD Files from Dropbox with StructureMate

January 2nd, 2013

StructureMate is a new iPhone/iPad app for browsing SD files. One of the first questions new StructureMate users will have is how to open their own SD files. Although Email is one way to move files between devices, DropBox offers some important advantages.

Useful though it may be, the Dropbox app itself can only read a limited selection of file types. How can you open SD files in StructureMate using the Dropbox service?

Step 1. Find the SD File in the Dropbox App

Dropbox lets you add files into a familiar folder hierarchy. From your computer, add an SD file to a convenient location in your Dropbox folder. Then browse to that folder in the Dropbox app:

Step 2. Select the SD File in the Dropbox App

Tap on the entry for the SD file you’d like to open. Dropbox will attempt to read the file, and will then tell you it’s unable to do so with a message like “Unable to view file”.

Although Dropbox itself can’t read SD files, it can delegate the opening of these files to an app that can. Tap on the button in the lower-right of your screen:

Step 3. Open the SD file in StructureMate

You’ll be given the option to open the SD file with other apps that can read SD files. If you’ve installed StructureMate, it will be one of those options. Simply tap the StructureMate icon and the SD file will load.

Research ideas often come when you’re not in the office or lab. Using StructureMate and Dropbox together, it’s possible to always have access to your chemistry data, regardless of which device you happen to have access to.

Archive