Building an eCommerce website for small molecule products isn't easy. Most of the issues can be traced to chemical information management, a job for which off-the-shelf eCommerce systems are ill-suited. This article describes the problem in detail, and a future article will describe a practical solution.
The most pressing need of a chemical eCommerce system is structure search. Structure search allows a customer to find products that contain a structural motif of interest. The results of a structure search for commercially-available materials can markedly influence the scope of a customer project. For example, many projects require ready access to families of starting materials sharing a common substructure. Identifying a source of related building blocks can motivate many repeat purchases.
Structure search presents a pivotal opportunity to match customer with product. When implemented properly, structure search can become a business-critical feature.
Before structure search can even be implemented, each product must be associated with a machine-readable structure. A machine-readable structure represents a molecule in a format suitable for manipulation by software. The two most useful formats are SMILES and Molfile. Both formats have been standardized over the course of many years, and are well supported by a wide range of software.
Given a collection of products, each of which is linked to a machine-readable structure, a chemical eCommerce system can implement structure search. There are two basic approaches that differ markedly in their performance characteristics. Both approaches attempt an atom-by-atom match between a query structure drawn by the customer and a target catalog structure. However, one approach precompiles an index to avoid performing the search on every product structure. Indexes are most suitable for large collections of more than a few thousand products.
More than Just Structure Search
It turns out that a chemical eCommerce system needs machine-readable structures for much more than just structure search. Examples include:
- 2D Structure. Customers expect to see a structure image when browsing product selections.
- Systematic Name. An IUPAC name on a product summary page further confirms identity and can promote search engine discoverability.
- Chemical Properties. Customers often use product catalogs as a source of information about products. Properties such as molecular weight and molecular formula are especially handy.
- Chemical Identifiers. Cross-checking a catalog entry with another database gives added confidence that the right product is being purchased. Both CAS Numbers and InChI are helpful in this regard.
The unmistakable need for machine-readable structures raises the question of how these structures will be generated. Companies whose products number in the dozens might get by with manual curation. In other words, each 2D structure, chemical property, name, and identifier is assigned by a person. The data may be tracked in a spreadsheet or other ad hoc database. Publishing the data might involve copying it from one location (e.g., a spreadsheet) to another (e.g. a data entry form on the company website).
Unfortunately, manual curation fails for catalogs of even a few hundred entries. Human error inevitably causes inconsistencies in reporting and bad data to seep into the catalog. These errors can be very difficult to spot, and vexing to customers and company management.
Accuracy aside, manual curation of a product catalog can be harmful to a company in other ways. The most obvious is scale: limiting product selection to that which can be managed by hand prevents a company from growing sales through product line expansion. Manual processes are also slow, which can delay the effective release date of new products. Finally, the need to convert catalog data by hand can harm marketing efforts, which require wide publication of product data in a variety of formats.
Given the right skills and tools, manual labor can be almost entirely eliminated from a chemical eCommerce system. Machine readable structures can be generated from either CAS numbers or IUPAC names alone. From these machine readable structures, an eCommerce system with structure search can be built. Product summary pages complete with 2D structure, IUPAC name, CAS number, and molecular weight/formula can be generated without human intervention. Moreover, catalogs in a wide variety of formats can be published to ensure the largest possible audience. The size of the catalog isn't constrained by data management issues. New products can be introduced to customers as soon as they become available. And customers can use a variety of search methods to find what they're looking for.
Unfortunately, high up-front and maintenance costs put automation out of reach for many companies.
Many eCommerce websites are built through minor customization of an existing content management system (CMS). A handful of off-the-shelf components are bundled together with the CMS, yielding the finished site. However, chemistry applications require components that don't usually exist. An earlier article explained this problem in detail. To summarize, a structure searchable website requires the coordination of four key pieces of software within a content management system.
Compounding this problem is the highly specialized nature of chemical data processing, or cheminformatics, in which simple, reasonable-sounding solutions often fail. For example, on more than one occasion an experienced developer who was a cheminformatics novice has suggested searching and indexing SMILES like ordinary text as a way to implement substructure search. Many cheminformatics novices don't understand the business case for implementing structure search in the first place and so guess. Even worse, many problems won't surface until an system has been in use for some time. For example, products available as different salt forms need to be handled in a consistent way - from the beginning. And so on.
Perhaps the most vexing problem is tracking down and evaluating software options. Good cheminformatics software also tends to carry a hefty price tag. Free alternatives do exist, but filtering the good from the bad takes experience, something that cheminformatics novices lack by definition. Resources that compile information about various cheminformatics packages are often scattered, out of date, or incomplete. Just fully understanding the software that must be assembled can consume valuable time and money.
Taken together, these difficulties can result in an expensive proposition for companies trying to roll their own chemical eCommerce solutions. First, a team of developers willing and able to extend its standard CMS with nonstandard cheminformatics components - usually on a tight budget - needs to be hired. Next, this team needs to sort through a broad but often shallow body of documentation in search of the appropriate cheminformatics components. Finally, the required components need to somehow be integrated within the CMS to yield a chemical eCommerce site. Even if everything comes together on time and within budget, ancillary data management tasks can inflate already stretched budgets. Most importantly, all of this work is typically performed by cheminformatics novices, resulting in inevitable cost overruns and delays.
In the face of what may seem like an insurmountable technical challenge, many companies turn to point solutions. Given the importance of structure search, for example, one approach would be to simply solve that one problem. As a simple and inexpensive approach, Metamolecular has introduced ChemServer. This self-contained module is capable of performing exact- and substructure matching within either PHP or Ruby environments. Catalogs of up to a few thousand structures in size can be searched without an index step or database modifications.
This solution works well if structure search is the only cheminformatics problem to be solved, but most companies will face many, if not all of the requirements described above. For example, images and chemical properties still need to be assigned to products. The site's database still needs to be populated with machine-readable chemical structures for each product. Moreover, ancillary data management (e.g., creation of PDF or SD file catalogs) will remain an expensive and error-prone process that can stunt sales.
Point solutions to the chemical eCommerce problem may appear attractive due to their low initial cost. But integration and the burden of manual data management can spawn new problems. What's needed is a fundamentally new approach to the chemical eCommerce problem. The next aricle in this series will introduce one.