Skip to content Documentation

CTS-Lite

UC Davis Fiehn Lab

Feedback   

Documentation

CTS-Lite is a lightweight Chemical Translation Service, and the successor to the Fiehn Lab's original CTS. It allows users to easily match InChIs, InChIKeys, SMILES, Molecular Formulas, and PubChem CIDs against a curated subset of the PubChem database containing 10.6 million compounds.

Using CTS-Lite

To use CTS-Lite, simply enter your queries into the input box on the main page. You can separate entries using spaces, tabs, or newlines. When you click Match, the results will be displayed in the results section. Results can be downloaded in JSON and CSV formats using the buttons provided.

Note: Queries are limited to 100,000 entries per request. With ClassyFire enabled, the limit is reduced to 100 entries (see Chemical Classification). Very large queries may take some time to process. Please be patient while the server handles your request.

REST API

Request Formats

To make queries using the REST API, use the following formats:

JSON (Standard):

curl -X POST \ -H "Content-Type: application/json" \ -d '{"queries":"query1 query2 ..."}' \ "cts-lite.metabolomics.us/match"

CSV:

curl -X POST \ -H "Content-Type: application/json" \ -H "Accept: text/csv" \ -d '{"queries":"query1 query2 ..."}' \ "cts-lite.metabolomics.us/match"

Request Parameters

Disable top hit only:

"cts-lite.metabolomics.us/match?top_hit_only=false"

Disable first block matches:

"cts-lite.metabolomics.us/match?first_block_matches=false"

Disable RDKit conversion:

"cts-lite.metabolomics.us/match?rdkit_conversion=false"

Enable ClassyFire chemical classification:

"cts-lite.metabolomics.us/match?classyfire=true"

Response Formats

Example query: XMBWDFGMSWQBCA-UHDFADDYSA-N will_fail

JSON

[
  {
    "query": "XMBWDFGMSWQBCA-UHDFADDYSA-N",
    "query_type": "inchikey",
    "found_match": true,
    "match_level": "First Block",
    "matches": [
      {
        "identifier": "24841",
        "inchikey": "XMBWDFGMSWQBCA-UHFFFAOYSA-N",
        "inchi": "InChI=1S/HI/h1H",
        "smiles": "I",
        "compound_name": "Hydrogen iodide",
        "molecular_formula": "HI",
        "exact_mass": 127.9123,
        "literature_count": 4430,
        "patent_count": 329042
      }
    ],
    "error_message": ""
  },
  {
    "query": "will_fail",
    "query_type": "unidentified",
    "found_match": false,
    "match_level": "",
    "matches": null,
    "error_message": "Invalid query type, could not identify, see documentation"
  }
]

CSV

query,query_type,converted_query,found_match,match_level,error_message,pubchem_cid,inchikey,inchi,smiles,compound_name,molecular_formula,exact_mass,literature_count,patent_count
XMBWDFGMSWQBCA-UHDFADDYSA-N,inchikey,,true,First Block,,24841,XMBWDFGMSWQBCA-UHFFFAOYSA-N,InChI=1S/HI/h1H,I,Hydrogen iodide,HI,127.9123,4430,329042
will_fail,unidentified,,false,,"Invalid query type, could not identify, see documentation",,,,,,,,,
                    

Query Types

Query types are parsed using the following logic:

  • InChIKeys must be in the format XXXXXXXXXXXXXX-XXXXXXXXXX-X (14-10-1, all uppercase letters)
  • InChIs must start with InChI= (case-sensitive)
  • SMILES are first identified by the presence of structural characters: = # - / \ : . @ + [ ] ( )
  • Converted SMILES are SMILES queries that failed to match, but were then matched using their converted InChIKey. The SMILES are converted to InChIKeys using RDKit.
  • PubChem CIDs are identified as queries which only contain numbers
  • Molecular Formulas are recognized by starting with letters that cannot be at the start of SMILES: ADEGHKLMRTUVWXYZ
  • SMILES/Mol. Formula some queries, like C, are ambiguous and can be either SMILES or Molecular Formulas. In these cases, the query first tries to match against SMILES, and then Molecular Formula.

Malformed Queries

Malformed queries are identified as follows:

  • InChIKeys that match the regex pattern: ^[a-zA-Z]{12,16}-[a-zA-Z]{9,11}-[a-zA-Z]{0,2}$
  • InChIs that start with InChI=, but with improper capitalization
  • Unidentified are queries that didn't fit any of the above criteria

Top Hit Only

By default, CTS-Lite will return only the top hit per query. This setting can be toggled in the settings panel, or by adding the top_hit_only=false parameter to the API request.

For each query, the top hit is determined by ranking the hits on a weighted relevance score:
(0.7 * literature_count) + (0.3 * patent_count).

Match Levels

Given the setting for first block matches is enabled (default), "InChIKey" and "Converted SMILES" queries can match by first block if they don't find an exact match. This gives the First Block match level. The first fourteen characters of the InChIKey are the key's first block.

For example, the query XLYOFNOQVPJJNP-XXXXXXXXXX-X would be a first block match with Water, whose key is XLYOFNOQVPJJNP-UHFFFAOYSA-N.

To disable first block matching, use the settings cog in the web UI or add first_block_matches=false to the API request.

All other query types can only be Exact matches.

RDKit Conversion

By default, CTS-Lite will attempt to convert failed SMILES queries into InChIKeys using RDKit. It will then retry the lookup against the database using the converted InChIKey. A successful conversion match is returned with the query type Converted SMILES.

Because SMILES are non-canonical, the same compound can have many SMILES representations, but PubChem only stores one of them. Converting to InChIKey ensures the lookup is format-independent.

Disable RDKit conversion by toggling the setting from the cog-icon next to the "Match" button, or by adding the rdkit_conversion=false parameter to the API request.

Chemical Classification (ClassyFire)

BETA: Please be aware that the ClassyFire implementation in CTS-Lite is in a beta state. Please report any issues to the issue tracker.

CTS-Lite can optionally provide chemical taxonomy classifications from ClassyFire, developed by the Wishart Research Group.

ClassyFire is disabled by default due to the additional network latency it causes. Enable it from the settings panel, or via the API with the classyfire=true parameter. Requests with ClassyFire enabled are limited to 100 entries.

Each matched compound receives the following ClassyFire fields (some may be empty and therefore omitted):

  • Kingdom (organic vs inorganic, e.g., Organic compounds)
  • Superclass (general structural identifiers, e.g., Organoheterocycli compounds)
  • Class (specific structural features, e.g., Imidazopyrimidines)
  • Subclass (even more specific features, e.g., Purines and purine derivatives)
  • Direct Parent (most specific class (e.g., Xanthines)
  • Description (text description of the compound's chemical class)

If a compound is not found by ClassyFire, or if the ClassyFire service is unavailable, the reason is reported in an error field.

A single query can match many compounds (for example, a molecular formula with top_hit_only=false may return hundreds of hits). To keep latency reasonable, only the top 3 matches of each identifier are classified.

Note: ClassyFire adds significant network latency proportional to the number of unique matched compounds in your query.

API Details When Using ClassyFire

Enable ClassyFire by adding the classyfire=true parameter to an API request.

You can stream ClassyFire results by adding the stream=true parameter. Instead of a single JSON array upon completion, the server returns live NDJSON results. This may help with timeouts of large requests.

JSON responses include a classyfire object on each matched compound.

CSV outputs append seven additional columns:

...,classyfire_kingdom,classyfire_superclass,classyfire_class,classyfire_subclass,classyfire_direct_parent,classyfire_description,classyfire_error