Ebbels T, Keun H, Beckonert O, Antti H, Bollard M, Holmes E, Lindon J, Nicholson J
Toxicity classification from metabonomic data using a density superposition approach: 'CLOUDS'
Analytica Chimica Acta: 2003 490:109-122
Predicting and avoiding the potential toxicity of candidate drugs is of fundamental importance to the pharmaceutical industry. The consortium for metabonomic toxicology (COMET) project aims to construct databases and metabolic models of drug toxicity using ca. 100,000 600 MHz H-1 NMR spectra of biofluids from laboratory rats and mice treated with model toxic compounds. Chemometric methods are being used to characterise the time-related and dose-specific effects of toxins on the endogenous metabolite profiles. Here we present a probabilistic approach to the classification of a large data set of COMET samples using Classification Of Unknowns by Density Superposition (CLOUDS), a novel non-neural implementation of a classification technique developed from probabilistic neural networks. NMR spectra of urine from rats from 19 different treatment groups, collected over 8 days, were processed to produce a data matrix with 2844 samples and 205 spectral variables. The spectra were normalised to account for gross concentration differences in the urine and regions corresponding to non-endogenous metabolites (0.4% of the data) were treated as missing values. Modeling the data according to organ of effect (control, liver, kidney or other organ), with a 50/50 train/test set split, over 90% of the test samples were classified as belonging to the correct group. In particular, samples from liver and kidney treatments were classified with 77 and 90% success, respectively, with only a 2% misclassification rate between these classes. Further analysis of the data, counting each of the 19 treatment groups as separate classes, resulted in a mean success rate across groups of 74%. Finally, as a severe test, the data were split into 88 classes, each representing a particular toxin at a particular time point. Fifty-four percent of the spectra from non-control samples were classified correctly, particularly successful when compared to the null success rate of similar to1% expected from random class assignment. The CLOUDS technique has advantages when modelling complex multi-dimensional distributions, giving a probabilistic rather than absolute class description of the data and is particularly amenable to inclusion of prior knowledge such as uncertainties in the data descriptors. This work shows that it is possible to construct viable and informative models of metabonomic data using the CLOUDS methodology, delineating the whole time course of toxicity. These models will be useful in building hybrid expert systems for predicting toxicology, which are the ultimate goal of the COMET project. (C) 2003 Elsevier Science B.V. All rights reserved.
e-link to journal