Utility of multispectral imaging for nuclear classification of routine clinical histopathology imagery
© Boucheron et al; licensee BioMed Central Ltd. 2007
Published: 10 July 2007
Skip to main content
Volume 8 Supplement 1
© Boucheron et al; licensee BioMed Central Ltd. 2007
Published: 10 July 2007
We present an analysis of the utility of multispectral versus standard RGB imagery for routine H&E stained histopathology images, in particular for pixel-level classification of nuclei. Our multispectral imagery has 29 spectral bands, spaced 10 nm within the visual range of 420–700 nm. It has been hypothesized that the additional spectral bands contain further information useful for classification as compared to the 3 standard bands of RGB imagery. We present analyses of our data designed to test this hypothesis.
For classification using all available image bands, we find the best performance (equal tradeoff between detection rate and false alarm rate) is obtained from either the multispectral or our "ccd" RGB imagery, with an overall increase in performance of 0.79% compared to the next best performing image type. For classification using single image bands, the single best multispectral band (in the red portion of the spectrum) gave a performance increase of 0.57%, compared to performance of the single best RGB band (red). Additionally, red bands had the highest coefficients/preference in our classifiers. Principal components analysis of the multispectral imagery indicates only two significant image bands, which is not surprising given the presence of two stains.
Our results indicate that multispectral imagery for routine H&E stained histopathology provides minimal additional spectral information for a pixel-level nuclear classification task than would standard RGB imagery.
The use of multispectral imaging capabilities is relatively new to the field of cyto- and histo-pathology, particularly for transmitted brightfield microscopy [1, 2]. Recent publications (e.g., [3–6]) have begun to explore the use of extra information contained in such spectral data (29–33 wavelengths in the visible spectrum, from 400 nm to 720 nm, spaced 10 nm apart), in particular for multiply stained (>2 stains) specimens. Specifically, there have been comparisons of spectral unmixing algorithms (to separate constituent dyes) which demonstrate the advantage of multispectral data [5, 7]. The added benefit of multispectral imaging for analysis of routine H&E cyto/histopathology imagery, however, is still largely unknown, although some promising results are presented in .
While the use of multispectral light microscopy is new to cyto/histopathology, many researchers have used single or dual narrow-band filters to enhance imagery for particular stains, most using a red filter (or the red channel of an RGB image) for enhancement of Hematoxylin or Feulgen staining [8–12], and some using a green filter for enhancement of Feulgen staining [13–16].
We present analyses of our multispectral data designed to test the hypothesis that the additional spectral bands contain more information useful for classification as compared to the 3 standard bands of RGB microscopy imagery. The work presented here is an extension of the work presented in .
Wilcoxon p-values for performances of multispectral versus RGB imagery.
We have shown in this section, using a pairwise Wilcoxon signed rank test, that only a few performance differences between multispectral and RGB imagery are statistically significant. Furthermore, we note that these statistically significant differences are 0.46%, 0.76%, and 0.38% increase in favor of multispectral imagery over rgbequal, truecolor, and ccd, respectively, for MED; 0.32% in favor of multispectral over rgbequal for SAM; 0.58% in favor of multispectral over rgbequal and 0.35% in favor of ccd over multispectral for AFE; 1.06% in favor of multispectral over rgbequal for LSVM; and 1.7%, 1.1%, and 1.1% in favor of multispectral over rgbequal, truecolor, and ccd, respectively, for NLSVM.
Similarly, we note that for RGB images, the red channels yield the best performance (Figure 2B); we choose the AFE classifier for presentation here since it consistently yields the highest performance scores, though the other three classifiers display the same trends. While it may seem contradictory that in RGB imagery the green channel outperforms the blue channel when the opposite is true in multispectral imagery, it is important to remember how the multispectral bands are allocated to each of the RGB bands. Consider, for example, the allocation of bands in rgbequal imagery: the bands from 510 nm to 600 nm are averaged to yield the green channel. Referring to Figure 2A we see that these bands have a large variation in performance. Thus, to obtain the green channel, we are averaging multispectral bands, several of which have relatively good performance. A similar situation occurs with the truecolor and ccd imagery, albeit with a weighting applied to each band.
We find the analysis of performance on single image bands satisfactory from an intuitive standpoint. Since the nuclei are stained with the blue-colored Hematoxylin which will block red light, the red portions of the spectrum have the best contrast and perform the best for this nuclear classification task. While green light is also blocked by the Hematoxylin, so also is it blocked by the Eosin, rendering the green portion of the spectrum less informative for the task at hand.
Wilcoxon p-values for performances of the best multispectral band versus the red RGB channel.
multi 590 nm
multi 600 nm
multi 620 nm
multi 660 nm
We have shown in this section that performance differences between single multispectral image bands and single RGB image bands are not statistically significant. This would seem to indicate that the individual multispectral image bands are not yielding any more specific spectral information than are the individual RGB image bands for this nuclear classification task.
We note also that with RGB imagery, the FLDA classifier weights the red channel the most, followed by the blue, and finally green channels. Similarly, the AFE classifier chooses the red channel most often, followed in turn by blue and green. Comparing the multispectral plots for the AFE and FLDA classifiers, there are striking similarities in the relative use/weighting of bands, particularly in the red portion of the spectrum (i.e., 580–650 nm). The more prevalent use of green and blue bands in the AFE classifier, compared to FLDA, may be due to the classifier's ability to extract local features, making those bands more useful beyond the raw spectral attributes used by the FLDA classifier. Overall, considering the disparate nature of these two classifiers, we find it very interesting that they both display similar preferences for particular image bands.
We use the analysis in this section as a complement to the analysis of performance on single image bands. Specifically, we have shown that image bands that yielded the better performances are also the image bands chosen preferentially in both the FLDA and AFE classifiers. While it may be more qualitatively satisfying if the plots of Figures 3 and 4 would bear more resemblance to those of Figure 2, it is important to remember that these two analyses are very distinct from one another. In the case of Figure 2, we are limiting the classifiers to a single image band, and optimizing the performance, whereas for Figures 3 and 4 we are providing the classifiers with a choice of all available image bands and optimizing performance. As a more intuitive example, for the FLDA classifier, even if a specific image band X performs well when used alone, this same image band X may not yield as much information as, say, the linear combination of bands Y and Z. We have shown, therefore, in this analysis, a classifier preference for image bands which yield better performance when used singly in classification.
We use Principal Components Analysis (PCA) as a dimensionality reduction method to see how many "important" bands actually exist within our multispectral image stacks. We choose PCA rather than another dimensionality reduction technique, such as Independent Components Analysis (ICA), since PCA has a well established ranking for the resulting vectors. While there has been at least one ranking method suggested for ICA, the ratio of between-class to within-class variance , there is not a universally accepted ranking for ICA vectors. While ICA may yield a better separation of the independent causes in our data (i.e., the two stains), we are interested in the use of a dimensionality reduction technique mainly to help interpret the (lack of) differences in performance we have presented for our multispectral and RGB imagery.
We have thus found that PCA indicates the presence of 2 dominant eigenvalues, when we consider the principle components responsible for 97% of the variation in the data. This indicates the presence of only 2 information-bearing bands in the imagery for this nuclear classification task, providing insight into the approximately equivalent performance of the RGB imagery and multispectral. We have also shown that these 2 informative bands demonstrate a direct relationship to the two image stains. Interestingly, the first component is responsible for 93% of the total variation; this band is generally correlated with Hematoxylin, but is sometimes correlated instead with Eosin. The possibility that other image bands may contain important diagnostic information for further analysis is still an open question .
We have shown a demonstration of performance for different image types and different classifiers in a nuclear classification task. Results seem to indicate only slight performance differences (less than 1%) using multispectral imagery as opposed to our derived RGB imagery; while these performance increases are small, we report them here since they are a direct result from our experiments, and may be statistically significant. These conclusions hold for both classification using all available image bands as well as using single image bands, indicating that the multispectral bands do not contain much more discriminatory spectral information than do the RGB bands for this nuclear classification task. There are, undoubtedly, a number of metrics that could be used in a study such as this, and we may have been able to find a metric for which multispectral would fare better (or worse) than presented here. However, we wanted to use a metric that provides an equal trade-off between two commonly used metrics (detection rate and false alarm rate). We have also shown that the single image bands with the best performance are the image bands chosen more often/weighted more heavily by the AFE and FLDA classifiers. Finally, we have shown through the use of PCA as a dimensionality reduction method, that only 2 image bands are carrying 97% of the variation in our image data, and appear to be correlated with the two image stains. This result provides some insight into the roughly equivalent performance of RGB imagery to multispectral. While the results presented here are intriguing, they are by no means complete, since we are considering only a single pixel-level classification task. Future work will continue to compare multispectral with RGB imagery for further classification tasks, as well as other image analysis tasks, including object-level analysis. In particular, we are currently researching methods to segment (i.e., delineate) individual nuclei using the results of these pixel-level classifications.
One could foresee many methods for the derivation of RGB imagery from multispectral. We use here:
1. rgbequal: created by (approximately) equally allocating the 29 bands to R, G, and B, similar to the approach in , reflecting a rough approximation of the three spectral ranges associated with the three colors red, green, and blue, albeit with some ambiguity in allocation of intermediate colors (e.g., yellow).
2. truecolor: created by converting the illumination wavelength for each band into the constituent RGB values as perceived by humans, then averaging the contribution to R, G, and B for each band. This method utilizes the MatlabCentral  function spectrumRGB.
3. ccd: a modification of truecolor imagery to better match the spectral response of common 3-CCD color cameras used in microscopy setups for biomedical research. This method also utilizes the spectrumRGB function.
We describe here the six pixel-level classifiers used in this study. We choose these classifiers based on their established performance and use for multispectral data, sparsity of parameters to optimize, computational efficiency, and the use of (primarily) spectral information. The use of primarily spectral information is important in these analyses since the basic hypothesis in question deals with the spectral information content of our imagery. The exceptions to these characteristics are noted in the classifier descriptions to follow.
• Maximum Likelihood (ML): Maximizes the likelihood of a pixel belonging to a certain class. That is, a pixel is assigned the label of the class that it is most likely to be a member of. Likelihood is defined probabilistically, using the estimated joint probability density or mass function. We assume a Gaussian density model, and estimate the mean and covariance matrix for each class. These assumptions result in a quadratic discrimination boundary.
• Minimum Euclidean Distance (MED): Minimizes the Euclidean distance between an observation and the class means.
• Spectral Angle Mapper (SAM): Minimizes the angle between an observation and the class means.
• Fisher Linear Discriminant Analysis (FLDA): Projects the multi-dimensional data to one dimension, maximizes a function representing the difference between the projected class means, and normalizes by the within-class scatter along a direction perpendicular to the decision hyperplane . This is also equivalent to a Maximum Likelihood formulation assuming equal covariance matrices for each class, resulting in a linear discrimination boundary.
• An Automated Feature Extraction (AFE) tool called GENIE: GENIE is based on evolutionary computation, and is designed to explore the entire feature space of multispectral data, and evolve a solution best fit for the classification task. More practically speaking, GENIE selects an initial set of algorithms consisting of randomly selected operators and randomly selected data planes as input. Throughout the evolution process, only appropriate algorithms with appropriate data input will survive. GENIE has the ability to use information from both the spectral and spatial domain, which renders it unique among the six classifiers. For more information, see Reference .
• Support Vector Machine (SVM): Constructs a linear hyperplane that maximizes the margin between classes. In the case of nonlinear SVMs, the data is first mapped to a higher dimensional space where a linear hyperplane is computed to separate the classes, using a kernel function which defines the inner product operation in the higher dimensional space . We have implemented an SVM using SVM light , with a linear kernel (LSVM) using all training data as input, and a quadratic kernel (NLSVM) using a randomly selected 10% of our training data as input (to speed the training process to a reasonable time). For this classifier, the kernel parameters must be explicitly optimized for the training data; this is the only classifier used in this study which requires optimization of parameters.
Before discussing our performance metric and results, we would like to briefly discuss how these pixel-level nuclear classifications will be used. We are currently working towards a hierarchical image analysis system, where we will alternate classification and segmentation of the imagery in an interactive system eliciting user feedback. Current active research involves nuclear segmentation, i.e., the proper delineation of all nuclei contained in the image. As such, it is necessary to achieve an accurate classification of all nuclei pixels if we are to use shape and other appropriate metrics to their best advantage in the nuclear segmentation process.
Humans inherently incorporate higher-level information in their analysis of imagery; since we are considering the nuclear classification performance based on primarily spectral information, it is difficult, if not impossible, to specify the expected level of performance for a human expert. The issues of human performance in diagnosis, particularly the inter- and intra-observer variability (see [26, 27] and the references therein) will be an important consideration in our future work and is indeed a strong motivation for a computerized quantitative analysis.
We choose a general metric of classification performance that equally penalizes both types of classification errors: 1) true (nuclei) pixels incorrectly labeled as false (non-nuclei) and 2) false pixels incorrectly labeled as true. In particular, the performance metric is defined as
P = 0.5(R d + (1 - R f )), (1)
where R d is the fraction of true pixels classified correctly (detection rate), R f is the fraction of false pixels classified incorrectly (false alarm rate), and the factor of 0.5 scales the metric to the range [0, 1]. Note that a perfect segmentation will yield a performance score of 1 (100%), while a score of 0.5 (50%) can be obtained by a trivial solution of all pixels labeled as a single class (true or false). This metric is an equal tradeoff between detection rate and false alarm rate.
As a compromise between the necessity of comprehensive ground truth for proper quantification of classification accuracy, and the tedious and time-consuming aspect of human delineation of such ground truth, we have marked a 200 × 200 pixel window in each of our 58 histology images. This window is used to determine classification performance for each image.
Minimum Euclidean Distance
Spectral Angle Mapper
Fisher Linear Discriminant Analysis
Automated Feature Extraction
Support Vector Machine
Linear Support Vector Machine
Non-Linear Support Vector Machine
Principal Components Analysis
Independent Components Analysis
The authors would like to gratefully acknowledge James Theiler, Steven Brumby, Reid Porter, and Jiyun Byun for their excellent ideas and feedback about our test design and results, as well as Carola Zalles for her pathology tutelage.
LEB would like to acknowledge her funding support from NSF IGERT Grant DGE-0221713 for her first two years of Ph.D. research, and Los Alamos National Laboratory for subsequent funding.
This article has been published as part of BMC Cell Biology Volume 8 Supplement 1, 2007: 2006 International Workshop on Multiscale Biological Imaging, Data Mining and Informatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2121/8?issue=S1
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.