Random subwindows and extremely randomized trees for image classification in cell biology
© Marée et al; licensee BioMed Central Ltd. 2007
Published: 10 July 2007
Skip to main content
Volume 8 Supplement 1
© Marée et al; licensee BioMed Central Ltd. 2007
Published: 10 July 2007
With the improvements in biosensors and high-throughput image acquisition technologies, life science laboratories are able to perform an increasing number of experiments that involve the generation of a large amount of images at different imaging modalities/scales. It stresses the need for computer vision methods that automate image classification tasks.
We illustrate the potential of our image classification method in cell biology by evaluating it on four datasets of images related to protein distributions or subcellular localizations, and red-blood cell shapes. Accuracy results are quite good without any specific pre-processing neither domain knowledge incorporation. The method is implemented in Java and available upon request for evaluation and research purpose.
Our method is directly applicable to any image classification problems. We foresee the use of this automatic approach as a baseline method and first try on various biological image classification problems.
With the improvements in biosensors and high-throughput image acquisition technologies, life science laboratories are able to perform an increasing number of experiments that involve the generation of a large amount of images at different imaging modalities/scales: from atomic resolution for macromolecules (such as in protein crystallization), to subcellular locations (such as in location proteomics), up to human body organs or regions (such as in radiography).
In cell biology, the analysis of results of imaging experiments may provide biologists with new insights for a better understanding of all cellular components and behaviors . However, visual classification (also called visual examination, phenotyping, recognition, categorization, labelling, sorting) of images into several classes with some shared characteristics (also called phenotypes, groups, types, categories, labels, etc.) is tedious. Indeed, manual classification of such an amount of images is time-consuming, repetitive, and is not always reliable, due to experimental conditions, variable image quality, and human subjectivity or tiredness that lead to considerable interobserver variations and misclassifications. In other words, manual examination could be a source of bias and would cause a bottleneck for high-throughput experiments, thus systems that automate image classification tasks would greatly help biologists. Ideally these systems should proceed faster than human in most cases, with the same accuracy (or even better when patterns are indistinguishable by human experts), and widely reduce the number of images that require human inspection (for example only in the case where the automatic system does not have a great confidence about its prediction).
In the computer vision community, image classification is a very active field. Given a set of training images labelled into a finite number of classes by an expert, the goal of an automatic image classification method is to build a model that will be able to predict accurately the class of new, unseen images. Such techniques have been applied to various problems where the goal is to identify a specific object (e.g. the face of a given individual, a particular building, someone's car), and current researches aim at developing generic methods for the categorization, detection and segmentation of classes of objects or scenes with shared characteristics in terms of their shapes, colors, and/or textures (cars, airplanes, horses, indoor/outdoor scenes, etc.) .
In the context of biomedical studies and cell biology, such automatic methods could for example help to study the phenotypic effects of drugs in human (red-blood) cells  where a class could denote the shape of a cell (stomatocyte, discocyte, or echinocyte). In various cytopathology studies, one may want to automatically recognize various cellular types to quantify their distributions in a certain state (e.g. cellular sorting in serous cytology ). Another promising example is the automatic identification of subcellular location patterns (e.g.: cytoplasm, mitochondria, nucleoli, etc.), using fluorescent tagging and fluorescence microscopy, as an essential first step to understand the function of various proteins [5, 6]. Other recent examples of biological studies that can be formulated as image classification problems include the recognition of the different phases of the cell division cycle (interphase, prophase, metaphase, anaphase, etc.) by measuring nucleus shape and intensity changes in time-lapse microscopy image data [7, 8], the microscopic analysis of urine particles (eg. squamous epithelial cells, white blood cells, red blood cells, etc.) , the study of protein distributions following a retinal detachment from confocal microscopy images , the annotation of fruitfly gene expression patterns over the entire course of embryogenesis obtained by in situ mRNA hybridization , etc.
Till recently, image classification systems usually rely on a pre-processing step, specific to the particular problem and application domain, which aims at computing a certain number of numerical features from the initially huge number of pixels in images. Such features could for instance correspond to statistics of pixel intensities (mean, standard deviation, skewness, kurtosis, correlation between adjacent pixels, etc.), or compute various measures from preliminary segmented objects or "blobs" (ratio of area to perimeter, measure of straightness and curvature of boundaries, distance between objects, etc.), etc. This reduced set is then used as new input variables (also called features, signatures, descriptors) for traditional learning algorithms (for example a nearest neighbor or neural network classifier), possibly tuned for the specific application. The learning algorithm then tries to build from the data a model that associates features with predefined classes. The limitation of this approach is clear: a given set of features is suitable only for certain specific applications, but unsuitable for others, and the choice of which set of features to use for a given application is not obvious. Thus, when considering a new application or, more dramatically, when new image classes are of interest, it is often necessary to manually adapt the pre-processing step by taking into account the specific characteristics of the new task. Recently, several works tried to overcome this limitation and consider combining several different types of features that describe different aspects of an image, and applying feature selection techniques. In [5, 7, 12] several hundreds image features are extracted corresponding to texture descriptions, pixel intensity distributions, edges, responses to various filters, etc. However, these approaches that use global features may not work properly with cluttered and partially occluded images and they may not be robust to various image transformations (such as translation, orientation, scale, and viewpoint changes), that may appear in many applications. Meanwhile, it has been shown recently that generic methods developed by the object recognition community perform very well on medical images even though they were not tuned for such tasks .
Many recent object recognition methods rely on a "local features" scheme [14–16]. First, interest points or image regions are detected (eg., by using a detector of peaks in local image variation) whose neighbourhood has high informational content and which are thought to be robustly detectable in images under varying conditions .
Then, the appearance of the interest points or regions is encoded by a feature vector of numerical values computed in their neighbourhood . Such descriptors are often designed to be discriminative, concise and insensitive to various transformations that global feature methods are generally not able to cope with. These descriptors are sometimes compressed by dimensionality reduction techniques (such as Principal Component Analysis) because local regions contain too much data for the traditional learning methods that are not able to deal with very high numbers of variables. These local feature vectors are then stored in a database for use during the recognition step.
To predict the class of a new image, each feature vector computed from the image is classified using a nearest-neighbor algorithm against the feature vectors in the database. The majority class among the classes assigned to local feature vectors is then assigned to the image.
In , we have proposed a generic approach for image classification that largely follows the aforementioned scheme but distinguishes from other methods by several notable points. First, the method uses a large set of randomly extracted image subwindows (or patches) and describes those by high-dimensional feature vectors composed by raw pixel values. Then, the method uses ensemble of extremely randomized decision trees  to build a subwindow classification model. To predict the class of a new image, the method aggregates subwindow class predictions given by the decision trees and it uses majority voting to assign a class to the image. Details about the method and its rationale are given in the Methods section.
Our approach was evaluated on various image classification datasets involving the classification of digits, faces, objects, buildings, photographs, etc. Moreover, in , we successfully applied it on a 10000 X-Ray image database with classification results very close to the best ones .
In this paper, we evaluate the potential of our image classification method in cell biology by evaluating its performances on four datasets of images related to protein distributions or subcellular locations and (red-blood) cells. The application of our method is straightforward (without incorporation of domain knowledge) and we compare its results with human classification (when available) and automated methods designed specifically for a given task. We discuss properties of the method such as attractive computational efficiency and possible interpretation.
The performance of our method is given for four image classification tasks: two of them correspond to sub-cellular protein localizations, the third one to red-blood cell shapes, and the last one to protein distributions in retina cells and layers. Details about these datasets are given in the Methods Section.
Basically we measure the accuracy of the models to correctly predict the class of unseen images. In all experiments, we build T = 10 trees using the default filtering parameter value (k = = 16 for greyscale images, k = = 28 for color images) except for the RBC task where we observed that its maximum value (k = 256) achieved better accuracy. The number of extracted subwindows is given for each problem. Details about our method and its parameters are given in the Methods Section.
Since for this experiment there are no results available from the literature, we applied a nearest neighbor classifier with euclidian distance and an Extra-Tree classifier on resized versions (200 × 100) of the global images (without subwindows extraction) to provide some baseline for comparison. With these methods, we obtained error rates of 33.33% and 11.82% (T = 500, k = = 141) respectively, which shows that the nearest neighbor classifier is here not able to deal with the high-dimensional feature vectors and the small number of images. On the other hand, the significant improvement of our method with respect to the Extra-Tree classifier confirms the interest of the subwindows sampling and voting scheme of our method.
Random guessing on this dataset would give about 90% error rate, while the human classification error rate on this task is of 17%, as reported in . We obtain with our method an error rate of 16.63% ± 2.75 (when using N ls = N test = 2000).
We can compare these results with those of  (the first publication of this team based on this dataset) which range between 25% downto 15.6% depending on the number of features used and the parameters of the learning algorithm (a neural network classifier). Subsequently (see ), K. Huang and R.F. Murphy have improved these results downto 8.5% by using an unweighted majority-voting ensemble model of all possible combinations of eight classifiers, with several parameters optimized on this specific dataset.
In the literature, error rates on this dataset range from 31% to 13.5% , while the error rate of human experts is estimated to be above 20% . On the other hand, with the protocol we used and due to the unbalanced number of images in each of the three classes, a method always guessing the most frequent class would achieve an 35.7% error rate. With our method, we obtained the best results by constraining the random subwindow sizes between 80% and 100% of the image size instead of the full range of sizes, with a mean error rate over all subsets of 20.92% ± 1.53 with 100 subwindows extracted from each image.
Notice that the method that obtains the best results on this dataset  also uses a local appearance approach, but with a distance measure between patches that incorporates invariances with respect to transformations that are known a priori: cell border line thickness, six affine transformations, and additive image brightness.
In , authors proposed a method that computes different sets of MPEG-7 features within fixed-size square tiles, applies Independant Component Analysis to the feature vectors, and uses a Support Vector Machine classifier. Their results range from 65.6% downto 16.2% classification error rate on a dataset of 433 retinal images labelled into 9 classes. We obtain a 10% leave-one-out error rate using 5000 subwindows extracted from each image with subwindow random sizes inferior to 10% of the image size. Our 5 misclassification errors are confusions between "normal" and "1 day" conditions, and between "3 day" and "7 day" conditions. Our accuracy results are not directly comparable to those in  because the number of images and classes are not equivalent. However, they illustrate the ability of our method to capture the characteristics of these 4 classes using only a dozen images per class, hence its potential for this type of imaging experiments. A more in depth validation of our method on this type of problem would require a larger set of images representing additional experimental conditions (e.g. when different treatments are used).
We think our method is attractive for cell biology studies in view of its properties that we summarize hereafter.
First, without integrating any domain knowledge neither complex pre-processing techniques, our experiments show that our generic method obtains quite good results on average on four problems with images of different quality and representing various patterns. As one could have expected, these results are however not as good as the best results published in the literature obtained either with tailored methods for one specific dataset and/or after important research efforts (sometimes years of research).
Interestingly, our method is competitive with respect to classification by human experts on the HeLa cells and RBC tasks. In biological studies where the number of images to classify is so large, and where the perfect classification of molecules or cells is not required (but rather an estimation of distributions of types of cells, for example), the method would thus be quite useful. Indeed it is directly applicable to any image classification problem, it is reasonably fast, it can run on regular computers, and it would be easily possible to take advantage of parallel architectures, if available.
In the case of particular applications that require better prediction results than the ones obtained with the default settings of our method, its enhancement or tailoring is conceivable. Integration of domain knowledge would be possible. For example, in the case of protein subcellular localizations, the combination of the image classification and the classification of the amino acid sequence of the protein with a similar approach  might improve results. Domain knowledge could also be incorporated implicitly through the description of the subwindows with domain specific features, and also the exploitation of more generic image classification features (e.g. Haralick texture descriptors, Sobel edge features, etc.) may be useful. Generation of synthetic versions of the subwindows [27–30] might be another way to improve robustness (for e.g. to illumination changes or noise) by providing the learning method a richer training set to generalize from.
Beyond misclassification error rates, the method could highlight discriminative subwindows in images, hence it could be used as an exploratory tool for further biological interpretation. Preliminary results were given on the retinal dataset. For a specific study, this function should be applied on larger sets of images and corroborated by domain experts to assess its pratical usefulness.
We illustrated the potential of our generic image classification method on different kinds of problems in cell biology. Thanks to its computational efficiency and competitive accuracy results on average with respect to human classification and tailored methods, we foresee the use of this automatic approach as a baseline method and a first try on various biological image classification problems where a manual approach could be a source of bias and would cause a bottleneck for high-throughput experiments. Moreover, preliminary results show that minor parameter tuning could possibly improve the default results on specific problems. Extension of this approach to image sequence classification and segmentation also deserves to be studied.
We first describe the four image classification tasks and protocols used to evaluate our method. Our image classification method is explained afterwards.
The subcellular localization of proteins is an essential step for the understanding of their function. The use of computer vision techniques for the recognition of patterns of subcellular fluorescence  is promising if combined with high throughput imaging systems [1, 5, 6]. In order to illustrate the potential of our method in that domain, we collected images from the website of an ongoing project about the localization of novel GFP-tagged human cDNA products to subcellular compartments of the eukaryotic cell [32, 33].
As we collected the dataset by ourselves, we had to define a protocol to assess the classification performance. We used a leave-one-out error estimation as the dataset is rather small. That is, one model is built using all the images except one and the model is used to predict the class of the remaining image. The process is repeated for all the images, and the total number of prediction errors is counted. The total misclassification error rate is provided by percentage.
Given a set of training images labeled into a finite number of classes, the goal of an automatic image classification method is to build a model (training phase) that will be able to predict accurately the class of new, unseen images. The main characteristics of our method  are summarized as follows.
During the training phase, a large number (N ls ) of square subwindows of random sizes are extracted at random positions from each training image (see examples for LifeDB images in Figure 1). This random subwindow extraction provides a rich representation of images corresponding to various overlapping regions, both local and global, whatever the task and content of images. Each subwindow is then resized to a fixed size (16 × 16), to improve robustness to scale changes, and described by a high-dimensional feature vector of its raw pixel values (ie. 256 numerical values in the case of greyscale images, 768 in color images) to avoid discarding potentially useful information while being generic. Each subwindow is then labeled with the class of its parent image.
A subwindow classification model is then built by an ensemble of extremely randomized decision trees (Extra-Trees) algorithm . This machine learning method has been shown effective (in terms of accuracy and computational efficiency) in a large variety of high-dimensional problems such as proteomic mass spectra classification  and DNA sequence classification . Starting with the whole learning set of subwindows at the top-node, the Extra-Trees algorithm builds an ensemble of T fully-developed decision trees according to the classical top-down decision tree induction procedure . The main difference between this algorithm and other tree methods is that while growing a tree, it splits nodes by choosing both attributes and cut-points at random. In the case of subwindow image classification, a binary test within a tree node simply compares the value of a pixel (intensity of a grey level or of a certain color component) at a fixed location within a subwindow to a cut-point value. In order to filter irrelevant attributes, the filtering parameter k corresponds to the number of attributes (ie. pixel locations) chosen at random at each node, where k can take all possible values from 1 to the number of attributes describing the subwindows. For each of these k attributes, a pixel intensity value threshold is randomly choosen. The score of each binary test is then computed on the current subwindow subset according to an information criterion , and the best test among the k tests is chosen to split the current node. The procedure is repeated recursively on subwindow subsets until the tree is fully developed. T fully-developed trees are built according to this scheme and saved (learning images and subwindows are no longer required for prediction).
Classification of a new image similarly entails extraction and description of N test subwindows from this image, and the application of the model to these latter. Aggregation of subwindow predictions is then performed to classify the image, by assigning to the image the majority class among the classes assigned to each subwindow by each one of the T trees.
The method provides an interesting way to help domain experts to focus on discriminative regions in the images. Indeed subwindow individual votes are available when we predict the class of a new image. We can observe for each subwindow the distribution of votes for all classes assigned by the decision trees. The subwindows that receive the highest number of votes for a given class can then be considered as the most specific ones for that class and their visualization on the top of the image can bring potentially useful information about that class. Also, it is possible to generate a class-specific confidence map where each pixel corresponds to the sum of votes for that class received by every subwindows (correctly classified or only the most specific ones) the pixel belongs to. These functions are illustrated on the Retinal detachment images in the Results section.
The important parameters of the method are the number of subwindows extracted during learning (N ls ) and prediction (N test ), the number of trees T, and the extra-trees filtering parameter k. As a first try, we generally use a few hundred thousand of learning subwindows, a hundred or so subwindows per test image, and we build ten trees using the filtering parameter equal to the rounded square root of the number of attributes (default value suggested by ). As a general rule, we observe that the more subwindows we extract and trees we build, the better the accuracy is. Higher values of the filtering parameter also generally improve accuracy results. The parameter values could be adjusted in order to comply with desired computational efficiency requirements given that the complexity of the decision tree ensemble learning is on the order of kT N ls logN ls and that the prediction step is essentially proportional to N test TlogN ls . Note that the approach scales very well and, moreover, it is easy to parallelize.
The above image classification method was implemented as a Java user-friendly software called PiXiT . This software is freely available for research purpose. Screenshots of the software are shown in Figure 2. This software comes together with Annotor , a software developed by Vincent Botta which helps to annotate image databases. This second Java software allows users to annotate images through polygon labelling and to export individual annotations into directories of classes of images that can be imported into PiXiT to build classifiers.
Raphaël Marée is supported by the GIGA interdisciplinary cluster of Genoproteomics of the University of Liège with the help of the Walloon Region and the European Regional Development Fund. Pierre Geurts is a research associate of the FNRS, Belgium. Red blood cell database courtesy of Thomas Deselaers, RWTH Aachen University, Germany. The PiXiT software is maintained by PEPITe SA.
This article has been published as part of BMC Cell Biology Volume 8 Supplement 1, 2007: 2006 International Workshop on Multiscale Biological Imaging, Data Mining and Informatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2121/8?issue=S1
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.