Reducing image classification error and quantifying the contributions of components in a learning pipeline
Loading...
Authors
Chowdhury, Aritra
Issue Date
2018-08
Type
Electronic thesis
Thesis
Thesis
Language
ENG
Keywords
Computer science
Alternative Title
Abstract
In the second part of the Thesis, we try to understand the error contributions from different components of image classification pipelines. In the first work of this part, we are trying to quantify the quality of the individual image samples and also of the entire dataset. This involves the computation of a score to quantify the quality of an image sample. This score is computed by leveraging the probabilities of machine learning algorithms. The dataset is annotated by a pathologist as good or bad depending on whether it has high or low amount of signal. Texture based features are used to extract information from the images. Classification algorithms are used to classify the data. This score therefore provides a way to filter a dataset based on the quality of the image samples.
The final work involves a proposal of methods to quantify the contribution of errors in image classification pipelines. In particular, we propose an agnostic methodology to compute the contribution of errors from different components of the image classification pipeline. We show that algorithm selection and hyper-parameter optimization algorithms maybe used for error quantification. We empirically demonstrate that random search can efficiently and accurately quantify the contribution of errors from different components in image classification pipelines.
In the second work, we minimize the error in image classification pipelines. Here we show how we can reduce the classification error by considering the pipeline as a whole instead of optimizing individual components. The problem selected for demonstrating this is that of automatic microstructure recognition. The dataset for this problem is from the domain of material science. It consists of images of microstructures in the micrometer scale. We perform two classification tasks. The first task is of differentiating between dendritic and non-dendritic microstructures. The second task is of differentiating between longitudinal and transverse cross sections of microstructures. The objective for both the classification tasks is to find the best set of algorithms and hyper-parameters for minimizing the classification error. The approach used was an exhaustive grid search of the configuration space of algorithms and corresponding hyper-parameters for three stages of the pipeline - feature extraction, dimensionality reduction and learning algorithms. This exhaustive grid search is able to perform classification of the two tasks with a sufficiently high degree of accuracy. We demonstrate that pre-trained convolutional neural networks along with support vector machines with a linear kernel can be used for characterizing microstructures. In addition, we also show that exhaustive grid search based methodology can be used for finding the best algorithms and hyper-parameters for the problem of microstructure recognition in the domain of material science.
The first work involves an image classification problem in the domain of blood vessel characterization. Here we show how optimizing a particular component of the pipeline, in particular data pre-processing, reduces the classification error. The objective is to differentiate between vascular morphologies occurring in fluorescence microscopic 2D samples of neuropathological tissue samples. The main source of error in this problem arises as a result of imbalance in the training dataset. We address this problem by modeling artificial parametric 3D models of blood vessels to augment the training dataset and reduce the imbalance in the dataset. Pre-trained convolutional neural networks are used as the feature extraction algorithm in this work. We show that a combination of the artificial and natural data increases the accuracy of classification and as a result reduces the generalization error.
Image classification is an approach of pattern recognition in computer vision. The objective is to differentiate between different classes of images based on the quantification of contextual information in the imagesand it is performed using data analytic pipelines. These pipelines are organized as interdependent and individual components. The components include image acquisition, image preprocessing, feature extraction, feature preprocessing, dimensionality reduction and learning algorithms among others. More steps may be added or removed from the pipeline. The quality of the image classification pipeline is measured by the validation error or test error in classification of unseen image instances that quantify the generalization error. Even though the error appears as a result of application of the learning algorithm, it is actually an accumulation of the errors introduced from different parts of the pipeline starting from the acquisition of the image from the sample, the feature preprocessing algorithms, the feature extraction algorithms and finally to the learning algorithm used for performing the classification. This error is propagated down the pipeline which is finally accumulated in the form of the aforementioned classification error. In this dissertation, we attempt to holistically optimize, quantify and understand the contribution of error from the various components of image classification pipelines.
The final work involves a proposal of methods to quantify the contribution of errors in image classification pipelines. In particular, we propose an agnostic methodology to compute the contribution of errors from different components of the image classification pipeline. We show that algorithm selection and hyper-parameter optimization algorithms maybe used for error quantification. We empirically demonstrate that random search can efficiently and accurately quantify the contribution of errors from different components in image classification pipelines.
In the second work, we minimize the error in image classification pipelines. Here we show how we can reduce the classification error by considering the pipeline as a whole instead of optimizing individual components. The problem selected for demonstrating this is that of automatic microstructure recognition. The dataset for this problem is from the domain of material science. It consists of images of microstructures in the micrometer scale. We perform two classification tasks. The first task is of differentiating between dendritic and non-dendritic microstructures. The second task is of differentiating between longitudinal and transverse cross sections of microstructures. The objective for both the classification tasks is to find the best set of algorithms and hyper-parameters for minimizing the classification error. The approach used was an exhaustive grid search of the configuration space of algorithms and corresponding hyper-parameters for three stages of the pipeline - feature extraction, dimensionality reduction and learning algorithms. This exhaustive grid search is able to perform classification of the two tasks with a sufficiently high degree of accuracy. We demonstrate that pre-trained convolutional neural networks along with support vector machines with a linear kernel can be used for characterizing microstructures. In addition, we also show that exhaustive grid search based methodology can be used for finding the best algorithms and hyper-parameters for the problem of microstructure recognition in the domain of material science.
The first work involves an image classification problem in the domain of blood vessel characterization. Here we show how optimizing a particular component of the pipeline, in particular data pre-processing, reduces the classification error. The objective is to differentiate between vascular morphologies occurring in fluorescence microscopic 2D samples of neuropathological tissue samples. The main source of error in this problem arises as a result of imbalance in the training dataset. We address this problem by modeling artificial parametric 3D models of blood vessels to augment the training dataset and reduce the imbalance in the dataset. Pre-trained convolutional neural networks are used as the feature extraction algorithm in this work. We show that a combination of the artificial and natural data increases the accuracy of classification and as a result reduces the generalization error.
Image classification is an approach of pattern recognition in computer vision. The objective is to differentiate between different classes of images based on the quantification of contextual information in the imagesand it is performed using data analytic pipelines. These pipelines are organized as interdependent and individual components. The components include image acquisition, image preprocessing, feature extraction, feature preprocessing, dimensionality reduction and learning algorithms among others. More steps may be added or removed from the pipeline. The quality of the image classification pipeline is measured by the validation error or test error in classification of unseen image instances that quantify the generalization error. Even though the error appears as a result of application of the learning algorithm, it is actually an accumulation of the errors introduced from different parts of the pipeline starting from the acquisition of the image from the sample, the feature preprocessing algorithms, the feature extraction algorithms and finally to the learning algorithm used for performing the classification. This error is propagated down the pipeline which is finally accumulated in the form of the aforementioned classification error. In this dissertation, we attempt to holistically optimize, quantify and understand the contribution of error from the various components of image classification pipelines.
Description
August 2018
School of Science
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY