Get to know about data augmentation & if it improves accuracy. Learn how the same can be leveraged with VisionERA.
Machine learning (ML), a subset of artificial intelligence (AI), has been progressing rapidly in the recent past. With ML algorithms, it is possible to build highly accurate applications or tools that predict outcomes without explicitly programming them. Two main types of modeling are typically performed in ML, i.e., Classification and Regression. In Classification problems, the main goal is to categorize an input into one of the available target classes. For example, the input could be an image for an image classifier, whereas a piece of text for an NLP (Natural Language Processing) classifier. In the case of a classification model, accuracy is a metric (obtained by dividing the number of correct predictions by the total number of predictions) that can predict the model's performance. It is simple to compute and comprehend; hence, it is a commonly used statistic for assessing classifier models. So, the higher the accuracy, the better the classifier model. Thus, companies strive to have highly accurate models or improve the accuracy of the existing models. A classifier model accuracy can be enhanced by employing various data and model-related strategies, and one such strategy is 'Data Augmentation.'
In this article, we will discuss how Data Augmentation can improve classification accuracy as well as its limitations.
An ML classifier is the algorithm for automatically classifying the data into one or more of a set of "labels" or "classes." On the other hand, a classification model is built using a selected ML classifier for a classification task or predicting an input category. An example of this in NLP is an email classifier for labeling and filtering the incoming emails as 'Spam' or 'Not Spam.' Many examples of classification models are required, such as Optical Character Recognition (OCR), Face or Speech Recognition, Medical Imaging, etc. Generally, ML models for classification tend to be lighter on computing, while Deep Learning (DL) models are intended to think like humans and perform well with less human intervention. Therefore, DL models are more suited for analyzing images, videos, and unstructured data when machine learning models become difficult to train and use.
Deep learning models consist of neural networks and provide an advanced approach to machine learning for tackling challenging tasks. These are complex to set up; hence, the DL models require powerful hardware and resources like GPUs, which can run multiple operations simultaneously. Although DL models take more time to implement, they can handle big data efficiently and provide predictions instantaneously. Their quality tends to enhance over time with increasing data availability.
Generally, DL models can browse through millions of data points at an amazing pace to uncover hidden patterns and correlations that are often likely to go unnoticed by the human eye. It is evident that with DL, we can build advanced applications in Speech Recognition, Image Analysis and Reconstruction, Object Detection, Comprehension, etc. Despite this, DL models are still evolving as there are many limitations in DL models that currently hinder their mass applicability. An example could be that these models are challenging to scale, and there are chances that the classifier can easily misclassify inputs, especially images containing different orientations/colors of the object. One such limitation is the training dataset size. This is a common challenge with DL models, which can be overcome by feeding the model with additional data. To exemplify, a speech recognition model would require additional data in the form of long conversations or multiple dialects to obtain desired results.
Similarly, to classify the presence of a disease, an image classifier would require a significantly higher number of images to identify the onset or progress of the disease. Large organizations either have more data or acquire new data for training. Still, it might not be possible sometimes with data privacy laws or simply unavailability of relevant data.
DL models need to be fed with more diverse data to improve the model generalization. In such situations, Data Augmentation can be quite helpful in increasing the size of the dataset, although having more data remains the preferred solution. Especially in computer vision applications, the model needs to deal with different tasks related to image processing like image classification and segmentation, and object detection. But the main requirement for these is the requirement of large volumes of data.
If a DL model is trained on a large number of observations, it is likely that it will be highly accurate. So, data scientists in organizations use Data Augmentation to increase the size of the existing dataset for training the DL model. Data Augmentation techniques are quite popular in computer vision applications but are equally powerful for NLP.
In image classification, it involves the use of image manipulation techniques like cropping, rotating, and flipping to be used over the images to increase the data size. Similarly, in text data augmentation, an expanded data set can be achieved by replacing words or phrases with synonyms by consciously paying attention to the grammar rules.
It is widely known that having inadequate training data results in poor approximation. An over-constrained model will likely underfit the limited training dataset, while an under-constrained model would likely overfit the training data, resulting in poor performance in both cases. It is also possible that during the training phase, a DL model will perform well on limited and similar data, but the same, when deployed for testing on real-life data with a variety of samples, yields poor accuracy and provides incorrect predictions. With Data Augmentation, a predictive model can be exposed to additional diverse data than before and, in theory, should enable the DL model to learn better. Training the model on additional data can result in a few percent increase in accuracy (or any other metric) if there is sufficient data to train in the first place.
The following section demonstrates how to augment an image dataset for training an image classifier. Let us first build a deep learning model with an available dataset and compute the accuracy. Next, we will build a similar deep learning model with augmented images and compute the accuracy. Finally, we can conclude whether Data Augmentation has helped in improving the model by comparing the accuracies of both models.
In computer vision tasks, image transformations can be either done before training the model using selected techniques or on the go using data generators that generate a batch of randomly transformed images, i.e., the augmented data to feed the neural network during training. In the case of NLP, Data Augmentation requires to be applied cautiously due to the grammatical structure of the text. Hence, an augmented text dataset is usually prepared beforehand and later fed into data loaders for training the model.
Let us take an example of a DL model for image classification. To test the effect of Data Augmentation, we will use the flower image dataset, which is reasonably sized (3670 images, distributed evenly between five classes - 'roses', ‘daisy’, ‘dandelion’, ‘sunflowers’, and ‘tulips’). This is a multiclass image classification task.
The dataset is sufficient to train a decent image classifier. We will train the DL model first and later create augmented images using the TensorFlow framework.
With TensorFlow, it is possible to randomly apply different transformations such as horizontal/vertical flipping, rotation, zoom, width/height shifts, shear, and so on to an image dataset. We can easily create the augmented images before beginning the model training. For this article today, we will use dynamic transformations (on-the-go transformations) instead of static (prepared separately before training).
Here is the sample code we will use for building the image classification model and set the model metric to ‘accuracy’.
Upon training this model, we got 99.8% training accuracy and 67.3% validation accuracy.
Next, we will augment the dataset during the training process using the TensorFlow framework. For demo, we will use three transformations - flipping, rotation and zoom as shown in the code.
Following are few of the transformed images -
Upon training the previous model with augmented images, we got 81% training accuracy and 71.2% validation accuracy which is a good increase from the previous validation accuracy of 67.3%.
The performance of most ML models, and DL models especially, depends on the training data - its quality, quantity and relevance. However, relevant data insufficiency is one of the most common challenges in implementing deep learning in the organization as collecting such data can be costly and time-consuming in many cases. Companies can leverage Data Augmentation to reduce reliance on training data collection and preparation and to build more accurate deep learning models faster.
In conclusion, there is a visible difference in the validation accuracy of both models. Here is the summary of training:-
Without AugmentationWith AugmentationTraining Accuracy99.8%81%Validation Accuracy67.3%71.2%
It is essential to remember that even if the training accuracy was higher for the model without augmentation, the model did not work well with the validation data i.e., it was close to over-fitting. Hence, such a model is more likely to fail with real data. With augmentation, although the training accuracy was lowered, the model became more accurate, with validation accuracy increasing by almost 4%. Thus, Data Augmentation techniques can work more effectively when working with less amount of data. A significant difference in the validation accuracy can be observed when Data Augmentation is performed on a small data set with fewer training samples.
In this article, we explored Data Augmentation and how it boosts Deep Learning model accuracy. We also saw an example of Data Augmentation for multiclass image classification. Using Keras ImageGenerator, we expanded the existing dataset and retrained the model with augmented images.
Data Augmentation is beneficial, especially in situations where the available dataset is extremely small, lacks diversity in samples, and acquiring additional data is expensive and/or time-consuming. On the other hand, it has its limitations. We cannot expect Data Augmentation to boost the accuracy beyond a reasonable limit. The increase in model accuracy is, thus, subjective as it purely depends on the type of dataset and the project goals. Hence, Data Augmentation can improve the existing model when used consciously.
About us: VisionERA is an Intelligent Document Processing (IDP) platform capable of handling various types of documents because of Data Augmentation for Image Classification. It has the capacity to extract and validate data for bulk volumes with minimal intervention. Also, the platform can be molded as per requirements for any industry and use case because of its custom DIY workflow feature. It is a scalable and flexible platform providing end-to-end document automation for any organization.
Looking for a document processing solution that uses the enhanced capabilities of image classification using deep learning? Setup a demo today by clicking the CTA below or simply send us a query through the contact us page!