Car Damage Detection Using Deep Learning & Computer Vision

24 July 2023

12 min read

Intelliarts Blog Computer VisionCar Damage Detection Using Deep Learning & Computer Vision

Structure

Building a two-model computer vision solution for car damage detection tasks and comparing instance vs semantic segmentation algorithms.

Computer vision, as a field of AI, is becoming increasingly important in the insurance industry. It can help automate car damage detection processes and reduce costs while improving accuracy and enhancing customer experience.

Computer vision technology brings many opportunities, including the replacement of manual inspection to a certain extent. That’s why the Intelliarts team found it promising to start working on an automated car damage assessment project.

Let’s have an in-depth look at how to solve vehicle damage detection tasks using computer vision from the experience of the Intelliarts team of ML engineers, what particular algorithms can be used, what the process of training and evaluating a model’s performance is, and what popular algorithm may produce better outcomes in a computer vision project.

What algorithms can solve car damage detection task?

For this purpose, engineers utilize the image segmentation algorithm. Its work is to attribute one or another class to a particular pixel in the image based on certain visual characteristics, such as color, texture, intensity, or shape. In the case of vehicle inspection, the class is either with damage or without damage. The goal of image segmentation is to simplify or change the representation of an image into a more meaningful form and separate objects from the background and make them easy to analyze.

So what’s the difference between instance and semantic segmentation? The two major approaches to image segmentation are the following:

Instance segmentation

With this computer vision technique, each individual object is identified and labeled with a unique identifier. The first step of instance segmentation is object detection. In this phase, a computer vision algorithm attempts to detect all objects in the image and provide a bounding box, i.e., rectangular or square-shaped figure surrounding an object, to each of them. During the car damage detection and classification done over areas inside bounding boxes, the algorithm calculates the confidence or the likelihood of the particular object of interest with a specific class, e.g., car, tree, human, etc., being inside the bounding box.

In the second step of the technique, the algorithm performs segmentation in each of the bounding boxes and labels each pixel, indicating whether it belongs to an object or not.

Another requirement of instance segmentation is the usage of pixel-wise masks. These are binary images that are used to identify the location of objects or regions of interest within an image. Each pixel in the mask is assigned a value of either 0 or 1, indicating whether that pixel belongs to the object or region of interest. Pixel-wise masks can be generated manually by annotating images.

The way this algorithm works ensures that multiple instances of the same object are differentiated from one another, even if they overlap or are partially obscured by other objects in the image.

Semantic segmentation

The semantic segmentation technique involves dividing an image into multiple segments, each of which corresponds to a particular object or region of interest within the image, and classifying them separately. Unlike traditional image segmentation methods that simply partition an image into arbitrary regions based on pixel similarity, semantic segmentation aims to associate each segment with a meaningful semantic label, such as a person, car, building, tree, etc.

Semantic segmentation treats multiple objects belonging to the same class as a single entity. It can indicate boundaries of, for one, all people, all cars, or all buildings in the image, if necessary. It’s important to note that semantic segmentation allows for detecting damages only without distinguishing them. In construct, instance segmentation can distinguish multiple distinct damages from each other.

Once trained, the semantic segmentation model can be used to segment new images by propagating them through the network and generating a pixel-level segmentation mask. The latter works similarly to pixel-wise masks in instance segmentation yet assigns a label to each pixel in an image instead of generating multiple masks, one for each instance of an object.

The actual comparison of real-life instance and semantic segmentation algorithms with obtained insights is provided in the below sections.

The difference between semantic segmentation and instance segmentation algorithms

Types of car damage can be detected with computer vision

During the AI car damage inspection process, algorithms use digital pictures of a vehicle being examined to detect the following types of damage:

Metal damage

Deteriorations of the bumper, hood, doors, dickey and other metal car parts are classified as forms of metal damage:

Dents and deformations. Concavity on the metal surface caused by pressing the metal body inside is known as a dent. It’s often caused during car crashes when there is destructive pressure from external objects.
Scratch. When a hard or sharp object is moved against a metallic surface, it causes scratches. It’s the most prevalent type of car damage, and it may largely vary in its severity.
Tear. If a vehicle sustains extreme forces, the metal parts may be split into pieces. This damage is called a tear. It often affects the external surface of a car part as well as results in additional destruction to the inside of the car.

Metal damages are quite easy to detect, as they are well-visible. But the tricky part is the assessment of the destruction’s magnitude. Yet, using deep learning techniques in computer vision allows the system to solve even such complex tasks.

Glass damage

Deteriorations of the windshield, back glass, car windows, headlight, and taillight are classified as forms of glass damage:

Crack. This surface damage may be caused by physical forces, extreme temperatures, weather conditions, extreme pressure, or combinations of these factors. It typically looks like a net of circles and lines that indicate surface areas where the glass integrity was compromised.
Chip. When a small chunk of glass comes out of the surface, it’s called a chip. Professionals refer to it as a “stone break” or “pit.” This damage doesn’t penetrate the glass of the way, and consequently, it doesn’t cause longer cracks.
Spider crack. If there was a heavy physical impact caused to glass by an object, like a rock, a spider crack is formed. It often looks like a small hole in the center, where the force was applied — an impact point. Multiple cracks are formed around this area, giving the glass surface the appearance of a spider web.

Usually, the severity of glass destruction can be both spotted and assessed quite accurately.

Other Damage

Damages that are not categorized as either metal or glass damage or are not forms of impact damage fall within this definition. Examples include dislocation or replacement of car parts, gaps between car parts, missing parts, cosmetic damage done, for example, to car paint, and more. The capabilities of computer vision algorithms to spot non-destructive damage are limited, but they still may have something to offer.

Computer vision-assisted vs manual car damage assessment

While manual vehicle inspection is prevalent, computer vision-assisted one gradually becomes complementary to it. In some cases, it even completely replaces assessment by a human worker. Let’s have a look at the most significant pros and cons of both types of car damage detection.

Main pros and cons of computer vision-powered car damage detection:

Vehicle damage assessment with AI has the following strengths to offer:

Workforce utilization and safety. With AI-driven technologies, the involvement of labor in the vehicle inspection cycle can be minimized. Not to mention that the entire claim processing can be done remotely. It results in better worker safety as inspectors are not vulnerable to electrical, moving parts, slip and fall, and chemical exposure hazards.
Cost savings. Automating simple insurance claim processing can significantly lower operating costs. After all, everyday cases like scratches or glass cracks are easy to handle for the AI system, yet time-consuming for human inspectors. Using AI for vehicle inspection can lead to considerable savings for businesses.
High efficiency. Typically, it takes a computer vision system only a few seconds to process a single image. It makes working with batches of images incredibly time-efficient, allowing for more inspections to be completed in less time.

Here are two limitations of computer vision systems for car inspection:

Accuracy depends on the image quality. Low image sharpness or inappropriate lighting may largely deteriorate image processing. Besides, there are still risks of vehicle owners intentionally making fake images or taking photos in the way it alters the view of the severity of the damage.
Some types of damage cannot be detected. In the case of deteriorations that don’t categorize as glass or metal ones or are hidden damages, image processing may not be enough for an accurate estimation. AI-assisted vehicle inspection is also at a higher risk of fraud, but engineers may mitigate the issue by applying additional fraud-detection algorithms.

AI damage inspections have great usability. Yet, it’s important to understand that the scope of its usage has restrictions.

Main pros and cons of manual car damage detection

The strengths manual vehicle damage assessment is appreciated for are the following:

High reliability of damage assessment in specific cases. Trained inspectors can quite accurately estimate the severity of deterioration and the potential repair costs. They also can spot hidden damages, making human-powered assessment better for cases of inspection of flooded or extensively used vehicles.
Widely used approach. Since manual vehicle inspection is recognized as an industry-standard approach, finding specialists and establishing the optimal workflow should not present a difficulty.

Here are two limitations of the conventional approach to car inspection:

High rate of labor resource usage. In the conventional inspection cycle, manual labor is involved in nearly every step. It increases the probability of human error as well as brings difficulties related to human resource management.
Time-extensive procedures. Both document approval and manual car damage inspection take considerable time. This way, a single claim completion may take days and even weeks.

Let’s summarize the key points of both car damage detection techniques and compare them against each other.

Computer vision powered car damage detection vs manual car damage detection

It’s difficult to claim that the CV-assisted method of car damage detection will completely replace the manual one in a short while. Yet, its capabilities are enough to take on large batches of assessment cases. It’s also interesting to know how machine learning optimizes insurance claims processing, which is one of the 5 machine learning applications.

Machine learning metrics for the evaluation of a trained model’s performance

Machine learning metrics are quantitative measures of how well a model is solving a given task. They help in assessing the model’s performance and provide insights into the model’s strengths and weaknesses.

Values of machine learning metrics are calculated based on outcomes that the model shows after its testing on a previously unused dataset. This way, engineers can assess the potential performance of the model on real-life data. Obtained results guide further decision-making, as engineers may need to rework the model multiple times before its performance is proved satisfactory.

Here is the list of main image segmentation metrics to consider using while testing a model for solving damage detection tasks:

1. MIoU (mean intersection over union)

This metric measures the average overlap between the predicted and ground truth segmentation masks for each class in the dataset. MloU is calculated by computing the IoU for each class and then taking the mean across all classes. IoU = Intersection between predicted and ground truth masks for a class / union between predicted and ground truth masks for a class.

2. Pixel accuracy

This metric measures the percentage of pixels in an image that is correctly classified by the model. Pixel accuracy = The number of correctly classified pixels / the total number of pixels in the image.

3. Dice coefficient

In image segmentation, the dice coefficient measures the overlap between the pixels of the predicted and true segmentation masks on a scale of 0 to 1, where 0 means no overlap and 1 indicates the perfect match. Dice coefficient = 2 * the number of pixels that are correctly classified by both masks / the total number of pixels in both masks

In our research on the performance of instance segmentation and semantic segmentation models detailed below, the Intelliarts team used MloU and Dice Coefficient metrics for measuring the results of testing.

You may also find it helpful to learn about applications of Machine Learning in the insurance sector.

Examples of real-life AI architectures

Selecting the right AI architecture is a crucial step in any ML project. A wisely chosen solution can benefit the outcomes by offering better accuracy of the segmentation process, higher speed of processing, and high efficiency of resource usage. Besides, it may happen that some architectures are better suited for real-time or near-real-time applications, while others may be more suitable for batch processing of large datasets, which also should be carefully considered.

There are some image segmentation architectures that have been extensively tested and are quite popular. Among such are Mask R-CNN and U-net, which were exactly the algorithms used by the Intelliarts team to test instance segmentation and semantic segmentation techniques for solving car damage detection tasks. They allow engineers to utilize fine-tuning techniques over ready-made weights, i.e., models that are pre-trained on a large dataset. This way, training a model from the ground up, which is a resource-extensive task, is unnecessary.

Let’s have an insight into what these algorithms are:

Mask R-CNN

Mask R-CNN (Region-based Convolutional Neural Network with masks) is a deep learning architecture for object detection and instance segmentation. It’s built upon the Faster R-CNN object detection model and has a segmentation part, i.e., a subset of layers operating on the input data.

Mask R-CNN works in two stages. In the first stage, it generates region proposals using a Region Proposal Network (RPN), which suggests regions of the image that are likely to contain objects. In the second stage, it performs object detection and segmentation by simultaneously predicting class labels, bounding boxes, and masks for each proposal.

U-Net

U-Net is a convolutional neural network architecture designed for image segmentation tasks. It’so incredibly popular for solving medical image segmentation tasks such as brain tumor segmentation, cell segmentation, and lung segmentation. It has also been adapted for other image segmentation applications, such as road segmentation in autonomous driving.

The U-Net architecture has a distinctive “U” shape, which is formed by the downsampling and upsampling operations. The network has a contracting path, which captures context and downsamples the input image, and an expansive path, which enables precise localization and upsamples the feature maps. In essence, the network can retrieve detailed information about the objects being segmented while also capturing the context and global structure of the image.

One Pager

Unlock Efficiency: Maximizing Insurance Processes with Computer Vision

Learn more

Instance segmentation (Mask R-CNN) vs semantic segmentation (U-net) based on real-life observation

In our recent research, the Intelliarts team tested two popular neural network architectures used for image segmentation tasks — Mask R-CNN and U-net. Both computer vision algorithms were trained and then tested using the same datasets. We used precleaned and prepacked data from the publicly available Coco car damage detection and the Segme image datasets.

Despite the fact that Mask R-CNN has a more complex architecture and processes the region proposals rather than the entire image, testing revealed that actually, U-net, as a semantic segmentation-based algorithm, performs better.

U-net showed optimal outcomes in the first part of the test when the AI algorithms were used for identifying car damages and evaluating their magnitude. Besides, in the second part, when Intelliarts engineers made the computer vision models identify damaged car parts and recognize them, Mask R-CNN performed better as well. It brought our team to the conclusion that the semantic segmentation model, particularly the tested U-net, is currently a better choice when it comes to a vehicle damage inspection.

You may give a try to an online demo that presents an interactive playground for the trained AI model. The demo shows the capabilities of a computer vision-enabled model to detect car damage based on an input image or video frame.

Benefits of car damage detection with computer vision

Let’s review some business benefits of incorporating computer vision for car damage assessment:

Higher customer satisfaction. One of the frequent client complaints is that their insurer considers insurance claims for far too long. AI systems can partially relieve inspectors from handling loads of claims. This would instantly increase service quality and, consequently, customer satisfaction.
Optimized expenses. Setting up a computer vision system for vehicle assessment may seem to be a costly investment. Yet, it’s an incredibly cost-effective intervention in the long run, as processing minor cases with computer vision takes fewer resources compared to manual inspection.
Lower rate of manual labor involvement. Labor-extensive processes may be challenging to set up and maintain in the optimal state. Hiring and training staff, human error, and labor safety are only a few of the factors that should be considered carefully.
Enhanced digital transformation. Computer vision-enabled car inspection is one of the ways in which companies in the insurance sector may get a technological advantage over competitors. So, this option should not be overlooked.
Improved brand image. Technological advancements and a wide choice of options are what many appreciate. Mere knowing that the insurer offers such an option as online vehicle damage claim processing already makes clients trust their provider more.

It’s easy to notice that the usage of computer vision technology in insurance promises plenty of business opportunities.

Car damage assessment with computer vision

Above all, let’s find out the difference between car damage assessment based on computer vision and car damage detection

Car damage detection focuses on recognizing the presence of damage itself and identifying damaged areas. At the same time, car damage assessment using computer vision involves evaluating the extent and cost of damage to a vehicle, based on identified damage, with the prospect of making repair decisions.

Basically, card damage detection is a core part of car damage assessment.

Consideration of card damage assessment claims is often closely connected to a large load of manual document processing and property damage assessment. No wonder insurance companies may experience an increased load of vehicle damage claims, especially during peak seasons. That’s exactly when AI-driven automated car damage inspection comes into assistance.

As detailed previously, to conduct damage detection, computer vision algorithms process post-accident vehicle imagery. Smart systems can recognize a car validating whether it’s the same one as the documents evidence, identify damaged car elements, and estimate preliminary damage and repair costs.

While the conventional card damage assessment workflow is complicated and time-consuming, computer vision-enabled one is narrowed down to only four following steps:

Building a well-performing system for car damage detection with computer vision may be complex. Don’t hesitate to reach out to our team of well-seasoned AI and ML engineers should you need assistance.

How to train an AI model

Deep neural networks are successfully utilized to solve computer vision and other tasks. Many state-of-the-art solutions built on this technology are successfully used in insurance and other niches. Training of a deep learning ML model encompasses the following steps:

Data preparation

AI algorithms require considerable amounts of digital data contained in photos and videos to train on. There is a strong correlation between the volume and quality of data and the training outcomes. There is even a concept known as garbage in, garbage out (GIGO), which means that nonsense input data produces nonsense output. So it’s recommended to find or prepare extensive datasets. In the case of vehicle assessment, there is a need for high-quality samples of damaged vehicles with varying deteriorations, from diverse angles, with different lighting, etc.

Data annotation, i.e., categorization and labeling of data for AI applications, is used to make the model understand what digital information in visual materials exactly is and why it is important. For training the AI model for vehicle assessment, engineers need datasets, the images of which were annotated in the following two ways:

Damage detection. When a diverse set of different damage types is collected, it’s necessary to label and tag objects of interest with relevant metadata using a polygon or a brush. Adding bounding boxes is not required as the model generates them itself. Not to mention that unnecessary bounding boxes may create overlaps and confuse the model being trained.
Car part detection. Such data annotation is a necessary parallel stage to damage detection. Tagging will help the model recognize particular vehicle parts that were exposed to damage and even calculate their area in preferred units of measurement.

Image labeling is nearly one of the most time and resource-extensive aspects of training an AI model for computer vision applications. Selecting objects on a pixel-by-pixel basis and assigning correct labels that would describe objects and their various attributes or features is time-extensive work. The method is manual labeling when humans annotate images by hand. Yet, the assistance of machine learning algorithms may somewhat automate and simplify the task.

In our case, we used images that were correctly labeled already from the open-source datasets mentioned above. It’s considered the usage of precleaned and prepackaged data. Other methods of data collection include custom data crowdsourcing, building a private collection, and collecting data automatically through web scraping and web crawling.

There are some data collection best practices to keep in mind. They include determining the problem and the goal of the ML project, establishing data pipelines, establishing storage mechanisms, evaluating the collected data, and collecting concise data aligned with the project’s goals.

Discover more about deep learning and object detection from this case study by Intelliarts AI.

Training

Once data collection and annotation are complete, the prepared set of training data is input into the computer vision model. At this stage, it’s crucial to identify errors that the model makes to perform needed adjustments later in order to avoid incorrect bias/variance trade-off balance resulting in overfitting and underfitting problems.

The underfitting problem occurs when a model cannot capture the relationship between the input and output variables accurately. It can be solved by simplifying the model.

The overfitting problem is a scenario when the model is so familiarized with the training data that the algorithm becomes restricted and biased. This way, it won’t function when there is any considerable discrepancy in the data. This problem can be solved by complicating the model, expanding the training dataset, or using data augmentation.

After training on the initial dataset, the model moves to the validation phase. In this phase, the AI algorithm works with the validation dataset, which enables engineers to prove their assumptions about the performance of the model. Any shortcomings, unconsiderable variables, and other errors should be revealed in this stage.

Testing

After the training and validation are finished successfully, the computer vision model should be tested for the last time. Normally, the final or holdout set is composed of data that the model hasn’t worked with yet. The data is labeled so engineers can calculate the model’s accuracy. The model is launched on such a dataset only once, and the outcomes are regarded as a potential accuracy that the model will be showing on real-world data. It’s important to find out whether the trained model is capable of making accurate results with acceptable consistency.

Usually, the entire training process is repeated from step one several times, as developers may need to prepare another dataset or modify the model. After multiple tries, the model that shows optimal results is chosen and is considered ready to go live.

Should you require assistance with training an AI model, don’t hesitate to reach out to our experienced team of AI and ML engineers.

Benefits and challenges of a two-model approach

As with any other technology, there are some pros and cons of two-model approach to be aware of. Benefits you should expect here include:

Flexibility in design. Each model can be optimized for specific tasks or types of damage.
Improved generalization. By leveraging the strengths of two models, it may reduce the weaknesses inherent in relying on a single model.
Layered detection. One model can focus on broader damage detection, while the other can focus on finer, detailed damage assessment.
Continuous learning. While one model is updated or retrained, the other can still be active, ensuring uninterrupted service.

At the same time, limitations of a two-model approach you should consider include:

Increased resource usage. Two models might require more computational power, memory, and storage.
Maintenance overhead. Need to update, train, and fine-tune two models instead of one.
Synchronization issues. Ensuring both models receive timely and consistent data updates.
Latency. Processing through two models might introduce additional latency, especially if they need to be run sequentially.

Choosing a one-model, two-model, or multi-model approach to a AI-based solution development is up to you. In the detailed scenario, the Intelliarts team stuck to a two-model approach which we describe in the below section.

The two-model AI solution by Intelliarts

Intelliarts engineers built a software solution composed of two AI models, one of which is used for AI car damage detection and the other one for car part detection. So, when a user inputs a damaged car’s image in the resulting solution, it indicates the damage and identifies the affected car part separately. This way, outcomes of image processing are identified damages and specified car parts. If they intersect, the solution output results, for example, as “left door — dent.” The outcomes are then compared against similar cases in a prepared image database with repair cost estimations.

The value of the AI solution for car damage detection

The finished software solution is capable of recognizing particular car parts, detecting and categorizing multiple types of damage, such as metal or glass damage and dislocation or replacement of car parts, evaluating the severity of the damage, and indicating the estimated repair costs. The functionality of a trained AI model should be enough to solve most of the simple car damage insurance claims with little human supervision.

Also read: Automated Claims Processing with Machine Learning

Needless to say that choosing a particular algorithm or a combination of algorithms, training AI models, and then building a finished software solution should be tightly linked to performing a set of specialized tasks.

For product owners, the end goal of computer vision projects is to apply technology in the insurance industry and get business benefits that include the following:

Cost optimization. Automated detection can be done at a much lower cost compared with manual inspection, resulting in significant savings for insurance companies.
Reduction of labor-intensive tasks. Automating the lion’s share of car inspection cases can reduce the workload for insurance company staff and allow them to focus on other critical tasks.
Improved accuracy. Automated systems can analyze images of the car with greater precision, identifying even small damages that may have been overlooked by human inspectors.
Faster claim processing. Automated damage detection can speed up the claim-processing time, enabling insurance companies to settle claims faster. This may result in improved customer satisfaction and retention rates.

Partnering with a trusted ML service provider is half the success when it comes to implementing a complex computer vision project.

Final thoughts

Vehicle inspection is one of the resource-intensive tasks in the insurance sector. To automate claim processing when car damage detection activity is involved, businesses may use computer vision techniques. The image segmentation algorithm is what can perform such tasks. Yet, it’s necessary to carefully choose an optimal neural network, which may be Mask R-CNN, U-net, or any other, to build a model on, train a model properly using data annotation techniques, and then estimate its performance.

Solving computer vision tasks requires strong expertise and substantial experience in AI and ML field. Here at Intelliarts AI, we have already assisted a number of our clients with their projects by delivering well-performing AI systems. Should you need assistance with developments in computer vision or other areas of machine learning — feel free to reach out to us.

Go with computer vision

Let’s talk

FAQ

1. What types of data are used to train the computer vision models?

You can use labeled images, video frames, augmented data, 3D scans, and infrared imagery.

2. In which industries and applications can a two-model car damage detection solution be used?

Examples of domains include auto insurance, car rentals, auto repair, vehicle inspections, fleet management, and used car sales.

3. How can I explore or implement a two-model computer vision solution for car damage detection in my project or business?

Having a demo of this you can gather diverse datasets for fine-tuning, select complementary architectures, integrate into workflow, and continuously evaluate performance to finally have your best two-model-based solution.

By Volodymyr Mudryi

DS/ML Engineer

Rate this article

5.0/5

6 ratings