Predictive Maintenance Data Analytics [Case Study]

21 July 2021
10 min read
Find out why predictive maintenance data analytics is so important in building a Machine Learning-powered solution for EV charging stations.

In this article, we describe one of the recent projects we realized for our customer in the automotive industry. We saw an opportunity to implement a machine learning-powered solution for predictive maintenance and anomaly detection for a global EV charging company. The customer agreed that it would be a great idea to give the data science project a chance in order to improve the maintenance of EV charging stations and reduce the risk of downtime.

EV charging stations

Our partner offers modern electric vehicle charging solutions with premium 24/7 customer service, including turnkey certified charging stations, a software platform with a wide variety of management features for businesses and organizations, and a trusted EV driver mobile application. Since its inception, the company gained the trust of its clients, proven market leaders in the automotive industry.

Our customer offers a 3-step process to design and deploy the perfect EV charging solution for each unique scenario. The steps of the process include assessing the most fitting solution for the client’s site, facilitating the installation of charging stations, and offering consultations on ensuring optimal performance and top experience for the end-users. The drivers can use a custom mobile application that displays the location of the closest EV charging station, station ID, the provided power level, whether the station is occupied or free at the moment, and much more.

Business challenge

The EV charging company we partnered with, is always looking to keep user experience on the highest level, that’s why they constantly work on maintaining EV charging stations in top condition and reducing possible downtime. Unfortunately, EV chargers are at risk of breaking down at any moment, just like any other devices, resulting in inconvenience for the end-users.

All EV charging stations are operating through an open standard, and they are able to communicate with the network, providing information on charging sessions and the overall condition of each station. The standard is called the OCPP protocol – an application protocol for communication between an EV charging station and the central management system or a charging station network. The same principle as with cell phones and cell phone networks. Charging stations owners can change OCPP-based networks, but the protocol will always remain the same.

Machine learning technology is able to detect anomalies in the behavior of these stations, and it is possible to build a predictive maintenance data analysis solution that will minimize downtime and prevent serious damage to the devices. Finding anomalies would eventually lead us to the causes of failures and help us to predict and prevent them.

To learn how machine learning might be applied to keep assets in top shape read our article “Predictive Maintenance Part 1: The Domain Overview“. If you want to dive deeper into the predictive maintenance challenge learn the ways machine learning can help businesses and organizations in any industry check our article “Predictive Maintenance Part 2: Machine Learning techniques to solve a maintenance problem“.

There are several possible options for achieving this goal with ML algorithms, so the first step was to frame the problem correctly. There are two main types of ML algorithms to deal with problems:

  • Supervised ML algorithms are a great fit for predictions made on a set of examples. They can help with classification problems and regression problems.

  • Unsupervised ML algorithms are for unlabeled data and cases we need to find hidden characteristics in a dataset. Unsupervised ML algorithms can help to find anomalous data and extract that data.

Problem framing for a machine learning project is very important to the success of an entire solution. Here is a set of recommendations for business and tech experts if you are interested in more details.

In this case, we started with problem framing, which consisted of the following steps:

  • Articulating the problem (using predictive maintenance to determine when the station will break down).

  • Looking for data that was already labeled (finding out whether our partner already had labeled data suitable for a machine learning solution).

  • Finding out that data comes directly from charging devices in EV Stations.

  • Determining quantifiable outputs (as an output we agreed to expect the date when the station will go out of order)

Together with our partner, we decided that an ultimate goal for the ML project would be to implement a predictive maintenance solution for different EV charging stations they install for their clients. Only after precise problem framing we moved to data analysis.


After the series of meetings with Intelliarts, our partners made the strategic decision to perform a thorough predictive maintenance data analysis first. For every project that relies on data, we use the CRISP-DM methodology, which we find more useful than analogs like KDD and SEMMA as it has a very important “Business Understanding” phase. Previously, we compared all three methodologies, so if you want to know more details on why we decided to pick CRISP-DM, you can check out this article for a deeper breakdown.

Here are the stages of CRISP-DM methodology:

CRISP-DM methodology

After defining the task which was possible to solve with machine learning, we moved on to understanding the data, according to this methodology.

There are three types of EV charging stations named L1, L2, and L3. They all differ in speed of charging. We received access to the database with the latest production updates. The data investigation was split into a high-level overview and a deep investigation. After two steps of data overview, we figured out that about a dozen of collections among a total of over a hundred were suitable for machine learning.

The EV charging company had records of the data for the period of five years:

  • 25,000 records in 2015

  • 50,000 records in 2016

  • 150,000 records in 2017

  • 300,000 records in 2018

  • 550,000 records in 2019

  • 1,000,000 records in 2020

After some research, we discovered that the data from the beginning of 2020 was the most suitable for this project, hence we selected it for further usage.

Our partner has business relationships with numerous vendors of the stations, so, as a first step, we decided to focus exclusively on the power Level 2 BTC stations. We made this decision due to the popularity of this vendor and the fact that the information was covered best for ML needs in the database. However, we still faced some major data labeling issues, which are common for projects of this kind:

  • Some stations had mismatched identifiers

  • In some cases, it was hard to distinguish between real sessions and fake or test ones

  • Charging session status was flagged as “Invalid” in a variety of situations, rather than having a particular status for each particular situation – because of that it was sometimes hard to distinguish what had really caused a charging session failure

  • If the power value was less than a certain number, the car was marked as fully charged

  • Two or more charging sessions were sometimes combined into one

  • The information about station reboots looked unrealistic in some cases

  • Some charging sessions had a start time bigger than a stop time

To sum it up, the quality of those data labels was not enough to create an effective ML-powered predictive maintenance solution that was planned initially. The biggest challenge was that the issues were reported manually anytime after the actual event (charging, breakdown, or maintenance) occurred. Because of that, the Remaining Useful Life (RUL), a subjective estimate of the remaining years and days of each component of the system, couldn’t be calculated precisely enough. The labels showing the maintenance mode of the station don’t always mean that the station was actually broken. In some cases, it was impossible to find out what actually caused the failure.

As the result of the initial data analysis, we understood that the labeling process requires significant improvement so that failed charging sessions have clear flags and distinct failure types, as ambiguous data don’t work for an efficient predictive maintenance data analysis solution. Ideally, the labeling process should be automated to minimize human interaction and reduce potential errors in the data collection flow. Other key factors for the realization of such a solution are collecting properly labeled data for at least 1 or 2 years, and improving the data collection pipeline.

The second goal was to implement an anomaly detection solution to detect abnormal behavior of the stations. We decided that the same data selection (information about BTC stations of level 2 in 2020) was a good place to start. The only features that described session behavior and could be used for anomaly detection were: power, charging time, parking time, and information about charging periods that took place during a charging session. Let’s take a look at the data from one of the charging stations and see what insights we might get out of it:

Energy distribution
Energy distribution
Charging time between 50 and 3000 minutes
Charging time between 50 and 3000 minutes

The most interesting thing we spotted after the analysis was that there were charging sessions with a really long duration, sometimes up to 30130 minutes, which could be considered an anomaly, and required further investigation. 10 kWh/hour power level was also a huge value for that kind of EV, so it required deeper investigation too. And there were other suspicious charging stations behaviors that we decided to investigate.

We used the following unsupervised anomaly detection algorithms to find anomalies:

  • DBSCAN (Density-Based Spatial Clustering of Applications with Noise), which is useful for finding arbitrary-shaped clusters and clusters with noise, also known as outliers. In this algorithm, if the point is close to many points of a particular cluster, it belongs to it. DBSCAN determines the number of clusters while detecting the outliers, it is very robust to outliers, performs well with arbitrary shape clusters, very effective when the distribution of values in the feature space cannot be assumed and works well in the multidimensional feature space for searching outliers. On the other hand, this algorithm requires powerful computing resources and is very sensitive to some parameters.

  • The isolation forest algorithm structures data points as nodes of an isolation tree while assuming that anomalies are rare events with feature values that differ a lot from expected data points. This is a precise and easy to optimize algorithm with a few parameters and is very effective when the distribution of values in the feature space cannot be assumed. However, when the algorithm isn’t optimized correctly, you can easily waste time on training and money on computing power.

  • The local outlier factor (LOF) gives an anomaly score for each data point. It is achieved by measuring the local density deviation of a given data point considering data points around it. This algorithm can work and provide great results out of the box for various domains, however, in higher dimensions the detection accuracy gets affected.

Two component PCA

After the anomaly detection analysis, we saw that there were two normal clusters for two types of cars for the Level 2 EV charging stations. These stations sent a number of short charging sessions that could be considered abnormal. Additionally, there was information about the reboots of stations, but it was unclear what exactly caused them. We found two clusters with normal behavior and one cluster with abnormal behavior.

The next step was to understand the data from the business side. This task was assigned to the product management team which collaborates closely with the station technical experts. The team of station experts should validate our findings to make a decision about our next steps: whether we continue the investigation or start building software with our findings in mind.

In case you’re interested in another case study on manufacturing, check this post on the problem of false defect detection and how we solved it.


An effective predictive maintenance solution in manufacturing requires clear and properly labeled data for at least a few years. The ambiguous data for many of the parameters is not the best for building correct assumptions. It is quite common for companies to collect historical data, but to build an effective machine learning data analysis solution the collection process should be adjusted appropriately. Our partners had challenges similar to any company that steps on the path of introducing a solution like this.

We advised our partners to update a data-collection pipeline to get all necessary information from the OCPP communication protocol. You can learn more about the creation of a data collection pipeline for ML-powered anomaly detection solutions in this article. This is how the data pipeline might look:

Data transportation process
Data transportation process

Proper data warehouse architecture should also be implemented for better results. In our case, the data on a collection phase was transformed and saved in the aggregated state, but the raw OCPP data was never stored as it is, and this could be improved. Storing raw data is very important for an organization because this allows having all the information in its original state. With raw data, the organization will be able to transform data into different forms, perform deep analysis, generate reports, and merge with other data sources for getting more insights.

Data warehouse architecture suitable for ML projects might look like this:

Data Warehouse Architecture
Data Warehouse Architecture

Predictive maintenance data analysis is a critical task that needs to be solved before building a powerful ML-based solution. There are some common issues for organizations, which we talked about in the article, but ultimately knowing and understanding the data makes a business more resistant. EV charging company got data analysis and consultations from Intelliarts at the right moment to improve their data collection strategy.

Outcomes of the project so far:

  • Now our partners have information that the raw data should be collected for at least one year or more in a proper format. They know exactly what type of information is needed for the desired solution and are going to increase the variety of sensors in their stations.

  • The data labeling should be automated with manual input minimized, while the labels themselves should have only one meaning and more label categories should be added.

  • The raw data should be stored separately from application storage. That storage could be used as a source of data for machine learning tasks.

At the moment our partner continues to collect data for the project, which is currently ongoing. The data analysis showed some aspects that could be improved to build an ML-powered predictive maintenance and anomaly detection solution faster and eventually gain a strategic advantage on a competitive EV chargers market.

We at Intelliarts love to help companies solve the challenges with big data strategy design and implementation, so if you have questions related to ML pipelines in particular or other areas of data science — feel free to reach out.

Build an ML solution
Contact us


Alexander Barinov
Managing Partner
Rate this article
5 ratings
White paper
White paper
Turning Predictive Maintenance into a Success Story for Your Manufacturing Company
Download now
Related Posts