The first part of this article presented the CAN environment, the message injection threat, and how we managed to extract the necessary data from CAN messages. In part 2, we will use this data to train Machine Learning models under normal conditions and try to detect message injection attacks!
A quick look at the extracted variables confirmed our intuition regarding the interactions between the physical variables in a car. Below, you can see a matrix where we colored the pairwise correlations in a subset of the extracted variables. Red means high correlation and White means no correlation.
We can see strong correlations between some of the variables. For example and as expected, the throttle and acceleration variables are highly correlated. We also observed that these correlations are very stable over time.
Learning The Normality
One way to learn the normality of a system is to:
- find a task,
- train a model to do the task on normal data,
- monitor how well it does on new unseen data,
- consider poor performance on new data to be an anomaly.
Choosing a task is an art in itself. One important aspect, the task has to be hard enough to force the model to learn interesting concepts about its input data.
Our First Proof-of-Concept: A Car Speed Prediction Algorithm
In our first algorithm, the task we chose was a very concrete and practical one.
We trained a Deep Learning model to predict the speed value from other variables values (more precisely from past values of other variables).
The model learned to predict the current speed of the car from current and past values of the RPM, throttle, steering angle, etc.
We used a Deep Learning architecture called LSTM (Long Short Term Memory) which is very popular and, more importantly, very good at learning from sequential data.
In the graph below, you can see the speed predictor in action on unseen data (a different car drive not used for training). We can see in orange the prediction and in blue the actual “real” value that was read on the CAN bus. We can see that there is clearly enough information in the RPM, and other variables to predict the speed value of the car with a relatively good accuracy. In more practical terms, we could say that the neural network has found a mathematical formula that approximates the speed as a function of the RPM, throttle, and acceleration values.
We can now use a performance metric for the task, and monitor it to find anomalies. The simplest is the absolute error between the prediction and the measured truth. We can see that our speed predictor approximates the real speed value with a reasonable error (in red), most of the time, less than 5 mph. It is important to note that, in this experiment, our dataset was relatively small (about 3 hours worth of driving data). The prediction should get much better with more training data and more reverse-engineered variables (gear ratio, etc.).
Now, if an attacker (like the one we described earlier) tries to inject low speed value messages at a time where the car is actually at high speed, we would see a very big difference between our prediction (unaffected by the attacker) and the values measured on the bus (poisoned by the attacker). Below, a simulation of such an attack on our previous dataset, with an obvious spike in prediction error at the time of attack:
A more complete anomaly detection system could also learn to predict, and monitor other variables. One prediction can be made for each variable of interest and individual prediction errors can be monitored to detect anomalies (more on that later).
This anomaly detection system seems quite robust to defend against the threat we established earlier. If an attacker has to inject some specific speed values to perform his attack and remain undetected, he would have to manipulate all the sensor values speed is correlated with, and do it:
- in real-time,
- in a physically-coherent manner (within the limits of what the model has been able to learn).
Going Further, Learning From Raw Data
One interesting addition in this project was a promising anomaly detection Proof-of-Concept that skips the only remaining “manual” part of the process, the reverse-engineering phase.
In our first anomaly detection algorithm, we had to know where the variables were located and extract them to feed our Machine Learning models. This required a lot of process knowledge.
This time, we built a completely generic anomaly detection system that learns from the bytes (raw data) with no prior knowledge of where and how variables are stored.
Instead of physical variables predictors, we build byte predictors.
Above, as an example of such predictors, the prediction of the second byte for messages with CAN ID – 0F3. We get many of them and, as expected, some bytes are unpredictables.
We then keep only the most interesting bytes, based on their individual prediction performance. If a prediction is usually good, monitoring its error should be particularly interesting. As a consequence, we also assign a weight for each byte that is inversely proportional to its standard, normal-condition prediction error.
To build a consolidated (unique) anomaly score, we compute the weighted average of the individual byte prediction errors.
Here is an example of the byte prediction anomaly score on different datasets (four 10-minute car drives). In one simulated attack, we chose 3 random bytes and set their value to their known minimum (usually 0).
We can see that the byte prediction gets way worse than usual at the time of the attack. That’s it for this Proof-of-Concept!
Conclusion & Perspective
In this experiment, we showed that it is possible to learn the normality of a process by learning the normal interactions of key physical variables, and to use deviations from this model to detect simple injection attacks.
We also developed a promising Proof-Of-Concept for a generic anomaly detection system that learns from the bytes without knowing where these key variables are stored.
Since last year, we have been hard at work to try and apply these ideas in the realm of ICS, IoT and traditional networks, testing new things, and adapting some of the ideas we got from this project.
We are convinced that process variable tracking is an interesting way to protect cyber-physical system like ICS/SCADA and we look forward to sharing more information on this topic with you.