Reverse Engineering Software and Systems
As part of our aptent support business, Chipworks applies software reverse engineering to generate evidence of patent infringement, which is used by IP groups and outside counsel in patent licensing negotiations and litigations. This evidence is in the form of a claim chart which maps relevant patent claim elements to the infringing product. We apply software reverse engineering to analyze and document infringement on a wide variety of products, from consumer electronics to communication devices and automotive systems.
We recently went inside an automotive electronic control unit (ECU). These ECUs comprise a wide range of units like the powertrain control module (PCM), body control module (BCM), electric power steering (EPS), airbag control unit (ACU), or electronic brake control module (EBCM). In light of the recent media attention targeting the automotive industry, we thought it would be interesting to share some of our findings in relation to the perceived quality of the systems software against other semiconductor-based systems we have analyzed.
Since our overall findings are a bit mixed, we will spare the blushes of the leading automotive companies by not publishing the make or type of module we analyzed.
Reverse Engineering an Automotive Module
First we disassembled the module, reverse engineered the PCB, scoped its signals while operational, and reverse engineered its software.
Two types of software analysis were done. We extracted the raw binary code from the module and analyzed it statically (called “dead” code analysis). We also analyzed the code while it was in action on the target device (called “dynamic” or “live” code analysis).
We then rated several aspects of the unit (e.g., physical protection, code complexity) for quality. The results were surprising to us.
Physical Protection: Good
The module has a heat sink which also plays the role of a mechanical cover. The CPU detects the presence of the module and stops running the code if it is removed. Below is an example of one of the holes used for screwing the module to the PCB.
Note the resistor connecting the screw pad to the CPU.
Code and Data Space Protection: Good
The code space is checked for corruption and tampering by calculating and checking a CRC over the complete code space. The twist on this particular implementation is that there are several routines doing similar CRC checks, making tampering with the code more difficult.
The content of the RAM data space cannot, of course, be checked for correctness since it changes all the time. Instead, the memory integrity is checked by writing and reading back standard 0xAA – 0x55 patterns in the background. The obligatory enforcement of the atomic instructions has been observed here.
Compared with other consumer and communication devices that come across our desks, both the code and data space protection were a notch higher.
Code Complexity: Bad
Inner workings of the code have been analyzed, and a number of software routines completely dissected. The surprise here was just how much code was created for what was supposed to be simple processing of a few sensor values. A lot of the code improved sensor resolution and produced more precise reading; correction, normalization, and adjustment routines were abundant.
This left us with a question – why was it necessary to go to great pains to provide 0.003% precision when 0.3% would have sufficed for the purpose?
This observation was further confirmed by the sheer number of routines in the code – over 700 in total. Around 200 of them were involved in the comparatively simple task of processing two sensor inputs and outputting two variables. This certainly looks like overkill.
The calling tree, showing which routines call which others, looks rather intimidating.
The overwhelming impression was that precision was put far ahead of code simplicity. Occam’s razor was rather dull in the process of this code development.
A good coding practice is to sprinkle the code with debug logs. Granted, they do increase the code size, but they can also save your bacon when you need to debug a problem in the field on a live system. Usually, the execution of the debug logs is skipped and turned on only when needed.
We did not see evidence of debug logs in this code.
The CPU in question did not have a JTAG or any other debug port. The only way to debug it effectively is to use an in-circuit emulator (ICE). The only problem with the ICE is that the CPU needs to be completely removed and replaced with a bulky ICE module. Such a contraption can not fit mechanically in the narrow space provided for the module. Couple this with the above mentioned physical protection, and the result is a very difficult setup to debug in the field.
Error Handling: Ugly
Another good coding practice for mission critical applications is to check for software errors and log any found. A balance needs to be achieved between too much error checking and too little. Too much leads to code bloat (more code, more bugs), and too little will let code bugs lurk undetected.
The voluminous code we inspected is certainly not a result of too many error checks. In many of the routines inspected, there were no checks for software errors.
We did, however, find checks verifying the raw sensor values and final output values, but not much more. Even these errors were not logged immediately when they were detected, potentially making the debugging more difficult.
However, the most worrisome error checks were those where values were checked if they were outside a maximum range allowed. One would think that alarm bells would be going off upon detecting any such errors, and they would be logged and appropriately handled. Instead, all that was done was to simply cap the value to the maximum (or minimum) and forward it on as if nothing had happened!