29. October 2015

Secure Firmware Updates (Part II)

In part one of this blog post I have indicated why secure remote firmware updates are important, and how a firmware update mechanism may be threatened. Let us look at some ways for mitigating such threats.

Attacks on channels

End-to-end security is an approach that addresses many of the threats shown in the first part (the arrows in the diagram). The idea is to encrypt data - in this case firmware updates - by the sender at the origin of the data, transmit the encrypted data over some channel to the receiver - in this case the gateway - and decrypt it there. Intermediate stations, e.g. proxies or servers of the Internet provider, only see encrypted data, never clear text.

Actually, encryption is only relevant for firmware updates if their code contains confidential data, e.g. intellectual property such as a unique sensor fusion algorithm for a medical device. Otherwise encryption, and thus confidentiality, is of secondary importance. It is more important that data can be signed by the sender in such a way that the receiver can check whether the data indeed comes from the correct sender (which is typically well-known, as IoT devices rarely need to be "promiscuous"), and whether the data has been transmitted without changes (authenticity and integrity).

Thanks to end-to-end security, the channel can almost be ignored regarding security threats - at least with respect to authenticity, integrity and confidentiality. This massively simplifies the overall problem!

For our Limmat platform, we have implemented the following update mechanism:

compress, encrypt and sign the update image

upload image to Microsoft Azure Blob Storage

device regularly polls Azure for a new update image

if new update image is found:

download update image

store update image in dedicated flash area

reboot device

bootloader finds new update image in dedicated flash area

check signature of update image

decrypt and decompress update image to executable flash area

(if a power outage occurs, update process simply repeats on next reboot)

remove update image from dedicated flash area

write log entry to Microsoft Azure Table Storage to indicate success

start application code

application code communicates over TLS when necessary

For signing and encrypting an update image we have used NaCl. This library provides authenticated encryption in an efficient manner, using elliptic curves. Because the NaCl algorithms are efficiently implementable in software, it is possible to achieve a high degree of security even on low-cost microcontrollers. We have implemented the performance-critical parts of the code in Cortex-M4 assembly language.

The NaCl algorithm family is becoming more and more popular. For example, Google has proposed adding it to the TLS standard and Apple uses it in its HomeKit Accessory Protocol (see also my blog post here).

A question we are frequently asked is whether hardware acceleration would be useful for such algorithms. We tend to be a bit skeptical in that regard: at least for firmware updates, hardware acceleration of these algorithms would probably be of little to no benefit.

For distributing the signed and encrypted firmware updates we have used Microsoft's Azure Blob Storage Service. However, it would be easy to use similar services such as Amazon S3 or an on-premise company Web server.

Attacks on Internet ports

Firewalls: Firewalls protect the interface between Internet and endpoints. Firewalls allow to only expose gateways to the Internet, while keeping other devices on the gateway's local network invisible to the Internet at large - and thus to protect them from direct attacks from the Internet. Of course, Internet ports should be opened as selectively as possible.

For our firmware update mechanism, only one port needs to be opened (e.g. port 443 for HTTPS) and this only in outgoing direction. Direction is important, because incoming ports are difficult to protect against denial of service attacks from the Internet (which would endanger availability of the update mechanism). Every connection attempt to an open, incoming port leads to resource consumption in the gateway, in terms of memory and processor cycles. Thus Internet ports of IoT devices should never be opened in incoming direction; see also Clemens Vaster's blog post on this topic. Note that this constraint does not imply that the IoT device cannot be used as a server, see my blog post here.

Attacks on endpoints

Attacks on gateways

No shared secrets: The cost of a security attack should clearly be higher than the benefit for the attacker. Among other things, this means that a successful attack on one device should not imply that the same kind of attack has now become trivial for all other devices of the same type ("if you own one, you own them all"). Hacking one device, especially if the hack is detected or even destroys the device, is typically of much lower value than having hacked all of them. The technical consequence of this rule is that devices should not contain shared secrets, such as symmetric encryption keys.

Physical access control: In many cases an IoT device, such as a gateway, can be installed in a physically protected area, e.g. in a factory protected through physical access control.

SoCs as secure elements: Unfortunately, many IoT devices will be located in public spaces (smart city) or in homes with little physical protection. Such a device could be dismantled and broken open. This means that in such scenarios, the entire box and the printed circuit board contained in it cannot be regarded as secure anymore. For example, a memory chip could be removed and another one - e.g. with different firmware - could be soldered in. Thanks to highly integrated systems-on-chip (SoC) components, it is now possible to integrate process core(s), memory and peripheral circuits on the same chip. This greatly mitigates the threat, because many attacks on the internals of chips require possession of the device, specialized know-how and expensive lab equipment. A SoC can thus be used as a "reasonably secure element" if it provides no data buses to the outside world and if its debug interface (usually JTAG) can be switched off permanently. There must be no way to reverse this, e.g. by applying overvoltages. The other requirement is that it should be possible to create true, i.e. non-deterministic random numbers on the SoC, so that private keys can be generated in place and never need to leave the chip.

Systems that are immune to side-channel attacks: Some attacks on chip internals are relatively inexpensive, because they only require the non-destructive measuring of power consumption, radiation or other information that is leaked by a chip while it is operating. This can enable side-channel attacks that analyze the leaked information for patterns that allow deducing secret data.

The hardware design of a system can help mitigating such attacks. For example, a SoC contains many different subsystems operating from a common power supply, which makes an attack more difficult than on a dedicated crypto processor for example.
Software, in particular the software parts that access secret data, can be designed in a way that leaked information becomes useless to an attacker. For example, cryptographic code can be implemented in a timing-invariant way, meaning that simple power analysis is not able to deduce the processed secret data from power measurements. The NaCl algorithms have been designed by Dan Bernstein and his colleagues to be easily (and efficiently) implementable in a timing-invariant way. Nevertheless, care is needed to ensure this property in an implementation.

Trusted execution environments: For more complex hardware, with external memory chips attached to a SoC, the external chips must be considered untrusted. They may be used anyway: e.g. an encrypted and signed firmware update may be stored on an external flash chip. Alternatively, trust may be re-established by checking the untrusted system elements on demand. This is the idea behind trusted execution environments, as supported by trusted platform modules (TPMs). TPMs are designed for PCs and servers, but more light-weight variations for integration into SoCs will probably appear over the next couple of years. Separate TPM chips would hardly be attractive for high-volume, mass-market IoT microcontrollers due to the space that they take up and their additional power consumption and cost.

Attacks on cloud services

Cloud services are particularly attractive endpoints for attacks, because they usually consolidate data from many devices, persons or companies. Enormous damage can be caused e.g. if all credit data of customers is stolen, or if all traffic signals of a city can be remotely controlled by attackers.

Requiring independent security audits from the cloud provider can help mitigate these risks. In some cases, it is possible to only handle encrypted data in the cloud, so that it does not act as an endpoint itself. But in general, cloud security remains a difficult topic, where in particular the management, storage, updating and revocation of keys is not yet solved in a satisfactory way, in particular if there are scalability requirements to huge numbers of IoT devices.

Attacks on developer organization

Keep it simple: To minimize accidental vulnerabilities at the developer organization, avoiding unnecessary complexity is as obviously important as it is difficult to achieve in practice. Many security holes have nothing to do with malicious developers, or even with buggy security software, but with the difficulty of correctly using such software. Keeping a system as simple as possible is hard work and does not happen automatically - only complexity builds up seemingly on its own.

Peer reviews: Processes, architecture, design and code reviews among peers can uncover many security holes, whether introduced inadvertently or maliciously. Such reviews should be integrated into the organization's processes, so that e.g. two persons need to sign off on a new firmware update after having participated in such reviews.

Tools: Before a firmware update is deployed, it may be checked automatically as part of the build or at least deployment process, e.g. using static code analysis tools.

Partitioning: If the complexity of the firmware becomes large, e.g. due to integration of third-party open source code components, there comes a point where it is not realistic anymore to trust the entire firmware. Then it is necessary to divide the firmware up into loosely coupled partitions so that the malfunction of one partition cannot critically affect other partitions. Ideally, there only remains one small partition whose trustworthiness is critical - so that less strict development, deployment and firmware update processes can be applied to the other partitions. Partitioning can greatly profit from hardware support. For example, some form of TPMs as mentioned earlier could be used, or more flexible mechanisms such as ARM's TrustZone. But even without expensive hardware support, software approaches can provide varying degrees of partitioning: separation kernels, virtual machine monitors, operating system processes or language runtimes. For example, a sandbox for high-level languages like C# or Java can prevent entire categories of security vulnerabilities such as buffer overruns. The code in a sandbox may be entirely untrusted, only the implementation of the runtime needs to be trusted.

Various hardware mechanisms: There are a number of possible hardware mechanisms that are neither necessary nor sufficient for ensuring security, such as hardware accelerators for encryption or one-time programmable memories. Nevertheless, some of them can be useful for erecting further hurdles to an attacker who has successfully overcome other hurdles, e.g. to make it more difficult for a disgruntled developer to introduce a backdoor. This is an example of the more generic defense in depth approach: always assume that an attacker is able to overcome any given hurdle, so make it more expensive for him by adding yet more hurdles. However, if the security features of a microcontroller needed to do this require a thousand pages of documentation, then a good idea has been stretched so far that it has become almost self-defeating.

Conclusions

If you trust a device, trust its current firmware, and trust the origin of a firmware update, then the firmware can safely update itself, resulting in a trusted state again. This is possible even if the communication channels cannot be trusted, because the device can check the signature of firmware updates.

To be able to trust the first iteration of a device's firmware is an interesting challenge. It requires extending the threat model with the process for initial set up of a device: the point where a device gets its production firmware, gets its own unique identity, and is assigned to its owner.

For low-cost, mass-market IoT devices, SoCs as secure elements are often sufficient to achieve a good level of security. For more complex devices, it becomes necessary to partition hardware resources into isolated partitions, such that most of these partitions are uncritical in the sense that even if the software they execute had been successfully attacked, the attack could not easily spread to other partitions, and thus the attacker could not obtain full control over the device.

It should have become apparent that security is a system-level property that requires a threat model, an architecture-level view regarding possible mitigations, and careful implementation in order to not undermine the promises of an otherwise good architecture. There is a tension between keeping things as simple as possible on the one hand - because complexity is a fundamental cause of many vulnerabilities - and a plethora of possible mitigation mechanisms that are neither necessary nor sufficient in theory, but help achieving better defense in depth in practice - at the cost of higher complexity.

Cuno Pfister, Oberon microsystems AG

Tags: IoT, NaCl, firmware

Write a comment

Comments: 1

#1
Stephan Koch (Sunday, 08 November 2015 18:51)

Hello Cuno
Well written article!
Best Stephan