Log4j, open source issues and my dead hard drives
What you will learn
- Why Log4j is a problem for embedded developers.
- Challenges in using open source.
- How Bill killed 28TB of storage.
Unfortunately, I recently killed a pair of 14TB hard drives on one of my servers. How this relates to open source and Log4j isn’t too complicated.
The hard drives decided to kick the bucket because they were getting too hot. This happened because the case fan that was cooling the hard drive assembly decided it was a good time to die. Of course, there were no alarms and I discovered that the drives gave errors when I accessed certain files. This was a tertiary server that I don’t access much, which is why the issue wasn’t discovered earlier. More on that later, as well as what I ended up doing about it.
What is Log4j?
So the first piece of this puzzle is Log4j (Fig.1). Log4j is an open-source project with Apache Software Foundation. This Java-based system provides logging services and is integrated with a number of other projects and products, such as many open source projects.
Logging is central to the Internet of Things (IoT). Not all IoT systems use Log4j, as it is just one of many middleware systems used by developers. As such, many other systems depend on the quality of its code for functionality and security. The latter has become a problem recently with CVE-2021-45105, CVE-2021-45046 and CVE-2021-44228 in progress.
CVE stands for Common Vulnerabilities and Exposures. the CVE program manages the definitions and catalogs these issues, each given a unique number. For example, CVE-2021-44832 is one for Log4j2 regarding a vulnerability to Remote Code Execution (RCE) attacks.
Projects that use Log4j are likely to identify bugs like these until the version of Log4j they are using is fixed or the application mitigates the problem. Some problems are inherent to the underlying system. A release with a bug fix is all that is needed to secure an application, so it is generally recommended to keep up to date with the latest versions of software.
Embedded developers are an interesting lot. They tend to want to use stable development tools rather than diving into the latest technologies used with smartphone apps or cloud systems. Technologies such as hypervisors were rare a long time ago, but have become ubiquitous in mid- to high-end platforms. Things like serverless computing are still in the cloud. Even keeping up to date with supporting tools and software can be a hassle.
Process changes and fixes
Changes can cause all sorts of headaches, even for projects that don’t require certification. Those that do often require recertification when changes are made. Changes require testing and must be deployed. This can be time consuming and expensive.
Embedded developers write a lot of code, but most systems are built using other components, from RTOS to middleware. The open source components of this mix are where things like Log4j come in. Of course, proprietary solutions can reduce reliance on open source solutions, but the issues we discuss are not limited to open source platforms. They also occur with closed proprietary solutions.
The catch is who is responsible for fixing problems and distributing patches. Usually closed, proprietary software is the responsibility of the company that sources that software. Sometimes the source is available, allowing developers using the software to make changes. For open source software, the source code is available, but whether any fixes are rolled back into the main development thread is up to the open source maintainers. In general, developers do not want to take on the maintenance and support of open or closed solutions.
Some open source projects have corporate support. Linux is a classic example, but most open source projects lack financial support or work. Lately there has been more discussion and action taken regarding projects with little corporate support, but these are projects used by many companies internally or as part of their products.
Recently, a developer corrupted their own open-source project, essentially breaking the software that depended on it to try to make others aware of the lack of support and the importance of the project. Those who just used the updated source code without verification had a problem. Rolling back to an earlier version allowed software that depended on the project to work, with future updates possibly fixing the issue.
Of course, someone could essentially take the old software and start a new project based on it, since the software has an open-source license. There is always the issue of support going forward.
Open source support
Supporting open source development is at the heart of this issue, along with determining your application’s dependencies. There is actually software to track dependencies. This may include tracking the different types of licenses involved, but that’s another story for another time.
What developers need to keep in mind when choosing their tools and supporting software is what the actual costs will be, where the money is spent, etc. Free software, open source software, and closed/proprietary software all have their own set of costs and issues.
So how does all this open source stuff mix up with my dead hard drives?
Well, a while ago I wrote about how I used an open source project called Centreon to monitor my various systems, including the server with the overheating hard drives. It has now been changed to track the temperature of every hard drive on all my servers (Fig.2). This took a bit of effort because the systems used different controllers and had different ways of logging data. Log4j is not used by this particular software, but it is a similar application since Centreon basically pulls information from various sources, compares it to my settings, and reports warnings and errors.
This type of system is typical for an enterprise or server farm, but rare for a lab or home environment like mine. Also, while the system was already configured to track dozens of details like disk space, it was not configured to check system fans or the temperature of anything, including disk drives.
As I discovered, there are dozens of details that I could and should check through the system. In reality, however, I just wait for him to email me about any issues. It did pretty well overall. Still, this is hopefully a clue for you to keep in mind what you might be looking for in your digital twin or IoT device. This is because more information could be helpful if you thought about it. It could be the battery level or an intrusion alert.
Either way I had backups for most of the data on the drives and in terms of storage it was only 14TB as it was a RAID 1 setup. That didn’t help , because the discs are adjacent and both overheated. Fortunately, the backups are on another server in a different room. You can never have enough backups.