Equifax's database was breached through a library vulnerability. Apache Struts is to blame.
Naturally the software community turns its focus on fixing the library and lament on the oversight of not patching sooner. A "Struts-centric" view is necessary at this stage to identify other affected systems and prevent another exploitation. However, looking at one specific library stops us from seeing the root cause of this very issue, which are development technology trends and established practices.
First of all, how exactly did Struts fail us? It turned out that the REST plugin, combined with the XStream handler, allows remote code execution.
Here's a breakdown of how it works:
A RESTful API endpoint is just a web link that you send requests to. Suppose that there is a link that creates a comment. A "classical" link would take on this format:
https://www.target-site.com/create_comment?name=Bob&message=Test
A strict RESTful format would use the HTTP methods (GET,POST,PUT,DELETE) to indicate resource actions (read, create, update, delete). And the data payload of the HTTP request is used to carry more information. XML is typically used for this payload. A RESTful request would look like this:
POST /comment HTTP/1.1
Host: www.target-site.com
<CreateCommentRequest>
<Name>Bob</Name>
<Message>Test</Message>
</CreateCommentRequest>
The XStream component in Struts is responsible for parsing out a specific XML node. In this case, the entire XML document describes a comment creation request. The REST plugin in Struts then converts this XML representation to a Java object that contains the Name and Message attributes.
So far so good.
Keep in mind that Struts has powerful (read: vulnerable) serialization routines. Any objects can be expressed as plain XML text. This is called serialization. The reverse direction, unserialization, converts XML back to binary object. A lot of "library magic" is necessary for this to function. Picture that we can sequence a person's DNA as well as his current knowledge and personality. All such information is printed on paper. We then alter the print before cloning this person, both biologically and socially, so that we can swap in the clone, and none of his family or friends can tell the difference. For the sake of illustration, we'll use some much simplified notation. Imagine the Comment class has a "print" function:
public class Comment{
private String name;
private String message;
public void print(){...}
}
An instance of the comment class would be serialized as this:
<Comment>
<Name>Bob</Name>
<Message>Test</Message>
<__class_function_public_print>
INJECTED_CODE
</__class_function_public_print>
</Comment>
Of course, a properly written software would not serialize object methods. But an attacker can synthesize one just like the above. Struts then dutifully restores the object from the XML, except with a modified function. If the "print" method is invoked anywhere else in the code, the injected code is executed. Continuing our cloning example, the replacement person is almost the same as before, but with a killer intent.
Wait a second, this sounds awfully familiar. We've dealt with this type of vulnerability many times before. Not just in Java, but in PHP as well. This entire exploit can be summed up with one phrase - Object Injection.
Many WordPress plugins serializes entire objects in a database. With the help of SQL injection techniques, these objects can be made to carry method code in addition to instance variables. The PHP magic methods such as __wakeup are particually exploited.
Similar to how Struts is now patched, multiple layers of PHP have been fortified, from the application plugin code, to frameworks, to the PHP execution core. Sometimes these vulnerabilities resurface and after some incidents they are fixed again.
As a safety measure, we have altogether abandoned the use of serialize and unserialize functions in our coding practice. Instead, we use json_encode and json_decode to avoid mixing in function code.
But how should the rest of the developer world react? Object oriented programming and serialization are their way of life. A "class" or "prototype" neatly organizes a programmer's thought. The actual plumbing is left to layers of libraries and abstraction. Even when the programmer is facing a more hands-on inter-process communication task, he is likely to dump out the entire object and expect to get it back on the receiving end.
What further propagated this Lazy Serialization mindset is the use of MVC in web frameworks. We have previously discussed the issues of MVC, one of them being the Model not knowing what the View wants and blindly providing all the information. In fact, we once audited an Angular-Node project where a Node.js backend is the Model and the Angular frontend, or the View, renders the JSON objects. We found that sensitive database entries were contained in the JSON response even though they are not displayed.
We must stop the Lazy Serialization mentality, or in broader senses, stop being lazy. OOP and MVC can be powerful when used correctly. But their power should be treated as convenience not ignorance. There is a cost associated with data transfer. Beware of that cost and move around only what is needed.