Let’s talk about how Truxton uses database and depots (D&D) to manage data in your Digital Forensics Investigation. If you thought D&D meant something else (http://dnd.wizards.com/), you’re probably a forensic technician or computer programmer. Truxton puts data into three categories, meta-data, raw data and something in-between we call snippets.
Meta-data. This is stuff that is actively searched. It is stored in a SQL database. Truxton uses PostgreSQL (https://www.postgresql.org/) for the database because of its stability, cost, and maturity. We use it in a somewhat unique way in order to go big. Things in the database are filenames, file content locations, phone numbers, geographic coordinates, etc. The items of information that investigators care about and will search.
Raw data is stored in big files we call depots. Files and free space from seized media are placed in these depots one after another. The database contains the information that maps file contents to locations in a depot. This is a well-worn technique (I’ve used it for almost 20 years now) for storing billions of blobs. Facebook uses it in their Haystack system (https://code.facebook.com/posts/685565858139515/needle-in-a-haystack-efficient-storage-of-billions-of-photos/), Amazon uses it in their S3 storage (https://en.wikipedia.org/wiki/Amazon_S3). The technique is used to balance the efficient storage of many things with the efficient management of disk space. When you load up a file system with billions of files, finding a file by name becomes challenging. The filesystem becomes a really bad database. Truxton can put a million files into a depot, but the underlying filesystem only has to manage one file. This means the file system can find the depot file quickly and has much fewer things to manage. It is far easier to move one file than a million.
Snippets are information that is difficult to get and is sometimes needed. It won’t be searched for, but it is still useful. Truxton snippets are XML files stored in depots. Truxton uses snippets in file stitching (highly fragmented file carving), NT login password hash extraction, installed application list, etc. We chose XML because it fits into our goal that users should be able to extend Truxton using any programming or scripting language. XML is supported by nearly everything. For example, a snippet in support of NT login password hashes would be the “syskey” which is made up of the JD, SKEW1, GBG and DATA fields (http://moyix.blogspot.com/2008/02/syskey-and-sam.html). If someone wanted access to these values, they would normally have to do a deep parse of Windows registry files. That is a difficult thing to do. Parsing XML is far easier.
When Truxton processes incoming media, it will store file names, dates, hashes, content location, etc. in the database. Big data like file contents go into depots. When the desktop displays a photograph found in media, all of the data items come from the database, and the image comes from the depot. Snippets are a key ingredient in non-linear-time processing (I’ll blog about that later) and report generation.