Databases and Depots

Database and DepotsLet’s talk about how Truxton uses database and depots (D&D) to manage data in your Digital Forensics Investigation. If you thought D&D meant something else (http://dnd.wizards.com/), you’re probably a forensic technician or computer programmer. Truxton puts data into three categories, meta-data, raw data and something in-between we call snippets.

Meta-data. This is stuff that is actively searched. It is stored in a SQL database. Truxton uses PostgreSQL (https://www.postgresql.org/) for the database because of its stability, cost, and maturity. We use it in a somewhat unique way in order to go big. Things in the database are filenames, file content locations, phone numbers, geographic coordinates, etc. The items of information that investigators care about and will search.

Raw data is stored in big files we call depots. Files and free space from seized media are placed in these depots one after another. The database contains the information that maps file contents to locations in a depot. This is a well-worn technique (I’ve used it for almost 20 years now) for storing billions of blobs. Facebook uses it in their Haystack system (https://code.facebook.com/posts/685565858139515/needle-in-a-haystack-efficient-storage-of-billions-of-photos/), Amazon uses it in their S3 storage (https://en.wikipedia.org/wiki/Amazon_S3). The technique is used to balance the efficient storage of many things with the efficient management of disk space. When you load up a file system with billions of files, finding a file by name becomes challenging. The filesystem becomes a really bad database. Truxton can put a million files into a depot, but the underlying filesystem only has to manage one file. This means the file system can find the depot file quickly and has much fewer things to manage. It is far easier to move one file than a million.

Snippets are information that is difficult to get and is sometimes needed. It won’t be searched for, but it is still useful. Truxton snippets are XML files stored in depots. Truxton uses snippets in file stitching (highly fragmented file carving), NT login password hash extraction, installed application list, etc. We chose XML because it fits into our goal that users should be able to extend Truxton using any programming or scripting language. XML is supported by nearly everything. For example, a snippet in support of NT login password hashes would be the “syskey” which is made up of the JD, SKEW1, GBG and DATA fields (http://moyix.blogspot.com/2008/02/syskey-and-sam.html). If someone wanted access to these values, they would normally have to do a deep parse of Windows registry files. That is a difficult thing to do. Parsing XML is far easier.

When Truxton processes incoming media, it will store file names, dates, hashes, content location, etc. in the database. Big data like file contents go into depots. When the desktop displays a photograph found in media, all of the data items come from the database, and the image comes from the depot. Snippets are a key ingredient in non-linear-time processing (I’ll blog about that later) and report generation.

Truxton’s Digital Forensics World View

Written by Sam Blackburn

 

Truxton believes that information should get into a digital forensic investigator’s brain as quickly as possible in a format they can understand. Every piece of Truxton was designed and implemented with this in mind. “Information” – Go one step beyond raw data and extract the information from it. “Quickly” – Don’t just process data at a screaming rate, thoroughly exploit it and add new forensic techniques with minimal effort. “Format” – Present the information in a context that is easily understood. Geographical data is most easily understood on a map, and juries rarely understand buzzwords.

Technology always outpaces the law, both sides of the law. Bad guys innovate, so do good guys. Law and policy always lag behind for various reasons (https://www.youtube.com/watch?v=tyeJ55o3El0). Digital forensics teams generally follow the ideas pioneered by medical labs. A specimen comes in, it is processed with sterile equipment, test results are analyzed, findings are documented, specimen gets stored away, equipment is sterilized. No risk of contamination. Data analysts want everything from everywhere. They have a real fear that what they are trying to learn is sitting in some data they weren’t allowed to have. The law says you have to have a warrant to gather data. You don’t get access to the entire media, just portions of it. This sucks. Truxton is designed to handle ALL of the data from every investigation for all time. An analyst’s dream, technician’s nightmare.

How can this divide be crossed? Well, Truxton of course. No surprise there. I mean, hey, you’re reading my blog.

Forensic technicians load the media into Truxton, perform their magic, apply warrant restrictions, then export the findings, and data investigators are allowed to have. These results are then imported into the investigator’s Truxton. This one-way transfer can take place via DVDs, thumb drives, etc. The investigators are the keepers of corporate knowledge, not the forensic technicians.

For a rather short time, I was the head of software for a product called Drugfire (https://en.wikipedia.org/wiki/Drugfire). This would match bullets from crime scenes with weapons that fired them. The beauty was we could produce “cold hits” where seemingly unrelated cases could be linked based on ballistics, well, scratches really. We could do this because the images of the bullets were shared amongst customers or already in a central database. In the digital forensics world, there’s no analogy. Everything is siloed. Since Truxton keeps the metadata from digital media in a database, it can be shared by exporting from one Truxton an importing to another. No file contents required. Let me say that again, NO FILE CONTENTS REQUIRED. Truxton’s exploitation of the media produces INFORMATION, not data. Phone numbers, serial numbers, geographic coordinates, et al and where the information came from. The information is correlated and any “cold hits” can be traced back to the source file in the media. Then, and only then, are file contents needed.

By sharing or centralizing the entities that Truxton extracts, we can automatically link a long password found on a phone in Herndon, VA with the password of a WiFi access point in Ottumwa, IA. One of the views in Truxton shows you everything in your current investigation that also occurs in any other investigation.

But, alas dear reader, a policy may (will?) prevent us from reaching investigative Shangri-La. What can we do in the meantime? Truxton has an automated alert system. Alerts are generated from BOLOs which contain investigator contact information, case description, and search criteria. Every time a piece of media is loaded, Truxton goes through its list of BOLOs. When an alert is generated, it contains everything you need to contact the person who entered the BOLO. In the case above, an investigator would have to create the BOLO for the particular password they were interested in, then distribute their BOLO to all other instances of Truxton running. This greatly reduces the chance for a true “cold hit” but remains a useful feature.

So there you have it, Truxton believes all digital forensic investigators should have access to all information for all time and correlate with all investigations throughout the nation. But, we realize that policies will restrict this either by keeping data unsharable or automatically aging the data. But, at least there is symmetry.