Friday, August 19, 2011

Fuzzy Hashing and E-Discovery

Recent work has made me consider an interesting role fuzzy hashes could play in E-Discovery.

In the last year I've worked a few intellectual property theft cases where Company A has sued Company B claiming Company B stole IP from Company A in the form of documents, design drawings, spreadsheets, contracts, etc.

In these cases Company A has requested that Company B turn over all documents that may pertain to Company A or Company A's work product, etc. with specific search terms provided and so on.

Company B argues they can't comply with Company A's request because they have documents relating to Company A and Company A's work product as a result of market research for the purposes of strategic planning and that turning over all of those documents would damage Company B.

In such cases, if Company A is concerned that Company B has stolen specific documents, maybe a better approach would be to request that Company B run ssdeep or another fuzzy hashing tool against all of their documents and turn over the fuzzy hashes.

Company A can then review the fuzzy hash results from Company B without knowing anything about the documents those hashes came from. They can compare the set of hashes provided by Company B against the set of fuzzy hashes generated from their own documents and make an argument to the judge to compel Company B to turn over those documents that match beyond a certain threshold.

24:DZL3MxMsqTzquAxQ+BP/te7hMHg9iGCTMyzGVmZWImQjXIvTvT/X7FJf8XLVw:J3oy+x/te7qmNmlYvX/xp8W

Sunday, August 14, 2011

Facebook Artifact Parser

If you have a Facebook account, take a look under the hood some time by viewing the source in your browser while you're logged in. Imagine having to deal with all of that for a digital forensics investigation. It's mind numbing, especially if all you want is who said what and when. I spent the better part of today brushing up on Python's regular expression implementation and put together this Facebook Artifact Parser that does a decent job of parsing through Facebook artifacts found on disk (as of the time of this writing).

In my case, I made use of this by first recovering several MB worth of Facebook artifacts from disk and I combined all of these elements into one file. Having done that, run this script from the command line giving the name of the file as the only argument. It works on multiple files as well.

Sunday, August 7, 2011

Yahoo! Messenger Decoder Updated

I'm working yet another case that involves Yahoo! Messenger Archives. I tried using JAD Software's excellent Internet Evidence Finder for this and it worked pretty well, but in the interest of double-checking my tools, I brushed off my old yahoo_msg_decoder.py script that I'd written a few years ago. It used to be interactive, meaning it was run with no arguments and would prompt for a username and a filename to parse, this was less than ideal for parsing a large number of files.

I have remedied that situation. The script now takes three arguments, one optional. The first is the username for the archive. Yahoo! Messenger Archives are xor'd with the username. The second argument is the name of the other party to the conversation and the third argument is the name of the dat file to process.

The nice thing about this is that you can now create a for loop like the following from a Linux environment and parse multiple files at once:

for i in $(ls *.dat); do echo; echo "== Parsing $i =="; yahoo_msg_decoder.py --username=joebob --other_party=billybob --file=$i; echo "== Finished parsing $i =="; echo; done


The output of this for loop can be redirected to a file.

My script is still not perfect. On some dat files it doesn't properly xor the data and yields garbage. I have not determined why that is the case yet.

As for IEF, I'm not sure why, but running it over the same dat files as my script, it dropped some portions of the conversation. I will be reporting the issue to JAD. But it's yet another reminder of the importance of testing your tools and confirming results.

update: After posting this, I remembered that Jeff Bryner had written a utility for this and it is still vastly superior to my own. I just verified that the link I have to his yim2text still works. Check it out.

Other thoughts from Lean In

My previous posts in this series have touched on the core issues that Sheryl Sandberg addresses in her book  Lean In: Women, Work, and the W...