Monday, April 25, 2011

Scalpel and Foremost

The crew over at Digital Forensics Solutions announced the release of a new version of Scalpel with some exciting new features. Check out their post for the full details, but here are three I was most interested in:

  • Parallel architecture to take full advantage of multicore processors

  • Beta support for NVIDIA CUDA-based GPU acceleration of header / footer searches

  • An asynchronous IO architecture for significantly faster IO throughput


  • Digital forensics is time consuming so any speed gains we can make are welcome ones.

    Over the last few days, I've had a chance to play with the new version of scalpel on my 64-bit Ubuntu system with 7GB of RAM. I downloaded the source and followed the directions in the readme to configure and compile the binary.

    I then ran some carves against a 103GB disk image from a recent case. The command line I used was:

    scalpel -b -c /etc/scalpel.conf -o scalpel-out/ -q 4096 sda1.dd

    The -q option is similar to foremost's -q option in that it tells scalpel to only scan the start of each cluster boundary for header values that match those specified in the config file. In my test, I used the two doc file signatures in the supplied example scalpel config file. The nice thing about -q is that you can provide the cluster size. With foremost -q will scan the start of each sector by default, you'll have to also add -b to get similar functionality out of foremost.

    I ran scalpel with the Linux time command so I could determine how long the command took to complete. Scalpel carved 6464 items that had byte signatures matching those in the configuration file. According to the time command, this took 52 minutes and 40 seconds.

    Manually verifying that all 6464 files are Word docs would be time consuming. In lieu of that, I followed Andrew Case's suggestion and used the following command from within the scalpel-out directory:

    for i in $(find . | grep doc$); do file $i; done | grep -i corrupt | wc -l

    The result was that 2707 of the 6464 files were found to be "corrupt" according to the file command. This is not an exact measure of the accuracy of scalpel's work, but it gives us a ballpark figure. If my math is correct that's a false positive rate of 41%. Just remember, these are rough figures, not exactly scientific.

    Next I configured foremost to use the exact same configuration file options and similar command line arguments (recall I had to use -b with foremost) and ran the carve against the same image. The command line I used was:

    foremost -c /etc/foremost.conf -i sda1.dd -o foremost-out -q -b 4096

    Again, I used the time command to measure how long this took, 47 minutes and 32 seconds later, foremost finished having carved 6464 files. I used the same measure for accuracy as with scalpel, running the following command from within the foremost-out directory:

    for i in $(find . | grep doc$); do file $i; done | grep -i corrupt | wc -l

    The result was that 2743 files came back as "corrupt" according to the file command. Interesting. Both tools used the exact same signatures, both carved exactly the same number of files, yet foremost was approximately 1% less accurate, though at 47 minutes compared to scalpel's 52 minutes, it was almost 10% faster.

    Conclusion:
    It's hard to draw conclusions from one simple test. I think it's great that scalpel is under active development and for those who can take advantage of the CUDA support, it could be a huge win in terms of time and time is against us these days in the digital forensics world.

    The other big plus, is that it's great to have another tool that we can use to test the results of another tool. I will continue to experiment with scalpel and look forward to future developments and I thank the developers of both tools for their contributions to the community.

    No comments:

    Post a Comment

    Paperclip Maximizers, Artificial Intelligence and Natural Stupidity

    Existential risk from AI Some believe an existential risk accompanies the development or emergence of artificial general intelligence (AGI)...