Monday, December 26, 2011

Check the uids and gids

While working on body-outliers, the Python script I wrote to do statistical analysis on fls bodyfiles in an effort to find malicious files in compromised file systems, one of the things I was ignoring completely, but that stuck out like a sore thumb when reviewing the data, was user and group IDs for files in Unix and Linux file systems.

When attackers build their kits that they intend to drop on remote hosts as backdoors, packet sniffers, key loggers, etc., they often use tar and gzip to create compressed archives of those files, then they can issue a command like wget to download the archive to the compromised host where they will "untar" the archive and move their malicious binaries into desired paths on the system.

One of the "features" of tar, as the manpage tells us is, "by default, newly-created files are owned by the user running tar." This means that if the attacker is logged into his own system as a non-root user and he's are compiling binaries which will replace legitimate binaries on the target system, those binaries will retain his user and group id information when they are tar'd up. Of course a careful, thoughtful attacker can take a variety of countermeasures to change this, but many are not so careful.

As a result, when they install malicious code on target systems, there's a chance those binaries will be installed with user IDs and group IDs (henceforth uid and gid) that don't match other files in those locations. These are obvious outliers and as I was working on the next version of body-outliers, I had written the code to calculate the average uid and gid values on a per directory basis, then calculate standard deviation, then alert on the outliers, but this sort of statistical analysis didn't make sense for uids and gids, because for the most part, they are uniform throughout the file system, with a few exceptions like /tmp, /var/spool/cron, /var/spool/mail and many custom software packages, but many system directories like /dev, /bin, /usr, etc. are set uid and gid 0, meaning the files are owned by the root account and belong to the root group. In this context, standard deviation didn't make much sense, so I modified my code to do another form of statistical analysis; namely calculating distributions.

Calculating distributions is just fancy talk for counting the occurrences of a thing, say, how many files are uid 0, how many are uid 1000, and so on, then displaying this information. This type of analysis lends itself well to finding oddball uid and gid files in compromised *nix file systems. On the hacked system I spoke of during my SECTor 2011 talk (video, slides), finding these unusual uid and gid files correlates very well to finding attacker code for precisely the reasons described above.

Here's a sample run of the script, which I'm calling body-ugid-dist.py, run against the same bodyfile as the one in the SECTor talk, this has been trimmed down a bit:
./body-ugid-dist.py --file sda1_bodyfile.txt --meta uid
[+] Checking command line arguments.
[+] sda1_bodyfile.txt may be a bodyfile.
[+] Discarded 0 files named .. or .
[+] Discarded 0 bad lines from sda1_bodyfile.txt.
[+] Added 20268 paths to meta.

...

Path:  /etc/cron.daily
==========================
Count:       1  uid:  1000
Count:       9  uid:     0

...

Path:  /usr/lib
==========================
Count:       1  uid:    10
Count:       1  uid:    37
Count:       1  uid:  1000
Count:    2082  uid:     0

...

In actuality this script returns 499 lines of output, representing about 350 "Counts," most of which were specific to the custom application running on the system. But the overall bodyfile had more than 200 thousand lines, so this is a considerable reduction in data, which is vital to any investigation. What the above output tells us is that of the 10 files in /etc/cron.daily, nine of them are uid 0 and one is uid 1000, that's a lead that may be worth pursuing and indeed, in this case, it is malicious code. The next entry shows tht /usr/lib contains 2085 files with 2082 of them being uid 0 and three others that are one offs and certainly worth looking into. In that case, two of the three are malicious code.

body-ugid-dist.py is available from my github repo. Unfortunately, it's only going to be useful for *nix cases. Running it is quite simple, the usage is shown below:
./body-ugid-dist.py 
usage: body-ugid-dist.py [-h] --file FILENAME [--meta META]

This script parses an fls bodyfile and returns the uid or gid distribution on
a per directory basis.

optional arguments:
  -h, --help       show this help message and exit
  --file FILENAME  An fls bodyfile, see The Sleuth Kit.
  --meta META      --meta can be "uid" or "gid." Default is "uid"
I wrote about this previously for the SANS Digital Forensics Blog If this kind of analysis interests you, join me for SANS 508: Advanced Computer Forensic Analysis & Incident Response in Phoenix in February of 2012.

Paperclip Maximizers, Artificial Intelligence and Natural Stupidity

Existential risk from AI Some believe an existential risk accompanies the development or emergence of artificial general intelligence (AGI)...