Saturday, July 11, 2015

Cracking repeating XOR key crypto

My last post here, XOR'd play: Normalized Hamming Distance, was a lengthy bit about the reliability of Normalized Hamming Distance to determine the size of a repeating XOR key that was used to encrypt a string of text and was based on my experience working on the Matasano Crypto Challenges at cryptopals.com.

After a few weeks of focusing on other things, I returned to that effort and finished problem six from challenge set one, which says to use Normalized Hamming Distance to determine the probable key size and then use English letter frequency analysis to determine the probable bytes that make up the key. That's the short version.

The script I wrote for challenge applies some extra math to the problem of determining key size, because I found normalized Hamming Distance alone to not be very reliable. However, even with the extra math, the problem is still one of probability, not certainty. If you have no idea about the key size, the problem is even more expansive, though not completely open ended because if it's repeating key, it has to be no more than half the size of the encrypted byte stream.

Let's look at an example. Below I have already populated two variables, $plaintext and $key and am using XOR-Encrypt.ps1 to encrypt the $plaintext value with the $key and return a base64 encoded string. In the second command, I'm populating the $ciphertext variable with that base64 encoded string.

Click to enlarge
Now that we have a string of plaintext encrypted with a repeating XOR key, let's try cracking it.


Click to enlarge
In the screenshot above, I've run the Crack-XORRepeatingKeyCrypto.ps1 script with about the minimum set of command line parameters. The VERBOSE output line tells me that I didn't provide a MaxKeySize value, so the script has calculated the number of bytes in the ciphertext and has set the MaxKeySize to half of that value. If we're dealing with repeating key XOR crypto, the key must be at most half the size of the ciphertext, otherwise the key can't repeat (completely).

The script applies normalized Hamming Distance to find the top n most likely key sizes, five is the default, which you can see listed towards the bottom of the screen. Next, the script calculates the most frequently occurring greatest common denominator, hereafter MFOGCD, of those top n probable key sizes. In my testing, calculating the MFOGCD of the top n normalized average Hamming Distance values can be used to more accurately find the correct key size. This is a good example, as you can see, the correct key size is 30, but its normalized average Hamming Distance is fourth in the list of the top five.

My script has a programmatic bias towards smaller key sizes, especially where they are the MFOGCD. This formula returns the correct key size more than 90% of the time in my testing where I was able to control for the MaxKeySize, whereas normalized average Hamming Distance alone only returns the correct key size about 40% of the time where MaxKeySize is controlled. If MaxKeySize is unknown, your mileage may vary.

After the script determines the most probable key size, it transposes the ciphertext into blocks aligned on the key size byte boundaries. In other words, if the script determines the key size is 30 bytes, the first block will be comprised of ciphertext bytes 0, 29, 59, 89, 119, etc. The next block begins with ciphertext byte 1, then 30, 60, 90, 120, etc. This process repeats until all ciphertext bytes have been allocated to their respective blocks.

Next the script moves into the actual brute-force phase, XORing each byte of the transposed blocks against ASCII printable characters. That's the default, you can apply all bytes 0x00 - 0xFF, via the -includeNonPrintable command line switch. If you think this takes longer, you're right.

After each transposed block is XOR'd with each byte, the English letter frequency calculation is applied and the byte that results in the highest score, is deemed to be the most probable byte for that position in the key and the process repeats for each byte in the key.

Once all the likely key bytes have been determined, the original ciphertext is XOR'd against that key and the likely plaintext is returned. You can see this above in the ProbableDecryptedValue property of the output. At first glance, the output looks pretty good, but if you read it carefully, you'll see it's not exactly correct. Remember we're dealing with probabilities.

If you look at that ProbableKey property, you can see the probable key and you may be able to guess which byte is incorrect. You could use the XOR-Decrypt.ps1 script, which can take a user supplied key string and decrypt the ciphertext if you want to be more exact. Without a key, XOR-Decrypt.ps1, assumes a single byte key and attempts to brute force it, but I digress.

So that's the script when it's working well, but the probabilities sometimes don't come out so nicely. Here's another example:


Click to enlarge
If you look at the ProbableDecryptedValue property here, you can plainly see this does not look like English plaintext. What happened? The script worked as written. It calculated the MFOGCD among the top five most probable key sizes, but that value, 16, was not also listed in the top five most probable key sizes, so it didn't try 16 as a key size. Instead, it selected the smallest key among the keys with the smallest normalized average Hamming Distance, 416 and that turned out to be horribly wrong, but all is not lost.

One approach that I've tried at this point, is to run the script again, but add the -MaxKeySize parameter with an argument matching the smallest key size from the above run, 48. Here's the output of that run:

Click to enlarge

This looks more like English plaintext, but it's still not perfect. Note the ProbableKeySize property is 16 bytes, that was the MFOGCD from the previous run so I'm not surprised to see it is the probable key size. 48 still has the lowest normalized average Hamming Distance, but remember, the script has a bias towards smaller keys, especially when those keys are also the MFOGCD in the list of the top n ProbableKeySizes.

Our key is obviously not 100% correct, but again there may be enough of the correct value there that you can make some guesses and use the XOR-Decrypt.ps1 script to try them out until you have the correct key.

What happens if you run the script again and pass -MaxKeySize as 16, unfortunately, if you do that, the most probable key size becomes two because that is the MFOGCD and that yields even more inaccurate results.

What's the practical use for this script? There is some malware that uses repeating XOR key encryption because it's easy to implement. You may be able to apply it in your analysis. Does it work against code?

Here's an example:
Click to enlarge

Crack-XORRepeatingKeyCrypto.ps1's accuracy could be improved at a performance cost. If you look at XOR-Decrypt.ps1, you'll see that it has multiple scoring functions, a sort of mini-neural network. One for English letter frequency, one for English bigram frequency and one for English trigram frequency. All of the values returned from those functions factor into determining the most probable key for XOR-Decrypt.ps1, but because of the way Crack-XORRepeatingKeyCrypto.ps1 works, determining the most probable byte for each position of the key against the transposed blocks, bigrams and trigrams can't be used because we're not looking at the bytes in context, so bigrams and trigrams won't follow their natural frequency of occurrence.

Crack-XORRepeatingKeyCrypto.ps1 could factor in bigram and trigram scores when it moves to the stage of decrypting the original ciphertext, using what it has calculated as the most probable key, but that would require maintaining a list of the top n most probable keys and trying each of them, scoring them and choosing the one with the best score.

Maybe if time permits, I'll trying making this change, but I'm afraid of what it may do to the run time.

Lastly, the script includes a .SYNOPSIS section with help and examples, like any PowerShell script should. So do a Get-Help -Full .\Crack-XORRepeatingKeyCrypto.ps1 | more for additional information.

5 comments:

  1. Hello, i have a similar text and i would really appreciate your help since your code isn't working with me.
    Thank you!
    It is encoded using Xor an put as base64, this text contains an email address.

    NwMXGRwQB0wcB1MHBw0PHFMVGhkLUxAAAUwcCxYXFgUaFlNEVSkXBRwcEBZZBRwRBwlZMCVFBhk
    LUx0KAR4cUxIBBwkKABZFGA0QH1MBEEEaGxIXDA4dFl4AG0EKEAoJGQ05EAoHEB4OEgcGHUIfAV
    9FFA8aHB4VFAsXsNpFERlZEBwBEEwIBhZFAwMMAFMEAwkDUwYRHAAQALDMVRwWBgFFGg4NFh0MB
    0waFlMIEB8KEhQAW0w3HAYWVRoWBgBFBwkaHB0RFA8NFgEKGx9ZFxILBkwVFgBFBQAMAFMHBwkf
    AFMBtsUVEhoWVQ0PFhBFAAIcUwMXGhwWABoRHAMXUxcAVR4cHRcAD0EPHAYWWw==

    ReplyDelete
  2. Sorry, I don't check on the blog much these days and just saw your post. Happy to report that I think I may have your key. It took a little experimenting, trial and error, but setting a max key size to 16, looks like I got it. Note the script is written to find English text and your string appears to be French. Despite the language difference, it appears the letter frequencies and n-gram analysis translate somewhat from English to French.

    PS > ./Crack-XORRepeatingKeyCrypto.ps1 -String $data -Encoding base64 -MaxKeySize 16


    ProbableKeySize : 6
    ProbableKey : ulysse
    ProbableDecryptedValue : Bonjour et bravo pour cet exercice ! Envoyez votre CV sur notre adresse mail de-charybde-en-scylla@cyberwatch.fr, accompagné du code que vous avez utilisé pour obtenir ce message. Nous vous recontacterons dans les plus
    brefs délais avec une proposition de rendez-vous.
    Top5KeySizes : 12:5:6:14:2
    Top5NAvgHDs : 2.75 : 2.79622641509434 : 2.85606060606061 : 2.86904761904762 : 2.88059701492537

    ReplyDelete
  3. Hello I tried running your program and I can't seem to get the input, I think the key size is 5, any help would be appreciated."FkxSHToWXlYbORoJUhA2FkxFGyBPCUQbN10JRxY3Fm5WDD9XR0BeJVlcXxpyRUxdGnJZXEFeNlNa
    QxsgV11WEisWR1YbNlNNExwgU0hXXiZZCUcWNxZLXAomWUQTETQWXVsbcllKVh88GAl8CyAWTVIX
    Pk8JVR87WlxBG3JBSEBeM1hHXAs8VUxXXjNCCUcWNxZKWxc/U1oTETQWRFoaPF9OWwp8FmhdGnJC
    QVZeIVlcXRpyQUZGEjYWQVILPEIJXAsgFlxdCTdaSlwTNxZNQRszW1oddBZZCUoRJxZCXRElFl5b
    B3JGTFwOPlMJXxc5UwlFFz1aTF0dNwkJegpyX1oTHDdVSEYNNxZAR140U0xfDXJRRlwafBZhRhMz
    WFoTGDtYTRMIO1lFVhAxUwlXGzdGRUpeIVddWg00T0BdGXwWa0YKckRMXhEkUwlHFjcWWlIKO0VP
    Uh0mX0ZdUnJXR1deJl5MEx8xQglRGzFZRFYNfBgHExY9WkVcCXw8flsbPBZZVhEiWkwTCjNaQhMK
    PRZMUh06FkZHFjdEBRMKOlNQExA3QExBXiFXUBMJOlddEwo6U1ATEzdXRx1eBl5MSl4hV1ATDT1b
    TEcWO1hOExs+RUwTHzxSCUoRJxFbVl43TllWHSZTTRMKPRZDRg0mFkJdESUWXlsfJhZdWxsrFkRW
    HzwYI2cWO1hCExE0FkBHUHJ3CVcXNV9dUhJyVUZeDidCTEFQcnNFVh0mREBQHz4WS0EfO1gHOS09
    W0xHFz9TWhMXJhZAQF4mXkwTDjdZWV8bclhGExE8UwlaEzNRQF0bIRZIXQcmXkBdGXJZTxMJOlkJ
    VxFyQkFWXiZeQF0ZIRZHXF49WEwTHTNYCVoTM1FAXRt8PGdcXj1YTBMQPUREUhJyVUZGEjYWQVII
    NxZNXBA3Fl1bHyYYCXcRck9GRl45WEZEUnJCQVoNcltGQRA7WE4dUHwWYBMJM0UJXBByVwlHDDNf
    RxMKOlddEwk3WF0TCjpERkYZOhZIEx07QlATCjpXXRMJPUNFVxB1QglWBjtFXRMXNBZAR14lV1pd
    WSYWT1wMck9GRlByfwlRESdRQUdeMxZdWh05U10TGCBZRBMfcltIXV4lXkYTCT1DRVdePl9CVhIr
    FktWXjZTSFdeO1AJWgpyQUhAEHVCCVURIBZQXAt8FmATDDdXTRMLIhZGXV4/TwlEESBdBx1QclcJ
    RBY9WkwTGDtTRVdePVAJQB07U0dHFzRfShMXPEdcWgwrFl1bHyYWRl0SKxZMSxchQloTHDdVSEYN
    NxZGVV4rWVwdXhxZXh9eO1AJShEnFl5aDToWUFwLclVGRhI2FkFSCDcWS1YbPBZHXAw/V0UdUHwW
    YBMdM1gJQww9W0BAG3JPRkZeGxZNXF48WV0dXgZeTBMJPURFV147RQlSEHJfR1UXPF9dVhIrFktW
    CiZTWxMOPldKVl4iRExQFyFTRUpeMFNKUgshUwlKEScWXlYMN1gOR1BYfwlYED1BCVoKdUUJXREm
    FkZBGjtYSEEHfBZrRgpyQUFcXjdATEFePllfVhpyWVtXFzxXW0pBWHldWxsgPGpbDDtFXVwOOlNb
    EzM9REpcE2gWC2ARP1NdWhM3RQlaCnJfWhMKOlMJQxs9RkVWXjxZCVwQNxZAXh81X0dWDXJXR0oK
    Ol9HVF49UAlEFj0WTVxeJl5MEwo6X0dUDXJYRhMRPFMJUB88FkBeHzVfR1ZcWGVdVgkzRF0TMzdY
    U1obIQwJfBZ+FmhfHzwYBx1eJVMOQRtyUUZdEDMWQVIINxZaRh06FkgTCT1YTVYMNENFEwkzRAlH
    ETVTXVsbIBgjdhA2Fl1aCj5TWgleGl9aExMzVUFaEDcWXlINclhMRRsgFllWDDRTSkcbNhoJRxY9
    Q05bXjtCCVQbPFNbUgo3UglSXiVeRl8bclBAVhI2FkZVXiBTWlYfIFVBExc8QkYTCTpXXRMcN1VI
    XhtyXUdcCTwWSEBecGJcQRc8UQl+HzFeQF0bIRQHEyo9UkhKXiVTCVAfPloJRxY3WwkRHT1bWUYK
    N0RaEVBYd0VSEHJiXEEXPFETEzdyWkBYG3JFRl8IO1hOEw4gWUtfGz9FBRM9PVtEUhA2U1sdXhNY
    TRM7PF9OXh9yX1oTCjpTCV4RIUIJVxc0UEBQCz5CCUMMPVRFVhNyX0cTCjpTCUQRIFpNHXQRWURe
    HzxSTEFeFlNHXRchQkZdRHJzR1oZP1cJWg08EV0TGjtQT1odJ1pdH147Qg5AXjtbWVwNIV9LXxt8
    Fn1bG3J3RFYMO1VIXQ1+Fl1bG3JkXEANO1dHQFJyQkFWXhRETF0dOhoJRxY3Fm5WDD9XR0BSclNf
    VgwrWUdWXiZeQF0VIRZsXRc1W0gTFyEWXF0cIFNIWB8wWkwddBNaSF1eBkNbWhA1DAl0ET1SBxMy
    N0IJXhtyQltKXjNYTRMJNxFFX145WEZEXjRZWxMNJ0RMH14lWUcUCnJBTAx0E1pIXV4GQ1taEDUM
    CWcWN08JXBA+TwlRGzNCCV4bckNZExw3VUhGDTcWYBQTckVEUgwmU1sTCjpXRxMKOlNQEx8gUwc5
    PTpEQEAKPUZBVgxye0ZBHT1bExMwPRoJRxY3TwlRGzNCCUoRJxZcQ14wU0pSCyFTCUoRJxFbVl42
    X09VGyA="

    ReplyDelete
    Replies
    1. I suspect you've figured this out already, again, I don't check this blog for comments very often. Here's what I have:

      PS > .\Crack-XORRepeatingKeyCrypto.ps1 -MaxKeySize 6 -File ~\Desktop\XORbit.txt -Encoding base64


      Probable Key Size : 5
      Probable Key : 6)3~R
      Probable Key Bytes : 54 41 51 126 82
      Probable Decrypted Value : each week, and every week the Germans would send our desperately needed bread to the bottom of the ocean. Our daily failure was announced at the chimes of midnight. And the sound would haunt our unwelcome dreams.
      Do you know why people like violence? It is because it feels good. Humans find violence deeply satisfying. But remove the satisfaction, and the act becomes... hollow.
      When people talk to each other, they never say what they mean. They say something else and you're expected to just know what they mean.
      Think of it. A digital computer. Electrical brain.
      Sometimes it is the people no one imagines anything of who
      Probable Decrypted Bytes : 32 101 97 99 104 32 119 101 101 107 44 32 97 110 100 32 101 118 101 114 121 32 119 101 101 107 32 116 104 101 32 71 101 114 109 97 110 115 32 119 111 117 108 100 32 115 101 110 100 32 111 117 114 32 100 101 115 112
      101 114 97 116 101 108 121 32 110 101 101 100 101 100 32 98 114 101 97 100 32 116 111 32 116 104 101 32 98 111 116 116 111 109 32 111 102 32 116 104 101 32 111 99 101 97 110 46 32 79 117 114 32 100 97 105 108 121 32
      102 97 105 108 117 114 101 32 119 97 115 32 97 110 110 111 117 110 99 101 100 32 97 116 32 116 104 101 32 99 104 105 109 101 115 32 111 102 32 109 105 100 110 105 103 104 116 46 32 65 110 100 32 116 104 101 32 115 111
      117 110 100 32 119 111 117 108 100 32 104 97 117 110 116 32 111 117 114 32 117 110 119 101 108 99 111 109 101 32 100 114 101 97 109 115 46 10 68 111 32 121 111 117 32 107 110 111 119 32 119 104 121 32 112 101 111 112
      108 101 32 108 105 107 101 32 118 105 111 108 101 110 99 101 63 32 73 116 32 105 115 32 98 101 99 97 117 115 101 32 105 116 32 102 101 101 108 115 32 103 111 111 100 46 32 72 117 109 97 110 115 32 102 105 110 100 32
      118 105 111 108 101 110 99 101 32 100 101 101 112 108 121 32 115 97 116 105 115 102 121 105 110 103 46 32 66 117 116 32 114 101 109 111 118 101 32 116 104 101 32 115 97 116 105 115 102 97 99 116 105 111 110 44 32 97
      110 100 32 116 104 101 32 97 99 116 32 98 101 99 111 109 101 115 46 46 46 32 104 111 108 108 111 119 46 10 87 104 101 110 32 112 101 111 112 108 101 32 116 97 108 107 32 116 111 32 101 97 99 104 32 111 116 104 101 114
      44 32 116 104 101 121 32 110 101 118 101 114 32 115 97 121 32 119 104 97 116 32 116 104 101 121 32 109 101 97 110 46 32 84 104 101 121 32 115 97 121 32 115 111 109 101 116 104 105 110 103 32 101 108 115 101 32 97 110
      100 32 121 111 117 39 114 101 32 101 120 112 101 99 116 101 100 32 116 111 32 106 117 115 116 32 107 110 111 119 32 119 104 97 116 32 116 104 101 121 32 109 101 97 110 46 10 84 104 105 110 107 32 111 102 32 105 116 46
      32 65 32 100 105 103 105 116 97 108 32 99 111 109 112 117 116 101 114 46 32 69 108 101 99 116 114 105 99 97 108 32 98 114 97 105 110 46 10 83 111 109 101 116 105 109 101 115 32 105 116 32 105 115 32 116 104 101 32 112
      101 111 112 108 101 32 110 111 32 111 110 101 32 105 109 97 103 105 110 101 115 32 97 110 121 116 104 105 110 103 32 111 102 32 119 104 111 32
      Top 5 KeySizes : 5:6:4:2:3
      Top 5 NAvgHDs : 2.65645161290323 : 3.34142394822006 : 3.38225806451613 : 3.39102564102564 : 3.5176282051282

      Delete

Other thoughts from Lean In

My previous posts in this series have touched on the core issues that Sheryl Sandberg addresses in her book  Lean In: Women, Work, and the W...