Optimising a stochastic hill climbing attack?

The Body in My Library Forums Harry’s Meeting Place Optimising a stochastic hill climbing attack?

Viewing 11 posts - 1 through 11 (of 11 total)
• Author
Posts
• #91868
pufferfish101007
Participant

Hi all, I’m making use of my half-term break to have a look through @madness’ book on classical cryptography which is available for free in the BOSS library, and doing some programming to prepare for some of the later programming. To solve challenge 3B I wanted to try out the stochastic hill climbing attack for monoalphabetic substitution ciphrrs that is described, and it worked surprisingly well! However on further investigation with various ciphertexts I discovered that it doesn’t always work – sometimes it produces absolute gibberish which is probably to be expected. It seems that increasing the threshold (for how many iterations the fitness should remain the same before exiting) yields more consistently correct results but this may be a coincidence, and I’m trying to figure out the best way to almost guarantee a correct decipherment without sacrificing too much performance.
From some googling (other search engines are available) I’ve discovered that hill-climbing algorithms can get stuck in local optima if the function isn’t convex, however I’m unclear if this is the case for stochastic hill climbing as well as simple hill climbing, and also whether the function that relates the key space and textual fitness in convex or not – I’m finding it difficult to visualise a 27(?)-dimensional function in my head.
So, coming on to the actual questions that I have:
– is the function fitness = f(keya, … keyz) convex (or is there a nice way for me to figure this out myself?)
– if it isn’t, would it be more optimal to run several threads and take the fitest result, or to just increase the threshold?
– if increasing the threshold is the solution, is there a nice way to figure out the ‘optimal’ threshold? I feel like the shorter the text, the larger the optimal threshold but again that could be totally wrong

I appreciate this might be too advanced for some people but I’d really apreciate any guidance (and thanks a tonne for actually reading this mammoth of a post).
Thanks & happy deciphering,
pufferfish

#91909
pufferfish101007
Participant

Partially answering my own question here:
– If there are local optima, increasing the threshold won’t help because the hill climbing always goes uphill
– This is backed up by some testing: if I log the counter every time it gets to a multiple of 1000, once it gets to a sufficiently large number (usually only about 1000), it doesn’t go back down, i.e. it has reached an optimum
– This suggests that the keyspace-fitness mapping is not convex and so does have local optima
– So the best solution would be to run multiple hill-climbers in parallel and pick the fittest
Some confirmation that this is correct would be nice – I do wonder if anyone else has actually resorted to this method of decipherment yet at this stage in the competition of if it’s just me…
As a side note about improving performance – I tried utilising the ‘faster’ method outlined in Thomas Jakobsen’s paper (which doesn’t involve decoding the text at any point in the attack and instead shuffles the distribution matrix around) and found it to be significantly slower (which is probably [at least in part] due to the fact that I’m using tetragram fitness rather than bigram fitness – and also computing power has increased by a lot since 1995!)

#91915
Participant

@PuffFishy

For monoalphabetic substitutions, I stop at 10000 and automatically restart if the answer is gibberish. For speed, I use C language, and so I can break 3B in a second or two.

For other ciphers, I modified the simulated annealing approach by doing something like this:

if (f > bestfit) or ((f > 1.05*bestfit) and (randrange(100) < 5)):
etc.

If the 1.05 is adjusted to be the height of local maxima (compared to the nearest valley), then it doesn’t get stuck.

#91921
BobD
Participant

Check out the Simulated Annealing page on Wikipedia. Adopting this refinement allows the search to skip to nearby hills more readily while the â€˜temperatureâ€™ is higher, less readily as it â€˜cools downâ€™. I find that this increases the success factor substantially, usually reaching the desired result in 1500 â€“ 2000 iterations.

#92037
Rob27
Participant

#92053
Texal
Participant

Personally, I go through all possible keys instead of taking a “wait until there are no further increases in frequency approach” which seems to work fine for me. One of the ways I’ve found to increase success probability is by optimising how I generate my initial key (so you hopefully) end up closer to the global maxima. I generate a key based on the actual frequency of letters in the ciphertext, but still maintaining a bit of randomness, instead of a straight random key. It does seem to have improved my success rate!

#92057
f6exb_the_frenchy
Participant

I write in Python and as it is not very fast, the program takes an extract of maybe 300 to 700 characters. If it finds a good key then it decrypts all the ciphertext. Sometime less than 200 letters are enough.

#92094
pufferfish101007
Participant

woah thanks for all the responses! @BobD thanks, I’ll need to look into simulated annealing when I get sone time… @madness I think I’ll need to do more reading before I can understand what you’re dping with the simulated annealing ðŸ˜›
@Rob27 I discovered the comparative easiness of spreadsheets this week – impressive that you’ve done hill climbing in that manner! I’m already using quadgrams – intriguing that you’ve only got stuck on a local optima once.
@Texal you go through all ~4e23 keys?????!!!!! I agree that starting with an educated guess as the key is a good idea.

As for myself, I have gone with running several concurrent threads and using the fittest – this consistently yields the correct solution without sacrificing speed (ie it still runs in a few seconds [depending on the device]).

#92354
Rob27
Participant

@pufferfish101007 Just to add, for some of the cyphers, interestingly, there can be MORE THAN ONE key that gives the correct plain text solution, hence not always getting stuck in local maxima. This is particularly the case for Playfair, Bifid, ADFGVX and similar encryption methods. For example, 2013’s 8B challenge which was a Playfair cypher has an official decrypt key of “ADLERBCFGHIKMNOPQSTUVWXYZ”. I found this with a dictionary word attack. However when hill climbing I also found that “ZXUWVSQMPNROTEHLKFIGDCYBA”, “QMSPNOTREHXUZWVKFLIGCYDBA” and “ACBYDNQPMSGKIFLHOETRVXWUZ” all give the same plain text solution. There are no doubt some other solutions too.

#92368
Ron0Studios
Participant

Perhaps not an optimisation, but you can try running a genetic algorithm instead. I think they converge faster.

#92432
Dvorak
Participant

I’ll throw in my (rambling and incoherent) thoughts on this too. My team has settled on using stochastic hill-climbing for all the ciphers so far this year, and haven’t had it fail yet.
TLDR: I do think the answer to your optimisation problem is the modified annealing that madness suggests, perhaps described most simply as letting the algorithm settle for a slightly worse key every so often, in order to avoid local maxima. I believe the pseudocode for this is in the Cadenus chapter of madness’ (brilliant) book.

As has been said, for monoalphabetic (and often polyalphabetic) ciphers there is no need for annealing, a threshold of 10k – 100k works perfectly well. For other ciphers though it’s a different story.

More complex hill-climbers – optimal threshold
You’re right to assume that the optimal threshold is dependent on the length of the ciphertext, however through some iterative testing over the summer I was able to determine that a much more significant factor is the period (e.g. key length in vigenere or transposition). I’ve used a threshold of 100 * period in these hill-climbers, and for polyalphabetic texts with a length/period of >80 I’ve got consistently good results. Here are the results of testing on all of the polyalphabetic exercises in madness’ book (the time values are in seconds {shocking!} but in my defence the program is run on free remote servers for the purposes of collaboration, which significantly increases processing time by a factor of 10-20 – please view them in relation to each other)

cipher length period length/period success? time

214 7 30 failed

154 3 51 failed

396 7 56 failed

327 5 65 failed

850 10 85 quite close 1440

509 5 100 almost perfect 1140

831 6 140 almost perfect 637

2279 15 152 almost perfect 3900

1181 7 169 almost perfect 698

970 5 194 almost perfect 338

647 3 216 perfect 119

713 2 356 perfect 68.6

2062 4 516 almost perfect 273

– almost perfect: Almost all (99%) letters correct, keys easily reconstructible
– quite close: Quite a few letters wrong, but keys can be reconstructed
– failed: Not enough recognisable English to confirm that the program worked

As you can see the greater the value of length/period, the better the result. If your threshold was solely based on length, long periods would likely return more gibberish since the key space has 26*period dimensions, so there are many ‘adjacent’ keys.
The time to decrypt was determined more by the period (decryption of the polyalphabetic occurs for each cycle of the hill-climber so it’s order is multiplicative to the overall order {a sub-process}, and the longer the period, the more slices have to be monoalphabetically decrypted).

More complex hill climbers – optimal annealing
All this aside (I seem to have got distracted), the biggest difficulty when moving from monoalphabetic to transposition is that the fitness function over key space has much more noise, so there are many more local maxima. The solution to this is to allow the program to select a slightly worse key every so often. But how much worse, and how often? These optimisation questions are quite tricky to solve, and I doubt I’ve got the answer. With some testing I did get values for a step down frequency (percentage of slightly worse keys chosen, 5 in madness’ example #post-91915) and a margin (how close the fitness needed to be to the current best key to be considered ‘slightly’ worse, also 5% in madness’ example). My values were different to madness’ choices in the example algorithms: most interestingly (to me at least), the optimal-ish margin I found was actually dependent on the period (key length)! I think this can be rationalised by considering the fact that long periods have higher local maxima, so a more extreme step down is required to move towards the global maximum. I’ll keep the exact function I use secret until the end of this year’s challenge just in case it turns out to actually be a good choice. After that I’ll be too old to compete so I’ll discuss it, but it’s probably not the best choice anyway.

Wow I’ve gone on too long, if you’re not bored reading this then congrats. Happy hill-climbing all.

Viewing 11 posts - 1 through 11 (of 11 total)
• You must be logged in to reply to this topic.