De-obfuscated: Decoding the Encoder

Justin Majetich
5 min readOct 22, 2019

I can’t say who hurt Jim Hague but clearly the man was determined to pay it forward with his submission to the 1986 International Obfuscated C Code Contest — an annual competition which solicits the most unruly, obscure, and inefficient code from around the world. This specific year, Hague’s program took home the prize for “Worst abuse of the C preprocessor.” Here’s a look inside the bleakly titled script, “hague.c”:

On first glance, I have no idea what’s happening here, but I see a clue in the macros at the head of the script. It appears Mr. Hague has hidden a good portion of the scripts functionality beneath a layer of macros. For example, every time “DAHDIT” appears in the code, we can read “for” in its place. Luckily, we won’t have to parse through the macros instance by instance — the preprocessor will handle this for us automatically. But let’s not get ahead of ourselves. Before excavating the contents, let’s consider what the program does.

Who Is She?

To do this, I try to run the script through the compiler and am admittedly shocked when the code compiles — albeit, with a myriad of warnings. Now that we have an executable, let’s run it and see what happens:

Not much. Still, I haven’t been returned to the command line, and after a few puzzled seconds, I decide to punch out some input:

It appears the program has translated my input into morse code (“?” sufficing for spaces). A quick consultation with an online morse code translator confirms this is indeed the case. Very oblique, Mr. Hague.

While I have little use for a morse code encoder, knowing what the program does will go a long way in helping me understand what’s happening under the hood. As I mentioned above, a quick run through the preprocessor will expand our macros and have things a bit more legible:

Decoding the Encoder

Now, I’m starting to recognize some familiar syntax, but the formatting is unspeakable. I spend a few minutes moving things around, and the shape of a program emerges:

What do we have here. Up top, we see a string “_DAH_” defined with global scope. This is presumably our reference tool for converting user input into morse. Next is our main function. It begins with the declaration of several strings— their purposes obscured behind more DITs and DAHs. The declarations are followed by three absolutely cantankerous for loops. In the outer loop, space is allocated for “_DIT”, presumably to store user input ingested later in this line via the “gets” function.

Middle Loop

The middle loop seems to handle printing. The loop initiates with “DAH_” set to “_DIT” (which I’ve deduced above to contain user input) and will run as long as “*DAH” is not a null-byte. The third portion of the for statement contains a call to “DIT__()” — a print function , as I’ll explain below— with a ternary statement as its argument. Depending on the value of “*_DIT_” in this iteration of the ternary statement, “__DIT()” will be called to print “.”, “-”, or “?” — the former two for alphanumeric characters in “_DIT_” and the latter for anything else. This print is followed by the printing of a space and the incrementation of “DAH_”.

Inner Loop and Code B

The inner loop is much more opaque to me. “*DIT_” is set to “2”, but without an understanding of what the variable “DIT_” is, this doesn’t tell me much. I do notice another ternary operator in this for loop’s condition. This statement seems to be checking if the value of “*_DAH” is a lowercase character and, if so, converting it with some bit math. The line of code which lies at the heart these three loops increments “*DIT_” by either the value of “*_DIT_” converted to its ASCII value or by 0, depending on whether or not “*_DIT_” is a lowercase character.

__DIT() and _DAH()

These leaves us with our two satellite functions, “_DAH()” and “__DIT()”. Let’s start with “__DIT()”. This function appears to work similarly to the standard C libraries “putchar()”, printing one character at a time. There are times when Hague’s program calls directly on “__DIT()” to print, such as in the middle loop. However, “_DAH()” also contains a call to “__DIT()”. It seems that “_DAH()” does some data manipulation before routing it’s input to print with “__DIT()”. Notably, “_DAH” is recursive, calling itself for as long as “DIT_ > 3”. When this condition is false, “__DIT()” is called to print a null-byte. There’s also some activity in the return statement of “_DAH”. Here, “DIT_” is bit-shifted by one and the result — “1” or “0” — triggers a return of “-” or “.”, respectively, to the “_DAH()” call in main’s middle for loop.

Conclusion

While I’m able to de-obfuscate the program to an extent, it’s still largely illegible and many of its incremental workings remain shrouded within a cloud of DITs, DAHs and underscores. Alas, Hague won’t fold so easily. I may revisit this program, replacing variable names and editing it further into readability, but for now, Jimmy wins.

To be continued…

--

--