This invitation is extended to anyone interested in helping crack the code. If we get it figured out, we can all submit the result and be entered into the drawing to win the silly prizes.
I'll work on putting together some more materials here and begin the list of clues and possible thoughts for code-breaking, possibly later tonight.
EDIT: Perhaps I'll end up doing this by myself here, but feedback & input is welcome. Hopefully the incarnations that I go over will be helpful to someone new to data decryption & ciphers who is interested in the subject and would like to know more about what kinds of methodologies may be employed.
The code repeats three times, and is itself comprised of a repeating pattern, in two tracks, made up of 3 distinct symbols: dot, dash and bar. If the symbols are translated into numerical representations so as to be more easily read where dot=2, dash=1, bar=3, then the two tracks look like this:
Track 1's repeating pattern is "1232"
Track 2's repeating pattern is "3212"
It patterns in repetition produce the exact same stream of values except that they are offset by 2 positions. Out of 4 total positions in each pattern, 2 works out to exactly half the pattern being shifted.
The result of adding coinciding positions from each track is always 4. Track 1, position 1 = [1] where track 2, position 1 = [3].. [1] + [3] = 4. For position 2 we get 2+2=4, for position 3 we get 3+1, and so on. It is unlikely that the number 4 is significant in the final outcome because we arbitrarily selected the numbers 1,2 and 3 to represent the symbols sumply for analysis, and there is no evidence that the coders would have selected the same numbers, or even numbers at all.
Now what if we changed our numbers from 1,2,3 ro 1,2,4 - the first three significant digits in base-2 binary system. Then the tracks would show:
In this case, the sum of each posision alternates between 5 and 4.
Still not very interesting.
Now what if we looked at the tracks as if they are data sets. Then the values might represent an plotted position on a graph over time. If this is the case, and the values of track 1 are plotted out, then we get:
These appear to be triangle waves which are among the most simple waveforms to replicate. Triangle waves, as with any waveform really, may be mathematically represented as a function of varying sine waves modulated upon one other. BUT they say it's supposed to be sophisticated, so it can't be that easy, right? Agreed. Though I will say that the code itself is too simple to represent anything too terribly sophisticated.
Now what if we took the two plotted tracks and laid them over one another? We would see this:
That looks like crap, bu it expresses the criss-cross pattern formed by the two triangle waves, with the "O" are the center of each intersection where they meet. Essentially, the pattern looks like a big XXXXXX rather than the individual \/\/\/\/\/ waves.
Still not very interesting.
Now what if we analyze the number of symbols in each track? Well there are 21 total symbols in each. We know the the number 21's denominators are 1, 3, 7 and 21. Well, 1 isn't a very interesting denominator because we tried breaking up the patterns into 1-digit sized chunks and didn't see anything meaningful. 21 isn't very useful because it's obvious that there is no way to decode something that represents only itself. So 3 and 7 are where we're at.
Now we can look at 3 chunks of 7 digit groupings to end up with:
But I still see a lot of patternous repetition in there that makes me think this is probably not very useful. But what if we break it into 7 chunks of 3..?
"LEGO Leg" and "Go LEGO L"? Well it doesn't sound terribly meaningful, but I think "Go LEGO" is one of their catch phrases, is it not? Their website shows "PLAY ON" under their logo. I don't see "go lego" on their website anywhere, but google reveals that the phrase is used repeatedly, so perhaps it's an older catch phrase. I've been into Legos since about 1977, so I've seen most of their marketing materials over the years - it definitely rings a bell!
This could be close to something, but doesn't look like it's quite there!
Okay, here's something odd. The photo of the disc on Mars which is the first one I posted does not contain the same code as the second photo. The first one distinctly shows THREE tracks of data which does NOT REPEAT as it does in the clear photo below. This leads me to think that the code to break is actually the code on the mars disc and not this friggen forgery or whatever it is..
Unfortunately, the mars photo is not clear enough to correctly transcribe the smaller code symbols with any degree of confidence. Now THAT looks like a much better code to try and crack..
Okay, well one of the clues given on the originally linked page is that the code contains dots and lines - that's binary. That also means that the code can be transcribed from the original, crappy image sent from Mars because that much distinction can be made. It's clear that a three-level distinction would have been impossible, but this shouldn't be too bad, so here we go. Track 1 is the one closest to the edge of the disc:
taking 1 bit from each track in order of position and set into groups of three bits (which allows up to 16 possible values), we would get 25 sets. Suppose we wanted to have enough bits to transcribe each grouping into a letter of the alphabet, then our bit-word would have to be made up of 5 bits which yields 32 possibilities. This also works out conveniently because the total number of bits both cumulatively and per track are evenly divisible by 5. Of course this also means that the values 27 to 32 will be undefined, so if those values appear, we won't know what letter/character to assign to them. So this could be a wild goose chase, but it's worth the exercise. Breaking the tracks into 5-bit words, we get 15 words, five per track. Fortunately, since there are only 15 words, this means that it would be impossible to utilize all 32 combinations, in which case the subset may be further transposed/encrypted to add meaning to those values that we thought might have been meaningless, so let's take a look:
This just processes the information linearly. I figure if they want someone to crack the code who isn't a master hacker/cracker, that it'll probably read left to right. Now let's translate those bit strings into values so that they're easier to look at:
So we can see a couple repeats in there being 19 and 30. When it comes to eyeballing simple ciphers, the most common elements of any given language are the ones you look for to repeat. In the case of the English language (which I presume the decoded message to be in! ) these most frequently repeated things include spaces between words and the commonly selected letters in Wheel Of Fortune: RSTLNE AOI. If we decided that 19 was our space character, then we would end up with words that look like the following:
[5,7,30] [17,6,30,14,20,4,3,2] [25,16]
This discourages me because there are not many phrases that end with a two-letter word, but let's check out the alternative where 30 is the space character:
[5,7] [19,17,6] [14,20,4,3,2,19,25,16]
The lack of repeating characters make me think that this code is not a simple cipher. If it were, then there would be letters that you could figure out (such as the two letter words containing either 25,16 or 5,7, for which there are only so many valid combinations the could apply) would appear in other words in order for you to have a clue as to that word. So, due to the lack of clues here, this would appear to be the wrong approach.
The colors are not a part of the code. That was my first inclination as well, but they are actually used to calibrate the camera in the martian atmosphere and lighting in order to perform color corrections on images received back at Earth. Most camera-bearing spacecraft contain some sort of mechanism for calibration such as this. As it was specifically mentioned on one of the mission pages, I assume that they have nothing to do with the message encryption.
1) the code does indeed cipher/encrypt text characters
2) the SPACE character is included in this encryption, but not punctuation or line breaks
- thus we need 26 letters and a 27th character for SPACE (or some other "word separator")
Let's look at it a little more closely now before diving into Carl Sagan's text to look for a match.
First of all, the sample text is comprised of multiple sentences, and some sentences have multiple parts separated by commas. So we'll start by breaking the sample apart into sections separated by the punctuations since each section represents a complete string of characters run through the encryption algorithm:
Now the first big clue that I see when I look at these for strings is that sections 2, 3 and 4 all share a common opening data set of [00011000]. Sections 2 and three share even more than that, but all three share this. Why is this significant? Because what we know of english print states that the first character to appear following any punctuation will be a SPACE character. Section 1 does not start with this because the first sentence is not preceded by a SPACE. So I think we have found not only the code for our space character, but also our character bit size: 8 bits. Another important revelation is that there is no sort of complicated data compression scheme applied to the bit stream. If data compression were applied we would not see such repeatability in the data. So this is good news for us!
But still we have some mystery remaining because the original code contained 75 bytes of data and 75 is not evenly divisible by 8. Thus the bit size is not necessarily fixed for every character. So let's see what else we can spot. Let's locate every occurrance of our 8-bit space character and isolate it to see if we can spot any other repeatability between words since words are separated by spaces!
Now our next big clue comes from the word lengths themselves. WORD3 appears to be the shortest possible word, containing 7 bits. We all know that there are only two words in the english language that contain a single character and they are "A" and "I". This character may be either, but we know it must be one of the two. But before we get into that, lets look at some more bit lengths:
As you can see, there is no predictability in the denominators for each word length. This means that every character has a varying number of bits. So let's go back and look at our smallest character discovery.
I have armed myself with a text copy of Carl Sagan's "Pale Blue Dot". Carl, as you may or may not be aware, was a co-founder of the Planetary Society who is sponsoring this contest, before he passed away in 1996. Anyway, as I mentioned, the character must either be an "a" or an "I". So I performed a comprehensive search of the text for the string " I " - naturally this character string pops up frequently when you're discussing your own thoughts, but not nearly as much as the phrase " a ". As it turns out, there are only two sentence fragments in the book that contain five words where the middle word is "I", but as it turns out neither ends with a period punctuation. For this reason it is more likely that the character is an "a". So let's play around with "a" for a bit: we're going to see what other occurrences of the "a" bitstream are contained within the words:
[SPACE] = 00011000
A = 1110010
SECTION 1:
[011110111100000101101110101010110010001110000]
[ ]
[111100001000011010101 A 001110101101100000111101101001110110101111001]
[ ]
[ A ]
[ ]
[011001100100010101110000101011000000000 A 10111100000111010101110110111000 A 101011000110110011111101]
[ ]
[1110110000110100010100101111001].
SECTION 2:
[ ]
[ A 00001111011110000001110111011111011011011110011000
1110000101100],
Let me continue by saying I have NO IDEA if I'm on the right track here, I'm just following my nose. But Substituting all the "A" strings in the text has yielded a fair number of "A"s! Moreover, one of them is in such a position that it may help searching the text of Pale Blue Dot for ", A" (beginning of SECTION 3) where A begins a 3 or four letter word. The bitstream following the A in word 7 is 18 bits in length which means there are probably either two or three characters following assuming a 6 to 9 bit length for each remaining character. The word preceding the comma (WORD6) will also start with the letter "A" and be the first word in the sentence.
Strike out on that approach, no such sentence fragment exists. A further pisser is that the text does not contain a sentence matching the pattern "WORD6, WORD7 WORD8, WORD9 WORD10 WORD11 WORD12." There are six sentences beginning with ". WORD6, WORD7 WORD8, ", but none of them end correctly to match what we're looking for. So I guess we're back to square 1 (...?)
have you been to the matrix site? they have this whole elaborate thing with encryptions and binary codes. check out the attachment below. i dont know if it can help you or not.
That's a standard hexadecimal to binary conversion. Converts from a base-16 numeric system to a base-2. Useful for all types of computer engineering, but so far I haven't seen an indication that base-16 has any bearing on the code here.
Starting over with the first two clues. What if the notion that the first character in SECTIONs 2-4 is a SPACE is correct, but that the space character is much smaller than the 8 bits identified and that those three sections just have more in common than simply the space character? I've come to the conclusions that the logic is sound with the SPACE character, but that the character itself has not quite been isolated correctly. This is because a search of the Pale Blue Spot text reveals that there is no sentence ending in "A ____ ____." nor "I ____ ____." This means that the first sentence is NOT necessarily 5 words and that the "A/I" word there doesn't quite make the sense that it seemed to at first.
(Just so you know I'm using software to process the e-text of Pale Blue Dot, not just a feeble, manual search. It has pretty comprehensive pattern-matching capabilities.)
So, in an effort to determine the exact nature of the SPACE character, let's try some smaller bit patterns and see how our words break apart, beginning with the original 4-section break-down declared above. First, 7 bits:
We'll stop right there because SECTION 1 already displays the same pattern: 5 words with a single-character word in the middle which we have found not to exist in the source text. So let's try again with 6 bits:
This certainly seems to have changed things up, but we ended up with a funny double-space in the middle there which would suggest that it is not a correct translation. Furthermore, we end up with a significant number of VERY short bit strings which are unlikely to make things any better for our word options. Still, they could be trying to get tricky, so let's analyze this a bit more before moving on..
Here's some good news: they all have bit lengths that are evenly divisible by three. You might recall that the code we ultimately want to break also has a bit length evenly divisible by three.
This is our best lead yet: 3 is a major player in deciphering the code.
So now we'll break down the remaining three sections according to the newly defined space character:
Our mysterious double-space has appeared a couple more times, and now there is a glaring anomaly that popped up in section 4 with a 1-bit length word. For now we'll consider that it might be a glitch in the data - let's look at everything else we have before we discard this approach.
SECTION 4 WORDS:
4A = 12 bits
4B = 6 bits
4C = 6 bits
4D = 21 bits
4E = 12 bits
4F = 45 bits
4G = ??? Anomaly!
4H = 20 bits <- 1 missing bit to be a product of 3!
4I = 69 bits
EVERYTHING is miraculously a product of 3 except 4G and 4H, so I'm going to assume a mistake in the data and prepend 4G to 4H. The new section 4 looks like this:
MUCH BETTER! It's something of a cheat, but let's see how close we can get with it.
So now because everything is in nice, tidy groups of three, I'm tempted to convert everything into a base-8 (octal) notation which will be easier on the eyes for pattern recognition. I'm going to perform the conversion from left to right, 3 bits per octal digit where the right-most bit is the lest significant digit (LSD - compatible with typical engineering values and hence Microsoft's desktop calculator program). Because of the way we had that data error, and because of the peculiar double-spaces that are showing up, and now that we know everything is a nice, comfortable base-8, it is possible that the octal conversion is required BEFORE performing the pattern-match on the space character. This would eliminate funky matches. So I'm going to start with the original data, perform the base-8 conversion, then translate against our space character definition. Here's what the converted data set looks like:
LOVELY! Now we apply word breaks based on what we think is the "space" character which should give is much cleaner results because we can no longer break any character-boundaries which we surely were before. The space character is 6 bits "000110" which translates in octal to "06". Because the space character is 6-bits, or two-characters in octal, it is likely that all other "letters" are two characters as well. We have to make sure not to split a letter up with our space replacement, so we'll separate each letter from the next before doing the replacement:
Strangely, we have a stray nibble at the end of sections 1 and 4. This could be an indicator that our pretzel logic may be leading us astray once again. But, as usual, we'll ignore that for now. The base-8 thing appears to be working for us, and if necessary, we'll abandon the idea that a word is always two nibbles (as would seem to be the case evidenced by the strays). So pretending that didn't happen, here's what goes down when we break our sentences apart with the word separator "06":
Combined with the strays, unless something VERY sneaky is going on in the encoding (like substitution of single-codes for complete common words), Section 4 now shows a significant development that decreases the likelihood that we're going down the right path. Not only has our double-space returned to haunt us, but there is a series of two single-letter words in a row separated by spaces which couldn't possibly make sense in the english language. How could " a I " or " I a " ever make sense in any meaningful context? Furthermore if either 17 or 23 were characters "a" or "i" two of only five vowels in our written language, you would see them appear repeatedly elsewhere in the message, yet neither appears ever again!
SO - we're onto something with the base-8 stuff, but there's still something wrong with our definition for the word separator, and/or the way we're applying it, and/or our assumptions about how many 3-bit nibbles make up an actual word.
why do i have this strange feeling that leaving out those pieces of nibble will screw it up. it has to work, why would they just have it so there is extra pieces left... unless they are trying to decieve us.