Assignment 1 Questions In the following, suppose that ct is a ciphertext, or, when there is more than one, ct1 and ct2 are ciphertexts. Let pt, pt1, pt2 be the corresponding plaintexts, or any standard text such as "blackcat.txt" by Edgar Allan Poe. ################################################################################ ################################################################################ Q: How to I read the ciphertext strings into SAGE? Note that you may read the strings in as 'string monoid elements': sage: S = AlphabeticStrings() sage: pt = S.encoding(open("Plaintext/blackcat.txt").read()) sage: ct = S.encoding(open("Ciphertext/cipher06.txt").read()) or as plain vanilla Python strings: sage: pt = strip_encoding(open("Plaintext/blackcat.txt").read()) sage: ct = strip_encoding(open("Ciphertext/cipher06.txt").read()) but remember in the former case to use apply strip_encoding to remove the end-of-line characters '\n'. Also some of the *'s below, for multiplication of 'string monoid elements', must be +'s, for Python string concatenation. If you use the former 'string monoid elements' then either of the following syntax can be used to compute frequency distributions: sage: X1 = frequency_distribution(ct,1) or sage: X1 = ct.frequency_distribution(1) and similarly for coincidence_index. ################################################################################ ################################################################################ Q: (Question 1) How do I break a Vigenere cipher? The first unknown you must determine is the period. For this you can use the coincidence index of the decimations for m in range(1,20): print "%s: %s" % (m, sum([ coincidenceindex(ct[i::m]) for i in range(m) ])/m) A high value, near 0.065, suggests the value of m is the period. X = frequency_distribution(pt) eps = 0.33 matches = [ [] for i in range(m) ] for i in range(1,m): Z = ct[i::m].frequency_distribution() Y = DiscreteRandomVariable(X,Z.function()) for j range(26): corr = X.translation_correlation(Y) if corr > eps: matches[i].append(j) matches ################################################################################ ################################################################################ Q: (Question 2) How do I break a substitution cipher? Single character frequencies should give you the image y of the character E, and guesses for the next most frequent characters. In order to break the substitution, you should apply the map x -> (P(x),P(xy|y),P(yx|y)) where y is the image of a known character, such as E. You can compare these vectors to the data in "digraph_frequencies.pdf" from the course web page. In what follows, suppose we find that y = 'C' is the image of 'E'. # Compute the 1 and 2-character frequency distributions: S = AlphabeticStrings() ct = S.encoding(open("Ciphertext/cipher06.txt").read()) X1 = frequency_distribution(ct,1) X2 = frequency_distribution(ct,2) # Determine the vectors (P(x),P(xy|y),p(yx|y)) in R^3 for ciphertext y = 'E': E = S('C') # 'C' is the ciphertext image of E V = VectorSpace(RealField(),3) AZ = S('ABCDEFGHIJKLMNOPQRSTUVWXYZ') XE_ct = [ V([X1(AZ[i]),X2(AZ[i]*E)/X1(E),X2(E*AZ[i])/X1(E)]) for i in range(26) ] # User digraph_frequencies.pdf: identify some isolated vectors of the form # (P(x),P(xy|y),p(yx|y)) in R^3 for x = 'E' -- set up as a dictionary {} of # key:value pairs which can be indexed by the keys. XE_pt = { S('H'):V([0.05744, 0.18046, 0.02125]), # expected image of 'H' S('R'):V([0.05674, 0.13071, 0.11352]), # expected image of 'R' S('S'):V([0.05669, 0.07553, 0.08050]), # expected image of 'S' S('T'):V([0.09364, 0.06919, 0.06151])} # expected image of 'T' # Define the norm of a vector to be the sqrt of the self-inner product: def norm(v): return v.inner_product(v).sqrt() # Search for vectors near to those for H, R, S, and T: for eps in [0.100,0.050,0.025,0.010]: print 'eps:', eps for x in (S('H'),S('R'),S('S'),S('T')): matches = [] for i in range(26): if norm(XE_ct[i]-XE_pt[x]) < eps: matches.append(AZ[i]) print '%s: %s' % (x, matches) Note that by forming the standard vectors for 'HE', 'RE', 'SE', 'TE' in XE_pt, and elements of XE_ct as vectors in R^3, we can take their differences as vectors, followed by their norm to determine what elements of XE_ct are close to each of the standard vectors. Once several characters are known, one has many choices of y to form the images x -> (P(x),P(xy|y),P(yx|y)), each of which can be used to determine (the preimage of) additional characters x. ################################################################################ ################################################################################ Q: (Question 3) There are some repeat entries in the rows defining the cryptosystem. Doesn't this mean that the cryptosystem is not a bijection? Yes, this is in fact true. This implies that there will be some plaintexts which can not be deciphered. This is not a good feature of a cryptosystem -- it decreases the entropy. However remember that some modes of operation for block ciphers only require an enciphering operation, so it is not strictly necessary that a cipher be injective. The entropy is still well-defined but lower than it should be (for a strong cryptosystem). Q: (Question 3 continued) How do I read in all of that data in a useful why in SAGE? # Everyone should have the same alphabet of 16 elements (and 8 keys): Alpha = ['NTH','AND','EST','ERE','ETH','THA','THE','TER','WIT','HIS','HAT','INT','ING','HER','HES','ITH'] # Read in the 16 probabilities of the 16 plaintext elements: Probs = open("Ciphertext/LastFirstMiddle.5.txt").read().split("\n") Set up a the dictionary of probabilities of these characters, e.g. using a Python dictionary type P = { 'NTH':P['NTH'], etc. }. Note that each line of Probs is of the form 'P(XYZ) = 0.nmpq' so split() of this gives ['P(XYZ)', '=', '0.nmpq'] and we us eval of the entry at position 2 (= '0.nmpq') to interpret it as a real number. P = { } # Python 'dictionary' for i in range(16): P[Alpha[i]] = eval(Probs[i].split()[2]) # To be really fancy, we can create this as a DiscreteProbabilitySpace: MSpace = DiscreteProbabilitySpace(Alpha,P) # Read in the 16 x 8 array of images ciphertexts E(K,M): Imags = open("Ciphertext/LastFirstMiddle.4.txt").read().split("\n") # The conditional entropy should be defined in terms of the Crypt = { } #[ '' for j in range(8) ] for i in range(16) ] for i in range(16): Crypt[Alpha[i]] = Imags[i].split() Now Crypt is the dictionary of ciphertext images for each message, such that Crypt['NTH'][0] determines the ciphertext image 'NTH' with binary key 000 = 0 and Crypt['NTH'][7] is the ciphertext image of 'NTH' with binary key 111 = 7. Now create CSpace as the DiscreteProbabilitySpace for the ciphertext space! Note that you can determine the entropy of a DiscreteProbabilitySpace (with entropy), but check that this agrees with the formula! Using Crypt as the cryptosystem map and MSpace, you should be able to easily calculate conditional probabilties and conditional entropies. ################################################################################ ################################################################################ Q: (Question 4) How do I break a substitution-transposition cipher? Break the transposition cipher first! Then reduce to a substitution ciphertext. Let coincidence_discriminant be defined as in the text, then you need to look for peaks in the coincidence discriminant some sequences of (i,j)-decimations of various periods m. You may fix i = 1 and run through various j (why?). ct = S.encoding(open("Ciphertext/cipher10.txt").read()); eps = 0.0080 N = len(ct) for m in range(2,20): i = 1 for j in range(2,m): ctij = [ ct[i+k*m]*ct[j+k*m] for k in range(N//m) ] dsc = coincidence_discriminant(ctij) if dsc > eps: print "(1,%s) mod %s: %s" % (j, m, dsc) Once you have decided on the correct value of the period m, you can repeat this loop with m fixed and varying (i,j), with i < j (why?): m = 11 # conjectural period for i in range(m): for j in range(i+1,m): ctij = [ ct[i+k*m]*ct[j+k*m] for k in range(N//m) ] dsc = coincidence_discriminant(ctij) if dsc > eps: print "(%s,%s) mod %s: %s" % (i, j, m, dsc)