Arbitaray letter substitution cyphers can be automatically broken by taking advantage of the known letter frequencies of the language being encrypted. The frequencies for the English language, based on a sample of a few hundred thousand letters, are given in the file freq
Your assignment is to write a program that uses this data to crack the code for arbitrary letter substitution cyphers. The input data you will be given will be encrypted so that each letter is replaced with some other letter. Distinctions between upper and lower case letters will be preserved, and punctuation and numerals will not be scrambled.
In fact, the encrypted data will be constructed using MP3 to perform the letter substitutions.
The following basic algorithm will work for this problem:
Your program should put a two line title on the output data to indicate the substitution it used. These two lines should be the same as the key file that could be used with MP3 to decrypt the data. Assuming that the original input was encrypted using MP2 (ROT13), the title might be:
RGNBVAFUEQYHZJPTSLCOXIWKMDrgnbvafueqyhzjptslcoxiwkmd ETAOINSHRDLUMWCGFYPBKVJXZQetaoinshrdlumwcgfypbkvjxzqNote that the second line of this title contains the letters in the frequency order shown above, while the first line shows the letters in the frequency order observed in the input data.
Note that, unlike MP4, your solution to MP5 is unlikely to completely decrypt the sample data, but it is likely to get enough right that you can figure out the rest by hand.
This problem is worth 5% of your grade. If you get your assignment to work perfectly, you will earn only half of credit. The other half will depend on the style of your code and commentary.