Homework 4
CS1210, Fall 2021
Due Tuesday, Oct. 5, 2021, by 8:00pm
6 points

1. (2 points) Write function q1(infoDict, listOfLists). listOfLists is a list of zero or more lists of zero or more numbers. infoDict is a dictionary with numbers as keys. The values associated with keys can be of any type.

A number is considered "red" if the number is a key in the dictionary and has value "red". A number is considered "blue" if it is a key in the dictionary and has value "blue". Other numbers, whether or not they exist as keys in the dictionary, are considered "green".

A sublist of listOfLists is considered "red" if contains more red items than blue ones and more red items than green ones. A sublist is considered "blue" if it contains more blue items than red ones and more blue items than green ones. Other sublists are considered "green".

q1 returns a list that is the same length as listOfLists and such that item i is For example
>>> q1( {1:"purple", 2:"red", 3:"blue", 25:"red"}, [[4], [2,3,3], [1,2,3], [17]] )
[4, 2, 1, 17]
NOTE: you may not use Python's .count() method, nor the min() or max() functions. You may use the .sort() method or sorted() function.

2. (4 points)
a. Write program q2(filename, minWordLengthToConsider = 1) that analyzes word frequencies in real-world text messages.

Text file SMScollection.txt contains 5574 SMS messages. There is additional information about the contents of the file in the associated "readme" file readmeSMScollection.txt, written by the creators of the dataset. The data was originally from this no-longer-working link: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/. Some information about the data set and its initial investigators is now here.

Each line of the file is represents one SMS/text message. The first item on every line is a label - 'ham' or 'spam' - indicating whether that line's SMS is considered spam or not. The rest of the line contains the text of the SMS/message. For example:
spam  Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! Call ...
ham	Sorry, I'll call later in meeting.
At the end, your program must print summary information, including at least: Feel free to compute and print out additional information as well.

To accomplish this, your q2 function should:
b. Write a sentence or two saying something about the results. Can you conclude something about spam vs. non-spam? Did you learn something? Put this answer in a comment just before your q2 function. Eg.:
# 2b. ... your answer here ...
#    ....
#
def q2(filename, minWordLengthToConsider = 1):
 ...


Submit to ICON exactly one Python file. The file must not contain any code (except possibly "import math") that is not part of a function definition.