22C:16 (CS:1210): Project 2, Stage 2 Clarifications

22C:16 (CS:1210) Project 2, Stage 2 Clarifications

Here are two test files for Project 2, Stage 2: test1.py, test2.py. The grading for Project 2 will be largely automated and these test files will be the primary vehicle for grading your program. It is therefore critical that what your functions return exactly matches what our solution returns. For example, the types of objects should match, the order of elements in lists should match, etc. You should submit your solution in a file called project2.py by 4:59 pm on Friday, May 10th. Here are some corrections and clarifications in response to questions asked in class and on the discussion board.

The handout asks that you implement a function with the header def evaluateCF(testSet, rLu, userList, occList): This is a typo; the last two parameters userList and occList should not be present and the function header should simply be: def evaluateCF(testSet, rLu):
The handout asks you to implement a function with the header: def topKMovies(u, userRatings, k, friends): This header a missing one parameter; the correct header is: def topKMovies(u, userRatings, numMovies, k, friends): The new parameter numMovies allows the function to generate all possible movie IDs (1 through numMovies), obtain a predicted rating for each of these movies and then pick out and return the top k movies.
Due to floating point errors the similarity between a pair of users might be slightly larger than +1 or slightly smaller that -1. This is okay and does not need to be fixed.
It is okay for predicted rating to be outside the range 1 through 5.
When computing the k nearest neighbors of a user u, the user u should be treated just like the other users and should not be explicitly excluded from inclusion in the nearest neighbors of u.
To perform the final evaluation of the collaborative filtering algorithm, use a set of friends for each user u that consists of the 75 nearest neighbors of u.
For each 80-20 split of the ratings, you should run the random rating algorithm (the one that returns a random integer in the range 1 through 5) 10 times and compute the average over these 10 repititions. Note that you will still be using ten 80-20 splits to test all 3 algorithms over.

There may be more clarifications and corrections in the next day. So please visit this page and the discussion board regularly. Also, don't hesitate to get in touch by in person or by e-mail with the instructors and TAs.

Last updated: Wednesday, May 8th