Homework 5

We are given some data which consists of points of the form (x, f(x)), where x ranges from 1 to N. For x1 < x2, the slope of the line through (x1, f(x1)) and (x2, f(x2)) is (f(x2) - f(x1))/(x2 - x1). Let S be the set of N(N-1)/2 slopes corresponding to all pairs of points. We would like to find the median of S.

If we view the data as a univariate time series, this median is known as the Sen slope, which is a statistic of the rate of change widely used in trend analysis.

For our purposes, it suffices to find an approximate median of S with high probability, as described here. You can assume that the data is given in a file whose first line specifies N. There are N subsequent lines, with the x-th subsequent line specifying f(x), as in this example for N = 5.

You need to implement (a) an algorithm for computing the exact median of S; (b) an algorithm for computing the approximate median of S; (c) describe the running times (in milliseconds) of both implementations for sample data with N = 10000. I will post some sample data on this page shortly. For (a), it is fine to use existing sorting algorithms for sorting S. For (c), you can ignore the time for reading the data from the file into your program.

Please submit into the dropbox folder in ICON named "Homework5". Submit your source files, a file with instructions on how your program can be run, and a file with the running times. The deadline is 11:59 pm on April 22.

Update (04/17): Sample input files points1.txt and points2.txt.