Part One
The task here is to add a method mst() to this
undirected graph class that computes
the minimum spanning tree of the graph and returns the edges in
this spanning tree as a linked list of its edges. You should implement
Kruskal's algorithm to implement this method. Kruskal's algorithm
is described in Section 9.5.2 of the text.
Kruskal's algorithm first sorts the edges by increasing cost. For
this, you should simply invoke a library sort method and not implement
one yourself.
You also need a Union-Find data structure for maintaining the connected
components. For this, you should implement the Union-by-Size scheme
from Section 8.4. All of this was discussed extensively in class on
April 26.
Feel free to add other required fields and methods to the Vertex, Edge,
and Graph classes. Part One is worth 10 points.
Part Two
Here, you will apply the mst method to the document similarity setting
that we looked at in homework 7. We create a graph which has a
vertex for each of the ten documents. Between every pair of vertices,
there is an edge whose cost is the dissimilarity between the
two corresponding documents. The dissimilarity is defined to be
1 minus the similarity. The code for creating this graph is given here , along with a copy of the undirected graph
class. Once you have finished Part One, you should incorporate the
modifications into this undirected graph class.
Run the mst method on the graph constructed here. From the minimum spanning
tree, remove the four costliest edges. This will yield four components.
Output the names of the files in each component.
What we have described here is a well-known way of clustering using the
minimum spanning tree. The choice of four made above is somewhat arbitrary.
Update 1: You can write your own sorting method if that is more convenient.
Update 2: If we remove four edges in the minimum spanning tree, we get 5
components, not four.
Update 3: After obtaining the min-spanning tree for the 10-node graph
on the files, you can do the rest of the stuff manually, just to save
time. So print the edges of the min-spanning tree, and just report
the 5 components by manual inspection. You can submit a text file with
these 5 components.
Update 4: Using a library sort on an array
of objects belonging to a class we define.