Homework III

 

 

1. [25 points] -- #4 continued from last week

Modify the Wren scanner (file scanWr in our class directory) so that it ignores comments that are written to begin with a '#' character and extend to the end of the line. Your submission for this problem should consist of a printout that contains a listing of your code, plus thorough testing of all parts of the code that you have changed or added. In addition, you should submit your code file (only) electronically to the directory Hwk3 for class c185 (follow the link on the class web page for the description of the submit command if you have not used this previously).

 

 

2. [35 points]

This problem is inspired by the field of DNA mapping. DNA is the material that forms the chromosomes of all known life. DNA consists of two long sequences of the molecules adenine (a), thymine (t), cytosine (c), and guanine (g), called bases, organized in a double helix shape. Finding the ordering of the bases in an organism’s DNA strand is called mapping the DNA.  The Human Genome project is a large international effort devoted to mapping the DNA of humans and the mapping has recently been completed.  DNA strands can be very long -- around three billion bases for humans. To obtain a manageable problem, these long strands are broken into small pieces called clones.  Clones from the same section of the strand are called contigs, and an important problem is the reconstruction of an original chromosome from its contigs. Since contigs may overlap, the problem is to find the minimal length sequence that the contigs yield when overlaid. In this way longer and longer segments are gradually identified.

 

For instance the three contigs

   a-a-a-a-t-c-g

   a-t-c-g-g-g-c

   g-c-c-a-t-t

overlay as

   a-a-a-a-t-c-g

         a-t-c-g-g-g-c

                   g-c-c-a-t-t

to yield the overall sequence

   a-a-a-a-t-c-g-g-g-c-c-a-t-t.

Is this the minimal length sequence for these contigs?

 

This homework is to write a Prolog program to solve a simplified version of this problem. There will be two lists of Prolog atoms, and the object is to determine the shortest list in which they each appear as sublists. You should construct a Prolog program contigs(As,Bs,Cs) that succeeds with Cs as a shortest list containing each of the given lists As and Bs.

 

Of course, there is always some list in which each of the two lists occurs  as a sublist -- concatenating the lists together yields such a list. But this is an unacceptable outcome if there is any way to do better -- it is imperative that a shortest containing list (i.e., the one with the greatest overlap) be obtained.

 

Achieving an efficient solution to the DNA mapping problem is known to be a very difficult since there may be a huge number of contigs and they can be very long. Our specialized homework version avoids these difficulties. Also the high level of generality and abstraction of logic programming can make challenging programs substantially easier to construct. In particular, the general list predicates provided in the utilities file, plus the built-in list length predicate will be indispensable aids in attaining a solution for this problem. The key to this problem is to analyze the various ways in which lists may overlap, and find the greatest overlap that can occur in these various configurations.

 

As it runs normally, both SICStus and SWI Prolog truncate the printing of long lists (the lists are computed, just not fully printed). This is unsatisfactory for this problem. To ensure that SICStus Prolog prints out long lists in their entirety, enter the query

?- prolog_flag(toplevel_print_options, _,

               [quoted(true),numbervars(true),portrayed(true)]).

You can accomplish this effect simply by consulting the utilities file, or use the predicate longLists/0 defined there that simply makes this call. This change will be effective throughout the Prolog session in which it is made. For SWI Prolog, the process is simpler – at the output pause of a truncated list, just hit the 'w' key and the full list is printed.

 

Use the submit command to submit your program to the Hwk3 directory. Your program should contain copies of all the non built-in predicates it uses, and the main predicate should be named 'contigs'.

 

Your program should correctly solve the contigs problem for any pair of lists.

The pairs of lists below are some samples on which your program can be

tested. However, depending on how your program is written, these particular examples may not test all your code -- you are responsible for creating test cases that exercise all the code you create, and for providing an analysis that justifies that this complete testing has been accomplished.

  (a) [a,a,b,b,a,b,b], [b,b,b,a,a,b,b]

  (b) [b,c,a,c,c,b,c,b,b,a,a], [a,b,b,c,a,c,c,b,c,b,b,a,a,b,c]

  (c) [a,b,b,c,c,c,a,a,a,a], [b,c,a,a,b,c]