Protein folding assignment

Protein folding is a heavily studied question in biology: Given a chain of amino acids, how will the protein tend to arrange itself physically? The physical arrangement of the protein has important consequences for what the protein does in a biological system.

Computing the physical arrangement is very difficult. Some researchers instead examine a highly simplified model called the HP model, in which each amino acid appears on a grid, with connected amino acids placed at adjacent grid points. (This simplification is completely unrealistic, but the problem is still interesting.) We classify each amino acid into one of two categories — some are hydrophobic (represented with a capital H) and others are polar (capital P). The hydrophobic amino acids are non-polar and so despise being next to polar acids and water molecules. Thus, in a water-based solution, the protein will tend toward arranging the hydrophobic residues next to each other. In the HP model, then, we ask: How can we place the protein chain onto a grid so as to maximize the number of adjacent pairs of hydrophobic amino acids?

As an example, let's consider the protein chain HPPHPPPHPPH. Two possible ways of arranging this protein on a grid are illustrated below.

    
score: 3 score: 2

The arrangement at left has a score of three, since there are three pairs of adjacent hydrophobic amino acids (each diagrammed with a gold-dashed line), whereas the second has a score of two. The first arrangement would therefore be preferred. In fact, there is no arrangement with a score better than three, so the first arrangement is optimal.

Finding the optimal protein arrangement — even using the much-simplified HP model on two-dimensional grid — is NP-complete. But because the problem is important, researchers still work finding fast algorithms.

For this assignment, I have produced a program that uses exhaustive search to compute and display the optimal arrangement under the HP model. This code consists of two files (available in Java or Python).

Protein.java protein.py Defines a class for representing a single protein-folding solution, including code to compute its score and to display it. Also, the class includes the main method. You should not modify this file.
Searcher.java searcher.py All search code is contained in this file.

Your assignment is twofold.