End-Game, Text

Protein Folding: The End-Game

Michael Levitt, Mark Gerstein, Enoch Huang, S. Subbiah* & Jerry Tsai

Department of Structural Biology, Stanford University School of Medicine, Stanford, California 94305

* Present address: Wistar Institute, 3601 Spruce St., Philadelphia, PA 19104 and Bioinformatics Center, National University of Singapore, Kent Ridge, Singapore

KEY WORDS: protein folding, packing, side-chains

Send Proofs to: Michael Levitt
Department of Structural Biology
Stanford University School of Medicine
Stanford, CA 94305
Phone: (415) 723-6800
Fax: (415) 723-8464
e-mail: levitt@hyper.stanford.edu

INTRODUCTION 2

WHAT CAN WE LEARN FROM X-RAY STRUCTURES? 4

How Is Packing Characterized? 4

How Tightly Packed Is The Protein Core? 6

How Tightly Packed Are Other Parts Of Proteins? 8

WHAT DOES EXPERIMENT HAVE TO SAY? 11

Proteins Are Well-Packed in Solution 11

Good Packing Leads to Greater Stability 12

How Does Good Packing Arise? 12

Folding Pathways 13

Equilibrium Experiments- The Molten-Globule State 15

Kinetic Experiments 19

Conclusions From Experiment 21

MODELING THE PACKING OF SIDE-CHAINS 22

Early Work- Defining The Problem 22

A Possible Solution? 24

Recent Refinements- A Classification 25

Assessing the Accuracy 28

Why Is This An Easy Problem? 30

Main-Chain Movement 30

GETTING TO THE END-GAME 31

How Close Is Close Enough? 31

Threading Methods 31

Ab Initio Folding 32

Discrete-State Models And Energy Functions 33

Energy Minimization And Search Strategies 35

Discriminating Native From Near-Native Conformations 38

CONCLUSIONS 41

ABSTRACT

The last stage of protein folding, the "end-game", involves the ordering of amino-acid side-chains into a well-defined and closely packed configuration. We review a number of topics related to this process. We first describe how the observed packing in protein crystal structures is measured. Such measurements show that the protein interior is packed exceptionally tightly, more so than the protein surface or surrounding solvent and even more efficiently than crystals of simple organic molecules. In vitro protein folding experiments also show that the protein is close-packed in solution and that the tight packing and intercalation of side-chains is a final and essential step in the folding pathway. These experimental observations, in turn, suggest that a folded protein structure can be described as a kind of 3D jigsaw puzzle and that predicting side-chain packing is possible in the sense of solving this puzzle. The major difficulty that must be overcome in predicting side-chain packing is a combinatorial "explosion" in the number of possible configurations. There has been much recent progress on overcoming this problem, and we survey a variety of the approaches. These differ principally on the whether they use ab initio (physical) or more knowledge-based methods, on how they divide up and search conformational space, and how they evaluate candidate configurations (using scoring functions). The accuracy of side-chain prediction depends crucially on the (assumed) positioning of the main-chain. Methods for predicting main-chain conformation are, in a sense, not as developed as that for side-chains. We conclude by surveying these methods. As with side-chain prediction, there are a great variety of approaches, which differ in how they divide up and search space and in how they score candidate conformations.

INTRODUCTION

The end-game of protein folding refers to the final stage in the folding process. At this point, it is believed that the overall fold has already been determined and the side-chains are close to their final positions. The previous steps in the folding process, especially those that determine the shape of the overall fold, are thought to be greatly, if not completely, dictated by hydrophobic interactions . However, here we argue that the end-game transition to the native structure is governed by somewhat different interactions, tight close-packed contacts between amino acid side-chains. The creation of these contacts has been compared to crystallization . Clearly, such tight packing is related to the most important characteristic of native protein structures, their unique and precisely determined yet highly complex three-dimensional shapes. The packing process is likely to be energetically difficult as side-chains prefer to be disordered. The process, therefore, will have a high activation barrier, and will be slow.

Packing as a phenomenon is easily visualized and is commonplace in everyday experience. It is dominated by a simple universal energy term, the strong repulsion between atoms that approach too close to each other. Packing is a short-range phenomenon, which allows a more local treatment considering only surrounding neighbors. Richards was one of the first researchers to emphasize the importance of close packing in protein structure , and his point of view is becoming increasingly accepted.

With its focus on packing, this review on protein folding considers both experimental work and theory. In doing so, it is by necessity selective, particularly when considering the large body of theoretical and computational work. Attention is focused on the type of packing observed in proteins and on the prediction of packing for side-chains. We also focus on the more difficult problem of generating main-chain conformations close enough to make side-chain packing predictions possible.

We have deliberately not dealt with certain related topics including molecular dynamics simulation and homology modeling. Realistic molecular dynamics simulations of protein folding or unfolding in solution is not reviewed, in spite of the recent work in this field . Homology modeling is also not reviewed, in spite of the close connection to side-chain packing using a main-chain "borrowed" from a protein with a homologous sequence. The intent here is to concentrate more on basic principles rather than on applications. Furthermore, homology modeling has been recently reviewed .

This review is divided into a number of sections. The first section deals with the observed packing in protein structures as determined by x-ray crystallography and shows that proteins are more tightly packed than almost any other organic matter. The second section extends the review of experimental work to solution studies by relating close-packing to stability and considering how such close-packing arises during the folding process. The third section shows how close-packing has lead to the effective solution of the side-chain prediction problem when a sufficiently native-like main-chain conformation is known. The forth section considers how to generate sufficiently accurate main-chain conformations, primarily by searching large spaces of possible conformations with appropriate energy functions.

WHAT CAN WE LEARN FROM X-RAY STRUCTURES?

The best source of information on packing in protein molecules comes from the hundreds of highly refined high-resolution protein structures that have been determined over the past three decades. These structures show a high degree of order in all the residues, except occasionally those on the surface of the protein.

How Is Packing Characterized?

The packing efficiency of a given atom is defined as the ratio of the volume of its van der Waals (VDW) envelope to the amount of space it actually occupies . This simple definition masks considerable complexity. First of all, how does one determine the volume of the VDW envelope ? This obviously requires knowledge of what the VDW radii of atoms are, a subject on which there is no universal agreement . This fact is particularly true for water molecules and polar atoms . Second, how does one determine how much space an atom occupies? Or, equivalently, how much additional "cavity" volume to associate with a particular atom in addition to its envelope volume? These later questions can be addressed by a variety of geometric constructions which will be discussed in the following section.

The absolute packing efficiency of an atom is most useful in a comparative sense, e.g. when comparing equivalent atoms in different parts of a protein structure. In calculating the ratio of packing efficiencies, the VDW envelope volume remains the same and cancels. One is left with just the ratio of space that an atom occupies in one environment to that it occupies in another.

VORONOI CONSTRUCTION Voronoi volume calculations are geometrically rigorous methods that determine how much space an atom occupies. These calculations were originally developed by Voronoi . They were first applied to molecular systems by Bernal & Finney and to proteins by Richards . Since then they have been used successfully in the calculation of standard volumes of protein residues, in characterizing protein-protein interactions, in understanding protein motions, and in analyzing cavities in protein structure . They have also been used in the analysis of liquids , and the faces of Voronoi polyhedra have been used to characterize protein accessibility and to assess the fit of docked substrates in enzymes .

The Voronoi procedure allocates all space amongst a collection of atoms. Each atom is surrounded by a polyhedron and allocated the space within it. The faces of Voronoi polyhedra are formed by constructing dividing planes perpendicular to the interatomic vectors between atoms, and the edges of the polyhedra result from the intersection of these planes.

The Voronoi procedure requires the location of all neighboring atoms. This is possible in the protein core, but on the protein surface many of the neighbors of a protein atom are water molecules, which are often not well-localized in crystal structures. A variety of approaches have been developed to deal with this. The simplest is to surround the protein with a shell of water molecules generated on a regular grid . It is also possible to use pre-defined boundary shapes (such as the snub cube) to truncate the "open" polyhedra at the protein surface . This sort of truncation can be smoothly and rigorously achieved by using a particular generalization of the Voronoi construction, called the alpha-shape . In molecular dynamics simulations employing periodic boundary conditions, all atoms are completely surrounded by solvent, circumventing this problem .

OTHER CONSTRUCTIONS There are a number of additional methods for measuring volumes and packing that are not based on Voronoi polyhedra. Connolly developed a method for the determination of volumes based on the direct integration of the space inside of the molecular surface envelope . Gregoret & Cohen developed a simplified way of evaluating the packing in a structure at a residue, rather than at atomic, level.

All the other approaches have concentrated on the explicit identification and measurement of cavities in protein structures . The advantage of cavity identification algorithms is that the exact location of cavities is often of great interest. However, as the association between a particular cavity and a particular protein atom is somewhat arbitrary, one cannot directly calculate packing efficiencies for individual atoms as with the Voronoi procedure. Another difficulty with cavity identification algorithms is that many of these algorithms model cavities in terms of idealized spherical shapes. This does not allow a complete partition of space; after the volumes of the spherical cavities and the atoms’ VDW envelopes are accounted for, there is still left over space.

How Tightly Packed Is The Protein Core?

Packing calculations on protein structure were done first by Richards more than two decades ago and then soon after by others . These initial calculations revealed some important facts about protein structure. First, in the protein core, atoms and residues of a given type have a roughly constant (or invariant) volume. This is because the atoms inside proteins are packed together tightly, with the interior of the protein better resembling a close-packed solid than a liquid or gas. This high packing efficiency of internal protein atoms is roughly what is expected for the close-packing of hard spheres (0.74).

More recent calculations measuring the packing in proteins have shown that the packing inside of proteins is somewhat tighter than observed initially (~4%) and that the overall packing efficiency of atoms in the protein core is greater than in crystals of organic molecules. When molecules are packed this tightly, small changes in packing efficiency are quite significant. In this regime, the limitation on close-packing is hard-core repulsion, so even a small change is quite substantial energetically. Furthermore, Richards & Lim pointed out that the number of allowable configurations that a collection of atoms can adopt without hard-core overlap drops off very quickly as these atoms approach the close-packed limit.

The exceptionally tight packing in the protein core seems to require a precise jigsaw-puzzle like fitting together of the residues inside of proteins. This appears to be true for the majority of atoms inside of proteins . However, there are exceptions, and some studies have focused on these, showing how the packing inside proteins is punctuated by defects or cavities . If these defects are large enough, they can accommodate buried water molecules .

Researchers using highly simplified two-dimensional lattice models to study protein structure have pointed out that tight packing in the protein core may drive or force the formation of secondary structures . This conjecture has been tested on somewhat more realistic off-lattice models of protein structure . The results have been mixed in the sense that they do observe high packing density driving the formation of secondary structure, but to a much lesser degree than in the lattice models.

How Tightly Packed Are Other Parts Of Proteins?

THE SURFACE Measuring the packing efficiency inside of the protein core provides a good standard, and a number of other studies have looked at this in comparison to other parts of the protein. The most obvious thing to compare with the protein inside is the protein outside, or surface. This is particularly interesting from a packing perspective since the protein surface is covered by water and this substance is known to be packed much less tightly than protein and in a distinctly different fashion (the tetrahedral packing geometry of water molecules gives a packing efficiency of ~0.34, less than half that of hexagonal close-packed solids .

Calculations based on crystal structures and simulations have shown that the protein surface has an intermediate packing, being packed less tightly than the core, but not as loosely as liquid water . One can understand the looser packing at the surface than in the core in terms of a simple trade-off between hydrogen bonding and close-packing. In the absence of any other interactions than van der Waals attractions and repulsions, liquids (and solids) tend to close-pack and the geometry of their interaction can be described simply in terms of a simple hard-sphere (i.e. billiard-ball) model . However, if there are also highly directional interactions, such as the hydrogen bond in water, the situation is more complicated. Often the close-packing has to be explicitly traded-off to maintain hydrogen bonding. This can be visualized in simulations of the packing in simple toy systems .

An important aspect of the looser packing at the protein surface is how this packing is expected to change when the protein surface binds to another molecule, particularly another protein. Calculations measuring the packing in protein-protein interfaces have been done, such as those in an antibody-antigen and protease-inhibitor complexes . These have shown that the packing at protein-protein interfaces is roughly comparable to what is found in the protein interior, and is tighter than the packing usually observed at the surface. Thus, the formation of a close-packed interface may be a driving force in docking. Simple shape complementarity (in the sense of a close-packed jigsaw puzzle) is an integral part of many docking programs .

INTERNAL INTERFACES It is also of interest to compare the packing at various internal interfaces inside of proteins, particularly at domain-domain interfaces. Such comparisons are often closely coupled with analysis of protein flexibility.

It has been argued that motion is possible across a close-packed interface such that the close-packing is maintained throughout the motion. To prevent the atoms from bumping into one another, the motion has to be fairly small and parallel to the plane of the interface. There can not be large torsion angle changes, so side-chains maintain the same rotamer configuration . A large motion is achieved by concatenating many of these small motions at many different interfaces. This sort of small, sliding motion has been dubbed "shear motion" . It has been carefully documented in numerous cases (see Gerstein et al. for a list). Moreover, physical studies have shown that a folded protein does not have a single perfectly defined conformation . Rather, it has some intrinsic flexibility and can readily jump between many nearly energetically identical micro-states without significantly changing its packing. This sort of small-scale flexibility is what makes shear motions possible.

Following a somewhat different line of reasoning, it has also been proposed that certain interfaces may be particularly mobile, precisely because they contain defects and are not close-packed. This idea was suggested in the 1970’s . Since then a number of workers have noted that there are relatively more cavities at interdomain interfaces than elsewhere on protein interiors. Hubbard & Argos , in particular, claim that these cavities have a functional role in the mechanisms of protein movements.

Packing is also expected to be important in protein motions involving hinges. Numerous studies have emphasized how critical the packing at the base of the hinge is (in the same sense that the "packing" at the base of an everyday door hinge determines how easily the door can close ). Hinge motions often involve creating a new protein-protein interface (e.g. a new domain-domain interface is formed during hinged domain closure). Calculations have shown that these interfaces are close-packed in the same manner as the interfaces involved in protein-protein recognition . They, likewise, suggest that the formation of a new close-packed interface may be a driving force in the motion.

WHAT DOES EXPERIMENT HAVE TO SAY?

Clearly proteins are close-packed in the crystal state. Such close-packing is also seen in protein structures determined in solution by nuclear magnetic resonance, but these proteins are generally rather small (<100 residues) and do not always have a large core region. This section further considers proteins in solution . We first examine whether close-packing stabilizes proteins in solution and then review experimental work on how proteins fold to achieve such close packing.

Proteins Are Well-Packed in Solution

In solution, volumetric studies of both amino acids and whole proteins have been common. The most recent by Chalikian, et al. is quite comprehensive, covering 15 proteins between a temperature range of 18 to 25 ˚C. The results show that studying whole proteins is more accurate than considering just the amino acids. In doing so, the authors derive some useful relationships based on the molecular weight of a protein without knowledge of the crystallographic data. As rough estimates, the van der Waals volume Vw, the molecular volume Vm, and the accessible surface area Sa can all be related to the molecular weight MW as follows (in Å3): Vw = (1100( ± 300)) + (0.77( ± 0.01))MW; Vm = (1200( ± 500)) + (1.04( ± 0.02))MW; and Sa = -(1200( ± 200)) + (14.5( ± 0.25))MW2/3. The authors also show that packing efficiencies are relatively constant between 0.72 and 0.78. This range is very similar to the previously mentioned packing efficiencies computed from protein structures solved by x-ray crystallography .

Good Packing Leads to Greater Stability

Improving the packing of the protein interior has recently become a method for increasing stability . Nature uses this principle in the design of thermophillically stable proteins , and several groups have successfully applied it to protein design. Thus far, researchers have been able to create more stable proteins by intentionally increasing the packing efficiency for ribonuclease H1 , T4 Lysozyme , and l-repressor . Most recently, Munson et al. have re-engineered the internal packing of the four-helix-bundle protein, Rop. Their results further support the idea that increasing the core packing efficiency can increase stability; however, sometimes the increased stability caused a decrease in function. In a related experiment, Ramachandran and Udgaonkar added significant non-polar volume to the core of the protein barstar by chemically modifying its two free cysteines. They showed that the change caused an increase in protein stability without a decrease in activity or major alteration in structure as measured by circular dichroism. Until the crystal structure of the altered barstar is solved, they reason that this extra stability might be attributed to increased core packing efficiency.

How Does Good Packing Arise?

From an unfolded conformation, proteins must somehow establish their high degree of side-chain packing. Two descriptive models of protein folding, initially proposed in the early seventies, provide insight into this process. The nucleation model argued that protein folding begins with a kernel of residues making specific native-like contacts. Once the protein forms this rate-limiting configuration, the remaining structure quickly folds into place. Alternatively, in the hydrophobic collapse model the protein first aggregates its non-polar groups to form a structure with a loose hydrophobic core. Then secondary structural elements develop around this core, known as a molten globule, which finally folds in a slow step to form the tightly packed native structure. As a slightly different formulation of the hydrophobic collapse model, the framework model has the secondary structure forming first, and then the hydrophobic groups aggregating. Therefore, in the nucleation model, the tight packing forms rapidly with no intermediates, whereas for both collapse models, the tight packing occurs only after the formation of the molten globule.

Folding Pathways

Before proceeding further, we should mention the debate concerning whether the molten globule is an intermediate on or off the folding pathway (for a review see ). Studying the kinetics of intermediate formation can distinguish between each of the possibilities. Put simply, if the molten globule is part of the folding pathway, accumulation of it speeds up the formation of the native conformation (the folding rate is proportional to the fractional concentration of the intermediate). For off-pathway molten globules, formation of these structures inhibits the formation of the native conformation since the protein must fold back through the unfolded state to reach the native one (or the folding rate is proportional to 1 minus the intermediate's fractional concentration). Alternative or parallel pathways show a certain fraction of the unfolded species fold quickly into the native state while the remaining molecules follow a slower on-pathway model. The same group has gone on to show that the molecules on the slower pathway form an intermediate with helical secondary structure that is just slightly more energetically stable than the unfolded state, and this minor increase in stability retards the folding reaction .

Furthermore, in most all of the equilibrium and kinetic studies, the authors assumed a sequential pathway for protein folding. This view assumed that folding proceeds similarly to a chemical reaction . The intermediates along this path direct help guide the protein to its native state . More recent theoretical developments suggest that folding follows along an energy landscape (for a review see ). In this model, the intermediates arise due to kinetic traps where the protein is actually slightly misfolded. To continue, the protein needs to unfold only somewhat. The model is able to explain the behavior of fast folding proteins, which fold on the order of milliseconds instead of the usual seconds without distinguishable intermediates . Because these proteins are too small to form stable intermediates, they avoid the kinetic traps and therefore fold directly to the native state. Another way to view the rapid folding of small proteins is that the combinatorial search for correct side-chain packing in a small protein is much simpler and faster than in a large one. Baldwin notes that this model could be thought of as an extension of the jigsaw puzzle folding model . Here, the initial starting state is not fixed and energetics is coupled to a certain amount of randomness determine the folding pathway.

Equilibrium Experiments- The Molten-Globule State

The molten globule has yielded a great deal of experimental information regarding the structure of intermediates during protein folding. This conformational state, an equilibrium folding intermediate induced under mild denaturing conditions, consists of the following characteristics: (1) it is less compact than the native state, (2) it is more compact than the unfolded state, (3) it contains extensive secondary structure, and (4) it has loose tertiary contacts without tight side-chain packing. Recently, increasing evidence supports the idea that the molten globule may possess defined tertiary contacts (for a review see ). It has been argued that the molten globule state contains water molecules or is "wet" , but an experiment by Kiefhaber et al. found that an unfolding intermediate with molten globule attributes is dry. Strong support for either case has yet to be found. Beyond these similarities, the molten globule conformations are very diverse between proteins and even between different molten globules of the same protein . For this reason, we have decided to discuss each molten globule system individually.

CARBONIC ANHYDRASE The low pH form of carbonic anhydrase shows characteristics of a molten globule . Like others, this molten globule resembles a kinetic folding intermediate . Besides the molten globule, carbonic anhydrase provides evidence for an interesting second equilibrium intermediate . Because this state occurs at higher concentrations of denaturant and is less compact than the molten globule, the authors believed that it represents a pre-molten globule. They also show that this intermediate still contains considerable secondary structure and liken it to the burst intermediate seen in kinetic studies .

a-LACTALBUMIN The protein a-lactalbumin can produce two forms of the molten globule under different conditions: at low pH (acid form) and at neutral pH in the absence of calcium (apo form), both of which have been well characterized . Dissecting the protein to study only the alpha helical domain, Peng and Kim showed that at low pH, this domain contained enough of a tertiary fold that native disulfides could be found when they oxidized a reduced species in the molten globule state. Along with CD and NMR data, the authors believe that the molten globule is an expanded native state with no specific side-chain interactions. Further investigation by the same group showed that the beta sheet domain is largely unstructured in the low pH molten globule . Such a bipartite structure is interesting since small angle solution x-ray scattering showed a unimodal distribution, which implies that the molten globule is roughly spherical in solution . Using Raman optical activity measurements and studying both a-lactalbumin molten globules, Wilson et al. also found that both molten globules are native-like, but that the apo form is less sensitive to temperature denaturation since it is more ordered.

CYTOCHROME C Cytochrome c requires low pH and addition of salt to form a molten globule . The salt screens repulsive electrostatic interactions caused by the acidic conditions and allows the protein to collapse. This state has been characterized as possessing an increased volume and increased compressibility . Jeng et al. have shown that the N- and C-terminal helices are responsible for most of the molten globule's secondary structure. These two helices form during the early stages of folding and contact each other in the native structure. Two groups have shown that packing interactions between these terminal helices are just as important to the stability of the molten globule as they are to the native state. They mutated residues important to the interaction of the N- and C-terminal helices and found destabilization of both the native and molten globule states. This result implies that the molten globule of cytochrome c uses some native packing contacts for stability. As an overall picture, results from small angle X-ray scattering suggest that the cytochrome c’s molten globule best fits a structure containing a compact core with random coils extending from it.

MYOGLOBIN Depending on its environment, myoglobin in its apo form can fold into a number of molten globular states. Like cytochrome c, apomyoglobin collapses from a largely unfolded conformation at pH 2 into a molten globular form upon addition of salt . This form of the molten globule is assumed to be similar to the one at pH 4.2 in the absence of salt and has been characterized by Hughson et al. Their NMR analysis showed that the A, G, and H helices arrange themselves in a native-like conformation. These helices also form during the initial stages of apomyoglobin refolding . In its folded state, these three helices pack against each other with large hydrophobic contact areas , while independently they have very little helical content. At pH 2 with sodium trichloroacetate, apomyoglobin forms another molten globule stable with more helical structure . This form is considered to be further along in the folding pathway . Studying both molten globular forms, Nishii et al. found cold and heat denaturation of the two forms, indicating that hydrophobicity contributes to the molten globules' stability. Using small angle x-ray scattering to measure radius of gyration, they also showed that the molten globules were less compact. Hughson et al. mutated residues important to the packing between the A, G, and H helices of the pH 4.2 molten globule and found no perturbation of stability from acid denaturation. In fact, over-packing the interface caused an increase in stability. Approaching the problem from a different angle, Kiefhaber and Baldwin created mutations that increased the helical structure of the pH 4.2 molten globule. This mutant required higher concentrations of urea to become denatured from a molten globule state, showing that increasing the secondary structure stabilizes the molten globule.

So far, these studies suggest that myoglobin folds according to the hydrophobic collapse model, but work published this past year supports an alternate view. The mutational studies on the pH 4.2 molten globule were repeated by the same lab that performed mutational studies on the pH 4.2 molten globule repeated these experiments , but this time, they use urea, instead of acid, to denature the protein. They found that the mutations at the A, G, and H helical interfaces destabilized the molten globule as well as the native conformation. From their measurements, the investigators computed that packing interactions in the molten globule are about half as strong as in the native state. Kataoka and co-workers presented solution x-ray data that suggests the pH 2, trichloroacetate stabilized molten globule consists of a single hydrophobic core surrounded by disordered polypeptide chain. The evidence comes from the calculation of a distance distribution function. The trichloroacetate-stabilized molten globule at pH 2 showed a bimodal distribution, which is indicative of two different domains in this molten globule. Since this apomyoglobin contains only a single folding center, the authors attributed the second mode in the distribution to the unfolded portions of the chain. Native holomyoglobin and apomyoglobin as well as other molten globules (cytochrome c and a-lactalbumin ) possess unimodal distance distribution functions characteristic of a globular protein with a generally spherical shape in solution.

Kinetic Experiments

While the previous studies looked at stable, equilibrium intermediates, the following experiments analyzed transient, kinetic intermediates found during refolding or unfolding of the protein. Using methods such as circular dichroism or NMR coupled to stop-flow techniques to monitor the folded state of the protein, these experiments usually find a quick burst phase of folding where they cannot detect intermediates . After this initial burst, there is a slow phase while the molecule searches for its native state.

In the above section, an early kinetic intermediate of both cytochrome c and apomyoglobin contains characteristics similar to its related molten globule . Investigators have found the same in other systems. For ribonuclease A, Yamaguchi et al. found a negative change in volume as the protein went from a folded to an unfolded state by measuring the Gibbs free energy difference during pressure denaturation. Refolding of the solvent denatured protein produces two identifiable intermediates: The near-native intermediate requires a conformational change due to a proline isomerization to reach a completely folded conformation . The other intermediate occurs early in refolding and resembles a molten globule state . Studies on the volume change upon refolding and unfolding of ribonuclease A indicated that an intermediate possesses an increased volume akin to a molten globule, while NMR analysis provided evidence that an intermediate has features of a dry molten globule . Further investigation of the early intermediate corroborates results from equilibrium folding studies. Because the authors discovered the early intermediate is able to bind inhibitor; possesses hydrogen protection factors similar to the near native intermediate and has a developed b-sheet, they believed that this intermediate also contains significant tertiary structure.

Using staphyloccocal nuclease, Vidugiris et al. found that pressure denaturation formed a transition state with a positive activation volume (basically an increase in volume of the protein/water system). The authors liken this swollen intermediate to a molten globule state. In another study looking at apomyoglobin unfolding, Barrick and Baldwin describe an intermediate state with developed helices, no strong tertiary structure, and a Gibbs free energy closer to the unfolded state than the native. From these results, they conclude that side-chain packing is responsible for most of the stability of the native state. This apomyoglobin intermediate can be thought of as the initial burst state seen in much of the kinetic work , where the protein is compact and yet contains secondary structure. As discussed earlier, Uversky and Ptitsyn liken the burst intermediate to a pre-molten globule state . Elezier et al. provided a more general view of the solution structure of apomyoglobin’s folding intermediate. Their small angle x-ray scattering showed that the initial folding intermediates at 20 and 100 ms are as compact as molten globule and almost as compact as the refolded native state. In a quite recent analysis of dihydrofolate reductase refolding, Hoeltzli and Frieden monitored the resolved resonances of 6-19F-tryptophan and found direct evidence that the search for the correct residue packing causes the slow, rate limiting step of refolding.

Conclusions From Experiment

Except for fast folding small proteins, most proteins exhibit intermediates, which rules out the nucleation model. In general, secondary structure and hydrophobic collapse occur concomitantly and that in certain instances depend on each other. Experiments certainly show that collapse happens in the earlier stage of the folding pathway, while the recent work by Hoeltzli and Frieden provides proof that the slow step in folding is the search for close-packing. This point of view is supported by simulations of folding (for a review see ).

Thus, in general and barring kinetically trapped intermediates, the data supports the following folding progression. From an open chain, the protein collapses into a non-compact pre-molten globule or burst intermediate. Some secondary structures and tertiary contacts do exist at this stage, but the topology is still not well defined. Development of the general chain topology occurs in the molten globule. Residues most likely make more specific tertiary contacts than previously thought, but overall, the protein's side-chains are loosely packed. As supported by the analysis of x-ray determined protein structures as well as solution measurements , the protein attains its native conformation once the core is tightly packed. This is the rate-limiting step in folding. Examination of hinge motions and results from mutational studies supports the idea that packing can drive this last step in folding.

MODELING THE PACKING OF SIDE-CHAINS

Early Work- Defining The Problem

The difficulty of ab initio protein structure prediction originates from the enormous number of three-dimensional conformations that a chain of amino-acids can adopt. A 100 residue protein has approximately 400 degrees of freedom: each residue has two main-chain single-bond torsion angles, y and f, and on average two side-chain single-bond torsion angles, c1 and c2 (small side-chains have 1 c angle, large ones 4). Crudely assuming that a torsion angle accuracy of 10° is sufficient, each residue has 36x36 = 1296 independent (f,y) main-chain conformations, giving a main-chain combinatorial complexity of 1296100 = 10311. Making the same assumption for the two side-chain torsion angles also gives a complexity of 10311. Both conformational spaces are the same size. However, the main-chain, torsion angle errors propagate throughout the protein and are sequentially amplified. Side-chain angle errors only affect the local conformation and propagate less directly.

In 1987, Ponder & Richards pointed out that using the criterion of 'good packing' against the rigidly fixed, native main-chain rules out the majority of side-chain rotamer conformations for residues in core regions. Side-chain rotamers, which are a tabulation of frequently observed conformations, have been proposed for many years , but Ponder & Richards reduced these to a set of 67 different conformations that could account for most side-chains observed in real proteins (assuming an angle tolerance of ±20°). While enumeration of these was computationally feasible over a few neighboring residues, the task of enumerating all possibilities for each residue in a 200 residue protein was, and still is, computationally intractable. (Specifically, there are on average 3.35 rotamers per amino acid (67/20) and this gives 3.35100 @ 1053 combinations.)

One of the first attempts to actually predict the side-chain conformation given the correct conformation for the main-chain involved manual modeling . Working with the known x-ray conformation of the main-chain of flavodoxin, this test study yielded a final side-chain prediction error of 2.41 Å RMS. Nevertheless, it was realized that many large aromatic side-chains deep within the core of the protein were very badly predicted. This in turn led to an error propagation cascade throughout, causing satisfactory prediction for only 30 to 40% of the side-chain conformations.

Several investigators have performed local energy minimization of a very few residues in the field of otherwise fixed protein atoms . By restricting interest to situations where only a limited number of side-chains were replaced (e.g. by assuming that conserved residues remain in similar conformations when two sequences have very high sequence similarity), these methods effectively focused their efforts on neighboring residues. Their success suggested that, if the problem could be separated into small sets of residues that interact little with each other, the daunting combinatorics of the side-chain packing problem could be surmounted.

A Possible Solution?

In 1991, four groups working independently of each other each discovered a method that naturally broke the combinatorial problem into manageable pieces . When a protein is stripped of all its side-chains and the native main-chain is used as a rigid constraint to re-pack all the side-chain atoms, these varied methods could achieve an accuracy of 1.8 Å RMS error over all side-chain atoms.

These four methods all rely on the van der Waals energy to eliminate bad side-chain arrangements. They differ very much in how they generate possible side-chain conformations and how they choose between them. The method of Lee & Subbiah utilizes no database information whatsoever, making it the most physically-based method of the four. Side-chains are allowed to explore torsion angles in 10° intervals and simulated annealing is used to optimize the arrangement of neighboring side-chains by minimizing the van der Waals energy. Two of the methods use a set of rotamers taken from known protein conformations and optimize an energy function (which can include hydrogen-bonding and electrostatics ) using Monte Carlo minimization or a genetic algorithm . The fourth method relies more heavily on known protein structures and the surprising finding of Jones & Thirup that almost all segments of main-chain conformation recur in proteins. It uses van der Waals packing energy to select plausible segments of known protein structure, borrowing the side-chain conformation. Rather than optimize the side-chain conformations, it introduces some variability in selecting chain segments, averages atomic coordinates to enhance the signal from the common conformations and then regularizes the stereochemistry with energy refinement.

Since all these methods primarily rely only on extremely simple van der Waals packing in their energy functions, a better assay of accuracy is the predicted error in the well-buried side-chains. Considering only the half of all residues that are less solvent-exposed (< 30% surface area accessible to the solvent ) significantly improves the prediction accuracy. The only ab initio method, using simulated annealing to minimize the van der Waals energies in a finely discretized torsional space (10° for the c angles), was accurate to 1.25 Å RMS . The genetic algorithm approach that combinatorially mates rotamers selected from a 109-member rotamer database was accurate to 1.54Å RMS. The Monte Carlo energy minimization over a similar rotamer database was accurate to 1.6 Å RMS . The segment matching method was accurate to 1.37 Å RMS, in spite of its use of only the native Ca positions rather than the entire main-chain. This success by four different methods that all rely on packing to eliminate bad choices, proved the foresight of Ponder & Richards to be indeed correct.

Recent Refinements- A Classification

Over the past four years, a flood of new methods, as well as improved versions of the early ones, have been reported. The best of these, like those of Lee and Vásquez consistently break the 1 Å RMS barrier over a large set of proteins, while a few others , hover between 1 and 1.1 Å RMS error. Of the remaining recent methods, all report average errors of less than 1.45 Å RMS over a test set of 10 to 60 proteins .

The four methods discovered independently between 1991 and 1992 employ surprisingly different approaches. Classifying these and the newer methods helps highlight what is necessary for successful prediction. Methods that predict side-chain conformation from a known backbone conformation involve two steps: (1) choosing a set of possible conformations for each side-chain, and (2) choosing the conformations of each side-chain to optimize packing for a given fixed main-chain.

POSSIBLE CONFORMATIONS The set of possible conformations is either knowledge-based (taken from known three-dimensional structures of proteins) or defined by simple geometrical considerations. Most methods are knowledge-based , following the use of rotamer libraries by Ponder and Richards . Variants in both the size and content of these libraries have been attempted . The latter include some studies that use a rotamer set customized to match the local main-chain of the particular side-chain. . Others take one or a small number of fragments from known protein structures using a local fit to the main-chain to choose fragments. A few investigators disregard these database approaches, and instead vary the side-chain single bond torsion angles in 10° increments.

OPTIMIZING PACKING Most methods use some type of search strategy to find the combination of side-chain conformations that optimizes packing. Good packing is generally assumed to correspond to a favorable value of the van der Waals energy, with its strong steric repulsion and weak long range attraction, but more complicated energy terms are sometimes included . Of greater importance than the energy is the search method used to find the best combination of side-chain conformations. Simulated annealing is surprisingly effective at finding the optimal packing corresponding to side-chain arrangements found in native proteins as is the related Monte-Carlo minimization method . Genetic algorithms have also been used . More elaborate search methods have also been used, such as 'Dead End Elimination' or the A* algorithm , and these have been combined with other heuristics . More physically based methods search with molecular dynamics simulations , self consistent mean-fields and Gibbs sampling utilizing heat baths . One method , simply pastes together segments found in known proteins subject to their packing well into the growing structure.

AB INITIO METHODS Only a handful of methods that do not rely on protein-derived knowledge have worked well. One that relies on molecular dynamics 'annealing' of successively added atoms beyond the Cb atoms enjoyed some success, but has since been reported to be inferior to rotamer-based methods . A related method of annealing 'sprouted' side-chain atoms, again using molecular dynamics, has only been reported to work on small peptides . The most successful ab initio methods , mentioned above, rely on simple van der Waals energy in conjunction with complete sampling of torsion angle space.

Assessing the Accuracy

RANDOM OR WORST RMS The success of these methods must be put into context by considering the RMS expected if all side-chain conformations were (1) randomly predicted or (2) predicted as badly as possible. The random RMS was estimated to be 3.1 Å and the worst RMS to be 4 Å for a 100-member rotamer library . Later work gave similar random RMS between 3.3 Å to 3.5 Å, depending on the size of the rotamer library . Many studies have answered the opposite question of how well the best rotamer-based prediction can represent the native structure: RMS values range from about 0.5.Å for the large 624-member rotamer libraries to 1.0 Å for the original 67-member rotamer library .

EXPERIMENTAL ACCURACY The answer to the question 'What error value corresponds to an excellent prediction?' can be found in a rotamer-independent manner. It has long been known that when x-ray structures of the same protein are determined by two different laboratories or in two different crystal forms, the main-chain atoms differ by about 0.5 Å RMS . The side-chains can differ by as much as 1.5Å RMS , but for the more buried side-chains not involved in crystal contacts, the difference can be up to 1 Å . Judged against a side-chain RMS of 3.1 Å being random and a RMS of 1 Å being the best possible, the fact that automatic methods routinely achieve values as low as 1.25 Å RMS suggests that the side-chain packing problem may be solved.

TORSION ANGLES Another measure of fit is the percentage of side-chains for which the torsion angles are correctly predicted. For the buried residues, the better side-chain packing algorithms usually predict correctly (within 40°) 90% of the c1 angles and 80% of (c1,c2) angle pairs . When all residues are considered, these figures drop to 80% and 70%, respectively. The percentage correct obviously depends on the match criteria: with stricter criteria (within 20° or 30°), these values are reduced by about 10% . These predicted values must be compared with the best that can be achieved by rotamer libraries. Allowing a deviation of less than 40° from the angle derived from x-ray information, even the smaller rotamer libraries can often correctly capture the native side-chain conformations for some 95% of the c1 angles and 90% of the (c1,c2) pairs . With the stricter criterion of being within 20° of the angle from x-ray structures, these values drop to 85% and 75%, respectively . It is encouraging that for the buried side-chains, the success rates of prediction is only 10% less than the best possible with rotamer libraries.

PREDICTIVE SUCCESS In terms of claimed accuracy, the ab initio method of Lee and the rotamer-based method of Vásquez are marginally superior to all others. Lee has published predictions prior to experimental x-ray determination that have proved to be accurate. He has reported RMS errors of 0.68 to 0.89 Å in side-chains prediction for T4 lysozyme mutants , 1.11 Å on l-repressor mutants and 0.97 Å RMS on polymeric HLA alleles . While some caution should be expressed, since these predictions are only for a few buried residues, the results do suggest that the best side-chain packing methods can be useful.

Why Is This An Easy Problem?

Since it appears that the packing of side-chains can be well-predicted, some investigators have suggested the problem is not really combinatorial in that the allowed side-chain conformations depend on the local main-chain environment . Methods choosing the side-conformation based only on the local main-chain are about 20% less accurate than methods that allow full combinatorial packing . This remaining 20% in accuracy can only be obtained by considering combinatorial packing .

Main-Chain Movement

It is becoming increasingly clear that the assumption of fixed main-chain during combinatorial re-packing is not generally valid. Very recently, there have been attempts to relax this assumption. A clever method, which allows main-chain and side-chain flexibility, has been applied to the special case of repeating coiled-coil structures; it is able to predict the buried side-chains almost as accurately as when the perfect main-chain is available . Koehl and Delarue have applied the mean field approach, so successful for side-chains , to the main-chain with promising results. Wilson et al. have proposed the use of rounds of alternating side-chain packing onto a fixed main-chain and full molecular dynamics minimization. The multiple copy and mean-field approaches also appear to be particularly well-suited to allowing main-chain shifts .

GETTING TO THE END-GAME

How Close Is Close Enough?

In order to get to the end-game, one needs a backbone that is very close to native. How close is close enough? The question of whether packing optimization schemes can model side-chains accurately upon fixed, imperfect backbones has been under intense scrutiny. Many studies have considered the re-packing of correctly aligned target sequences onto fixed homologous template structures, employing the same algorithms used when the ideal backbone was provided . Recently in a more systematic study, which spanned the full range of possible sequence identities within certain protein families, Chung & Subbiah observed a monotonic decrease in buried side-chain prediction accuracy as the sequence identity diverges and backbone deviation increases. They estimate that when the template is more than 2Å RMS error from the native backbone (corresponding to ~25% sequence identity), the side-chain prediction accuracy approaches the random expectation of 3.1-3.3Å RMS .

In the absence of general methods that accommodate movable backbones in side-chain prediction, it appears that backbones within 2Å RMS of the native structure are required for accurate modeling. Backbones as accurate as these are sometimes available if the structure of a close sequence homologue is known and the two sequences have been correctly aligned. In the general case, how is it possible to obtain folded backbones that are sufficiently accurate?

Threading Methods

In one approach, known as threading or fold recognition, a new sequence is aligned upon a known three-dimensional structure, and each sequence-structure alignment is scored via an energy function. Threading has identified compatible folds that are undetectable by conventional sequence alignment methods . However, success in recognizing a related fold does not imply success in building an accurate model using the related fold as a template. The alignment of the new sequence on the known backbone has to be almost perfectly correct to get the required 2Å accuracy (adjacent residues are about 4 Å apart). Results from the threading predictions at a recent meeting ("Meeting on Critical Assessment of Techniques for Protein Structure Prediction", Asilomar, California 1994) illustrated the various shortcomings of available alignment and/or scoring methods . Moreover, even given perfect alignments, backbones generated by threading methods may not be useful if the aligned sequences show less than 30% identity . Threading specializes in finding such folds, so it is unlikely to provide acceptable backbones for standard side-chain prediction methods, even if the alignment were optimal. In any case, at the present time many proteins of interest are new folds for which there is no threading target. Hence, we do not regard threading in its current form as a viable pathway to the end-game of folding.

Ab Initio Folding

In ab initio methods, a fold for the new sequence is generated without directly using the known fold of any another protein. This is accomplished either by (1) a broad and even sampling of conformational space by an energy-independent method, followed by screening of the resulting candidate folds by an energy function or (2) minimizing the conformational energy of a polypeptide as it folds through an approximately continuous conformational space. In either case, the level of detail included in the structural representation must balance the computational tractability and geometric accuracy of the model. Lattice models can be computationally feasible, even to the point of enumerating folds exhaustively but such folds sacrifice secondary structure features and are generally less accurate than 5 Å RMS. Off-lattice discrete models, such as those possessing 6 states per residue, can reproduce the native backbone to 2 Å RMS , but generating such folds exhaustively is beyond the power of today's computers (a chain of length 100 has 6100 = 1070 folds). For minimization methods, lattice and discrete representations make the search for the energy minimum difficult because they make the energy landscape more rugged and increase the number of moves necessary to traverse the conformational space. In spite of these limitations, ab initio folding approaches have made progress, routinely achieving structures with accuracy up to ~4Å RMS error . Other methods, especially those that utilize experimental secondary structure constraints, fare even better. We will now review approaches to ab initio folding that have produced folds within 2-4Å RMS of the native structure.

Discrete-State Models And Energy Functions

A simple discrete-state model has been described by Park & Levitt . Their optimized four-state off-lattice representation is able to build backbones within 2 Å RMS from the native backbone. Even with this model, exhaustive enumeration is impossible as 4100 = 1060 is intractable. If one enforces the native secondary structure as an external constraint, this model has no more than about 200,000 folds for each protein. This is a manageable number of folds that evenly and broadly samples phase space, while providing candidates that take folding into the "end-game".

This in itself is not useful unless an energy function can successfully distinguish the native-like folds from the entire set of folds generated. This issue begs two questions: (1) can energy functions distinguish between the near-native folds and those which are grossly misfolded and (2) can the energy functions distinguish between the native structure and the near-native folds? The first question depends as much on the quality of the representation as on the effectiveness of the energy function; i.e. energy functions are useful only in the context of a representation capable of generating suitably near-native structures. The second question asks whether or not the energy function can tell a true native fold from the best near-native decoys; it thus assesses the resolution of the function and indicates whether further minimization of the function can in principle drive the conformation towards a more native-like state.

Park & Levitt , using a basis set of six energy functions, report that the native fold can be recognized very effectively if one combines energy functions that stress complementary factors, such as non-specific hydrophobicity (a general compacting force) and residue-specific pairings. Furthermore, the best of the native-like folds usually rank very highly in the energy-sorted list. On average, the best combination of energy functions place the native-like folds in the top 1 percent of the score-sorted list although there are always many grossly misfolded decoys with energies more favorable than some of the near-native folds. Therefore, if one were to apply an effective energy function as a screen of the entire decoy set (for example, by taking the top half of the energy-sorted list) the concentration of the near-native folds in the high-scoring subset would increase, but the highest-scoring folds in the subset would not all be near-native. In other words, RMS deviation and energies are not highly correlated in the RMS range explored in this study . More encouraging is that the best energy functions typically score the native fold more favorably than all the decoys, including those within 2 Å RMS.

Energy Minimization And Search Strategies

Methods that utilize energy minimization to move through phase space have shown promise in folding to near-native conformations. Recent work by Mumenthaler & Braun describes a self-correcting distance geometry method for predicting the tertiary arrangement of small, globular, helical proteins. This method, like the one by Park & Levitt , assumes that the helical segments are known in advance; only the (f,y) dihedral angles of loop residues are adjustable (though constrained to combinations that are commonly observed in the database for each residue type). First, the method predicts whether each residue is solvent-exposed ("outside") or buried ("inside"), using an algorithm that exploits multiple sequence alignment information . Upper limits for the distances between the three types of residue pairings (inside-inside, outside-outside, and inside-outside) are calculated as a function of the size of the protein. The minimization engine then applies these distance constraints in a clever algorithm that dynamically adjusts constraints over each iteration of the structure generation cycle. Thus, rather than having an energy function per se, the method relies on a "target function" that depends on the predicted constraints. The structures with the fewest constraint violations tend to cluster within 3 Å RMS of the experimentally-determined structure, though only the helical residues were included in the RMS calculation. The final predicted structure, taken as the average structure in the low-violations cluster, can be accurate to 2.3Å RMS of the native structure. Because the constraints are adjusted to the structures during the procedure, there is no path-independent energy function available for further minimization. Overall, six out of eight test proteins converged to near-native predictions (≤ 3 Å RMS error), but none were within 2 Å. Nevertheless, this method can be a useful tool for taking folding into the end-game, assuming that secondary structure prediction methods continue to improve.

A similar minimization approach was developed by Sun et al. . Like the two procedures discussed above, this method also begins with the known secondary structure elements in order to reduce the conformational space to be searched. Their conformational search engine is two-tiered and is powered by a genetic algorithm which operates on a string of paired (f,y) dihedral angles describing the conformation of the protein. First, mutation and crossover operations are performed at randomly chosen rotatable residues (i.e. those not in secondary structure). Mutations are random selections from a set of dihedral angle pairs derived from the structure database. The second step refines the search by perturbing randomly-chosen unconstrained torsion angles slightly in order to probe the local energy landscape for minima. The selection method is an energy function that models the hydrophobic interaction and is an extension of the simple hydrophobic-polar models of Dill and co-workers . The results were encouraging. Out of ten test cases, four of the lowest-energy models were within 4 Å RMS error. In two cases, the best structure that their reduced representation would allow were within 2 Å RMS error; however, none of the minimized structures actually achieved this 2 Å accuracy. Moreover, many of the native structures have energies much worse than the minimized structures, thus limiting the utility of their highly simple energy function in the end-game.

Let us summarize the strengths and shortcomings of the ab initio methods discussed above. The results of Park & Levitt suggest that an effective energy function (of which there are several) yoked with the proper search strategy can drive near-native folds towards the native fold. However, the same function cannot reliably recognize a near-native fold, even the best near-native folds, from the entire set of decoys. For near-native structure generation, the minimization method of either Mumenthaler & Braun or Sun et al. might be a better alternative. However, these methods are not fail-safe, for they do not always converge near the native structure.

In the ab initio methods discussed above, folds were generated either exhaustively or from random tertiary arrangements . As close as these methods can get to the native fold, their accuracy is hampered by the reduced complexity of the model, the energy functions that drive the folding of the chain, or both. In the next section, we address these concerns. Energy functions are challenged to recognize native folds from all-atom representations very close in conformation to the native fold.

Discriminating Native From Near-Native Conformations

A key requirement of an energy function able to drive the search towards the end-game is that the native conformation have a lower energy than the near-native conformations. Such sets of near-native conformations can be generated by deforming the experimentally determined structures using methods such as Monte Carlo (MC) and molecular dynamics (MD) simulations. Energy functions are then applied to these test sets in order to assay their discrimination power.

The method developed by Wang et al. was the first attempt at recognizing the native fold from large decoy sets of near-native and compact structures. It is based on the atomic solvation potential of Eisenberg & McLachlan , grouping atoms into 17 chemically related "molecular fragment types", each with its associated solvation parameter. These parameters were obtained by a training algorithm that maximizes the solvation energy difference between the native and a large set of compact, non-native structures generated by MC and MD simulations (; and references therein). The solvation parameters were then used to evaluate native structures of a separate test set of decoy structures generated by MC and MD. The MC-generated structures were selected to be compact (the radius of gyration did not exceed that of the native structure plus 5%) and within pre-determined RMS deviation from the native structure (up to 5 Å maximum). The MD simulations were carried out at room temperature (300 K) and high temperature (500 K); the average RMS error for the 300 K and 500 K simulations were 4.1 Å RMS and 8.0 Å RMS, respectively. Over 8,200 non-native MC and MD decoys were furnished for each of eleven test proteins, of which only seven on average were mis-recognized as native (having a more favorable energy score than the experimentally determined structure). The solvation energy roughly correlated with the RMS deviation between the native and decoy structures. All of the mis-recognized decoys, or "false positives" were structures very close to the native (< 1 Å RMS). Wang et al. also demonstrated that their method compared favorably against a battery of standard energy functions: molecular dynamics force fields, statistically-derived contact potentials, three-dimensional profile methods, knowledge-based potentials of mean force, and others .

In a related study, Huang et al. explored the ability of a very simple hydrophobic contact function to recognize near-native decoys generated by molecular dynamics simulation in solution at room (298 K) and high (498 K) temperatures. Five small proteins formed the test set. Overall, the average RMS deviations from the native structure were 1.5Å (at 298 K) and 4.1Å (at 498 K). As in the earlier studies , native structures were readily identified from the sets of decoy structures: there were only 330 false positives out of 10,000 (combined room and high temperature runs for the 5 proteins). Likewise, the energy function is strongly dependent on the extent to which the structures are deformed: only one false positive exhibited an RMS deviation > 2Å from the native structure .

What is the impact of these two studies on how the end-game is played? Both appear to be successful at identifying native folds from compact, near-native folds, a quality that other functions apparently lack . Huang et al. note that the decoy set used in their study is perhaps a more rigorous test, given the lower RMS deviations produced from MD simulations. Indeed, demonstrating that simple energy functions can discriminate native from near-native structures in this RMS range (0 - 2Å) is important. Given that ab initio methods can provide folds that are quite close to the native (around 2Å), it is important to use methods such as MC and MD simulations to probe the relationship between energy and molecular conformation within 2Å RMS from the native. However, even more challenging near-native test sets are needed to assess the true discrimination power of existing potentials. High temperature MD simulations and the MC simulations of Wang et al. compromise the integrity of the secondary structure and loosen the packing of the tertiary structure. Even the 298 K MD simulations in solvent by Huang et al. , which depart from the native by an average of only 1.5Å RMS, undergo a 2-3% increase in the radius of gyration. A function which stresses hydrophobicity (i.e. non-specific compacting force), such as the one by Huang et al. is sensitive to minute changes of this type. Corroborating evidence is seen in recent work by Levitt and co-workers, who have tested the performance of 18 energy functions on this set of MD structures . This study indicated that other energy functions emphasizing hydrophobicity also excelled at native fold discrimination.

Although RMS deviation imperfectly serves as a coordinate along the folding trajectory, it is encouraging nonetheless to confirm its strong correlation with energy functions . Although neither study attempted to minimize their respective energy functions using near-native structures as starting points, we challenge future studies to progress along these lines.

CONCLUSIONS

Proteins are close-packed both in the solid state and in solution. In fact, they are probably the most tightly packed form of organic matter. This close packing is related to function in that it provides a rigid core on which to arrange catalytic side-chains in enzymes. Loose packing is often associated with flexible hinges and conformational changes, whereas tight packing correlates with better stability. How such tight packing arises in protein folding is still unclear, although there has been enormous progress in characterizing the packing of partially folded intermediates. However it arises, this close-packing limits the number of possible arrangements of the side-chains, which has lead to methods capable of predicting side-chain packing on a known, rigid main-chain. These same methods are applicable to homology modeling provided the main-chain "borrowed" from the related structure is close enough (within 2 Å). If no homologous structure is known, other methods can sometimes generate main-chains that are almost close enough (< 3.5 Å RMS). It is crucial to have an energy function that can recognize the folds that are closer to the native structure.

Throughout the review, we have argued that packing forms a strong constraint on protein structure, severely restricting the number of possible structures. However, in the earlier stages of the folding process, particularly those relating to the formation of the overall fold, it is believed that packing is much less important. This has been borne out in experimental studies demonstrating how tolerant a fold is to many random mutations . It has also been substantiated in theoretical studies that show how surprisingly easy it is for a protein of random chosen sequence (a "random hetropolymer") to close pack in an approximate sense .

Looking to the future, there are a number of challenges. Perhaps the greatest is to understand the manner by which a protein close-packs its residues during the latter stages of folding. The early stages are generally considered to be dominated by non-specific hydrophobic interactions. Another challenge is to understand how packing affects function: if loose packing is essential for function, it should be possible to design proteins that are too stable to function as catalysts. In the area of computer simulations, we expect progress on including main-chain flexibility, deriving strongly discriminating energy functions and generating diverse sets of decoy folds. For structure prediction, packing side-chains using a near native backbone seems almost completely solved. The challenge now is to generate main-chains sufficiently close to the native backbone to allow packing algorithms to be successful. It also seems likely that designing small helical proteins will be easiest, and their detailed structure could be predicted over the next five years!

BIBLIOGRAPHY



1 Kauzmann W. 1959. Adv. Prot. Chem. 14: 1-63
2 Dill KA. 1990. Biochemistry. 29: 7133-7155
3 Kuwajima K. 1996. Faseb J. 10: 102-109
4 Levitt M. 1976. J Mol Biol. 104: 59-107.
5 Richards FM. 1974. J. Mol. Biol. 82: 1-14
6 Richards FM. 1977. Ann. Rev. Biophys. Bioeng. 6: 151-176
7 Daggett V and Levitt M. 1994. Curr. Op. Struct. Bio. 4: 291-295
8 Fersht AR. 1995. Curr. Op. Struct. Bio. 5: 79-84
9 Karplus M and Sali A. 1995. Curr. Op. Struct. Bio. 5: 58-73
10 Koehl P and Delarue M. 1996. Curr. Op. Struct. Bio. 6: 222-226
11 Vásquez M. 1996. Curr. Op. Struct. Bio. 6: 217-221
12 Richards FM. 1985. Meth. Enzym. 115: 440-464
13 Richards FM and Lim WA. 1994. Q. Rev. Biophys. 26: 423-498
14 Petitjean M. 1994. J. Comp. Chem. 15: 507-523
15 Gerstein M and Chothia C. 1996. Proc. Natl. Acad. Sci. USA. 93: 10167-10172
16 Madan B and Lee B. 1994. Biophys. Chem. 51: 279-289
17 Gerstein M, Tsai J and Levitt M. 1995. J. Mol. Bio. 249: 955-966.
18 Voronoi GF. 1908. J. Reine Angew. Math. 134: 198-287
19 Bernal JD and Finney JL. 1967. Disc. Farad. Soc. 43: 62-69
20 Chothia C. 1975. Nature. 254: 304-308
21 Finney JL. 1975. J. Mol. Biol. 96: 721-732
22 Richards FM. 1979. Carls. Res. Commun. 44: 47-63
23 Finney JL, Gellatly BJ, Golton IC and Goodfellow J. 1980. Biophys. J. 32(1): 17-33
24 Janin J and Chothia C. 1990. J. Biol. Chem. 265: 16027-16030
25 Harpaz Y, Gerstein M and Chothia C. 1994. Structure. 2: 641-649
26 Shih JP, Sheu SY and Mou CY. 1994. J. Chem. Phys. 100: 2202-2212
27 Tsai J, Gerstein M and Levitt M. 1996. J. Chem. Phys. 104: 9417-9430
28 Finney JL. 1978. J. Mol. Biol. 119: 415-441
29 David CW. 1988. Biopolymers. 27: 339-344
30 Edelsbrunner H and Mucke E. 1994. ACM Trans. Graph. 13: 43-72
31 Edelsbrunner H, Facello M, Ping F and Jie L. 1995. Proc. 28th Hawaii Int. Conf. Sys. Sci.: 256-264
32 Connolly M. 1983. J. Appl. Cryst. 16: 548-558
33 Connolly M. 1983. Science. 221: 709-713
34 Connolly M. 1986. J. Mol. Graph. 4: 3-6
35 Gregoret LM and Cohen FE. 1990. J. Mol. Biol. 211(4): 959-974
36 Rashin AA, Iofin M and Honig B. 1986. Biochemistry. 25: 3619-25
37 Tilton RF, Jr., Singh UC, Weiner SJ, Connolly ML, Kuntz ID, Jr., Kollman PA, Max N and Case DA. 1986. J. Mol. Biol. 192(2): 443-456
38 Alard P and Wodak S. 1991. J. Comp. Chem. 12: 918-922
39 Hubbard SJ and Argos P. 1994. Prot. Sci. 3(12): 2194-2206
40 Hubbard SJ, Gross KH and Argos P. 1994. Prot. Eng. 7(5): 613-626
41 Kleywegt GJ and Jones TA. 1994. Acta Cryst. D50: 178-185
42 Williams MA, Goodfellow JM and Thornton JM. 1994. Prot. Sci. 3(8): 1224-1235
43 Hubbard SJ and Argos P. 1995. Prot. Eng. 8(10): 1011-1015
44 Sreenivasan U and Axelsen PH. 1992. Biochemistry. 31: 12785-12791
45 Baker EN and Hubbard RE. 1984. Prog. Biophys. Mol. Biol. 44: 97-179
46 Matthews BW, Morton AG and Dahlquist FW. 1995. Science. 270: 1847-1849
47 Chan HS and Dill KA. 1990. Proc. Natl. Acad. Sci. USA. 87: 6388-6392
48 Chan HS and Dill KA. 1991. Annu. Rev. Biophys. & Biophys. Chem. 20: 447-490
49 Gregoret LM and Cohen FE. 1991. J. Mol. Biol. 219(1): 109-122
50 Hunt NG, Gregoret LM and Cohen FE. 1994. J. Mol. Biol. 241: 214-225
51 Franks F. 1983. Water. London: The Royal Society of Chemistry
52 Chandler D, Weeks JD and Andersen HC. 1983. Science. 220: 787-794
53 Zichi DA and Rossky PJ. 1986. J. Chem. Phys. 84: 2814-2822
54 Gerstein M and Lynden-Bell RM. 1993. J. Phys. Chem. 97: 2991-2999
55 Gerstein M and Lynden-Bell RM. 1993. J. Mol. Biol. 230: 641-650
56 Chothia C and Finkelstein AV. 1990. Ann. Rev. Biochem. 59: 1007-39
57 Jones S and Thornton J. 1996. Proc. Natl. Acad. Sci. USA. 93: 13-20
58 Shoichet BK and Kuntz ID. 1991. J. Mol. Biol. 221: 327-346
59 Cherfils J, Duquerroy S and Janin J. 1991. Prot. Struct. Func. Genet. 11: 271-80
60 Walls PH and Sternberg MJ. 1992. J. Mol. Biol. 228: 277-297
61 Cherfils J and Janin J. 1993. Curr. Opin. Struct. Biol. 3: 265-269
62 Ponder JW and Richards FM. 1987. In Evolution of Catalytic Function, Vol. LII, pp. 421-428 Cold Spring Harbor, NY, USA
63 Lesk AM and Chothia C. 1984. J. Mol. Biol. 174: 175-91
64 Gerstein M, Lesk AM and Chothia C. 1994. Biochemistry. 33: 6739-6749
65 Lawson CL, Zhang R, Schevitz RW, Otwinowski Z, Joachimiak A and Sigler PB. 1988. Proteins. 3: 18-31
66 McPhalen CA, Vincent MG, Picot D, Jansonius JN, Lesk AM and Chothia C. 1992. J. Mol. Biol. 227: 197-213
67 Frauenfelder H, Sligar SG and Wolynes PG. 1991. Science. 254: 1598-1603
68 Hubbard SJ and Argos P. 1996. J. Mol. Biol. 261: 289-300
69 Lesk AM and Chothia C. 1988. Nature. 335: 188-190
70 Segawa S and Richards FM. 1988. Biopoly. 27: 23-40
71 Gerstein M and Chothia CH. 1991. J. Mol. Biol. 220: 133-149
72 Gerstein M, Anderson BF, Norris GE, Baker EN, Lesk AM and Chothia C. 1993. J. Mol. Bio. 234: 357-372.
73 Gerstein M, Schulz G and Chothia C. 1993. J. Mol. Biol. 229: 494-501
74 Jolicoeur C, Riedl B, Desrochers D, Lemelin LL, Zamojska R and Enea O. 1986. J. Solu. Chem. 15: 109-128
75 Kharakoz DP. 1989. Biophys. Chem. 34: 5634-5642
76 Kunst ID and Kauzmann W. 1974. Adv. Prot. Chem. 28: 239-345
77 Lee JC, Gekko K and Timasheff SN. 1979. Methods Enzymol. 61: 26-49
78 Gavish B, Gratton E and Hardy CJ. 1983. Prot. Natl. Acad. Sci. USA. 80: 750-754
79 Gekko K and Hasegawa Y. 1986. Biochemistry. 25: 6563-6571
80 Kharakoz DP and Mkhitaryan AG. 1986. Mol. Bio. 20: 312-321
81 Iqball M and Verrall RE. 1987. J. Phys. Chem. 91: 1935-1941
82 Chalikian TV, Totrov M, Abagyan R and Breslauer KJ. 1996. J. Mol. Bio. 260: 588-603.
83 Baldwin EP and Matthews BW. 1994. Curr. Opin. Biotechnol. 5: 396-402
84 Hubbard SJ and Argos P. 1995. Curr. Opin. Biotechnol. 6: 375-381
85 Russell RJM and Taylor GL. 1995. Curr. Opin. Biotechnol. 6: 370-374
86 Ishikawa K, Nakamura H, Morikawa K and Kanaya S. 1993. Biochemistry. 32: 6171-6178
87 Anderson DE, Hurley JH, Nicholson H, Baase WA and Matthews BW. 1993. Prot. Sci. 2: 1285-1290
88 Lim WA, Hodel A, Sauer RT and Richard FM. 1994. Proc. Natl. Acad. Sci. USA. 91: 421-427
89 Munson M, Balasubramanian S, Fleming KG, Nagi AD, O'Brien R, Sturtevant JM and Regan L. 1996. Prot. Sci. 5: 1584-1593
90 Ramachandran S and Udgaonkar JB. 1996. Biochemistry. 35: 8776-8785.
91 Tsong TY and Baldwin RL. 1972. J. Mol. Bio. 63: 453-475
92 Wetlaufer DB. 1973. Proc. Natl. Acad. Sci. USA. 70: 697-701
93 Ptitsyn O. 1973. Dokl. Akad. Nauk SSSR. 210: 1213-1215
94 Kim PS and Baldwin RL. 1990. Annu. Rev. Biochem. 59: 631-60
95 Baldwin RL. 1996. Folding & Design. 1: R1-R8
96 Kiefhaber T. 1995. Proc. Natl. Acad. Sci. USA. 92: 9029-9033
97 Wildegger G and Kiefhaber T. submitted. 
98 Kim PS and Baldwin RL. 1982. Ann. Rev. Biochem. 51: 459-489
99 Ptitsyn OB. 1987. J. Prot. Chem. 6: 273-293
100 Creighton TE, Darby NJ and Kemmink J. 1996. FASEB J. 10
101 Baldwin RL. 1995. J. Biomol. NMR. 5: 103-109
102 Wolynes PG, Luthey-Schulten Z and Onuchic JN. 1996. Chem. & Bio. 3: 425-432
103 Jackson SE and Fersht AR. 1991. Biochemistry. 30: 10428-10435.
104 Khorasanizadeh S, Peters ID, Butt TR and Roder H. 1993. Biochemistry. 32: 7054-7063.
105 Milla ME and Sauer RT. 1994. Biochemistry. 33: 1125-1133.
106 Kuszewski J, Clore GM and Gronenborn AM. 1994. Prot. Sci. 3: 1945-1952.
107 Sosnick TR, Mayne L, Hiller R and Englander SW. 1994. Nat. Struct. Bio. 1: 149-156.
108 Kragelund BB, Robinson CV, Knudsen J, Dobson CM and Poulsen FM. 1995. Biochemistry. 34: 7217-7224.
109 Schindler T, Herrler M, Marahiel MA and Schmid FX. 1995. Nat. Struct. Bio. 2: 663-673.
110 Huang GS and Oas TG. 1995. Proc. Natl. Acad. Sci. USA. 92: 6878-6882.
111 Harrison SC and Durbin R. 1985. Proc. Natl. Acad. Sci. USA. 82: 4028-4030
112 Ohgushi M and Wada A. 1983. FEBS Lett. 164: 21-24
113 Ptitsyn O. 1996. Nat. Struct. Bio. 3: 488-490.
114 Finkelstein AV and Shakhnovich EI. 1989. Biopoly. 28: 1681-1694.
115 Kiefhaber T, Labhardt AM and Baldwin RL. 1995. Nature. 375: 513-515.
116 Fink AL. 1995. Ann. Rev. Biophys. Biomol. Struct. 24: 495-522
117 Wilson G, Ford SJ, Cooper A, Hecht L, Wen ZQ and Barron LD. 1995. J. Mol. Bio. 254: 747-760.
118 Dolgikh DA, Abuturov LV, Brazhnikov VE, Lebedev YO, Chirgadze YN and Ptitsyn OB. 1983. Dokl. Akad. Nauk SSSR. 272: 1481-1484
119 Dolgikh DA, Kolomiets AP, Bolotina IA and Ptitsyn OB. 1984. FEBS Lett. 165: 88-92
120 Uversky VN and Ptitsyn OB. 1996. J. Mol. Bio. 255: 215-228.
121 Gilmanshin RI and Ptitsyn OB. 1987. FEBS Lett. 223: 327-329
122 Kuwajima K, Yamaya H, Miwa S and Sugai S. 1987. FEBS Lett. 227: 115-118
123 Kuwajima K. 1989. Prot. Struct. Funct. Gen. 6: 87-103
124 Alexandrescu AT, Evans PA, Pitkeathly M, Baum J and Dobson CM. 1993. Biochemistry. 32: 1707-1718
125 Peng Z-Y and Kim PS. 1994. Biochemistry. 33: 2136-2141.
126 Wu LC, Peng Z and Kim PS. 1995. Nat. Struct. Bio. 2: 281-286
127 Dolgikh DA, Abaturov LV, Bolotina IA, Brazhnikov EV, Bychkova VE, Gimanshin RI, Lebedev YO, Semisotonov GV, Tiktopulo EI and Ptitsyn OB. 1985. Eur. Biophys. 13: 109-121
128 Goto Y and Nishikiori S. 1991. J. Mol. Bio. 222: 679-686
129 Foygel K, Spector S, Chatterjee S and Kahn PC. 1995. Prot. Sci. 4: 1426-1429.
130 Chalikian TV, Gindikin VS and Breslauer KJ. 1995. J. Mol. Bio. 250: 291-306.
131 Jeng MF, W. ES, Elöve GA, Wand AJ and Roder H. 1990. Biochemistry. 29: 10433-10437
132 Roder H, Elöve GA and Englander SW. 1988. Nature. 335: 700-704
133 Ochi H, Hata Y, Tanaka N, Kakudo M, Sakurai T, Aihara S and Morita Y. 1983. J. Mol. Bio. 166: 407-418
134 Marmorino JL and Pielak CJ. 1995. Biochemistry. 34: 3140-3143
135 Colón W, Elöve GA, Wakem LP, Sherman F and Roder H. 1996. Biochemistry. 35: 5538-5549.
136 Doniach S, Bascle J, Garel T and Orland H. 1995. J. Mol. Bio. 254: 960-967.
137 Goto Y and Fink AL. 1990. J. Mol. Bio. 214: 803-805
138 Nishii I, Kataoka M, Tokunaga F and Goto Y. 1994. Biochemistry. 33: 4903-4909.
139 Hughson FM, Wright PE and Baldwin RL. 1990. Science. 249: 1544-1548
140 Jennings PA and Wright PE. 1993. Science. 262: 892-896
141 Richmond TJ and Richards FM. 1978. J. Mol. Bio. 119: 537-555
142 Weaver D. 1992. Biopoly. 32: 477-490
143 Hughson FM, Barrick D and Baldwin RL. 1991. Biochemistry. 30: 4113-4118.
144 Barrick D and Baldwin RL. 1993. Prot. Sci. 2
145 Waltho JP, Feher VA, Mertuka G, Dyson HJ and Wright PE. 1993. Biochemistry. 32: 6337-6347
146 Goto Y, Takahashi N and Fink AL. 1990. Biochemistry. 29
147 Kiefhaber T and Baldwin RL. 1996. J. Mol. Bio. 252: 122-132
148 Kay MS and Baldwin RL. 1996. Nat. Struct. Bio. 3: 439-445
149 Kataoka M, Nishii I, Fujisawa T, Ueki T, Tokunaga F and Goto Y. 1995. J. Mol. Bio. 249: 215-228.
150 Yamaguchi T, Yamada H and Akasaka K. 1995. J. Mol. Bio. 250: 689-694.
151 Cook KH, Schmid FX and Baldwin RL. 1978. Proc. Natl. Acad. Sci. USA. 76: 6157-6161
152 Schmid FX and Blaschek H. 1981. Eur. J. Biochemistry. 114: 111-117
153 Schmid FX. 1983. Biochemistry. 22: 4690-4696
154 Udgaonkar JB and Baldwin RL. 1990. Proc. Natl. Acad. Sci. USA. 87: 8197-8201
155 Ybe JA and Kahn PC. 1994. Prot. Sci. 3: 638-649.
156 Tamura Y and Gekko K. 1995. Biochemistry. 34: 1878-1884.
157 Udgaonkar JB and Baldwin RL. 1995. Biochemistry. 34: 4088-4096
158 Vidugiris GJA, Markley JL and Royer CA. 1995. Biochemistry. 34: 4909-4912.
159 Barrick D and Baldwin RL. 1993. Biochemistry. 32: 3790-3796.
160 Eliezer D, Jennings PA, Wright PE, Doniach S, Hodgson KO and Tsuruta H. 1995. Science. 270: 487-488.
161 Hoeltzli SD and Frieden C. submitted. Biochemistry
162 Hinds DA and Levitt M. 1995. Trends Biotech. 13: 23-27.
163 Ponder JW and Richards FM. 1987. J. Mol. Biol. 193: 775-791
164 Janin J, Wodak S, Levitt M and Maigret B. 1978. J. Mol. Biol. 125: 357-386
165 Reid LS and Thornton JM. 1989. Prot. Struct. Funct. Genet. 5: 170-182
166 Snow ME and Amzel LM. 1986. Prot. Struct. Funct. Genet. 1: 267-279
167 Summers NL and Karplus M. 1989. J. Mol. Biol. 210: 785-812
168 Schiffer CA, Caldwell JW, Kollman PA and Stroud RM. 1990. Proteins. 8: 30-43
169 Lee C and Subbiah S. 1991. J. Mol. Biol. 217: 373-388
170 Holm L and Sander C. 1991. J. Mol. Biol. 218: 183-194
171 Tufféry P, Etchebest C, Hazout S and Lavery R. 1991. J. Biomol. Struct. Dyn. 8: 1267-1289
172 Levitt M. 1992. J Mol Biol. 226: 507-533.
173 Jones TA and S. T. 1986. EMBO J. 5: 819-822
174 Lee B and Richards FM. 1971. J. Mol. Bio. 55: 379-400
175 Holland JH. 1975. Adaptation in Natural and Artificial Systems: an Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Ann Arbor, MI.: Univ. of Michigan Press
176 Goldberg A. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning.:  Addison-Wesley, Reading, MA
177 Lee C. 1994. J. Mol. Biol. 236: 918-939
178 Vásquez M. 1995. Biopolymers. 36: 53-70
179 Wilson C, Gregoret LM and Agard DA. 1993. J. Mol. Biol. 229: 996-1006
180 Eisenmenger F, Argos P and Abagyan R. 1993. J. Mol. Biol. 231: 849-860
181 Laughton CA. 1994. J. Mol. Biol. 235: 1088-1097
182 Holm L and Sander C. 1992. Proteins. 14: 213-223
183 Koehl P and Delarue M. 1994. J. Mol. Biol. 239: 249-275
184 Desmet J, Maeyer MD, Hazes B and Lasters I. 1992. Nature. 356: 539-542
185 Dunbrack RL and Karplus M. 1993. J. Mol. Biol. 230: 543-574
186 Hwang JK and Liao WF. 1995. Prot. Eng. 8: 363-370
187 Tanimura R, Kidera A and Nakamura H. 1994. Prot. Sci. 1994: 2358-2365
188 Holm L and Sander C. 1992. J. Mol. Biol. 225: 93-105
189 Dunbrack RL and Karplus M. 1994. Nat. Struct. Biol. 1: 334-340
190 Wendoloski JJ and Salemme FR. 1992. J. Mol. Graph. 10: 124-126
191 Hellinga HW and Richards FM. 1994. Proc. Natl. Acad. Sci. USA. 91: 5803-5807
192 Cregut D, Liautard JP and Chiche L. 1994. Prot. Eng. 7: 1333-1344
193 Goldstein RF. 1994. Biophys. J. 66: 1335-1340
194 Leach AR. 1994. J. Mol. Biol. 235: 345-356
195 Harbury PB, Tidor B and Kim PS. 1995. Proc. Natl. Acad. Sci. USA. 92: 8408-8412
196 Correa PE. 1990. Prot. Struct. Funct. Genet. 7: 366-377.
197 David CW. 1993. J. Comp. Chem. 14: 715-717
198 Flores TP, Orengo CA, Moss DS and Thornton JM. 1993. Prot. Sci. 2: 1811-1826
199 Wlodawer A, Deisenhofer J and Huber R. 1987. J. Mol. Biol. 193: 145-156
200 Schrauber H, Eisenhaber F and Argos P. 1993. J. Mol. Biol. 230: 592-612
201 Lee C. 1996. Fold. & Design. 1: 1-12
202 Lee C and Levitt M. 1997.Pac. Sym. on Biocomp.,  . Hawaii, USA:  in press
203 Lee C and McConnell HM. 1995. Proc. Natl. Acad. Sci. USA. 92: 8269-8273
204 Lasters I, De Maeyer M and Desmet J. 1995. Prot. Eng. 8: 815-822
205 Koehl P and Delarue M. 1995. Nat. Struct. Biol. 2: 163-170
206 Roitberg A and Elber R. 1991. J. Chem. Phys. 95: 9277-9287
207 Zheng Q and Kyle DJ. 1994. Protein. 19: 324-329
208 Chung SY and Subbiah S. 1995. Prot. Sci. 4: 2300-2309
209 Chung SY and Subbiah S. 1996.Pac. Sym. on Biocomp.,  126-141. Hawaii, USA:  World Scientific, NJ
210 Bowie JU, Lüthy R and Eisenberg D. 1991. Science. 253: 164-170
211 Jones DT, Taylor WR and Thornton JM. 1992. Nature. 358: 86-89
212 Lemer CM-R, Rooman MJ and Wodak SJ. 1995. Prot. Struct. Funct. Genet. 23: 337-355
213 Hinds DA and Levitt M. 1994. J. Mol. Biol. 243: 668-682
214 Rooman MJ, Kocher JPA and Wodak SJ. 1991. J Mol Biol. 221: 961-980
215 Wilson C and Doniach S. 1989. Prot. Struct. Funct. Genet. 6: 193-209
216 Covell DG. 1992. Prot. Struct. Funct. Genet. 14: 409-420
217 Bowie JU and Eisenberg D. 1994. Proc. Natl. Acad. Sci. USA. 91: 4436-4440
218 Covell DG. 1994. J. Mol. Bio. 235: 1032-1043
219 Dandekar T and Argos P. 1994. J. Mol. Bio. 236: 844-861
220 Kolinski A and Skolnick J. 1994. Prot. Struct. Funct. Genet. 18: 338-352
221 Vieth M, Kolinski A, Brooks CLI and Skolnick J. 1994. J. Mol. Bio. 237: 361-367
222 Wallqvist A and Ullner M. 1994. Prot. Struct. Funct. Genet. 18: 267-280
223 Monge A, Lathrop EJP, Gunn JR, Shenkin PS and Friesner RA. 1995. J. Mol. Bio. 247: 995-1012
224 Srinivasan R and Rose GD. 1995. Prot. Struct. Funct. Genet. 22: 81-99
225 Vieth M, Kolinski A, Brooks CLI and Skolnick J. 1995. J. Mol. Bio. 251: 448-467
226 Rose GD and Srinivasan R. 1996. Biophysical Journal. 70: A378
227 Yue K and Dill KA. 1996. Prot. Sci. 5: 254-261
228 Park BH and Levitt M. 1995. J. Mol. Biol. 249: 493-507
229 Park B and Levitt M. 1996. J. Mol. Biol. 258: 367-392
230 Mumenthaler C and Braun W. 1995. Prot. Sci. 4: 863-871
231 Hänggi G and Braun A. 1994. FEBS Lett. 344: 147-153
232 Sun S, Thomas PD and Dill KA. 1995. Prot. Eng. 8: 769-778
233 Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD and Chan HS. 1995. Prot. Sci. 4: 561-602
234 Wang Y, Zhang H, Li W and Scott RA. 1995. Proc. Natl. Acad. Sci. USA. 92: 709-713
235 Wang Y, Zhang H and Scott RA. 1995. Prot. Sci. 4: 1402-1411
236 Eisenberg D and McLachlan Ad. 1986. Nature. 319: 199-203
237 Weiner SJ, Kollman PA, Case DA, Singh UC, Ghio C, Alagona G, Profeta S and Weiner P. 1984. J. Am. Chem. Soc. 106: 765-784
238 Miyazawa S and Jernigan RL. 1985. Macromolecules. 18: 534-552
239 Hendlich M, Lackner P, Weitckus S, Floeckner H, Froschauer R, Gottsbacher K, Casari G and Sippl MJ. 1990. J. Mol. Biol. 216: 167-180
240 Godzik A and Skolnick J. 1992. Proc. Natl. Acad. Sci. USA. 89: 98-102
241 Maiorov VN and Crippen GM. 1992. J. Mol. Biol. 227: 876-888
242 Ouzounis C, Sander C, Scharf M and Schneider R. 1993. J. Mol. Biol. 232: 805-825
243 Huang ES, Subbiah S, Tsai J and Levitt M. 1996. J. Mol. Biol. 257: 716-725
244 Huang ES, Subbiah S and Levitt M. 1995. J. Mol. Biol. 252: 709-720
245 Park BH, Huang ES and Levitt M. 1996. J. Mol. Bio. in press
246 Lim WA and Sauer RT. 1989. Nature. 339
247 West MW and Hecht MH. 1995. Prot. Sci. 4: 2032-2039
248 Sosnick TR, Jackson S, Wilk RR, Englander SW and De Grado WF. 1996. Proteins Structure Function And Genetics. 24: 427-432
249 Finkelstein AV and Ptitsyn OB. 1987. Prog. Biophys. Mol. Biol. 50: 171-190
250 Gerstein M, Sonnhammer D and Chothia C. 1994. J. Mol. Biol. 236: 1067-1078
251 Kapp OH, Moens L, Vanfleteren J, Trotman CNA, Suzuki T and Vinogradov SN. 1995. Prot. Sci. 4: 2179-2190