Nucl. Acids. Res. -- Qian et al. 29 (8): 1750

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information

Jiang Qian, Brad Stenger, Cyrus A. Wilson, Jimmy Lin, Ronald Jansen, Sarah A. Teichmann¹, Jong Park², Werner G. Krebs, Haiyuan Yu, Vadim Alexandrov, Nathaniel Echols and Mark Gerstein^*

Department of Molecular Biophysics and Biochemistry, Yale University, PO Box 208114, New Haven, CT 06520, USA, ¹Department of Biochemistry and Molecular Biology, University College London, Darwin Building, Gower St, London WC1E 6BT, UK and ²European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Received November 15, 2000; Revised and Accepted February 27, 2001.

ABSTRACT

TOP
ABSTRACT
INTRODUCTION
ATTRIBUTES THAT CAN BE...
RANKING ALL THE FOLDS...
POWER-LAW BEHAVIOR OF MANY...
TRADITIONAL SINGLE-STRUCTURE...
DISCUSSION
REFERENCES

As the number of protein folds is quite limited, a mode of analysisthat will be increasingly common in the future, especially withthe advent of structural genomics, is to survey and re-surveythe finite parts list of folds from an expanding number of perspectives.We have developed a new resource, called PartsList, that letsone dynamically perform these comparative fold surveys. It isavailable on the web at http://bioinfo.mbb.yale.edu/partslistand http://www.partslist.org. The system is based on the existingfold classifications and functions as a form of companion annotationfor them, providing ‘global views’ of many alreadycompleted fold surveys. The central idea in the system is thatof comparison through ranking; PartsList will rank the approximately420 folds based on more than 180 attributes. These include:(i) occurrence in a number of completely sequenced genomes (e.g. itwill show the most common folds in the worm versus yeast); (ii)occurrence in the structure databank (e.g. most common foldsin the PDB); (iii) both absolute and relative gene expressioninformation (e.g. most changing folds in expression over thecell cycle); (iv) protein–protein interactions, basedon experimental data in yeast and comprehensive PDB surveys(e.g. most interacting fold); (v) sensitivity to inserted transposons;(vi) the number of functions associated with the fold (e.g.most multi-functional folds); (vii) amino acid composition (e.g.most Cys-rich folds); (viii) protein motions (e.g. most mobilefolds); and (ix) the level of similarity based on a comprehensiveset of structural alignments (e.g. most structurally variablefolds). The integration of whole-genome expression and protein–proteininteraction data with structural information is a particularlynovel feature of our system. We provide three ways of visualizingthe rankings: a profiler emphasizing the progression of highand low ranks across many pre-selected attributes, a dynamiccomparer for custom comparisons and a numerical rankings correlator.These allow one to directly compare very different attributesof a fold (e.g. expression level, genome occurrence andmaximum motion) in the uniform numerical format of ranks. Thisuniform framework, in turn, highlights the way that the frequencyof many of the attributes falls off with approximate power-lawbehavior (i.e. according to V^–b, for attribute valueV and constant exponent b), with a few folds having large valuesand most having small values.

	ABSTRACT

INTRODUCTION

TOP
ABSTRACT
INTRODUCTION
ATTRIBUTES THAT CAN BE...
RANKING ALL THE FOLDS...
POWER-LAW BEHAVIOR OF MANY...
TRADITIONAL SINGLE-STRUCTURE...
DISCUSSION
REFERENCES

Protein folds can be considered the most basic molecular parts.There are a very limited number of them in biology. Currently,about 500 are known, and it is believed that there may be no morethan a few thousand in total (1–3). This number is considerablyless than the number of genes in complex, multicellular organisms(>10 000 for multicellular organisms; 4). Consequently, foldsprovide a valuable way of simplifying and making manageablecomplex genomic information. In addition, folds are useful forstudying the relationships between evolutionarily distant organismssince, in making comparisons, structure is more conserved thansequence or function.

	INTRODUCTION

In a general sense, how should one approach the analysis ofmolecular parts? A simple analogy to mechanical parts may beuseful in this regard. Given the ‘parts’ from anumber of devices (e.g. a car, a bicycle, and a plane) one mightlike to know which ones are shared by all and which are unique(say, wings for a plane). Furthermore, one might want to knowwhich are common, generic parts and which are more specialized.Finally, one might like to organize the parts by a number ofstandardized attributes (e.g. the most flexible parts, the partswith the most functions, and the biggest parts). PartsList aimsto provide answers to simple questions such as these for thedomain of protein folds.

Properties related to protein folds can be divided into thosethat are ‘intrinsic’ versus ‘extrinsic’.Intrinsic information concerns an individual fold itself, e.g.its sequence, 3D structure and function, while ‘extrinsic’information relates to a fold in the context of all other folds,e.g. its occurrence in many genomes and expression level inrelation to that for other folds. Web-based search tools alreadyprovide intrinsic information about protein structures in theform of reports about individual structures. Valuable examplesinclude the PDB Structure Explorer (5), PDBsum (6) and the MMDB(7). However, current resources lack the ability to fully presentextrinsic information.

Likewise, while there are many databases storing informationrelated to individual organisms (e.g. SGD, MIPS and FlyBase;8–10), comparative genomics (PEDANT and COGs; 9,11), geneexpression (GEO, the Gene Expression Omnibus at the NCBI, andExpressDB; 12) and protein–protein interactions (DIP andBIND; 13,14), none of these integrates gene sequences, proteininteractions, expression levels and other attributes with structure.(However, it should be mentioned that the Sacc3D module of SGDand PEDANT do tabulate the occurrence of folds in genomes.)

PartsList is arranged somewhat differently from most otherbiological resources. In a usual database (e.g. GenBank; 15)the number of entries increases as the database develops, whileeach entry has a fairly fixed number of attributes to describeit. In contrast, PartsList is envisioned to have a relativelystable number of entries, i.e. the finite list of protein folds,while the attributes that describe each entry are expected toincrease considerably. In the current version of PartsList theproperties for a protein fold include: amino acid composition,alignment information, fold occurrences in various genomes,statistics related to motions, absolute expression levels ofyeast in different experiments, relative expression ratios foryeast, worm and Escherichia coli in various conditions, informationon protein–protein interactions (based on whole-genomeyeast interaction data and databank surveys) and sensitivityof the genes associated with the fold to inserted transposons.

One reason to build the database is to compare protein foldsin a rich context and in a unified way. This was achieved throughranking. This allows users to directly compare very differentattributes of a fold in a uniform numerical format. The rankingscan be visualized in three ways: a profiler emphasizing theprogression of high and low ranks across many pre-selected attributes,a rankings comparer for custom comparisons and a numerical rankingscorrelator. This can help users gain insight into the functionsof protein folds in the context of the whole genome. Our systemmakes it very easy to answer questions like: ‘What isthe most common fold in the worm as compared to E.coli?’,‘What is the most highly expressed fold in yeast and howdoes this compare to the fold that changes most in expressionlevel during the cell-cycle?’ and ‘Which fold hasthe most protein–protein interactions in the PDB and isit highly ranked in terms of protein motions?’

One of the strengths of the uniform numerical system of ranksin PartsList is that it puts everything into a common frameworkso that one can see hidden similarities in the occurrence ofparts ordered according to many different attributes. In particular,as we describe below, we found that the frequency of many ofthe attributes falls off according to a power-law distribution(i.e. according to V^–b, for attribute value V and a constantb), with a few folds having large attribute values and mosthaving small values. For instance, there are only a few foldsthat occur many times in the yeast genome and most only occuronce or twice. Likewise, most folds only have a few functionsassociated with them, but there are a few ‘Swiss-army-knife’folds that are associated with many distinct functions. Similarpower-law-like expressions have been found to apply in a varietyof other situations relating to proteins, for instance, in theoccurrence of oligo-peptide words (16–18), in the frequencyof transmembrane helices (19) and sequence families with givensize (20), and in the structure of biological networks, witha few nodes having many connections and most have only a few(21,22).

PartsList is built on top of the Structural Classificationof Proteins (SCOP) (23) fold classification and acts as an accompanyingannotation to this system. SCOP is divided into a hierarchyof five levels: class, fold, superfamily, family and protein.The ‘parts’ in our system can be either SCOP foldsor superfamilies. However, sometimes for ease of expressionwe will just refer to ‘folds’ when we really mean‘folds and/or superfamilies’. We currently use 420folds and 610 superfamilies in PartsList. Each is representedby a representative domain, which is also the key for each entryof protein fold.

While we chose to use the SCOP classification, we could equallywell have based the system on the other existing fold classifications,e.g. CATH (24), FSSP (25) or VAST (26,27). Moreover, for mostattributes, we could also have developed our system around non-structuralclassifications of protein parts, e.g. Pfam (28), Blocks (29)or SMART (30). However, basing it around actual structural foldshas the advantage that each part is more precisely and physicallydefined.

ATTRIBUTES THAT CAN BE RANKED: INFORMATION IN THE SYSTEM

TOP
ABSTRACT
INTRODUCTION
ATTRIBUTES THAT CAN BE...
RANKING ALL THE FOLDS...
POWER-LAW BEHAVIOR OF MANY...
TRADITIONAL SINGLE-STRUCTURE...
DISCUSSION
REFERENCES

Currently the attributes for each entry (i.e. protein fold)can be separated into several main categories: statistical informationfrom a comprehensive set of structural alignments, amino acidcomposition information, fold occurrences in various genomes,expression levels in different experiments, protein interactions,macromolecular motion, transposon sensitivity and miscellaneous.

	ATTRIBUTES THAT CAN BE RANKED: INFORMATION IN THE SYSTEM

We have developed a formalism for expressing each of the attributes,which is described in Table 1. In the table, the term PART refersto either fold or superfamily, depending on which of these isbeing ranked. Essentially, we have a database of attributeswhere each attribute is given a standardized description andassociated with a precise reference. In the following, we describesome main categories of attributes.

View this table:
[in this window]
[in a new window]

Table 1. All the attributes ranked by PartsList

Genome occurrence
The data in this category reveal fold occurrences in 20 differentgenomes, including four archaea, two eukaryotes and 14 bacteria(additional details online).

The data were obtained in the following fashion: Once a libraryof folds has been constructed, representative sequences canbe extracted (31). Then one can use these to search genomesby comparing each representative sequence against the genomesusing the standard pairwise comparison programs, FASTA (32)and BLAST (33) and well-established thresholds (34).

Alternatively, one can build up profiles by running each representativesequence against PDB with PSI-BLAST and then comparing theseprofiles against each of the genomes. This latter procedureis more sensitive than pairwise comparison and relatively efficientonce the profiles are made up. However, in doing large-scalesurveys one has to be conscious of the potential biases introduceddue to the profiles being more sensitive for larger families,which often results in the big families getting even bigger.

After the structure assignment, it becomes easy to enumeratehow often a fold or structure feature occurs in a given genomeor organism. Detailed information can be found in previous reports(H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35,36).This pools assignments from previous work (37,38).

Alignment
Number of structures. We did a comprehensive set of structuralalignments of structures in the PDB structure databank (39–41).The number of structures and aligned pairs used in these comparisons,which are based around Astral (31), give approximate measuresof the occurrence of folds in the PDB. Comparison of these valuesto those for genome occurrence provides a measure of how biasedthe composition of the PDB is (42).

Sequence diversity. The scores from the alignments indicatethe sequence diversity between the related structures withinfolds or superfamilies, in terms of percentage sequence identityand a sequence-based P value. P values are useful measures ofstatistical significance of the similarity calculation. A Pvalue is the probability that one can obtain the same or betteralignment score from a randomly composed alignment. A smallerP value is less likely to have been obtained by chance thana larger P value. Large P values close to 1.0 indicate thatthe similarity is characteristically random and thus insignificant.

Structural diversity. We also give analogous measures of thediversity of the structures with a given fold, allowing oneto rank folds by their degree of variability. We tabulate untrimmedand trimmed RMS, along with the structural P value. RMS,root-mean-squared deviation in ${alpha}$ carbon positions, has been thetraditional statistic that gauges the divergence between tworelated structures. Smaller RMS scores indicate more closelyrelated structures. However, sometimes a few ill-fitting atomsmay significantly increase the RMS of structures known to besimilar. To compensate for this we also report a ‘trimmed’RMS for a conserved core structure, which is based on the betterfitting half of the aligned ${alpha}$ carbons, and structural P value,which compensates for other effects such as structure size.For details, see Wilson et al. (39).

Composition
This allows us to see which folds are most biased in compositionof particular amino acids. We use various levels of the Astralclustering of the SCOP sequences to arrive at the composition(31).

Expression
Three techniques are frequently used to obtain genome-wide geneexpression data. They are Affymetrix oligonucleotide gene chips,Serial Analysis of Gene Expression (SAGE) and cDNA microarrays(43–45). SAGE and, to some degree, gene chips measurethe absolute expression levels (in units of mRNA transcriptsper cell), while microarrays are used to obtain the expressionlevel changes of a given open reading frame (ORF) as the ratioto a reference state.

A main motivation for expression experiments is often to studyprotein function and to characterize the functions of unannotatedgenes. However, this does not preclude relating other attributesof proteins, such as their structure, to expression data. Forinstance, it may be that highly expressed protein folds sharea number of characteristics, such as a particularly stable architectureor a composition biased in a certain way. Relating expressionand structure involved matching the PDB structure database againstthe genome and then summing the expression levels of all ORFscontaining the same fold. However, if one is trying to findgenes expressed in a particular metabolic state, PartsList isnot the right place to look.

Absolute. The absolute expression level data gives a good representationof highly expressed genes. All the experiments currently indexedby PartsList are for yeast. For each experiment, in additionto ranking based on the average expression level for a fold,we also consider the composition in the transcriptome and theenrichment of this value relative to its composition in thegenome. Transcriptome composition is the fractional compositionof a fold (relative to that for other folds) in the mRNA population.In other words, it is the composition of a fold in the genomeweighted by the expression levels of each of the genes. Theenrichment is the relative change between the composition ofa fold in the genome and the transcriptome. Further detailsare provided in previous reports (46,47). We report values forexperiments from a number of different labs (43,48–50)and a single reference set that merges and scales all the expressionsets together.

Ratio. The expression ratio data shows the most actively changinggenes over a period of time (e.g. cell cycle) or based on achange in states (e.g. healthy versus diseased). Source datafor expression ratios are the fluctuations in expression ofa certain fold over a period of time (e.g. the cell cycle).These are measured in terms of standard deviations for a particularfold, which is calculated from the average of the expressionratio standard deviations for each gene that matches the foldstructure.

Interactions
Information on protein–protein interactions is derivedfrom surveys of the contacts in the PDB and the experimentsin yeast.

PDB. To determine which domains interact with one another inthe PDB entries indexed by SCOP (9580 at the time of the analysis),the coordinates of each domain were parsed to check whetherthere are five or more contacts within 5 Å to anotherdomain, as described by Park et al. (51). The distance of 5Å was chosen, as this is a conservative threshold forinteraction between two atoms, where the atoms are either C ${alpha}$ sor atoms in side chains. The five-contact threshold was chosento make sure the contact between the domains was reasonablyextensive. (In fact, the number of domains identified as contactingeach other hardly changed for thresholds between one and 10contacts and 3 and 6 Å distances.)

Yeast. The interactions between structural domains in the yeastgenome were obtained by assigning protein structures to theyeast proteins using PSI-BLAST and PDB-ISL as described by Teichmannet al. (52,53). Assigned structural domains contained withinthe same ORF that were adjacent within 30 amino acids wereassumed to interact. (This is generally true of the domainsin the PDB, with a few exceptions, such as domains in transcriptionfactors like adjacent zinc fingers or variable and constantimmunoglobulin domains.) To derive intermolecular interactionsin the yeast genome we combined three sets of protein–proteininteractions: (i) the MIPS web pages on complexes and pairwiseinteractions (February 2000) (9), (ii) the global yeast two-hybridexperiments by Uetz et al. (54) and (iii) large-scale yeasttwo-hybrid experiments by Ito et al. (55). Out of all thesepairwise interactions known for yeast ORFs, there is a limitedset in which both partners are completely covered by one structuraldomain (to within 100 residues). This set of protein pairs wasused to derive a further set of domain contacts in the yeastgenome as described by Park et al. (51).

Motions
Information on motions is from the Macromolecular Motions Database(56,57). We consider a set of approximately 4400 motions automaticallyidentified by examining the PDB and a smaller, manually curatedset of motions. For each fold we determine the number of entriesin the motions database that are associated with it. Then, overthis set of motions we either average or take the maximum valueof a number of relevant statistics describing the motion, i.e.the maximum C ${alpha}$ displacement in the motion, the overall rotationof the motion and the energy difference between the start andendpoints of structures involved in the motion.

Transposon sensitivity
Ross-Macdonald et al. (58) developed a procedure for randomlyinserting transposons throughout the yeast genome. They investigatedthe phenotypes resulting from each insertion in 20 differentgrowth conditions in comparison to wild-type growth. The experimentfor each insertion in each condition was repeated several times.If the observed phenotype of the mutant deviates from the averagewild-type phenotype, this could be either because of a realeffect of the mutation on the cell or it could just be a typicalvariation of the phenotype of wild-type cells. We developeda P value score that measures the degree of confidence thatthe observed phenotype results from randomly changing wild-typecells. The negative logarithm of this P value rises with thesignificance of the phenotype measurements and can be understoodas the sensitivity of the cell to mutations in a particulargene. We calculated a value for the transposon sensitivity forprotein folds by geometrically averaging the P values of theassociated genes.

Miscellaneous
The miscellaneous section includes any information that doesnot fit into a major category. It includes: number of pseudogenesin worm associated with a fold (59), total number of functionsand number of enzymatic functions associated with a fold (60),the average length of the sequence, and the year the domainstructure was originally determined.

Errors
The above data, of course, have systematic and statistical errors.For some attributes we expect considerably smaller errors thanothers. For instance, we expect the numbers related to the sequencecomposition of different folds (e.g. the Ala composition) tobe particularly accurate, since the only factors affecting theseare errors in the underlying sequence of the protein and inthe SCOP fold classification itself. In contrast, there is aconsiderable known rate of false positives associated with theglobal protein interaction experiments using the two-hybridmethod (54,61), and this suggests statistics based on yeastinteractions may be somewhat less accurate. Furthermore, theprecise values for the rankings in PartsList are also contingenton the evolving contents of various databanks. Thus, over timeas more structures are determined, one should expect statisticssuch as the most common folds in a particular genome to changesomewhat. A very detailed discussion of the expected errorsin the various quantities in PartsList is available on the webfrom the help section.

RANKING ALL THE FOLDS BASED ON EXTRINSIC INFORMATION

TOP
ABSTRACT
INTRODUCTION
ATTRIBUTES THAT CAN BE...
RANKING ALL THE FOLDS...
POWER-LAW BEHAVIOR OF MANY...
TRADITIONAL SINGLE-STRUCTURE...
DISCUSSION
REFERENCES

The PartsList resource facilitates exploring extrinsic informationby dynamically ranking protein folds in different contexts,such as genome and expression levels. We provide three toolsfor visualizing the rankings: Comparer, Correlator and Profiler.The overall structure of PartsList is schematically shown inFigure 1.

	RANKING ALL THE FOLDS BASED ON EXTRINSIC INFORMATION

View larger version (55K):
[in this window]
[in a new window]

Figure 1. The overall structure of PartsList. Three tools (Profiler, Comparer and Correlator) provide an easy way to access and manipulate the display of the dataset. With these tools, users can isolate interesting folds and obtain fold reports about them. Further clicks take one to PDB report, which gives detailed information about an individual structural domain, including its genome occurrence, alignment information, molecular motions, functional annotation, interactions and core structure.

Comparer
The motivation behind Comparer is to allow one to rank foldsaccording to a given attribute and then see the ranks associatedwith other attributes. The ranking attribute and the additionalattributes are selected by the user. Figure 2A shows an example.The most common folds in E.coli are shown alongside three otherattributes: fold occurrence in yeast, fluctuation in expressionlevel during the yeast cell cycle and fluctuation in expressionlevel in E.coli during heat shock. Which displayed attributeis used to rank the folds can be easily changed; in the examplein Figure 2A the report can be re-sorted based on the otherthree attributes by clicking on arrows.

View larger version (64K):
[in this window]
[in a new window]

Figure 2. Sample displays. (A) A sample Comparer display: the four selected attributes are the fold genome occurrence in yeast, the analogous quantity for E.coli, fluctuation of expression level for CDC28 synchronized yeast cell during the cell cycle, and the corresponding values for E.coli to heat shock. (Using the nomenclature in Table 1 these quantities are G(scer), G(ecol), F(cdc28) and F(heatec).) The folds are ranked in terms of fold occurrence in E.coli and the most common fold here is the TIM-barrel (represented by the SCOP domain d1aj2__). If one clicks the ‘Display ranks’ button, the values in the cells will be replaced by the ranks in their respective columns. By clicking the ‘re-rank’ arrows, one can also obtain other views by sorting on other attributes. (B) Shows the occurrences of folds in 20 genomes in Profiler. (C) Shows the correlation between the fold occurrences in the A.fulgidus and S.cerevisiae genomes [G(aful) and G(scer)]. Both linear and rank correlation coefficients are calculated. The linear correlation coefficient is defined as: R = [1/(N–1)]X·Y, where X and Y are two vectors with N elements. Each element of the X vector is normalized thus: X_i = (X_i' – X)/ ${sigma}$ _x, where X and ${sigma}$ _x are the average and standard deviation of the values of the original data vector X', respectively. Y is normalized in a similar fashion. For two perfectly correlated datasets, R = 1, while for two completely uncorrelated datasets, R = 0. If we replace X_i by its rank among all the other X_i in the sample (i.e., 1,2,3 ... N), then we get the rank correlation coefficient. A scatter plot is also shown to help in visualizing this correlation.

Profiler
In principle, Profiler presents the same information as Comparer.However, it shows the progressing pattern for several pre-selectedcategories and is intended to give people an easy-to-use interfacethat gives some simple views of the data. Figure 2B shows anexample that highlights the phylogenetic pattern of fold occurrencein 20 genomes.

Correlator
Correlator uses linear and rank correlation coefficients tomeasure the association between two selected attributes. Thedifference between these two types of correlation coefficientsis that the former relates to the actual values while the latterrelates to the ranks among the samples. The interpretation ofthe linear correlation coefficient can be completely meaninglessif the joint probability distribution of the variables is toodifferent from a binormal distribution. This is the reason forintroducing the rank correlation coefficient. Correlator providesboth coefficients for the selected quantities. In most cases,they are close. For example, the linear correlation coefficientand rank correlation coefficient for fold occurrence in genomesArchaeoglobus fulgidus and Methanococcus jannaschii (Aful andMjan) are 0.88 and 0.77, respectively, while the correspondingcoefficients for fold occurrence in A.fulgidus and Saccharomycescerevisiae (Scer) are 0.52 and 0.48, respectively. This is notsurprising, as the first two genomes are both Archaeal, whilein the second comparison one genome belongs to Archaea (Aful)and another to Eucarya (Scer). As one would expect, the foldoccurrences for the more closely related genomes have a highercorrelation.

In addition to the coefficients, Correlator displays a scatterplot to aid in visualizing the correlation between the selectedfold attributes. Figure 2C shows the scatter plot for the secondexample above: the correlation between occurrences in the A.fulgidusand S.cerevisiae genomes. One can easily observe that some foldsappear frequently in Scer but seldom or never in A.fulgidus.By clicking on a point on the plot, one obtains detailed informationabout the corresponding fold. This kind of plot can reveal interestingfolds with certain relationships between attributes even thoughin some cases the overall correlation coefficients between thetwo attributes are almost zero (i.e. no correlation).

POWER-LAW BEHAVIOR OF MANY DISPARATE ATTRIBUTES

TOP
ABSTRACT
INTRODUCTION
ATTRIBUTES THAT CAN BE...
RANKING ALL THE FOLDS...
POWER-LAW BEHAVIOR OF MANY...
TRADITIONAL SINGLE-STRUCTURE...
DISCUSSION
REFERENCES

Going back and forth between Correlator and Comparer allowsone to see interesting relationships between disparate attributesof proteins. Figure 3 illustrates a comparison of two attributes,functions and interactions. It shows a ranking of the foldsthat have the most interactions in the PDB in comparison tothose that have the most functions. It is immediately apparentthat there are only a few folds with large values of eitherattribute, i.e. many functions or interactions. Moreover, themost multi-functional folds also have the most distinct interactionswith other folds, suggesting that a few a folds may functionas general-purpose parts.

	POWER-LAW BEHAVIOR OF MANY DISPARATE ATTRIBUTES

View larger version (69K):
[in this window]
[in a new window]

Figure 3. The relation between the number of functions associated with a protein fold and the number of distinct protein–protein interactions it has (based on a survey of the PDB databank). These are X(func) and I(pdball,none) using the nomenclature in Table 1. This relationship can be displayed both in Comparer (left) and Correlator (right).

In fact, the uniform system of ranks in PartsList shows that‘only a few folds having large values for an attribute’is a generally true statement for many of the disparate attributescatalogued by the system. Moreover, the falloff from high tolow values for a given attribute often follows a power-law distribution.That is, the normalized frequency F that a number of distinctfolds have a particular attribute value V follows a functionalform like:

F(V) = aV^–b

where a and b are constants. Note that F(V) is just the numberof folds with an attribute value V divided by the total numberof folds and that on a log–log plot this function becomesa straight line with slope –b. Often the attribute valueV itself reflects the ‘occurrence’ of a fold ina particular context, e.g. V could be the numberof times a given fold occurs in a particular genome. Quantitiesthat follow a power-law-like behavior are often said to havea form like that of Zipf’s law, which often occurs inthe analysis of word frequency in documents (62).

Thus far, this general conclusion is described in languagesufficiently abstract to accommodate the many different typesof attributes in PartsList. A few concrete examples will makethe conclusion clearer. For instance, we find that in genomesmost folds occur only once while there are only a very few foldsthat occur many times. An illustration is shown in the upperpanel of Figure 5 for E.coli. The x-axis is the number of timesa particular fold occurs in the E.coli genome and the y-axisshows the number of distinct folds that have same occurrence.(This is normalized by dividing by the total number of foldsso that the maximum value on y-axis is 100%.) From the log–logformat of the plot, one can immediately see that the falloffobeys a power-law, with a few folds occurring many times andmost only once or twice. The middle panel shows other attributesthat display similar power-law-like behavior, including expressionlevel in yeast, number of functions associated with a fold,and number of protein–protein interactions found in thePDB. Of course, not all attributes follow a power-law. The lowerpanel shows two of these less typical attributes: Asp compositionin a fold and average number of residues involved in a motion.

View larger version (25K):
[in this window]
[in a new window]

Figure 5. Some novel relationships that are highlighted by the PartsList system. (Upper panel) The occurrence of folds in the E.coli genome plotted on a log–log scale, i.e. G(ecol) using the nomenclature in Table 1. The x-axis is the fold occurrence in the genome, while the y-axis is the number of folds with a particular occurrence. The fit of the points to a straight line shows that the falloff obeys a power-law with constants a = 0.35 and b = 1.3 (see text). (Middle panel) Other attributes that also follow power-law behavior: the average expression level according to our merged and scaled set [L(ref) with a = 0.3 and b = 1.2), the number of protein–protein interactions [I(pdball,none) with a = 0.52 and b = 1.6], and the number of functions [X(func) with a = 0.76 and b = 2.5]. (Lower panel) Some attributes that do not follow power-law behavior: the Asp composition of the fold [B(Ala,pdb100)] and the number of mobile residues during a motion [M(nresidue,auto)]. The fold occurrence in E.coli is plotted as a reference.

One of the strengths of the uniform numerical system of ranksin PartsList is that it puts everything into a common frameworkso that one can see similarities across disparate attributes.We believe it would be difficult to see a common power-law behaviorfor many aspects of protein structure without PartsList.

TRADITIONAL SINGLE-STRUCTURE REPORTS

TOP
ABSTRACT
INTRODUCTION
ATTRIBUTES THAT CAN BE...
RANKING ALL THE FOLDS...
POWER-LAW BEHAVIOR OF MANY...
TRADITIONAL SINGLE-STRUCTURE...
DISCUSSION
REFERENCES

In addition to the tools that compare and relate the extrinsicproperties of protein folds, we provide traditional reportsthat are more focused on an individual structure.

	TRADITIONAL SINGLE-STRUCTURE REPORTS

Occurrence report. This allows users to see the number of timesthat a fold corresponding to the queried protein structure occursin various genomes. This gives a phylogenetic profile of theoccurrence of a particular fold in 20 genomes, similar in spiritto the fold patterns discussed earlier (19).

Function report. This summarizes the functional classificationof the queried PDB structure. It merges a number of functionalclassifications, including FlyBase (10), ENZYME (63), GenProtEC(64) and MIPS (9). Our approach to functional classificationis described in a number of previous publications (e.g. 39,60).In short, we used pairwise comparison to cross-reference thePDB domains against SWISS-PROT. Depending on whether they hadan Enzyme Commission (EC) number, we were able to divide allentries into enzymes and non-enzymes, a division that representsthe highest level in our classification. (For the enzyme category,we only transferred EC numbers to those SCOP domains with aone-to-one match to a SWISS-PROT enzyme.) In the absence ofan EC-type classification for non-enzymes, we assigned functionsto non-enzymatic SCOP domains according to Ashburner’soriginal classification of Drosophila protein functions. Thisclassification is derived from a controlled vocabulary of flyterms, is available on the web and is loosely connected withthe FlyBase database (10). It has recently been superceded bythe GO functional classification (65). MIPS and GenProtEC classificationsto SCOP domains were assigned based on sequence comparisonsto classified yeast and E.coli ORFs, respectively. The SCOPdomain most closely matching each ORF classified in MIPS orGenProtEC was assigned the corresponding MIPS or GenProtEC functionnumber. Only matches of $>=$ 80% sequence identity were considered.

Alignment report. This gives detailed information on structuralalignments available between pairs of protein domains associatedwith a fold. A pair viewer is provided, which gives many keystatistics about the alignment (e.g. RMS, sequence identity,number of fit atoms, etc.), in addition to a listing of theactual aligned residues. Both HTML and parseable text viewsare available.

Interaction report. This shows all the pairs of protein–proteininteractions associated with a fold based on either the PDBsurvey or yeast genome data.

Rank report. This highlights the top-five and bottom-five rankedattributes associated with a fold. It also shows all attributesordered by the rank they are given in that fold. Thus, it highlightsfor a particular fold the attributes with respect to which itmost stands out. That is, it highlights the ‘outlier attributes’of each fold, the way each fold is most unique. The rank reportcould be used, for example, by a protein engineer interestedin determining the unique properties of a structure he is workingon.

PDB report. This summarizes all the information concerning adomain or a representative PDB structure. It includes: (i) asummary of the occurrence report; (ii) a summary of the alignmentsavailable for structures in the same superfamily and fold; (iii)a description of motions and motion-movies associated with thestructure in the Macromolecular Motions database (56,57); (iv) asummary of the merged functional classification; (v) a corestructure, if available (66); (vi) ranking tables of the queriedstructure in various datasets; and (vii) a summary of the interactionsreport. Figure 4 shows a sample PDB report for structure 1AMA.

View larger version (92K):
[in this window]
[in a new window]

Figure 4. A sample PDB report for structure 1AMA. The report summarizes the relevant information for this domain, including genome occurrences, alignment, motions, function classification, core structure and rankings. By clicking on the headers, one can get the detailed reports for these quantities.

Fold report. This lists all the SCOP domains associated withthe queried fold and provides information (similar to that inthe PDB report) that is common to all, i.e. genome occurrence,alignment report and rankings.

DISCUSSION

TOP
ABSTRACT
INTRODUCTION
ATTRIBUTES THAT CAN BE...
RANKING ALL THE FOLDS...
POWER-LAW BEHAVIOR OF MANY...
TRADITIONAL SINGLE-STRUCTURE...
DISCUSSION
REFERENCES

We developed a web-based system for dynamically ranking proteinfolds based on disparate attributes, including fold occurrencein various genomes, expression level, alignment statistics,protein–protein interactions, motion statistics and transposonsensitivity. Three ranking tools are provided, Comparer, Profilerand Correlator, which can help users to place one fold in contextof all other ones. The uniform system of ranks employed by PartsListprovides a good framework for comparing different experimentsand gaining a broad perspective on the complexity of genomes.

	DISCUSSION

We anticipate that PartsList will have a relatively stablenumber of entries (i.e. folds), while for each entry the attributesthat describe it will increase over time. In the future, asexperiments yield new information, PartsList will include moreand more attributes. In particular, we anticipate that muchnew expression information will be incorporated. We also planto develop a form to allow automatic submission of new rankingattributes and to encourage people to submit any ranking information.

ACKNOWLEDGEMENTS

We thank NIH Structural Genomics Program and the Keck Foundationfor support.

	ACKNOWLEDGEMENTS

FOOTNOTES

^* To whom correspondence should be addressed. Tel: +1 203 4326105; Fax: +1 360 838 7861; Email: mark.gerstein@yale.edu

	FOOTNOTES

REFERENCES

TOP
ABSTRACT
INTRODUCTION
ATTRIBUTES THAT CAN BE...
RANKING ALL THE FOLDS...
POWER-LAW BEHAVIOR OF MANY...
TRADITIONAL SINGLE-STRUCTURE...
DISCUSSION
REFERENCES

1 Chothia,C. (1992) Proteins. One thousand families for the molecular biologist. Nature, 357, 543–544.[Medline]

	REFERENCES

2 Brenner,S.E., Hubbard,T., Murzin,A. and Chothia,C. (1995) Gene duplications in H. influenzae. Nature, 378, 140.[Medline]

3 Wolf,Y.I., Grishin,N.V. and Koonin,E.V. (2000) Estimating the number of protein folds and families from complete genome data. J. Mol. Biol., 299, 897–905.[Medline]

4 The C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science, 282, 2012–2018.[Abstract/Full Text]

5 Berman,H.,M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242.[Abstract/Full Text]

6 Laskowski,R.A., Hutchinson,E.G., Michie. A.D., Wallace,A.C., Jones,M.L. and Thornton,J.M. (1997) PDBsum: a web-based database of summaries and analyses of all PDB structures. Trends Biochem. Sci., 22, 488–490.[Medline]

7 Wang,Y., Addess,K.J., Geer,L., Madej,T., Marchler-Bauer,A., Zimmernan,D. and Bryant,S.H. (2000) MMDB: 3D structure data in Entrez. Nucleic Acids Res., 28, 243–245.[Abstract/Full Text]

8 Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H., Kaloper,M., Orr,S.D., Schroeder,M., Weng,S., Zhu,Y., Botstein,D. and Cherry,J.M. (2000) Integrating functional genomic information into the Saccharomyces genome database. Nucleic Acids Res., 28, 77–80.[Abstract/Full Text]

9 Frishman,D., Heumann,K., Lesk,A. and Mewes,H.W. (1998) Comprehensive, comprehensible, distributed and intelligent databases: current status. Bioinformatics, 14, 551–561.[Abstract]

10 The FlyBase Consortium. (1999) The FlyBase database of the Drosophila Genome Projects and community literature. Nucleic Acids Res., 27, 85–88.[Abstract/Full Text]

11 Tatusov,R.L., Galperin,M.Y., Natale,D.A. and Koonin,E.V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res., 28, 33–36.[Abstract/Full Text]

12 Aach,J., Rindone,W. and Church,G.M. (2000) Systematic management and analysis of yeast gene expression data. Genome Res., 10, 431–445.[Abstract/Full Text]

13 Bader,G.D. and Hogue,C.W. (2000) BIND—a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics, 16, 465–477.[Abstract]

14 Xenarios,I., Rice,D.W., Salwinski,L., Baron,M.K., Marcotte,E.M. and Eisenberg,D. (2000) DIP: the database of interacting proteins. Nucleic Acids Res., 28, 289–291.[Abstract/Full Text]

15 Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J., Rapp,B.A. and Wheeler,D.L. (2000) GenBank Nucleic Acids Res., 28, 15–18.[Abstract/Full Text]

16 Konopka,A.K. and Martindale,C. (1995) Noncoding DNA, Zipf’s law, and language. Science, 268, 789.[Medline]

17 Flam,F. (1994) Hints of a language in junk DNA. Science, 266, 1320.[Medline]

18 Bornberg-Bauer,E. (1997) How are model protein structures distributed in sequence space? Biophys. J., 73, 2393–2403.[Abstract]

19 Gerstein,M. (1998) Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census. Proteins, 33, 518–534.[Medline]

20 Gerstein,M. (1997) A structural census of genomes: comparing eukaryotic, bacterial and archaeal genomes in terms of protein structure. J. Mol. Biol., 274, 562–576.[Medline]

21 Jeong,H., Tombor,B., Albert,R., Oltvai,Z.N. and Barabasi,A.L. (2000) The large-scale organization of metabolic networks. Nature, 407, 651–654.[Medline]

22 Amaral,L.A.N., Scala,A., Barthelemy,M. and Stanley,H.E. (2000) Classes of small-world networks Proc. Natl Acad. Sci. USA, 97, 11149–11152.[Abstract/Full Text]

23 Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536–540.[Medline]

24 Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) CATH—a hierarchic classification of protein domain structures. Structures, 5, 1093–1108.

25 Holm,L. and Sander,C. (1996) Mapping the protein universe. Science, 273, 595–602.[Abstract/Full Text]

26 Gibrat,J.F., Madej,T. and Bryant,S.H. (1996) Surprising similarities in structure comparison. Curr. Opin. Struct. Biol., 6, 337–385.

27 Madej,T., Gibrat,J.-F. and Bryant,S.H. (1995) Threading a database of protein cores. Proteins, 23, 356–369.[Medline]

28 Bateman,A., Birney,E., Durbin,R., Eddy,S.R., Finn,R.D. and Sonnhammer,E.L.L. (1999) The Pfam protein families database. Nucleic Acids Res., 27, 260–262.[Abstract/Full Text]

29 Henikoff,J.G., Greene,E.A., Pietrokovski,S. and Henikoff,S. (2000) Increased coverage of protein families with the blocks database servers. Nucleic Acids Res., 28, 228–230.[Abstract/Full Text]

30 Schultz,J., Milpetz,F., Bork,P. and Ponting,C.P. (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl Acad. Sci. USA, 95, 5857–5864.[Abstract/Full Text]

31 Brenner,S.E., Koehl,P. and Levitt,M. (2000) The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res., 28, 254–256.[Abstract/Full Text]

32 Lipman,D.J. and Pearson,W.R. (1985) Rapid and sensitive protein similarity searches. Science, 227, 1435–1441.[Medline]

33 Altschul,S.F. and Koonin,E.V. (1998) Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem. Sci., 23, 444–447.[Medline]

34 Brenner,S., Chothia,C. and Hubbard,T. (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl Acad. Sci. USA, 95, 6073–6078.[Abstract/Full Text]

35 Gerstein,M. and Levitt,M. (1997) A structural census of the current population of protein sequences. Proc. Natl Acad. Sci. USA, 94, 11911–11916.[Abstract/Full Text]

36 Teichmann,S., Chothia,C. and Gerstein,M. (1999) Advances in structural genomics. Curr. Opin. Struct. Biol., 9, 390–399.[Medline]

37 Gerstein,M., Lin,J. and Hegyi,H. (2000) Protein folds in the worm genome. Pac. Symp. Biocomput., 5, 30–42.

38 Lin,J. and Gerstein,M. (2000) Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Res., 10, 808–818.[Abstract/Full Text]

39 Wilson,C.A., Kreychman,J. and Gerstein,M. (2000) Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J. Mol. Biol., 297, 233–249.[Medline]

40 Levitt,M. and Gerstein,M. (1998) A unified statistical framework for sequence comparison and structure comparison. Proc. Natl Acad. Sci. USA, 95, 5913–5920.[Abstract/Full Text]

41 Gerstein,M. and Levitt,M. (1998) Comprehensive assessment of automatic structural alignment against a manual sandard, the Scop classification of proteins. Protein Sci., 7, 445–456.[Abstract]

42 Gerstein,M. (1998) How representative are the known structures of the proteins in a complete genome? A comprehensive structural census. Fold. Des., 3, 497–512.[Medline]

43 Velculescu,V.E., Zhang,L., Zhou,W., Vogelstein,J., Basrai,M.A., Bassett,D.E.,Jr, Hieter,P., Vogelstein,B. and Kinzler,K.W. (1997) Characterization of the yeast transcriptome. Cell, 88, 243–251.[Medline]

44 Brown,P.O. and Botstein,D. (1999) Exploring the new world of the genome with DNA microarrays. Nat. Genet., 21, 33–37.[Medline]

45 Lipshutz,R.J., Fodor,S.P., Gingeras,T.R. and Lockhart,D.J. (1999) High density synthetic oligonucleotide arrays. Nat. Genet., 21, 20–24.[Medline]

46 Jansen,R. and Gerstein,M. (2000) Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. Nucleic Acids Res., 28, 1481–1488.[Abstract/Full Text]

47 Gerstein,M. and Jansen,R. (2000) The current excitement in bioinformatics-analysis of whole-genome expression data: how does it relate to protein structure and function Curr. Opin. Struct. Biol., 10, 574–584.[Medline]

48 Jelinsky,S.A. and Samson,L.D. (1999) Global response of Saccharomyces cerevisiae to an alkylating agent. Proc. Natl Acad. Sci. USA., 96, 1486–1491.[Abstract/Full Text]

49 Holstege,F.C., Jennings,E.G., Wyrick,J.J., Lee,T.I., Hengartner,C.J., Green,M.R., Golub,T.R., Lander,E.S. and Young,R.A. (1998) Dissecting the regulatory circuitry of a eukaryotic genome. Cell, 95, 717–728.[Medline]

50 Roth,F.P., Hughes,J.D., Estep,P.W. and Church,G.M. (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol., 16, 939–945.

51 Park,J., Lappe,M. and Teichmann,S.A. (2001) Mapping protein family interactions: intra- and intermolecular interactions repertoires are distinct. J. Mol. Biol., 307, 929–939.[Medline]

52 Teichmann,S., Chothia,C., Church,G. and Park,J. (2000) Fast assignment of protein structures to sequences using the intermediate sequence library PDB-ISL. Bioinformatics, 16, 117–124.[Abstract]

53 Teichmann,S.A., Park,J. and Chothia,C. (1998) Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc. Natl Acad. Sci. USA, 95, 14658–14663.[Abstract/Full Text]

54 Uetz,P., Giot,L., Cagney,G., Mansfield,T.A., Judson,R.S., Knight,J.R., Lockshon,D., Narayan,V., Srinivasan,M., Pochart,P., Qureshi-Emili,A., Li,Y., Godwin,B., Conover,D., Kalbfleisch,T., Vijayadamodar,G., Yang,M., Johnston,M., Fields,S. and Rothberg,J.M. (2000) A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature, 403, 623–627.[Medline]

55 Ito,T., Tashiro,K., Muta,S., Ozawa,R., Chiba,T., Nishizawa,M., Yamamoto,K., Kuhara,S. and Sakaki,Y. (2000) Toward a protein–protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl Acad. Sci. USA, 97, 1143–1147.[Abstract/Full Text]

56 Gerstein,M. and Krebs,W. (1998) A database of macromolecular motions. Nucleic Acids Res., 26, 4280–4290.[Medline]

57 Krebs,W. and Gerstein,M. (2000) The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework. Nucleic Acids Res., 28, 1665–1675.[Abstract/Full Text]

58 Ross-Macdonald,P., Coelho,P.S., Roemer,T., Agarwal,S., Kumar,A., Jansen,R., Cheung,K., Sheehan,A., Symoniatis,D., Umansky,L., Heidtman,M., Nelson,F.K., Iwasaki,H., Hager,K., Gerstein,M., Miller,P., Roeder,G.S. and Snyder,M. (1999) Large-scale analysis of the yeast genome by transposon tagging and gene disruption. Nature, 402, 413–418.[Medline]

59 Harrison,P., Echols,N. and Gerstein,M. (2001) Digging for dead genes: an analysis of the characteristics of the pseudogene population in the C. elegans genome. Nucleic Acids Res., 29, 818–830.[Abstract/Full Text]

60 Hegyi,H. and Gerstein,M. (1999) The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol., 228, 147–164.

61 Schwikowski,B., Uetz,P. and Fields,S. (2000) A network of protein–protein interactions in yeast. Nat. Biotechnol., 18, 1257–1261.[Medline]

62 Knuth,D. (1973) The Art of Computer Programming 3. Addison-Wesley, Reading, MA.

63 Bairoch,A. (1993) The ENZYME data bank. Nucleic Acids Res., 21, 3155–3156.[Medline]

64 Riley,M. and Labedan,B. (1996) E. coli gene products: physiological functions and common ancestries. In Neidhardt,F., Curtiss,R.,III, Lin,E.C.C., Ingraham,J., Low,K.B., Magasanik,B., Reznikoff,W., Riley,M., Schaechter,M. and Umbarger,H.E. (eds), Escherichia coli and Salmonella: Cellular and Molecular Biology. ASM Press, Washington, DC, pp. 2118–2202.

65 Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T., Harris,M.A., Hill,D.P., Issel-Tarver,L., Kasarskis,A., Lewis,S., Matese,J.C., Richardson,J.E., Ringwald,M., Rubin,G.M. and Sherlock,G. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25, 25–29.[Medline]

66 Schmidt,R.B., Gerstein,M. and Altman,R.B. (1997) LPFC: an internet library of protein family core structures. Protein Sci., 6, 246–248.[Abstract]

67 Drawid,A., Jansen,R. and Gerstein,M. (2000) Genome-wide analysis relating expression level with protein subcellular localization. Trends Genet., 16, 426–429.[Medline]

68 Park,J., Karplus,K., Barrett,C., Hughey,R., Haussler,D., Hubbard,T. and Chothia,C. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol., 284, 1201–1210.[Medline]

69 Spellman,P.T., Sherlock,G., Zhang,M.Q., Iyer,V.R., Anders,K. Eisen,M.B., Brown,P.O., Botstein,D. and Futcher,B. (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9, 3273–3297.[Abstract/Full Text]

70 DeRisi,J.L., Iyer,V.R. and Brown P.O. (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278, 680–686.[Abstract/Full Text]

71 Chu,S., DeRisi,J., Eisen,M., Mulholland,J., Botstein,D., Brown,P.O. and Herskowitz,I. (1998) The transcriptional program of sporulation in budding yeast. Science, 282, 699–705.[Abstract/Full Text]

72 Richmond,C.S., Glasner,J.D., Mau,R., Jin,H. and Blattner,F.R. (1999) Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res., 27, 3821–3835.[Abstract/Full Text]

73 Wixon,J., Blaxter,M., Hope,I., Barstead,R. and Kim,S. (2000) Caenorhabditis elegans. Yeast, 17, 37–42.[Medline]

Abstract of this Article

Reprint (PDF) Version of this Article

Similar articles found in:
Nucl. Acids. Res. Online
PubMed

PubMed Citation

Search Medline for articles by:
Qian, J. || Gerstein, M.

Alert me when:
new articles cite this article

Download to Citation Manager