Greenbaum et al 
11
To measure the statistical significance of the results on amino acid enrichment, we have performed 
a control analysis on a randomized dataset (Figure 3D). We randomly permutated the expression 
values of the ORFs 1000 times and then recomputed the enrichments.  This allowed us to compute 
distributions for the amino acid enrichments and, from integrating these, one-sided p-values 
indicating the significance of the observed enrichments.  
Biomass Enrichment 
A corollary to amino acid enrichments is the determination of the average biomass of the 
transcriptome and translatome populations.  We show this in Figure 3C.  We found that the average 
molecular weight of a protein in both populations was, on average, lower than in the genome 
population.  These preliminary observations suggest a cell preference to use less energetically 
expensive proteins for those that are highly transcribed or translated. However, we also found that 
the average molecular weight per amino acid differed much less between the transcriptome and the 
translatome on the one hand, and the genome on the other hand (though it was still slightly less). 
This finding indicates that lower molecular weights in the translatome and transcriptome 
populations relative to the genome population are predominantly due to greater expression of 
shorter proteins rather than the incorporation of smaller amino acids. 
 
Secondary Structure Composition 
We also used our methodology to study the enrichment of secondary-structural features. Secondary 
structural annotation was derived from structure prediction applied uniformly to all the ORFs in the 
yeast genome as described in Table 1. As shown in Figure 4A, all three populations  genome, 
transcriptome, and translatome  had a fairly similar composition of secondary structures -- sheets, 
helices, and coils. The differences between populations were marginal and based only on the small 
subset of genes.  They do, though, point to a possible trend of depletion of random coils relative to 
alpha helices and beta sheets in the transcriptome and translatome. 
 
We also found that transmembrane proteins were significantly depleted in the transcriptome (see 
website).  To identify transmembrane (TM) proteins, we used the GES hydrophobicity scale as 
described previously (see caption to Table 1 (Gerstein 1998)).  These results are consistent with our 
previous analyses (Jansen & Gerstein 2000).  This analysis could not be extended to the 
translatome because the 181 genes in the protein abundance data set (GProt) do not contain any 
membrane proteins, which are difficult to detect in gel electrophoresis (Molloy 2000). 
 
Subcellular Localization 
A generalization of the transmembrane protein analysis is subcellular localization.  We looked into 
the enrichment of proteins associated with the various subcellular compartments. This is shown in 
Figure 4C. For clarity, we divided the cell into five distinct subcellular compartments, as described 
in Table 1. We found that, in comparison to the genome, both the transcriptome and translatome are 
enriched in cytoplasmic proteins. This is true whether we make our comparisons in relation to the 
relatively large reference mRNA expression set or the smaller reference protein abundance set. As 
figure 4C shows, the 2D gel experiments are clearly biased towards proteins from the cytoplasm.  
However, in the biased subset Gprot transcription and translation lead to an even higher fraction of 
cytoplasmic proteins in the translatome.