Genes & Development -- Kumar et al. 16 (6): 707

Abstract of this Article

Reprint (PDF) Version of this Article

RESEARCH PAPER
Subcellular localization of the yeast proteome

Anuj Kumar,¹ Seema Agarwal,¹ John A. Heyman,³^,⁵ Sandra Matson,¹ Matthew Heidtman,¹ Stacy Piccirillo,¹ Lara Umansky,¹ Amar Drawid,² Ronald Jansen,² Yang Liu,² Kei-Hoi Cheung,⁴ Perry Miller,⁴ Mark Gerstein,² G. Shirleen Roeder,¹ and Michael Snyder¹^,²^,⁶

¹ Department of Molecular, Cellular, and Developmental Biology, ² Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA; ³ Invitrogen Corporation, Carlsbad, California 92008, USA; ⁴ Center for Medical Informatics, Department of Anesthesiology, Yale University School of Medicine, New Haven, Connecticut 06510, USA

ABSTRACT

Top
Abstract
Introduction
Results
Discussion
Materials and methods
References

	ABSTRACT

Protein localization data are a valuable information resource helpful in elucidating eukaryotic protein function. Here, wereport the first proteome-scale analysis of protein localizationwithin any eukaryote. Using directed topoisomerase I-mediatedcloning strategies and genome-wide transposon mutagenesis, wehave epitope-tagged 60% of the Saccharomyces cerevisiae proteome.By high-throughput immunolocalization of tagged gene products,we have determined the subcellular localization of 2744 yeastproteins. Extrapolating these data through a computational algorithmemploying Bayesian formalism, we define the yeast localizome (thesubcellular distribution of all 6100 yeast proteins). We estimatethe yeast proteome to encompass ~5100 soluble proteins and >1000transmembrane proteins. Our results indicate that 47% of yeastproteins are cytoplasmic, 13% mitochondrial, 13% exocytic (includingproteins of the endoplasmic reticulum and secretory vesicles),and 27% nuclear/nucleolar. A subset of nuclear proteins was furtheranalyzed by immunolocalization using surface-spread preparationsof meiotic chromosomes. Of these proteins, 38% were found associatedwith chromosomal DNA. As determined from phenotypic analyses ofnuclear proteins, 34% are essential for spore viability $---$ a percentagenearly twice as great as that observed for the proteome as a whole.In total, this study presents experimentally derived localizationdata for 955 proteins of previously unknown function: nearly halfof all functionally uncharacterized proteins in yeast. To facilitateaccess to these data, we provide a searchable database featuring2900 fluorescent micrographs at http://ygac.med.yale.edu.

[Key Words: Protein localization; S. cerevisiae; proteomics; machine learning; epitope-tagging]

	Introduction

Top Abstract Introduction Results Discussion Materials and methods References

A global understanding of the molecular mechanisms underpinning cell biology necessitates an understanding not only of anorganism's genome but also of the protein complement encoded withinthis genome (the proteome). In the past, data regarding an organism'sproteome have typically been accumulated piecemeal through studiesof a single protein or cell pathway. Genomic methodologies havealtered this paradigm: a variety of approaches are now in placeby which proteins may be directly analyzed on a proteome-widescale. Chromatography-coupled mass spectrometry (Gygi et al. 1999;Washburn et al. 2001), large-scale two-hybrid screens (Uetz etal. 2000; Ito et al. 2001; Tong et al. 2002), immunoprecipitation/massspectrometric analysis of protein complexes (Gavin et al. 2002;Ho et al. 2002), and protein microarray technologies (MacBeathand Schreiber 2000; Zhu et al. 2000, 2001) are yielding unprecedentedquantities of protein data. Recent genomic techniques combiningmicroarray technologies with either chromatin immunoprecipitation(Ren et al. 2000; Iyer et al. 2001) or targeted DNA methylation(van Steensel et al. 2001) have been used to globally map bindingsites of chromosomal proteins in vivo. Initiatives are even underwayto automate and industrialize processes by which protein structuresmay be solved, potentially providing a library of structural datafrom which homologous proteins may be modeled (Burley 2000; Montelione2001).

Although these approaches promise a wealth of information, many fundamental proteomic data sets remain uncataloged. Notably,the subcellular distribution of proteins within any single eukaryoticproteome has never been extensively examined, despite the usefulnessand importance of these data. Protein localization is assumedto be a strong indicator of gene function. Localization data arealso useful as a means of evaluating protein information inferredfrom genetic data (e.g., supporting or refuting putative proteininteractions suggested from two-hybrid analysis; Ito et al. 2001).Furthermore, the subcellular localization of a protein can oftenreveal its mechanism ofaction.

To determine the subcellular localization of a protein, its corresponding gene is typically either fused to a reporter ortagged with an epitope. Reporters and epitope tags are fused routinelyto either the N or C termini of target genes, a choice that canbe critical in obtaining accurate localization data. Organelle-specifictargeting signals (e.g., mitochondrial targeting peptides andnuclear localization signals) are often located at the N terminus(Silver 1991); N-terminal reporter fusions may disrupt these sequences,resulting in anomalous protein localizations. In other cases,C-terminal sequences may be important for proper function andregulation, as recently shown from analysis of the yeast $gamma$ -tubulin-likeprotein Tub4p (Vogel et al. 2001). Gene copy number can also havean impact on the accuracy with which a protein is localized; overexpressedprotein products may saturate intracellular transport mechanisms,potentially producing an aberrant subcellular protein distribution.In other cases, weakly expressed single-copy genes may not yieldsufficient protein to be visualized, particularly by fluorescencemicroscopy. The effects of copy number and reporter/tag orientationon protein localization, however, have never been studied in alarge dataset.

To date, few studies have characterized protein localization on a large scale, primarily because few high-throughput methodsexist by which reporter fusions or epitope-tagged proteins canbe generated and subsequently localized. Typically, systematicapproaches have been used to construct a limited number of chimericreporter fusions applicable to pilot localization studies. Forexample, >100 human cDNAs have been cloned as N- and C-terminalgene fusions to spectral variants of green fluorescent protein(GFP) as a means of examining the subcellular localization ofthese proteins in living cells (Simpson et al. 2000). Thus far,the majority of localization studies have been undertaken in yeast,owing primarily to the fidelity of homologous recombination inSaccharomyces cerevisiae and the concomitant ease with which integratedreporter gene fusions can be generated. As part of a pilot studyin S. cerevisiae, Niedenthal et al. (1996) constructed GFP reporterfusions to three unknown open reading frames (ORFs) from yeastChromosome XIV and subsequently localized these chimeric GFP-fusionproteins by fluorescencemicroscopy.

In addition to directed cloning methods, strains suitable for localization analysis may be generated through random approaches.Recently, a plasmid-based GFP-fusion library of Schizosaccharomycespombe DNA was constructed by fusing random fragments of genomicDNA upstream of GFP-coding sequence. Fission yeast cells transformedwith this library were subsequently screened for GFP fluorescence,and 250 independent gene products were localized (Ding et al.2000). In S. cerevisiae, transposon-based methods have been usedto generate random lacZ gene fusions (Burns et al. 1994) and epitope-taggedalleles (Ross-MacDonald et al. 1999) for subsequent immunolocalization.Although these transposon-based studies have resulted in the localizationof ~300 yeast proteins, the majority of the S. cerevisiae proteomehas remained uncharacterized in regards to its subcellulardistribution.

To address this deficiency, we have undertaken the largest analysis to date of protein localization in yeast. Employing high-throughputmethods of epitope-tagging and immunofluorescence analysis, ourstudy defines the subcellular localization of 2744 proteins. Byintegrating these localization data with those previously published,we identify the subcellular localization of >3300 yeast proteins,55% of the proteome. Building on these data, we have applied aBayesian system to estimate the intracellular distribution ofall 6100 yeast proteins and have further characterized a subsetof nuclear proteins both by immunolocalization on surface spreadchromosomal preparations and by phenotypic analysis. In total,our findings provide a wealth of insight into protein function,while formally corroborating an expected link between proteinfunction and localization. Furthermore, this study provides experimentallyderived localization data for nearly 1000 proteins of previouslyunknown function, thereby providing, at minimum, a starting pointfor informed analysis of this previously uncharacterized segmentof theproteome.

	Results

Top Abstract Introduction Results Discussion Materials and methods References

Genome-wide epitope-tagging and large-scale immunolocalization

Yeast proteins immunolocalized in this study were epitope-tagged using two approaches: directed cloning of PCR-amplified ORFsinto a yeast tagging/expression vector, and random tagging bytransposon mutagenesis. By the former approach, 2085 annotatedS. cerevisiae ORFs were cloned into the yeast high-copy expressionvector pYES2/GS through topoisomerase I-mediated ligation (Fig.1A). PCR-amplified yeast ORFs were inserted immediately upstreamof sequence encoding the V5 epitope (from the P and V proteinsof paramyxovirus SV5; Heyman et al. 1999) and downstream of thegalactose-inducible GAL1 promoter, such that galactose inductionin yeast could be used to drive expression of each gene as a fusionprotein carrying the V5 epitope at its C terminus. For purposesof this study, sequence-verified plasmids bearing yeast geneswere transformed into an appropriate strain of S. cerevisiae ina 96-well format (see Materials and Methods). Cloned genes wereexpressed in yeast by galactose induction; the induction periodwas kept as brief as possible to minimize potential artifactsassociated with gene overexpression. Protein products were subsequentlylocalized by indirect immunofluorescence using monoclonal antibodiesdirected against the V5 epitope. To accommodate higher throughput,yeast cells were prepared for immunofluorescence analysis in a96-well format as described (Kumar et al. 2000b).

View larger version (33K):
[in this window]
[in a new window]

Figure 1. Genome-wide epitope-tagging strategies. (A) Yeast ORFs were amplified by PCR and cloned by topoisomerase I-mediated ligation into the yeast expression vector pYES2/GS. The pYES2/GS vector carries the yeast 2µ origin of replication for maintenance of high copy number. Yeast genes were inserted into pYES2 such that they are under transcriptional control of the GAL1 promoter and fused at their 3' ends to sequence encoding the V5-epitope and polyhistidine tag (HIS)₆. By galactose induction in yeast, cloned genes were overexpressed as V5-tagged proteins for subsequent immunolocalization with $alpha$ -V5 antibodies (in 96-well formats). (B) Modified bacterial transposons were used to randomly tag yeast genes at their native genomic loci with sequence encoding three copies of the viral haemagglutinin epitope (3×HA epitope). The transposon carries a promoterless and 5'-truncated lacZ reporter enabling selection of in-frame insertions by $beta$ -galactosidase assay. In-frame insertions were subsequently modified in yeast by Cre-lox recombination, such that the majority of the transposon sequence was excised. The remaining HA-epitope insertion element (HAT tag) encodes no stop codons in the specified reading frame. The indicated 279-bp HAT-tag insertion includes a 5-bp duplication in target site sequence associated with Tn3 transposition. HAT-tagged proteins were immunolocalized with monoclonal $alpha$ -HA antibodies in a 96-well format.

Yeast genes were also epitope-tagged by means of insertional mutagenesis using a series of bacterial transposons, each modifiedto carry sequence encoding a reporter gene, bacterial and yeastselectable markers, a pair of internal lox sites, and three copiesof the HA epitope (Fig. 1B; Ross-Macdonald et al. 1997). By shuttlemutagenesis (Seifert et al. 1986), transposon-mutagenized fragmentsof genomic DNA were introduced into a diploid strain of yeast;insertion alleles integrated at their corresponding genomic lociby homologous recombination. Insertions in-frame with gene-codingsequences were selected and subsequently modified in vivo by Cre-loxrecombination such that all transposon-encoded reporters, stopcodons, and selectable markers were excised. The remaining transposoninsertion element encodes 93 amino acids, primarily consistingof the triplicate HA epitope. Proteins carrying this transposon-encodedHA tag (HAT tag) were localized by indirect immunofluorescencewith $alpha$ -HA monoclonal antibodies (see Materials and Methods). Bythis approach, 11,417 HAT-tagged strains were generated, encompassing2958 different proteins suitable for subsequentimmunolocalization.

In total, we have examined by indirect immunofluorescence >13,000 strains harboring epitope-tagged alleles of 3565 differentgenes (~60% of the yeast proteome). Of 2085 genes cloned intoV5-tagging/expression vectors, 2022 gene products showed a stainingpattern above background upon immunofluorescence analysis. Of2958 HAT-tagged proteins similarly examined, 1083 proteins yieldedstaining patterns appreciably distinct from background. As thesetwo data sets partially overlap, we define here the subcellularlocalization of 2744 different proteins. The subcellular compartmentalizationof these proteins is indicated in Table 1. Example staining patternsresulting from indirect immunofluorescence analysis of HAT-taggedproteins and V5-tagged proteins are presented in Figure 2.

View this table:
[in this window]
[in a new window]

Table 1. Summary of localized proteins

View larger version (120K):
[in this window]
[in a new window]

Figure 2. Immunolocalization of epitope-tagged proteins. (A-E) Vegetative cells containing HAT-tagged proteins were stained with the DNA-binding dye 4`,6-diamidino-2-phenylindole (DAPI; left image) and monoclonal antibody against HA (center). Per row, the DAPI-stained and $alpha$ -HA-stained images are shown merged in the rightmost panel. Typical nucleolar staining patterns can be seen in strains containing HAT-tagged alleles of the rRNA-binding proteins Net1p (A) and Sik1p (E). Staining of the cell neck is evident in cells containing HAT-tagged Hsl1p (B). HAT-tagging of the vacuolar ATPase Vma6p is shown in row C. Staining of the cell periphery can be seen upon HAT-tagging of the cell surface glycoprotein Gas1p (D). (F-J) Vegetative cells carrying V5-tagged proteins were stained with monoclonal antibody directed against the V5 epitope (center). Corresponding DAPI-stained images and merged images are shown to the left and right, respectively. Nucleolar staining is apparent in cells carrying V5-tagged Nop13p (F). Note, however, that V5-tagging and mild overexpression of SIK1 (J) results in a nuclear staining pattern, as opposed to the nucleolar pattern evident upon HAT-tagging of this same gene (E). Mitochondrial staining (G) can be seen in cells carrying a tagged allele of YMR293C; overlap between DAPI- and $alpha$ -V5 staining is shown in the merged image. V5-tagged Gpi12p localizes to the endoplasmic reticulum (H), visible as an area of strong staining around the nuclear rim. A patchy pattern of cytoplasmic staining can be seen in cells carrying V5-tagged Bzz1p (I). Bar, 2 µm.

Subcellular localization of 2744 yeast proteins

Tagged proteins were localized in yeast to a wide variety of organelles and intracellular structures including the nucleus,mitochondria, endoplasmic reticulum, plasma membrane, vacuole,cytoplasm, and cell neck (Fig. 2). The majority (48%) of proteinstested in this study were found localized throughout the cytoplasm,typically showing a finely punctate pattern of staining. In addition,68 proteins (2.5% of those tested) localized predominantly inclusters within the cytoplasm, visualized as intense areas ofstaining or patches occasionally overlaid on a background of generalcytoplasmic staining. This patchy staining was often evident instrains carrying tagged alleles of known cytoskeletal or cytoskeleton-associatedproteins. For example, immunofluorescence analysis of tagged Hsp42prevealed a patchy pattern of cytoplasmic staining; Hsp42p is asmall heat shock protein functioning in reorganization of theactin cytoskeleton following thermal stress (Gu et al. 1997).In total, 18 known cytoskeletal proteins showed this stainingpattern upon immunolocalization. Patches of cytoplasmic stainingwere also observed in cells carrying tagged proteins identifiedpreviously as components of the Golgi apparatus or other membrane-boundvesicles of the yeast secretory pathway. Van1p, a mannosyltransferaseresiding in the early Golgi compartment (Cho et al. 2000), showedthis patchy staining pattern upon HAT-tagging and subsequentimmunolocalization.

Approximately 1200 of all 2744 localized proteins were compartmentalized to discrete subcellular organelles such as the nucleus,mitochondria, or endoplasmic reticulum. Of these proteins, a significantfraction (25.2%) showed a mixed compartmentalization, localizingpredominantly to a single organelle but also showing appreciablecytoplasmic staining upon immunofluorescence analysis. For example,82 proteins were localized to the endoplasmic reticulum and cytoplasm,including the vesicular transport protein Sec17p. Previous studieshave suggested a role for Sec17p in vesicle-mediated endoplasmicreticulum-to-Golgi transport (Waters et al. 1991); the observedcytoplasmic/endoplasmic reticulum staining pattern resulting fromimmunolocalization of tagged Sec17p is typical of secretory vesicleproteins (seeDiscussion).

Interestingly, an even greater number of proteins (207) colocalized to the cytoplasm and nucleus. The majority of these proteins(for which functions had been assigned previously) are involvedeither in processes of transcription or cytoskeletal organization.In our study, many transcription factors were localized, at leastin part, to the cytoplasm. For example, we found the transcriptionalactivator Pho4p localized predominantly to the cytoplasm, andonly slightly in the nucleus, under conditions of vegetative growthon standard media. This finding agrees with published work inwhich Pho4p was found concentrated in the nucleus only under conditionsof phosphate starvation (O'Neill et al. 1996). In some cases,however, cytoplasmic staining may be an artifact resulting froma disrupted nuclear localization signal or saturated nucleartransporters.

To estimate the frequency with which such artifacts are present within our data, we have compared all localizations from thisstudy with previously published localization data extracted fromthe Yeast Protein Database (YPD; Costanzo et al. 2001), the SwissProtDatabase (Bairoch and Apweiler 2000), and the Munich InformationCenter for Protein Sequences Comprehensive Yeast Genome Database(MIPS CYGD; Mewes et al. 2000). Comparison of 694 protein localizationsindicated >85% agreement with data from existing literature. Inparticular, our findings are in agreement with previously publishedresults in 93% of cases in which we localize a protein to themitochondria (134 comparisons total) and 90% of cases in whichwe localize a protein to the nucleus (230 comparisons total).We do recognize biases in our method, as certain classes of proteins(e.g., spindle pole proteins) are underrepresented in our results.A more detailed analysis of the accuracy and efficiency of ourmethods is provided in theDiscussion.

Mapping protein sorting signals by transposon-tagging

As transposition occurs nearly at random, our genome-wide methods of transposon mutagenesis often generate multiple insertionswithin a single gene (see Discussion). The availability of thesemultiple insertion alleles can be advantageous, providing a meansby which intragenic sequences important for proper localizationand function may be mapped. For example, from immunolocalizationof several HAT-tagged variants of the yeast peroxisomal membraneprotein Pex22p, we have identified a putative peroxisomal membrane-targetingsignal at the N terminus of this protein: HAT-tag insertion atresidue 10 of Pex22p disrupts peroxisomal localization, whereasan insertion 55 residues C-terminal of this site does not. Interestingly,a functional homolog of Pex22p in Pichia pastoris contains a known25-amino-acid membrane-targeting signal at its extreme N terminus(Koller et al. 1999).

Subcellular compartmentalization of the yeast proteome using an integrated Bayesian system

By integrating our results with those publicly available in YPD, SwissProt, and MIPS CYGD, we can definitively assign subcellularlocalizations to 3343 yeast proteins. In complement, we have employeda hydrophobicity-based predictive algorithm (Krogh et al. 2001)to identify all yeast proteins possessing two or more transmembranedomains $---$ an approach estimated to identify integral membrane proteinswith 99% accuracy (Krogh et al. 2001). In total, 1029 integralmembrane proteins were identified in the yeast proteome; 387 ofthese predicted membrane proteins were already assigned a subcellularcompartment from our immunolocalization data and/or previouslypublished data. We have estimated the relative subcellular distributionof the remaining 642 previously unstudied membrane proteins byextrapolating from the relative compartmentalization of membraneproteins observed in our experimentally derived localization dataset. We, therefore, define a molecular environment (transmembraneor soluble) and subcellular localization for 3985 yeastproteins.

To intelligently predict the subcellular distribution of the remaining 2147 soluble yeast proteins for which no localizationdata are available, we have used a Bayesian system (Drawid andGerstein 2000) that extrapolates from our findings and also integratesadditional types of data potentially correlative to protein localization(see Materials and Methods). For purposes of this analysis, wehave used all available data describing experimentally determinedprotein localizations in yeast to calculate a default localizationprobability. This initial probability is sequentially updatedfor each previously uncharacterized protein using Bayes's rulesand a diverse set of 30 features (including motif analysis, surfacecomposition, isoelectric point, and mRNA expression, among others),thereby generating a final localization probability for each protein.Localization probabilities were subsequently summed, providingan estimate as to the overall population of each subcellular compartment.The estimated compartment populations were added to those experimentallydetermined to arrive at the total subcellular compartmentalizationof the yeast proteome (Fig. 3).

View larger version (28K):
[in this window]
[in a new window]

Figure 3. Subcellular compartmentalization of the yeast proteome. (A) Cellular compartments are as follows: cytoplasmic (Cyt.), nuclear (Nuc.), mitochondrial (Mit.), and exocytic (Exo.). The membrane fraction of each compartment is indicated in stripes. The percentage of the yeast proteome contained within the respective membrane and soluble fractions of each compartment is indicated outside the chart; the total percentage of the proteome contained within each of the four main compartments is indicated inside the chart. Plasma membrane proteins are included in the cytoplasmic compartment for purposes of this analysis. (B) The corresponding protein population of each cellular compartment and membrane/soluble subfraction is indicated.

By this approach, we estimate 47% of all yeast proteins to be cytoplasmic and an additional 27% to be nuclear. Approximatelyequal fractions of the yeast proteome (13%) compartmentalize tothe mitochondria and exocytic network. As expected, we find themajority of yeast integral membrane proteins localized to theendoplasmic reticulum or other secretory vesicles. In the categorizationscheme employed here, plasma membrane proteins have been incorporatedinto the cytoplasmic compartment; therefore, the membrane fractionof cytoplasmic proteins is higher than otherwise would be expected.In total, the yeast proteome consists of 1029 transmembrane proteinsand 5103 soluble proteins. Comprehensive results from this Bayesiananalysis may be accessed at http://genecensus.org/localize.

Chromosomal association and phenotypic analysis of nuclear-localized proteins

By our analysis (Fig. 3), the yeast proteome may encompass in excess of 1600 nuclear proteins. The presence of this surprisinglylarge nuclear complement raises an interesting question: whatfraction of these proteins associates with chromosomes? Furthermore,how many of these nuclear proteins are essential for cell viability?To address these questions, we have analyzed the chromosomal localizationand disruption phenotypes associated with a subset of yeast nuclearproteins identified in this study. Transposon-tagged strains werechosen for this analysis, as a single transposon insertion canbe used to generate both a gene disruption as well as an epitope-taggedallele (Ross-Macdonald et al. 1997), facilitating phenotypic studyand immunofluorescence analysis, respectively. To assess the abilityof these proteins to associate with chromosomal DNA, 56 HAT-taggednuclear proteins were immunolocalized on surface-spread preparationsof meiotic chromosomes isolated from late-zygotene-to-pachytenenuclei. A sampling of observed staining patterns is presentedin Figure 4; complete results are indicated in Figure 5. In addition,corresponding alleles of each gene carrying full-length transposoninsertions were assayed for their effect on spore viability (seeMaterials and Methods). Results from this phenotypic analysisare also presented in Figure 5.

View larger version (97K):
[in this window]
[in a new window]

Figure 4. Immunolocalization of nuclear proteins on surface-spread meiotic chromosomes. Meiotic chromosomes were surface spread and stained with the DNA-binding dye DAPI (left) and monoclonal anti-HA antibodies (center). Corresponding merged images are shown to the right. A general pattern of chromosomal binding can be seen from immunofluorescence analysis of cells containing HAT-tagged alleles of RFC3 (A-C), IOC2 (D-F), and PDR1 (G-I). Nine proteins localized predominantly to the nucleolus; typical nucleolar staining patterns are shown here in cells containing HAT-tagged alleles of YGR090W (J-L), MPP10 (M-O), and YHR196W (P-R). Specific binding to telomeric DNA can be seen upon HAT-tagging and immunolocalization of the origin recognition complex subunit Orc4p (S-U). Bar, 1 µm.

View larger version (29K):
[in this window]
[in a new window]

Figure 5. Chromosomal localization and phenotypic analysis of nuclear proteins. Chromosomal localization indicates a general pattern of chromosomal binding, typically with >40 staining foci per nucleus. Strains disrupted for each gene were assayed for spore viability or growth defects; observed disruption mutants are categorized as viable, inviable, or slow-growth, accordingly.

In total, 21 nuclear proteins of 56 tested (38%) were found localized to meiotic chromosomes. Specifically, 16 proteins (includingsix of previously unknown function) showed staining patterns indicativeof general chromosomal binding, typically with 40 or more chromosomalfoci per nucleus. Two proteins, Orc4p (Fig. 4S-U) and Rap1p, boundtelomeric DNA. Orc4p is a component of the origin recognitioncomplex (ORC) and is involved in transcriptional silencing attelomeres (Bell et al. 1995); Rap1p is a transcription factoralso involved in telomeric silencing as well as telomere maintenance(Ray and Runge 1998). Two additional proteins, the DNA replicationfactor component Rfc3p (Fig. 4A-C; Li and Burgers 1994) and thechromatin remodeling protein Rsc6p (Cairns et al. 1996), boundtelomeric sequence while also recognizing more than 20 other chromosomalsites each. As expected, the centromere-binding factor Cbf1p (Bakerand Masison 1990) bound centromeric sequence, visualized as asingle staining spot in the center of each chromosome. Nine geneproducts (16% of those tested) localized predominantly, if notexclusively, to the nucleolus, including two previously uncharacterizedproteins encoded by YGR090W (Fig. 4J-L) and YHR196W (Fig. 4P-R).The majority of chromosomal and nucleolar proteins, such as these,likely bind DNA, either chromosomal DNA or nucleolar rDNA; however,a significant fraction may only associate with chromosomes throughinteractions with other chromosomal proteins (e.g., histone modificationproteins).

Phenotypic analysis of the 56 nuclear proteins tested here revealed 19 genes (34%) indispensable for spore viability (Fig.5) $---$ a fraction approximately twice as great as that found for thegenome as a whole (Winzeler et al. 1999). Interestingly, 6 ofthese 19 essential genes encode nucleolar proteins, includingthe aforementioned YGR090W and YHR196W gene products. An additional13 genes produced a slow-growth phenotype upon disruption. Fiveof these genes encode chromosomal-associated proteins; three encodenucleolar proteins. In total, all nine nucleolar proteins conferredobservable phenotypes (spore inviability or slow-growth) upondisruption; 52% of chromosomal proteins conferred these same phenotypes,underscoring the fundamental importance of the nucleus/nucleolusand its proteincomplement.

Protein localization correlates strongly with protein function

The studies presented here provide a unique opportunity to examine more rigorously the assumption that protein function canbe inferred from protein localization, an assumption best testedby correlating proteome-wide data sets of protein localizationwith corresponding data sets of protein function. Accordingly,we have tallied all molecular functions (extracted from MIPS CYGD)associated with the 2744 yeast proteins immunolocalized in thisstudy. The most frequently observed functions associated witheach of the eight most populous compartments of the yeast proteomeare indicated in Figure 6.

View larger version (39K):
[in this window]
[in a new window]

Figure 6. Prevalent functions associated with cellular compartments in yeast. Functional categorizations (compiled from published literature) were extracted from the MIPS CYG database for all proteins experimentally localized in this study. In total, functions were available for 1789 proteins; the number of functionally categorized proteins localized to each of the indicated cellular compartments is shown. Mixed localizations are also represented: 165 functionally characterized proteins were colocalized to the cytoplasm and nucleus; similarly, 61 such proteins were colocalized to the cytoplasm and endoplasmic reticulum (ER). Functions were tallied for all proteins within a given cellular compartment. The most frequently occurring functions per compartment are shown boxed. Multiple functions may be associated with a single protein. Therefore, the listed percentage following each function refers to the fraction of proteins within each compartment associated with that particular cellular process, and the sum total of these percentages within a given compartment will not equal 100%.

Within each organelle or compartment, a plurality of proteins participate, at least partially, in maintaining structural integrity.Secondary functions also correlate well with major organelle-specificprocesses: for example, 34% of all nuclear-localized proteinsare involved in the process of transcription, and 26% of all mitochondrialproteins function directly in cellular respiration. Furthermore,specific functions can be correlated with subtly distinct localizationpatterns. For example, 17% of the proteins that colocalized tothe nucleus and cytoplasm are cytoskeletal, whereas cytoskeletalfunctions are not as strongly associated with proteins that localizeonly to the nucleus or only to the cytoplasm. Similarly, cytoskeletalproteins and Golgi proteins constitute the bulk of those proteinsshowing patchy patterns of cytoplasmic staining; however, identicalfunctions are not significantly represented among proteins yieldingfine, granular, or punctate cytoplasmic staining patterns. Thisstrong correlation between function and localization suggeststhat broad functional categories can now be ascribed to the 955proteins of previously unknown function localized in thisstudy.

An on-line database and visual library of protein localization in yeast

To catalog the data presented here, we have developed an on-line database of yeast protein localization accessible from ourlab homepage at http://ygac.med.yale.edu (Protein Localizationin Yeast link). For this site, we have developed a new user interfacespecifically accommodating our V5-tagged data set; new HAT-taggeddata may now be accessed from our TRIPLES web site (Kumar et al.2000a). In both cases, we supply search options by which userscan access data for any gene of interest. Alternatively, completedata sets for all proteins localizing to a given site may be downloadedas tab-delimited text. Tabular data sets are supplemented withfluorescent micrographs of staining patterns observed upon immunofluorescenceanalysis of each indicated protein. In total, this new site houses>2893 micrographs, establishing it as the largest visual libraryof eukaryotic protein localization todate.

	Discussion

Top Abstract Introduction Results Discussion Materials and methods References

Constituting the first proteome-wide analysis of protein localization, this study is uniquely suited to address a number ofissues regarding both the methods by which such a project maybe undertaken as well as the utility and applications of the enddata. Here, we have used two common approaches by which epitope-taggedalleles may be generated on a genomic scale: directed cloningmethods and random transposon-based approaches. By comparing thelocalization data generated from each respective set of taggedalleles, we can rigorously assess the efficiency and accuracyof each approach. In particular, our results may be used to considerthe accuracy with which overexpressed proteins can be localizedas compared with the localization accuracy associated with endogenouslyexpressed proteins. The resulting localization data sets correlatestrongly with protein function, providing further means by whichproteins may be ascribed functions on a proteome-wide scale. Thisanalysis also offers specific insight into the relative distributionof functions and phenotypes associated with nuclear proteins,while providing data regarding nearly 1000 proteins of unknownfunction.

Two genomic epitope-tagging approaches: respective efficiencies

Directed cloning and random transposon-tagging each possess advantages and disadvantages as approaches for genome-wide epitope-tagging.For large-scale directed cloning, a significant investment inlabor and reagents is initially required; however, the final collectionpossesses little or no redundancy in gene representation. Furthermore,for purposes of immunolocalization, directed approaches are efficient:in this study, 93% of genes cloned into a tagging/expression vectorsubsequently yielded staining patterns above background upon immunofluorescenceanalysis. Transposon-based methods, in contrast, are economicalbut inefficient. Only 30% of transposon-tagged proteins showeda staining pattern distinct from background. In addition, owingboth to the stochastic nature of transposition and to the insertionalbiases associated with bacterial transposons (e.g., Tn3), transposon-basedmethods can prove problematic as a means of saturating a givengenome. Small genes are less likely to be mutagenized by transposonmutagenesis than are large genes. Also, insertional collectionspossess greater redundancy in gene representation than do collectionsgenerated by directed methods: the collection of HAT-tagged genesgenerated in this study shows approximately fourfold redundancyin gene representation on average (11,417 HAT-tagged alleles representing2958 different genes). As shown, however, this redundancy canbe beneficial in mapping domains within a givenprotein.

Respective accuracy of each approach

To estimate the accuracy of data generated by each tagging strategy, we have compared all protein localizations determinedexperimentally in this study with previously published localizationdata. Both approaches (i.e., mild overexpression of C-terminal,V5-tagged alleles vs. endogenously expressed random HAT-taggedalleles) yielded data sets of similar accuracy (~85%) when comparedwith published localization results. This internal comparison,however, is complicated by the fact that different proteins arerepresented in the V5- and HAT-tagged data sets, respectively.To estimate the relative accuracy of each approach more rigorously,we have limited the comparison to only those 361 proteins commonto both data sets. Of these proteins, 295 yielded identical orvery similar staining patterns upon immunofluorescence analysisregardless of the tagging approach used; the remaining 66 proteinsyielded differing results. For 29 of these proteins, no previouslypublished localization data are available. Of the remaining 37proteins analyzed, localization data derived from V5-tagged proteinsagreed more closely with published results in 20 cases; in 17cases, HAT-tagged data proved more accurate (e.g., Sik1p shownin Fig. 2E,J). Therefore, both approaches may be used to generateepitope-tagged alleles for subsequent immunolocalization withcomparable degrees of accuracy, suggesting, furthermore, thatthe effects of tag size, placement, and expression may be lesssevere than generallythought.

The yeast proteome: subcellular distribution and functional implications

Using the tagging/immunofluorescence strategies discussed above, we have determined the subcellular localization of 2744 differentyeast proteins (Table 1). As expected, large sets of proteinswere localized to the cytoplasm and nucleus (~50% and ~25% ofthe yeast proteome, respectively); however, a surprisingly largenumber of proteins showed a mixed staining pattern, localizingto more than one subcellular compartment. In total, mixed localizationpatterns were evident in 11% of all samples tested. Transcriptionfactors and cytoskeletal proteins were frequently found distributedwithin both the nucleus and cytoplasm as discussed. Also, vesicularproteins (e.g., Sec17p) often colocalized to the endoplasmic reticulumand cytoplasm. Secretory polypeptides that have been either epitope-taggedor overproduced may be processed less efficiently for export,and therefore accumulate in the endoplasmic reticulum: randomlyplaced tags may disrupt signal peptide sequence, and overexpressedproteins may saturate mechanisms responsible for membrane proteintraffic. Therefore, colocalization of secretory vesicle proteinsto the endoplasmic reticulum isexpected.

The fact that a given protein may be distributed among more than one cellular compartment is a relevant consideration in developinga computational system by which our data may be extrapolated overthe yeast proteome. For this purpose, we have used a Bayesiansystem by which the relative protein population of each yeastcellular compartment may be estimated without requiring a definitivelocalization for every constituent protein. The accuracy of thismethod is dependent largely on the availability of a large andunbiased localization data set from which a default series oflocalization probabilities can be calculated. In previous applicationsof this approach (Drawid and Gerstein 2000), an initial data setwas constructed from all known yeast protein localizations catalogedin public databases (1342 localizations in total). This data set,however, is biased toward nuclear proteins, as they have beentraditionally studied in greater detail than other protein classes.Our study provides a less biased data set: the transposon-basedapproaches used here are near-random, and the population of yeastgenes successfully cloned into pYES2 may reflect only a marginalenrichment for short ORFs that tend to be easily amplified byPCR. By merging our experimentally determined localizations withpublished yeast localization data and predicted transmembraneclassifications (Krogh et al. 2001), we define a subcellular localizationfor 3985 yeast proteins. Applying our probabilistic system tothe remaining yeast protein complement, we arrive at the proteome-wideprotein compartmentalization indicated in Figure 3. This distributionagrees well with previous theoretical estimates of protein localizationin yeast (Drawid and Gerstein 2000).

Because protein localization and function are tightly correlated (Fig. 5), our global localization analysis provides a meansby which gene function in yeast may be inferred on a genome-widescale. Extrapolating from the functional categorizations maintainedin the MIPS CYGD and our localization data, ~45% of all yeastproteins function at least partly in maintaining cytoplasmic andorganelle-specific organization and integrity. From our localizationanalysis, we estimate that the yeast proteome contains nearly800 mitochondrial proteins $---$ the majority of which function, asexpected, in processes of cellular respiration (Fig. 6).

Similar predictions can be made regarding the yeast nuclear protein complement. In this study, we have identified 457 nuclear-localizedproteins for which functional data are currently available (including165 proteins that colocalize to the nucleus and cytoplasm). Ofthese nuclear proteins, 34.8% function in transcription. Extrapolatingthis fraction to the total nuclear protein compartment (1683 proteins;Fig. 3), we estimate that nearly 10% of the yeast proteome isdedicated to processes of mRNA transcription. Consistent withthis prediction, we have found that ~38% of all nuclear proteins(or 10% of the yeast proteome) are associated with chromosomalDNA as determined by immunofluorescence analysis of tagged proteinson surface-spread meiotic chromosomes (Fig. 5). Although cautionmust be exercised in extrapolating from a limited population of56 nuclear proteins, our phenotypic studies suggest that roughly34% of all nuclear proteins (>570 proteins in total) are essentialfor spore viability. In contrast, over the genome as a whole,18% of yeast genes (~1100) are thought to be essential as indicatedfrom systematic analyses of yeast deletion mutants (Winzeler etal. 1999). Therefore, slightly more than half of all essentialgenes in yeast are likely to benuclear.

Integrating localizome data

Large-scale localization data sets provide a fundamental complement to other existing varieties of proteomic data. For example,our localization data may be used to screen sets of putative protein-proteininteractions, enriching for genuine protein associations by virtueof the expectation that two interacting proteins will share acommon cellular compartment and show similar localization patterns.At present, large catalogs of protein interactions in yeast havebeen generated through genome-wide applications of the two-hybridmethod (Uetz et al. 2000; Ito et al. 2001) and systematic, high-throughputapproaches using mass spectrometric analysis of immunoprecipitatedprotein complexes (Gavin et al. 2002; Ho et al. 2002). We havecorrelated our localization results with a sampling of interactiondata drawn from each of these studies. Of 155 randomly selectedtwo-hybrid interactions identified either by Uetz et al. (2000)or Ito et al. (2001), only 73 (47%) contain a protein pair localizedto the same cellular compartment. In contrast, however, withina set of 105 two-hybrid interactions independently identifiedby both groups, 87 protein pairs (83%) show a shared localizationpattern. Analysis of data generated by Gavin et al. (2002) andHo et al. (2002) yields similar results. Of 100 sampled proteinassociations (encompassing 10 different bait proteins), 67 interactionsconsist of two proteins from the same cellular compartment. Ofthese 100 protein associations, 23 were identified by both groups:all 23 of these protein pairs show compatible localization patterns.These correlations suggest that confidence can be placed preferentiallyin protein interactions independently identified within more thanone study, while simultaneously demonstrating the usefulness oflocalization data in distinguishing spurious protein-protein interactionsis likely to bespurious.

As illustrated by these comparisons, results from independent studies may be effectively integrated to provide more accurateand complete genomic findings. The accuracy of our own localizationresults may be improved through comparison with a set of knownand established protein-protein interactions, a corollary of theanalysis above. A more comprehensive representation of yeast proteinfunction may be achieved by integrating multiple proteomic datasets, because all such individual data sets are presently incomplete(i.e., encompass <6000 yeast proteins). Collectively, this unionof diverse proteomic and genomic approaches will prove mutuallycomplementary and necessary as a means of understanding globalprocesses of eukaryotic cellularfunction.

	Materials and methods

Top Abstract Introduction Results Discussion Materials and methods References

Epitope-tagging and immunolocalization

Yeast genes were amplified by polymerase chain reaction (PCR) and cloned into the yeast expression vector pYES2/GS by topoisomeraseI-mediated ligation as described previously (Heyman et al. 1999).Vector constructs carrying cloned yeast genes were introducedinto haploid strain YNN218 [ura3-52 lys2-801 ade2-101 his3 $Delta$ 200]by DNA transformation (Ito et al. 1983). To induce gene expression,yeast transformants were first grown to saturation in syntheticmedium lacking uracil (SC-Ura) with raffinose as its carbon source;cultures were then washed in sterile water prior to resuspensionin SC-Ura with galactose as a carbon source. Transformant cultureswere incubated in galactose for 1 h. Multiple incubation periodswere tested to determine the optimum time for galactose inductionsuch that artifacts resulting from gene overexpression are minimized.Following galactose induction, cells were prepared for immunofluorescenceanalysis in a 96-well format as described (Kumar et al. 2000b).V5-tagged proteins were immunolocalized by indirect immunofluorescenceusing anti-V5 mouse monoclonal IgG2a antibody (Invitrogen) andCy3-conjugated goat anti-mouse IgG (JacksonLabs).

Yeast genes were HAT-tagged using transposon-based methods presented previously (Ross-MacDonald et al. 1999; Kumar et al.2000b). Tagged genes were generated in a Y800 background [MATaleu2- $Delta$ 98cry1^R/MAT $alpha$ leu2- $Delta$ 98CRY1 ade2-101 HIS3/ade2-101 his3- $Delta$ 200 ura3-52 cani^R/ura3-52CAN1 lys2-801/lys2-801 CYH2/cyh2^R trp1-1/TRP1 Cir⁰] carrying pGAL-cre (amp, ori, CEN, LEU2) (Burns et al. 1994).Asynchronous cultures of HAT-tagged yeast strains were grown andprepared for immunofluorescence analysis in 96-well microtiterplates (Kumar et al. 2000b). Transposon-tagged proteins were immunolocalizedas above, except that mouse monoclonal anti-HA 16B12 (MMS101R,BAbCO) was used as the primaryantibody.

Computational methods

For purposes of this analysis, all yeast proteins were divided into four localization categories: cytoplasm (Cyt), nucleus(Nuc), mitochondria (Mit), and exocytic network (Exo; endoplasmicreticulum, Golgi apparatus, vacuoles, vesicles, peroxisome, andextracellular proteins). A different localization prediction procedurewas applied for soluble and membrane proteins of allcategories.

We identified 1029 yeast proteins as integral membrane proteins. These proteins were predicted to possess two or more transmembranehelices using TMHMM (Krogh et al. 2001); many also had a verifyingmatch to a Pfam family of known membrane proteins (L. Yang andM. Gerstein, in prep.). Of these putative membrane proteins, ~380had been localized to one of the four categories described aboveby transposon-tagging and subsequent immunolocalization as describedhere. We believe the distribution of these proteins among thefour categories to be random and accurate. Consequently, we appliedthis distribution to the ~650 membrane proteins of unknownlocalization.

The remaining 5101 proteins in the yeast genome were considered to be soluble proteins. In total, experimentally derived localizationdata are available for ~2950 of these soluble proteins both fromthis study as well as from data previously deposited in the MIPS,YPD, and SwissProt databases. This served as the training setfor our Bayesian method (Drawid and Gerstein 2000). The Bayesiansystem integrates a large number of different features relatedto yeast proteins, including sequence patterns, such as the nuclearlocalization signal or signal sequence, expression information,and many varieties of phenotypic data (e.g., viability of correspondingnull mutants). The incorporation of expression information isparticularly unique and is derived from the observation that cytoplasmicproteins possess much higher levels of expression than those inother compartments (Drawid et al. 2000).

By transposon-tagging/immunolocalization, we have defined the localization of ~2500 soluble proteins. Because we believe theseproteins represent a random sample from the yeast genome, we haveused their localization proportions as our priors (Cyt, 52%; Nuc,27%; Mit, 14%; Exo, 7%). The subcellular localization of the remaining2150 soluble proteins (for which no localization data are available)was predicted using our Bayesian method and the above prior andtraining data. We directly integrated the proportions of these2150 proteins to yield an overall prediction for protein compartmentalizationwithin the yeast proteome. We added together the number of solubleand membrane proteins to obtain the pie chart presented in Figure3A.

Immunocytology and phenotypic analysis of nuclear proteins

Meiotic chromosomes from HAT-tagged strains were surface spread and stained as described using mouse anti-HA (Covance) at1:400 dilution and DAPI (Agarwal and Roeder 2000). To examinephenotypes of strains (Y800 background) containing disruptive,full-length transposon insertions within genes encoding nuclearproteins, diploids heterozygous for the insertion were sporulated;tetrads were subsequently dissected and assayed for spore viabilityand transposon-encoded $beta$ -galactosidase activity as described previously(Burns et al. 1994).

	Acknowledgments

We thank James R. Chambers, Shannon Hattier, and Jon Rowland of Invitrogen Corporation for strain organization and DNA preparation.This work was supported by NIH Grant R01-CA77808 to M.S. A.K.is supported by a postdoctoral fellowship from the American CancerSociety.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be herebymarked "advertisement" in accordance with 18 USC section 1734solely to indicate thisfact.

	Footnotes

Received December 18, 2001; revised version accepted February 1, 2002.

⁵ Present address: Active Motif, 104 Avenue Franklin Roosevelt, Box-25, B-1330 Rixensart,Belgium.

⁶ Correspondingauthor.

E-MAIL michael.snyder@yale.edu; FAX (203) 432-6161.

Article and publication are at http://www.genesdev.org/cgi/doi/10.1101/gad.970902.

References

Top
Abstract
Introduction
Results
Discussion
Materials and methods
References

	References

Agarwal, S. and Roeder, G.S. 2000. Zip3 provides a link between recombination enzymes and synaptonemal complex proteins. Cell 102: 245-255[Medline].
Bairoch, A. and Apweiler, R. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28: 45-48[Abstract/Full Text].
Baker, R.E. and Masison, D.C. 1990. Isolation of the gene encoding the Saccharomyces cerevisiae centromere-binding protein CP1. Mol. Cell. Biol. 10: 2458-2467[Medline].
Bell, S.P., Mitchell, J., Leber, J., Kobayashi, R., and Stillman, B. 1995. The multidomain structure of Orc1p reveals similarity to regulators of DNA replication and transcriptional silencing. Cell 83: 563-568[Medline].
Burley, S.K. 2000. An overview of structural genomics. Nat. Struct. Biol. Suppl: 932-934.
Burns, N., Grimwade, B., Ross-Macdonald, P.B., Choi, E.-Y., Finberg, K., Roeder, G.S., and Snyder, M. 1994. Large-scale characterization of gene expression, protein localization and gene disruption in Saccharomyces cerevisiae. Genes & Dev. 8: 1087-1105.
Cairns, B.R., Lorch, Y., Li, Y., Zhang, M., Lacomis, L., Erdjument-Bromage, H., Tempst, P., Du, J., Laurent, B., and Kornberg, R.D. 1996. RSC, an essential, abundant chromatin-remodeling complex. Cell 87: 1249-1260[Medline].
Cho, J.H., Noda, Y., and Yoda, K. 2000. Proteins in the early Golgi compartment of Saccharomyces cerevisiae immunoisolated by Sed5p. FEBS Lett. 469: 151-154[Medline].
Costanzo, M.C., Crawford, M.E., Hirschman, J.E., Kranz, J.E., Olsen, P., Robertson, L.S., Skrzypek, M.S., Braun, B.R., Lennon-Hopkins, K., Kondu, P. et al. 2001. YPD^TM, PombePD^TM and WormPD^TM: Model organism volumes of the BioKnowledge^TM Library, an integrated resource for protein information. Nucleic Acids Res. 29: 75-79[Abstract/Full Text].
Ding, D.Q., Tomita, Y., Yamamoto, A., Chikashige, Y., Haraguchi, T., and Hiraoka, Y. 2000. Large-scale screening of intracellular protein localization in living fission yeast cells by the use of a GFP-fusion genomic DNA library. Genes Cells 5: 169-190[Medline].
Drawid, A. and Gerstein, M. 2000. A Bayesian system integrating expression data with sequence patterns for localizing proteins: Comprehensive application to the yeast genome. J. Mol. Biol. 301: 1059-1075[Medline].
Drawid, A., Jansen, R., and Gerstein, M. 2000. Genome-wide analysis relating expression level with protein subcellular localization. Trends Genet. 16: 426-430[Medline].
Gavin, A.-C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.-M., Cruciat, C.-M. et al. 2002. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415: 141-147[Medline].
Gu, J., Emerman, M., and Sandmeyer, S. 1997. Small heat shock protein suppression of Vpr-induced cytoskeletal defects in budding yeast. Mol. Cell. Biol. 17: 4033-4042[Abstract].
Gygi, S.P., Rist, B., Gerber, S.A., Tureck, F., Gelb, M.H., and Aebersold, R. 1999. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotech. 17: 994-999[Medline].
Heyman, J.A., Cornthwaite, J., Foncerrada, L., Gilmore, J.R., Gontag, E., Hartman, K.J., Hernandez, C.L., Hood, R., Hull, H.M., Lee, W.-Y. et al. 1999. Genome-scale cloning and expression of individual open reading frames using topoisomerase I-mediated ligation. Genome Res. 9: 383-392[Abstract/Full Text].
Ho, Y., Gruhler, A., Heilbut, A., Bader, G., Moore, L., Adams, S.-L., Millar, A., Taylor, P., Bennett, K., Boutilier, K. et al. 2002. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415: 180-183[Medline].
Ito, H., Fukuda, Y., Murata, K., and Kimura, A. 1983. Transformation of intact yeast cells treated with alkali cations. J. Bacteriol. 153: 163-168[Medline].
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. 2001. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. 98: 4569-4574[Abstract/Full Text].
Iyer, V.R., Horak, C.A., Scafe, C.S., Botstein, D., Snyder, M., and Brown, P.O. 2001. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409: 533-538[Medline].
Koller, A., Snyder, W.B., Faber, K.N., Wenzel, T.J., Rangell, L., Keller, G.A., and Subramani, S. 1999. Pex22p of Pichia pastoris, essential for peroxisomal matrix protein import, anchors the ubiquitin-conjugating enzyme, Pex4p, on the peroxisomal membrane. J. Cell Biol. 146: 99-112[Abstract/Full Text].
Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305: 567-580[Medline].
Kumar, A., Cheung, K.-H., Ross-Macdonald, P., Coelho, P.S.R., Miller, P., and Snyder, M. 2000a. TRIPLES: A database of gene function in Saccharomyces cerevisiae. Nucleic Acids Res. 28: 81-84[Abstract/Full Text].
Kumar, A., des Etages, S.A., Coelho, P.S.R., Roeder, G.S., and Snyder, M. 2000b. High-throughput methods for the large-scale analysis of gene function by transposon tagging. Methods Enzymol. 328: 550-574[Medline].
Li, X. and Burgers, P.M. 1994. Molecular Cloning and expression of the Saccharomyces cerevisiae RFC3 gene, an essential component of replication factor C. Proc. Natl. Acad. Sci. 91: 868-872[Abstract].
MacBeath, G. and Schreiber, S.L. 2000. Printing proteins as microarrays for high-throughput function determination. Science 289: 1760-1763[Abstract/Full Text].
Mewes, H.W., Frishman, D., Gruber, C., Geier, B., Haase, D., Kaps, A., Lemcke, K., Mannhaupt, G., Pfeiffer, F., Schuller, C. et al. 2000. MIPS: A database for genomes and protein sequences. Nucleic Acids Res. 28: 37-40[Abstract/Full Text].
Montelione, G.T. 2001. Structural genomics: An approach to the protein folding problem. Proc. Natl. Acad. Sci. 98: 13488-13489[Full Text].
Niedenthal, R.K., Riles, L., Johnston, M., and Hegemann, J.H. 1996. Green fluorescent protein as a marker for gene expression and subcellular localization in budding yeast. Yeast 12: 773-786[Medline].
O'Neill, E.M., Kaffman, A., Jolly, E.R., and O'Shea, E.K. 1996. Regulation of PHO4 nuclear localization by the PHO80-PHO85 cyclin-CDK complex. Science 271: 209-212[Abstract].
Ray, A. and Runge, K.W. 1998. The C terminus of the major yeast telomere binding protein Rap1p enhances telomere formation. Mol. Cell. Biol. 18: 1284-1295[Abstract/Full Text].
Ren, B., Robert, F., Wyrick, J.J., Aparicio, O., Jennings, E.G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E. et al. 2000. Genome-wide location and function of DNA binding proteins. Science 290: 2306-2309[Abstract/Full Text].
Ross-Macdonald, P., Sheehan, A., Roeder, G.S., and Snyder, M. 1997. A multipurpose transposon system for analyzing protein production, localization, and function in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. 94: 190-195[Abstract/Full Text].
Ross-MacDonald, P., Coelho, P.S.R., Roemer, T., Agarwal, S., Kumar, A., Jansen, R., Cheung, K.-H., Sheehan, A., Symoniatis, D., Umansky, L. et al. 1999. Large-scale analysis of the yeast genome by transposon tagging and gene disruption. Nature 402: 413-418[Medline].
Seifert, H.S., Chen, E.Y., So, M., and Heffron, F. 1986. Shuttle mutagenesis: A method of transposon mutagenesis for Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. 83: 735-739[Medline].
Silver, P.A. 1991. How proteins enter the nucleus. Cell 64: 489-497[Medline].
Simpson, J.C., Wellenreuther, R., Poustka, A., Pepperkok, R., and Wiemann, S. 2000. Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing. EMBO Reports 1: 287-292[Abstract/Full Text].
Tong, A.H.Y., Drees, B., Nardelli, G., Bader, G.D., Brannetti, B., Castagnoli, L., Evangelista, M., Ferracuti, S., Nelson, B., Paoluzi, S. et al. 2002. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295: 321-324[Abstract/Full Text].
Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P. et al. 2000. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403: 623-627[Medline].
van Steensel, B., Delrow, J., and Henikoff, S. 2001. Chromatin profiling using targeted DNA adenine methyltransferase. Nat. Genet. 27: 304-308[Medline].
Vogel, J., Drapkin, B., Oomen, J., Beach, D., Bloom, K., and Snyder, M. 2001. Phosphorylation of $gamma$ -tubulin regulates microtubule organization in budding yeast. Dev. Cell 1: 621-631.
Washburn, M.P., Wolters, D., and Yates, J.R., III 2001. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19: 242-247[Medline].
Waters, M.G., Griff, I.C., and Rothman, J.E. 1991. Proteins involved in vesicular transport and membrane fusion. Curr. Opin. Cell. Biol. 3: 615-620[Medline].
Winzeler, E.A., Shoemaker, D.D., Astromoff, A., Liang, H., Anderson, K., Andre, B., Bangham, R., Benito, R., Boeke, J.D., Bussey, H. et al. 1999. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285: 901-906[Abstract/Full Text].
Zhu, H., Klemic, J.F., Chang, S., Bertone, P., Casamayor, A., Klemic, K.G., Smith, D., Gerstein, M., Reed, M.A., and Snyder, M. 2000. Analysis of yeast protein kinases using protein chips. Nat. Genet. 26: 283-289[Medline].
Zhu, H., Bilgin, M., Bangham, R., Hall, D., Casamayor, A., Bertone, P., Lan, N., Jansen, R., Bidlingmaier, S., Houfek, T. et al. 2001. Global analysis of protein activities using proteome chips. Science 293: 2101-2105[Abstract/Full Text].

Abstract of this Article

Reprint (PDF) Version of this Article

RESEARCH PAPER Subcellular localization of the yeast proteome

RESEARCH PAPER
Subcellular localization of the yeast proteome