25 November 1999 |
Nature 402, 413 - 418 (1999) © Macmillan Publishers Ltd. |
|
PETRA ROSS-MACDONALD*, PAULO S. R. COELHO*, TERRY ROEMER*, SEEMA AGARWAL*, ANUJ KUMAR*, RONALD JANSEN, KEI-HOI CHEUNG, AMY SHEEHAN*, DAWN SYMONIATIS*, LARA UMANSKY*, MATTHEW HEIDTMAN*, F. KENNETH NELSON*, HIROSHI IWASAKI*, KARL HAGER§, MARK GERSTEIN, PERRY MILLER, G. SHIRLEEN ROEDER* & MICHAEL SNYDER*
Economical methods by which gene function may be analysed on a genomic scale are relatively scarce. To fill this need, we have developed a transposon-tagging strategy for the genome-wide analysis of disruption phenotypes, gene expression and protein localization, and have applied this method to the large-scale analysis of gene function in the budding yeast Saccharomyces cerevisiae. Here we present the largest collection of defined yeast mutants ever generated within a single genetic background--a collection of over 11,000 strains, each carrying a transposon inserted within a region of the genome expressed during vegetative growth and/or sporulation. These insertions affect nearly 2,000 annotated genes, representing about one-third of the 6,200 predicted genes in the yeast genome1,2. We have used this collection to determine disruption phenotypes for nearly 8,000 strains using 20 different growth conditions; the resulting data sets were clustered to identify groups of functionally related genes. We have also identified over 300 previously non-annotated open reading frames and analysed by indirect immunofluorescence over 1,300 transposon-tagged proteins. In total, our study encompasses over 260,000 data points, constituting the largest functional analysis of the yeast genome ever undertaken.
The ability to sequence entire genomes has resulted in an abundance of raw sequence data3; however, relatively few methods exist to assess gene function on a genomic scale4,5,6,7,8. We have developed a transposon-based method for the large-scale accumulation of expression, phenotypic and protein localization data in yeast without bias towards previously annotated genes. Our approach utilizes a multipurpose minitransposon (mTn) derived from the bacterial transposable element Tn3 (ref. 9). The minitransposon mTn-3xHA/lacZ (Fig. 1) contains a lacZ reporter gene lacking an initiator methionine and upstream promoter sequence. Introduction of this transposon into yeast results in production of -galactosidase (-gal) if the mTn is present within a transcribed and translated region of the genome, typically corresponding to an in-frame fusion of lacZ to the yeast protein-coding sequence. Additionally, mTn-3xHA/lacZ contains a lox site near each Tn3 end; adjacent to one lox site is DNA encoding three copies of a haemagglutinin (3xHA) epitope tag. Production of the Cre recombinase in yeast containing this minitransposon results in recombination of the lox sites, reducing the mTn to a 274 base pair (bp) element (the HAT tag)9. As five bases of genomic DNA are duplicated during Tn3 insertion, the net result is a 279-bp insertion encoding a 93-codon open reading frame (ORF) which includes the 3xHA sequence. When the mTn's lacZ reporter has been fused in-frame to a yeast coding region, creation of the HAT tag allows production of a full-length, epitope-tagged protein from that gene.
Figure 1 The mTn insertion project.
Full legend
High resolution image and legend (40k)
Transformants carrying the mTn-borne URA3 marker were selected and assayed for -gal activity after growth on rich medium and sporulation medium10. Strains were also treated to induce the Cre recombinase. Following this Cre-mediated excision event, we used Ura- colonies for immunolocalization of epitope-tagged proteins with anti-HA antibodies. The bacterial clone that gave rise to each yeast strain in the collection was also stored. Plasmid DNA from these bacteria allowed transformation of insertion alleles into a haploid strain for analysis of disruption phenotypes.
Using this procedure, we have carried out 92,544 plasmid preparations and yeast transformations, allowing us to identify 11,232 strains containing lacZ fusions expressed during vegetative growth; the precise site of mTn insertion within the yeast genome has been determined by DNA sequence analysis for 6,358 strains. As expected, most of the strains within our collection contain transposon insertions affecting annotated ORFs. Of these 6,358 insertion alleles, 5,442 lie in or within 200 bp of an annotated ORF. In total, these insertions affect 1,917 ORFs distributed over all 16 chromosomes of the yeast genome (Fig. 2 (PDF 63k)).
Figure 2 Distribution and phenotypic analysis of 1,917 mTn-mutagenized ORFs within the S. cerevisiae genome.
Full legend High resolution image and legend (33k) |
In analysing mTn insertion events, we have identified a large number of previously non-annotated ORFs (NORFs) within the yeast genome. When annotating the S. cerevisiae genome, an arbitrary lower size limit of 100 amino acids was adopted as convention unless additional information suggested otherwise. Also, ORFs of more than 100 codons that fully or partially overlapped even longer ORFs were usually not annotated4,11. Such NORFs are, however, capable of being translated, as indicated in previous studies4,8. Using an arbitrary lower limit of 50 codons, we find that 328 of 6,358 mTn insertions represent in-frame fusions to NORFs. These NORFs range in size up to 247 codons. A detailed analysis of 229 such insertions revealed three classes: 52% are in NORFs that fully or partially overlap an annotated ORF in the antisense direction; 15% fall in NORFs that overlap an annotated ORF in the same orientation but in a different frame; and 33% of these insertions occurred in NORFs previously classified as intergenic regions. Many NORFs were identified by multiple mTn insertion events. For example, a 124-codon NORF in the antisense region of the rDNA has more than 50 insertions. In addition, we have observed mutant phenotypes resulting from transposon insertions within putative NORFs: for example, mTn insertion at base number 198,816 of chromosome XII (identifying a putative intergenic NORF of 86 amino acids) results in hypersensitivity to the microtubule-depolymerization drug benomyl. Furthermore, three NORFs encode proteins exhibiting a distinct pattern of subcellular localization. Transposon insertion at base number 1,236,754 of chromosome IV identifies a NORF of 97 amino acids; using the mTn-encoded epitope tag, we have localized the encoded protein to the nucleus. These findings substantiate the biological importance of NORFs, raising the possibility that they encode previously uncharacterized proteins. Complete data sets for all strains carrying mTn insertions identifying putative NORFs may be accessed from our web site at http://ycmi.med.yale.edu/ygac/triples.htm.
To analyse phenotypes on a genomic scale, we transformed 7,680 mTn-insertion alleles into a haploid strain. Most transformants were viable. In the 14.1% (1,082) that were inviable, the transposon typically affected a gene known to be required for cell viability. Gene functions associated with these essential genes may also be analysed using our multipurpose minitransposon. By cre-lox-mediated reduction of mTn-3xHA/lacZ, we have generated viable mutants containing in-frame insertions of HAT-tags within essential genes. We estimate that 30-45% of small insertion alleles in essential genes retain gene function.
Transformed haploid strains were scored for 20 phenotypes after growth under the test conditions indicated in Table 1 and Fig. 2 (PDF 63k)). Transformants were screened on a large scale using 'phenotypic macroarrays' compatible with 96-well formats. Yeast strains grown in 96-well arrays were transferred to test medium using a 96-pin tool; 576 strains were simultaneously screened for a given phenotype (Fig. 3).
Figure 3 Phenotypic macroarray analysis.
Full legend High resolution image and legend (220k) |
Interestingly, the exact site of transposon insertion within an ORF determines its disruption phenotype. As a result, our transposon can be used effectively to map functional domains within a given protein. For example, we have identified 17 transformants containing mTn insertions at 11 sites within IMP2', a nuclear gene encoding a transcription factor involved in sugar utilization18. Tranposon insertions within the amino terminus and central region of Imp2 (at amino acids 46, 69, 230 and 270) result in mutants unable to grow on glycerol, a non-fermentable carbon source. However, strains carrying an insertion at codon 334 (12 codons upstream of the translational stop codon) still grow normally on YPGlycerol. This same insertion, though, results in a defect in cell-wall synthesis (Fig. 3b) shared by strains containing insertions at residues 46, 230, 263 and 319. Therefore, nearly the entire length of Imp2 appears to be involved in cell-wall biogenesis17, although its carboxy terminus is not required for the control of sugar utilization. These findings illustrate the effectiveness of our collection in assigning specific functions to individual protein domains.
Many insertional mutants exhibit phenotypes under a specific subset of test conditions: these mutants appear to 'cluster' into discrete classes. To consider these clusters more rigorously, we have developed and implemented an algorithm by which haploid transformants may be grouped by common disruption phenotypes (see Methods). Based upon the tested phenotypes listed in Table 1, our collection of viable haploid transformants may be subdivided into 21 clusters: one cluster for each indicated growth condition as well as an additional cluster of mutants exhibiting wild-type phenotypes. Examples of these clusters are shown in Fig. 4. Transformants within a single cluster typically share a common mutant phenotype (Fig. 4) and can be subdivided further into smaller clusters. Interpreted judiciously, these phenotypic clusters provide a means of predicting cellular functions associated with a given ORF. For example, cluster 'YPG' contains transformants characterized most prominently by an inability to grow on YPGlycerol. This cluster is highly enriched in transformants carrying a transposon insertion within genes involved in cellular respiration (for example, IFM1, IMP2', SHY1 and NDI1). A number of strains within this cluster carry transposon insertions in previously uncharacterized genes (for example, YLR294C, YJL131C and YGR101W); from our clustering analysis, we can reasonably infer that these genes probably encode proteins involved in cellular respiration as well. Similar functional inferences may be drawn from each remaining data cluster (accessible in full at http://bioinfo.mbb.yale.edu/genome/phenotypes).
Figure 4 Graphical representation of clustered phenotypic data.
Full legend High resolution image and legend (284k) |
Further indication of cellular function may be inferred from the subcellular localization of a given protein. To determine these localization patterns, we have examined 1,340 diploid strains carrying in-frame HAT tag insertions by indirect immunofluorescence with antibodies directed against the HA-epitope. Localization of the HAT-tagged protein to a discrete cellular site was observed and confirmed for 201 strains; 214 additional strains showed cytoplasmic localization. A variety of localization patterns was detected within these strains: tagged proteins were localized to the nucleus, nucleolus, mitochondria, plasma membrane, cell neck and spindle pole body (Table 2 and Fig. 5). In many cases, strains that show a distinct localization pattern contain a HAT tag in a known protein whose localization had previously been determined (for example, Top1-HAT19 and Rap1-HAT20 localize to the nucleus). In other cases, however, our data constitute information concerning proteins that have not been studied or localized previously (such as the YHR196W gene product--a protein of unknown function--which localizes to the nucleolus).
Figure 5 Immunolocalization of epitope-tagged proteins.
Full legend High resolution image and legend (506k) |
In ref. 21, a DNA microarray representing all annotated ORFs in the yeast genome was screened for changes in gene expression during sporulation. Of the 31 meiotic genes we identified by in-frame lacZ fusions, only 17 were also found to be induced at least two-fold by microarray screening. For the remainder, microarray analysis indicated little or no induction during meiosis. One of the genes detected in our analysis but not by microarray screening is MER1, which has been shown to be strongly induced in meiosis both by northern blot hybridization and analysis of a Mer1--gal translational fusion gene22. This observation indicates that the -gal assay may serve as a more sensitive indicator of meiotic gene expression and/or detect differences in protein levels more readily. In any case, our transposon-based approach is an effective means of identifying novel sporulation-induced genes; one gene identified in our screen is YIL073C, which we found to be important for spore viability and synaptonemal complex formation.
Our transposon insertion strategy allows the rapid generation of reporter gene fusions, epitope-tagging constructs and insertion alleles (both large and small), all from a single mutagenic event. As we do not bias this event towards a population of previously annotated genes, our approach effectively identifies new genes. This random approach, however, presents an obstacle in achieving saturation mutagenesis: small genes are less likely to be mutagenized by random mTn insertion than are large genes. Using our current approach, an additional 30,000 mTn insertions in yeast ORFs would be required to mutagenize 90% of the yeast genome. Once complete, however, we see our collection ushering in the 'new yeast genetics'--a new paradigm by which yeast genetic screens may be performed. Using a collection such as ours, researchers will be able to screen large numbers of mutant yeast strains for a given phenotype (by macroarray analysis or other methods) and instantly know the gene affected in a mutant of interest. Furthermore, the availability of alleles as a plasmid collection will allow introduction into any desired yeast strain background before screening. We have already distributed over 700 insertion alleles to the research community. These alleles will expedite gene analysis.
Methods
Generation and analysis of yeast transformants We used transposon mutagenesis in E. coli to generate >106 independent transformants; individual transformant colonies were selected and stored in 96-well plates10. Plasmids were prepared from these strains by alkaline lysis23, digested with NotI and transformed into diploid strain Y800 (ref. 8). Following transformation, yeast cells were cultured as described10; after incubation under appropriate growth conditions, cells were permeabilized and assayed for -gal activity10. Transformants displaying activity were arrayed in a 96-well format; each strain has a unique coordinate in this collection. The strains were subsequently re-tested for -gal activity after vegetative growth and sporulation. The results of this retest are accessible at http://ycmi.med.yale.edu/ygac/triples.htm.
To determine the genomic site of mTn insertion within these clones, plasmid-borne insertion alleles were sequenced using a primer complementary to bases 74-97 of mTn-3xHA/lacZ (GenBank accession number U54828). The resulting sequence data were searched against the S. cerevisiae genome using BLAST24.
Phenotypic macroarray analysis DNA obtained from the insertion allele plasmid collection was digested with NotI and transformed into haploid strain Y2279 [MATa ura3-52 trp11 ade2-101 lys2-801 leu298]. Transformants were selected on SC-ura. Because phenotypes are often associated with mutations unlinked to the transposon insertion (M.S., unpublished data), we usually analysed two independent transformants. We scored yeast strains as exhibiting a defect only if both transformants displayed the defect or if two independent insertion alleles in the same gene produced the same defect. In many cases, haploid transformants were not obtained. We presume that in these cases either the affected gene was essential, or there was little transforming DNA. Transformants were grown in SC-ura medium overnight at 30 °C. A few microlitres of cell suspension were then transferred to selective growth medium using a 96-pin device for phenotypic macroarray analysis.
Cluster analysis A combination of k-means clustering25 and a heuristic algorithm was used to cluster the phenotypic macroarray data. The data were arranged into a 7,847 by 20 matrix X, with each row representing one transformant and each column representing one growth condition. The qualitative phenotype observations 'wild-type', 'weak', 'medium' and 'strong' were mapped onto the numerical scores 1, 3, 9 and 27, which reflect the different weights of the observations. The matrix rows were clustered based on the Euclidean distance between them, which is defined as
The tree describing the relationship between growth conditions was calculated with a program in the PHYLIP package26 that performs the Fitch-Margoliash least-squares algorithm27 on a 20 by 20 matrix describing the Euclidean distance between the columns of X. The Euclidean distance between columns is defined as in the formula above with
Production and immunolocalization of HAT-tagged strains HAT tags were generated in vivo by cre-lox site-specific recombination10. HAT-tagged strains were analysed by indirect immunofluorescence as described8,10, except that the primary antibody was mouse monoclonal anti-HA 16B12 (MMS101R, BAbCo) and the secondary antibody was Cy3-conjugated goat anti-mouse IgG (Jackson Labs).
Received 2 August;accepted 7 October 1999.
References
Acknowledgements. P.S.R.C. is supported by the Fundacao de Amparo a Pesquisa do Estado de Sao Paulo, Brazil. This work was supported by an NIH grant (to G.S.R. and M.S.).
Correspondence and requests for materials should be addressed to M.S. (e-mail: michael.snyder@yale.edu).