  
            
            A Bayesian Networks Approach for Predicting Protein-Protein 
            Interactions from Genomic Data Ronald 
            Jansen,1* 
            Haiyuan Yu,1 Dov 
            Greenbaum,1 Yuval Kluger,1 
            Nevan J. Krogan,4 Sambath 
            Chung,1,2 Andrew 
            Emili,4 Michael Snyder,2 
            Jack F. Greenblatt,4 Mark 
            Gerstein1,3  
            
             We have developed an approach using Bayesian networks to 
            predict protein-protein interactions genome-wide in 
            yeast. Our method naturally weights and combines into 
            reliable predictions genomic features only weakly 
            associated with interaction (e.g., messenger 
            RNAcoexpression, coessentiality, and colocalization). In 
            addition to de novo predictions, it can integrate often 
            noisy, experimental interaction data sets. We observe 
            that at given levels of sensitivity, our predictions are 
            more accurate than the existing high-throughput 
            experimental data sets. We validate our predictions with 
            TAP (tandem affinity purification) tagging experiments. 
            Our analysis, which gives a comprehensive view of yeast 
            interactions, is available at genecensus.org/intint. 
            
             1 Department of Molecular Biophysics and Biochemistry, 
            Yale University, 266 Whitney Avenue, Post Office Box 208114, New 
            Haven, CT 06520, USA. 2 Department of Molecular, 
            Cellular and Developmental Biology, Yale University, 266 Whitney 
            Avenue, Post Office Box 208114, New Haven, CT 06520, 
            USA. 3 Department of Computer Science, Yale 
            University, 266 Whitney Avenue, Post Office Box 208114, New Haven, 
            CT 06520, USA. 4 Banting and Best Department of 
            Medical Research, Department of Molecular and Medical Research, 
            University of Toronto, Toronto, M5G 1L6, Ontario, Canada. 
              
            
 * Present address: 
            Computational Biology Center, Memorial Sloan-Kettering 
            Cancer Center, 307 West 63rd Street, New York, NY 10021, 
            USA.  
               To whom correspondence should be addressed. E-mail: 
            mark.gerstein@yale.edu 
            
 
              
            Many fundamental biological processes involve protein-protein 
            interactions, and comprehensively identifying them is 
            important to systematically defining their cellular role. 
            New experimental and computational methods have vastly 
            increased the number of known or putative interactions, 
            cataloged in databases (1–7). 
            Much genomic information also relates to interactions 
            indirectly: Interacting proteins are often significantly 
            coexpressed (as shown by microarrays) and colocalized to 
            the same subcellular compartment (8, 
            9). 
            
            Unfortunately, interaction data sets are often incomplete 
            and contradictory (10–12). 
            In the context of genome-wide analyses, these 
            inaccuracies are greatly magnified because the protein 
            pairs that do not interact (negatives) far outnumber those 
            that do (positives). For instance, in yeast, the  6000 proteins allow 
            for  18 million 
            potential interactions, but the estimated number of 
            actual interactions is <100,000 (10, 
            13, 
            14). 
            Thus, even reliable techniques can generate many false 
            positives when applied genome-wide. This is similar to a 
            diagnostic with a 1% false-positive rate for a rare 
            disease occurring in 0.1% of the population, which would 
            roughly produce one true positive for every 10 false 
            ones. Further information is necessary. 
             Consequently, when evaluating protein-protein interactions, 
            one needs to integrate evidence from many different 
            sources (15–17). 
            Here, we propose a Bayesian approach for integrating 
            interaction information that allows for the probabilistic 
            combination of multiple data sets and demonstrate its 
            application to yeast (18). 
            Our approach can be used for combining noisy interaction 
            data sets and for predicting interactions de novo, from 
            other genomic information. The basic idea is to assess 
            each source of evidence for interactions by comparing it 
            against samples of known positives and negatives 
            ("gold-standards"), yielding a statistical reliability. 
            Then, extrapolating genome-wide, we predict the chance of 
            possible interactions for every protein pair by combining 
            each independent evidence source according to its 
            reliability. We verified our predictions by comparing 
            them against existing experimental interaction data (not 
            in the gold-standard) as well as new TAP (tandem affinity 
            purification) tagging experiments. 
             Among the many possible machine-learning approaches that 
            could be applied to predicting interactions (ranging from 
            simple unions and intersections of data sets to neural 
            networks, decision trees, and support-vector machines), 
            Bayesian networks have several advantages (19): 
            They allow for combining highly dissimilar types of data 
            (i.e., numerical and categorical), converting them to a 
            common probabilistic framework, without unnecessary 
            simplification; they readily accommodate missing data; and 
            they naturally weight each information source according 
            to its reliability. In contrast to "black-box" 
            predictors, Bayesian networks are readily interpretable 
            as they represent conditional probability relationships 
            among information sources. 
             The gold-standard data set on which we train 
            ("parameterize") the Bayesian network should ideally be 
            (i) independent from the data sources serving as 
            evidence, (ii) sufficiently large for reliable 
            statistics, and (iii) free of systematic bias. We used 
            the MIPS (Munich Information Center for Protein Sequences) 
            complexes catalog as the gold-standard for positives (6). 
            This hand-curated list of proteincomplexes is based on 
            the literature [8250 pairs in our filtered version (19)]. 
            A negatives gold-standard is harder to define, but 
            essential for successful training. Thus, we synthesized 
            negatives from lists of proteins in separate subcellular 
            compartments (9). 
            Our positive and negative gold-standards satisfy the 
            first two criteria and provide a good practical solution 
            for the third. Hence, our goal, precisely defined, was to 
            predict whether two proteins are in the same complex, not 
            whether they necessarily had direct physical contact. 
             As a measure of reliability, the overlap of information 
            sources (i.e., "interaction data sets," which could 
            either be noisy experimental data or sets of genomic 
            features) with the gold-standards can be expressed in 
            terms of a "likelihood ratio." For example, consider a 
            genomic feature f expressed in binary terms (i.e., 
            "present" or "absent"). The likelihood ratio 
            L(f) is then defined as the fraction of 
            gold-standard positives having feature f divided 
            by the fraction of negatives having f. For two features 
            f1 and f2 with 
            uncorrelated evidence, the likelihood ratio of the 
            combined evidence is simply the product 
            L(f1, f2) = 
            L(f1)L(f2). 
            For correlated evidence, L(f1, 
            f2) cannot be factorized in this way. 
            Bayesian networks are a formal representation of such 
            relationships between features. The combined likelihood 
            ratio is proportional to the estimated odds that two 
            proteins are in the same complex, given multiple sources 
            of information. 
             We predict a protein pair as positive if its combined 
            likelihood ratio exceeds a particular cutoff (L 
            > Lcut) (negative otherwise). To get 
            an overall assessment of how the prediction performs, we 
            segmented the gold-standard into separate training and testing 
            sets (using a sevenfold cross-validation protocol). Then 
            we evaluated the number of true- (TP) and 
            false-positive (FP) predictions in the testing 
            set. Finally, we applied the Bayesian network beyond the 
            testing set, computing likelihood ratios for all possible 
            protein pairs in the genome. 
             Figure 
            1 schematically shows the information sources and results 
            of our calculations. We term the results "probabilistic 
            interactomes" (PIs), in which each protein pair is 
            associated with a probability measure for being in the 
            same complex (i.e., likelihood ratio L). Our 
            procedure not only allows combining existing experimental 
            interaction data sets (resulting in a PI-experimental or 
            "PIE"), but also the de novo prediction of protein 
            complexes from genomic data sets (when the input data are 
            not interaction data sets per se, resulting in a 
            PI-predicted or "PIP"). 
             
              
            
            
              
              
                  Fig. 1. The information sources 
                  integrated in our analysis and their comparison with each 
                  other. (A) The three different types of data used: (i) 
                  Interaction data from high-throughput experiments. These 
                  comprise large-scale two-hybrid screens (Y2H) (1, 
                  2) 
                  and in vivo pull-down experiments (3, 
                  4). 
                  (ii) Other genomic features. We considered expression data, 
                  biological function of proteins (from Gene Ontology biological 
                  process and the MIPS functional catalog), and data about 
                  whether proteins are essential (6, 
                  19–22). 
                  (iii) Gold-standards of known interactions and noninteracting 
                  protein pairs. (The MIPS functional catalog differs from the 
                  MIPS complexes catalog used for the gold-standard.) (B) 
                  Combination of data sets into probabilistic interactomes. 
                  (C) Comparison of the probabilistic interactomes with 
                  the gold-standards and our new experimental data. Numbers next 
                  to the arrows indicate which figures refer to these various 
                  comparisons. [View 
                  Larger Version of this Image (25K GIF file)] 
               |   
             
            
            We combined four interaction data sets from high-throughput 
            experiments into the PIE (1–4) 
            (Fig. 
            1B). The PIE represents a transformation of the 
            individual binary-valued interaction sets into a data set 
            where every protein pair is weighted according to the 
            likelihood that it exists within a complex. 
             We computed the PIP from several genomic data sources: the 
            correlation of mRNA amounts in two expression data sets 
            (one with temporal profiles during the cell cycle, one of 
            expression levels under 300 cellular conditions), two 
            sets of information on biological function, and 
            information about whether proteins are essential for 
            survival (6, 
            20–22). 
            Although none of these information sources are 
            interaction data per se, they contain information weakly 
            associated with interaction: Two subunits of the same 
            protein complex often have coregulated mRNA expression and 
            similar biological functions and are more likely to be 
            both essential or nonessential (8). 
            
             For computing the PIE and the PIP, we used two different 
            types of Bayesian networks: a "naïve" network for the PIP 
            and a fully connected one for the PIE (19). 
            The naïve network is simpler to compute but requires 
            information sources with essentially uncorrelated 
            evidence. In contrast, the fully connected Bayesian 
            network accommodates correlated evidence, which is the 
            case for the four experimental interaction data sets. 
             Finally, we combined the PIP, PIE, and gold-standard into a 
            total PI (PIT), which represents our most comprehensive 
            view of the known and putative protein complexes in yeast 
            (23). 
            Because the PIP and PIE data provide essentially 
            uncorrelated evidence for protein-protein interactions, 
            we chose a naïve network to construct the PIT. 
             Figure 
            1C gives an overview of how we compared the PIP, PIE, 
            gold-standard, and our new experiments. In particular, Fig. 
            2 shows the performance of the integration resulting 
            in the PIP and PIE. When tested against the 
            gold-standard, we observed that the ratio of true to 
            false positives (TP/FP) increases 
            monotonically with Lcut, confirming L 
            as an appropriate measure of the odds of a real 
            interaction. Conservatively estimated, protein pairs with 
            L > 600 have a better than 50% chance of being 
            in the same complex, suggesting Lcut = 600 as a 
            useful threshold (19). 
            Unless otherwise noted, we use this throughout our 
            analysis. It gives 9897 predicted interactions from the 
            PIP and 163 from the PIE. In contrast, likelihood ratios 
            derived from single genomic features (e.g., mRNA 
            coexpression) or from individual interaction experiments 
            (e.g., the Ho data set) did not exceed the cutoff when 
            used alone, with TP/FP values far below 1. 
            This demonstrates that information sources that, taken 
            alone, are only weak predictors of interactions can yield 
            reliable predictions when combined. 
             
              
            
            
              
              
                  Fig. 2. Comparison of PIP and 
                  PIE with each other and with the individual information 
                  sources. (A) The TP/FP ratio as a 
                  function of Lcut for the PIP and the 
                  individual data from which it was computed. The ratio is 
                  computed as follows:  
                  
                   
                    
                    
                        |   
                  where pos(L) and neg(L) are the number of 
                  positives and negatives in the gold-standard with a given 
                  likelihood ratio L. The vertical line indicates our 
                  standard threshold Lcut = 600. (B) 
                  The same plot as in (A), but for the PIE. (C) 
                  Comparison of TP/FP ratios between the PIP and 
                  PIE. The abscissa represents the sensitivity of the 
                  probabilistic interactomes. The gray area indicates the gain 
                  of sensitivity of the PIP over the PIE for equal 
                  TP/FP ratios. The arrow shows the difference in 
                  sensitivity at TP/FP = 0.3. At this level, the 
                  PIP contains 183,295 protein pairs, of which 6179 are 
                  gold-standard positives (75% sensitivity), whereas the PIE 
                  contains 31,511 protein pairs and 1758 gold-standard positives 
                  among these (21% sensitivity). This difference in sensitivity 
                  between PIE and PIP illustrates the value of the de novo 
                  prediction. It also reflects, to some degree, that the 
                  experiments were done only on subsets of the genome and may 
                  have been measuring different types of interactions than the 
                  complexes' gold-standard, which we used to parameterize the 
                  PIP. The white circles show the performance of a voting 
                  procedure in which each of the four genomic features (from 
                  which we computed the PIP) contributed an additive vote. There 
                  are four possible outcomes in the additive voting procedure, 
                  depending on how many data sets contribute a positive vote (19). 
                  [View 
                  Larger Version of this Image (21K GIF file)] 
                |   
             
            
            The PIP had a higher sensitivity than the PIE for comparable 
            TP/FP ratios (Fig. 
            2C). ("Sensitivity" measures coverage and is defined 
            as TP/P, where P is the number of gold-standard 
            positives.) Specifically, the sensitivity of the PIP is 
             27% at our 
            cutoff. This may seem low, but compares favorably with 
            the PIE, which had a sensitivity of less than 1%. This 
            means that we can predict, at comparable error levels, 
            more complex interactions de novo than are present in the 
            high-throughput experimental interaction data sets. 
            
             One might ask whether simpler voting procedures can match 
            the performance of more complicated machine-learning 
            methods such as Bayesian networks. To test this 
            hypothesis, we compared the PIP with a voting procedure 
            where each of the four genomic features contributes an 
            additive vote toward positive classification. We found 
            that the Bayesian network achieved greater sensitivity 
            for comparable TP/FP ratios (Fig. 
            2C) (19). 
            
             Figure 
            3 shows parts of the PIP and PIE graphs and how these 
            compare with the gold-standard and our new experiments. 
            First, to test whether the thresholded PIP was biased 
            toward certain complexes, we looked at the distribution 
            of predictions among gold-standard positives (Fig. 
            3A); they were roughly equally apportioned among the 
            different complexes, suggesting a lack of bias. 
            
             
              
            
            
              
              
                  Fig. 3. Representations of the 
                  thresholded PIP (de novo prediction) compared with different 
                  data sets. (A) The complete set of gold-standard 
                  positives and their overlap with the PIP. The PIP (green) 
                  covers 27% of the gold-standard positives (yellow). (B) 
                  A graph of the largest complexes in the PIP, i.e., only those 
                  proteins in the thresholded PIP having  20 links. (Left) 
                  Overlapping gold-standard positives are shown in green, PIE 
                  links in blue, and overlaps with both the PIE and 
                  gold-standard positives in black. (Right) Overlapping 
                  gold-standard negatives are shown in red. Regions with many 
                  red links indicate potential false-positive predictions. 
                  (C) Three PIP complexes that we partially verified by 
                  TAP-tagging. Each complex contains the proteins linked to a 
                  central protein (gray) after thresholding the PIP at 
                  Lcut = 300. Interactions verified by our 
                  TAP-tagging are shown in dark blue and PIE links in light 
                  blue; gray links indicate where TAP-tagging overlapped with 
                  PIE links. [View 
                  Larger Version of this Image (36K GIF file)] 
               |   
             
            
            We have thus far treated all interactions as independent. 
            However, the joint distribution of interactions in the 
            PIs can help identify large complexes: An ideal complex 
            should be a "clique" in an interaction graph (i.e., a 
            subgraph with N(N – 1)/2 links between 
            N proteins). Although this rarely happens in practice, 
            because of incorrect or missing links, large complexes 
            tend to have many interconnections within them, whereas 
            false-positive links to outside proteins tend to occur 
            randomly, without a coherent pattern (Fig. 
            4). 
             
              
            
            
              
              
                  Fig. 4. TP/FP for 
                  subsets of the thresholded PIP that only include proteins with 
                  a minimum number of links. Requiring a minimum number of links 
                  isolates large complexes in the thresholded PIP graph (Fig. 
                  3B). Increasing the minimum number of links raises 
                  TP/FP by preserving the interactions among 
                  proteins in large complexes, while filtering out 
                  false-positive interactions with heterogeneous groups of 
                  proteins outside the complexes. [View 
                  Larger Version of this Image (8K GIF file)] 
               |   
             
            
            Figure 
            3B shows parts of the thresholded PIP that are restricted 
            to proteins with  20 links (23), 
            highlighting large complexes. Some predicted complexes 
            overlap with the gold-standard positives (cytoplasmic 
            ribosome) or the PIE (exosome, RNA polymerase I, 
            26S proteasome). Comparison with the gold-standard 
            negatives showed where the PIP likely produced false 
            complexes. Many protein associations only appear in the 
            PIP and thus potentially represent new interactions and 
            complexes. An interesting example is the mitochondrial 
            ribosome; it has appreciable overlap with both 
            gold-standard positives and the PIE and contains 
            plausible, newly predicted interactions with three 
            proteins (19). 
            
             To further test the predictions in the PIP, we conducted 
            TAP-tagging experiments, in which a protein expressed at 
            its normal intracellular concentration ("bait") is tagged 
            and used to "pull down" endogenous protein complexes. We 
            picked 98 proteins as TAP-tagging baits. These produced 
            424 experimental interactions overlapping with the PIP 
            thresholded at Lcut = 300. (Of these, 185, in 
            turn, overlapped with gold-standard positives, and 16 
            with negatives, highlighting the reliability of our 
            experiments.) 
             Figure 
            3C shows three examples of the overlap between the PIP 
            and TAP-tagging. We predicted that the putative DEAD-box 
            RNA helicase Dbp3 interacts with three other RNA 
            helicases (Hca4, Mak5, and Dbp7), with proteins 
            implicated in ribosomal RNA (rRNA) metabolism (e.g., 
            Nop2, Rrp5, Mak5, and components of RNA polymerase I), 
            and with Nsr1, the yeast homolog of mammalian Nucleolin 
            and a GAR domain–containing protein (24). 
            When Dbp3 was TAP-tagged and purified, we found 
            previously unknown interactions with Nsr1, Hca4, and 
            Nop1, connecting Dbp3 with known rRNA-processing 
            proteins. Further purifications with TAP-tagged versions 
            of Mak5, Rrp5, Dbp7, Dbp3, Nsr1, Hca4, and Nop2 verified 
            the physical association. 
             The nucleosome, a fundamental unit within chromatin, 
            provides a second example of overlap. It is composed of 
            eight histones (two H2A, two H2B, two H3, and two H4), 
            which can block RNA polymerase II progression. This 
            blockage is relieved upon interaction with the FACT 
            complex (also known as SPN or yFACT), which consists of 
            Spt16 and Pob3 in yeast. Mammalian Pob3 has a high mobility 
            group (HMG) domain for interaction with histones; however, 
            yeast Pob3 lacks this domain. Instead, the HMG protein 
            Nhp6 (with two virtually identical isoforms, Nhp6A and 
            Nhp6B) binds histones (25–27). 
            [Nhp6 also binds DNA in competition with the nucleosome 
            (28).] 
            Our thresholded PIP and experimental data document a 
            specific interaction between Nhp6A and Hhf1 (H4), 
            pinpointing the contact between the nucleosome and Nhp6 to 
            the H3-H4 heterodimer (Hhf1 and Hht1). This is plausible; 
            because Nhp6 has been shown not to influence nucleosome 
            reassembly (29), 
            it is unlikely that it binds with the H2A-H2B dimer, which 
            needs to reassociate with the nucleosome after binding 
            FACT. 
             The replication complex, a third experimental validation of 
            the PIP, assembles and dissembles from transiently 
            interacting subcomplexes (e.g., MCM proteins, ORC, and 
            polymerases) throughout the cell cycle (8, 
            30). 
            Our predicted and experimentally verified interactions 
            connect it, probably transiently, to another subcomplex, 
            replication factor A (RFA, composed of Rfa1, Rfa2, and 
            Rfa3). Specifically, we predicted and verified 
            interactions between RFA and two proteins associated with 
            other replication subcomplexes: Rfa2 with Top2 (a 
            component of the nuclear synaptonemal complex) and Rfa1 
            with Pri2 (DNA polymerase  –primase subunit). 
             Finally, we predicted and verified by TAP-tagging that two 
            proteins involved in translation elongation (Tef2 and 
            Eft2) interact. This is plausible given that protein 
            elongation is mediated by three factors in yeast: 
            EF-1  (Tef1, 
            Tef2), EF-2 (Eft1, Eft2), and EF-3 (Hef3, Yef3); most 
            other eukaryotes lack EF-3. Previous experimental data 
            suggest an interaction between yeast EF-1  and EF-3 (31). 
            An interaction between EF-1  and EF-2 had not 
            been demonstrated, although this is reasonable given their 
            similar roles in elongation and their overlapping binding 
            sites on the ribosome (32). 
            
             In summary, we have developed a Bayesian approach for 
            integrating weakly predictive genomic features into 
            reliable predictions of protein-protein interactions. Our 
            de novo prediction of complexes replicated interactions 
            found in the gold-standard positives and PIE. In 
            addition, we confirmed several of our predictions with 
            new experiments. The accuracy of the PIP was comparable 
            to that of the PIE while simultaneously achieving greater 
            coverage. 
             Our procedure lends itself naturally to the addition of more 
            features, possibly further improving results. We 
            anticipate that protein-protein interactions in organisms 
            other than yeast can be explored in similar ways. 
            
              References and 
            Notes
            
 
             
              
              
                | 1. | 
                P. Uetz et al., Nature 
                  403, 623 (2000).[CrossRef][ISI][Medline] |  
              
                | 2. | 
                T. Ito et al., Proc. Natl. Acad. 
                  Sci. U.S.A. 98, 4569 (2001).[Abstract/Free Full Text] |  
              
                | 3. | 
                A. C. Gavin et al., Nature 
                  415, 141 (2002).[CrossRef][ISI][Medline] |  
              
                | 4. | 
                Y. Ho et al., Nature 
                  415, 180 (2002).[CrossRef][ISI][Medline] |  
              
                | 5. | 
                I. Xenarios et al., Nucleic Acids 
                  Res. 30, 303 (2002).[Abstract/Free Full Text] |  
              
                | 6. | 
                H. W. Mewes et al., Nucleic Acids 
                  Res. 30, 31 (2002).[Abstract/Free Full Text] |  
              
                | 7. | 
                G. D. Bader et al., Nucleic Acids 
                  Res. 29, 242 (2001).[Abstract/Free Full Text] |  
              
                | 8. | 
                R. Jansen, D. Greenbaum, M. 
                  Gerstein, Genome Res. 12, 37 (2002).[Abstract/Free Full Text] |  
              
                | 9. | 
                A. Kumar et al., Genes Dev. 
                  16, 707 (2002).[Abstract/Free Full Text] |  
              
                | 10. | 
                C. von Mering et al., Nature 
                  417, 399 
                  (2002).[CrossRef][ISI][Medline] |  
              
                | 11. | 
                A. M. Deane, L. Salwinski, I. 
                  Xenarios, D. Eisenberg, Mol. Cell. Proteomics 1, 
                  349 (2002).[Abstract/Free Full Text] |  
              
                | 12. | 
                A. M. Edwards et al., Trends 
                  Genet. 18, 529 (2002).[CrossRef][ISI][Medline] |  
              
                | 13. | 
                G. D. Bader, C. W. Hogue, Nature 
                  Biotechnol. 20, 991 (2002).[CrossRef][ISI][Medline] |  
              
                | 14. | 
                A. Kumar, M. Snyder, Nature 
                  415, 123 
                  (2002).[CrossRef][ISI][Medline] |  
              
                | 15. | 
                A. M. Marcotte, M. Pellegrini, M. J. 
                  Thompson, T. O. Yeates, D. Eisenberg, Nature 
                  402, 83 (1999).[CrossRef][ISI][Medline] |  
              
                | 16. | 
                M. Steffen, A. Petti, J. Aach, P. 
                  D'Haeseleer, G. Church, BMC Bioinformatics 3, 34 
                  (2002).[CrossRef][Medline] |  
              
                | 17. | 
                R. Jansen, N. Lan, J. Qian, M. 
                  Gerstein, J. Struct. Funct. Genomics 2, 71 
                  (2002).[CrossRef][Medline] |  
              
                | 18. | 
                A. Drawid, M. Gerstein, J. Mol. 
                  Biol 301, 1059 (2000).[CrossRef][ISI][Medline] |  
              
                | 19. | 
                Materials and methods are available 
                  as supporting material on Science Online. |  
              
                | 20. | 
                T. R. Hughes et al., 
                  Cell 102, 109 (2000).[ISI][Medline] |  
              
                | 21. | 
                R. J. Cho et al., Mol. 
                  Cell 2, 65 (1998).[ISI][Medline] |  
              
                | 22. | 
                M. Ashburner et al., 
                  Nature Genet. 25, 25 (2000).[CrossRef][ISI][Medline] |  
              
                | 23. | 
                See http://genecensus.org/intint. |  
              
                | 24. | 
                I. P. Girard et al., EMBO 
                  J. 11, 673 (1992).[Abstract] |  
              
                | 25. | 
                N. K. Brewster, G. C. Johnston, R. 
                  A. Singer, Mol. Cell. Biol. 21, 3491 (2001).[Abstract/Free Full Text] |  
              
                | 26. | 
                A. A. Travers, EMBO Rep. 
                  4, 131 (2003).[Abstract/Free Full Text] |  
              
                | 27. | 
                T. Formosa et al., 
                  Genetics 162, 1557 (2002).[Abstract/Free Full Text] |  
              
                | 28. | 
                Y. Yu, P. Eriksson, L. T. Bhoite, D. 
                  J. Stillman, Mol. Cell. Biol. 23, 1910 (2003).[Abstract/Free Full Text] |  
              
                | 29. | 
                R. C. Bash, J. M. Vargason, S. 
                  Cornejo, P. S. Ho, D. Lohr, J. Biol. Chem. 276, 
                  861 (2001).[Abstract/Free Full Text] |  
              
                | 30. | 
                O. M. Aparicio, D. M. Weinstein, S. 
                  P. Bell, Cell 91, 59 (1997).[CrossRef][ISI][Medline] |  
              
                | 31. | 
                M. Anand, K. Chakraburtty, M. J. 
                  Marton, A. G. Hinnebusch, T. G. Kinzy, J. Biol. Chem. 
                  278, 6985 
                  (2003).[Abstract/Free Full Text] |  
              
                | 32. | 
                O. Kovalchuke, R. Kambampati, E. 
                  Pladies, K. Chakraburtty, Eur. J. Biochem. 258, 
                  986 (1998).[Abstract] |  
              
                | 33. | 
                We thank C. Sander and G. Bader for 
                  critical discussions. |   
            Supporting Online Material 
             www.sciencemag.org/cgi/content/full/302/5644/449/DC1 
            
             Materials and Methods 
             Figs. S1 to S3 
             Tables S1 and S2 
             References 
             29 May 2003; accepted 29 August 
            2003 10.1126/science.1087361 Include this information when 
            citing this paper.
             
              Volume 302, Number 5644, Issue of 17 Oct 2003, 
            pp. 449-453.  Copyright © 2003 by The American Association for the 
            Advancement of Science. All rights reserved. 
            
  |