- Research article
- Open Access
M-BISON: Microarray-based integration of data sources using networks
- Bernie J DaigleJr1 and
- Russ B Altman1, 2Email author
https://doi.org/10.1186/1471-2105-9-214
© Daigle and Altman; licensee BioMed Central Ltd. 2008
- Received: 06 September 2007
- Accepted: 25 April 2008
- Published: 25 April 2008
Abstract
Background
The accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis. Unfortunately, the noise level and experimental variability of microarrays can be limiting. While a number of existing methods partially overcome these limitations by incorporating biological knowledge in the form of gene groups, these methods sacrifice gene-level resolution. This loss of precision can be inappropriate, especially if the desired output is a ranked list of individual genes. To address this shortcoming, we developed M-BISON (Microarray-Based Integration of data SOurces using Networks), a formal probabilistic model that integrates background biological knowledge with microarray data to predict individual DE genes.
Results
M-BISON improves signal detection on a range of simulated data, particularly when using very noisy microarray data. We also applied the method to the task of predicting heat shock-related differentially expressed genes in S. cerevisiae, using an hsf1 mutant microarray dataset and conserved yeast DNA sequence motifs. Our results demonstrate that M-BISON improves the analysis quality and makes predictions that are easy to interpret in concert with incorporated knowledge. Specifically, M-BISON increases the AUC of DE gene prediction from .541 to .623 when compared to a method using only microarray data, and M-BISON outperforms a related method, GeneRank. Furthermore, by analyzing M-BISON predictions in the context of the background knowledge, we identified YHR124W as a potentially novel player in the yeast heat shock response.
Conclusion
This work provides a solid foundation for the principled integration of imperfect biological knowledge with gene expression data and other high-throughput data sources.
Keywords
- Differentially Express
- Differentially Express Gene
- Biological Knowledge
- Standard Parameter Estimation
- Silver Standard
Background
By measuring the abundance of tens of thousands of mRNA transcripts at once, DNA microarrays have become an important laboratory tool for analyzing gene expression and its regulation over a wide range of conditions and cell types. Since their introduction in 1995 [1], microarrays have measured expression in both large-scale multi-condition assays as well as individual condition queries. Though multi-condition analyses are most well known, individual condition queries are common, as most labs are interested in one or a few phenotypes and cannot afford to test all possible perturbations of a given cell type or organism. In particular, microarrays are often used to detect differentially expressed (DE) genes – which are defined here as genes whose transcripts are expressed at different levels between two conditions. In this work we consider differential expression to be a binary state; to estimate this state we ask whether the level of measured expression of each gene in the two conditions is significantly different.
Unfortunately, even though limiting the number of tested conditions simplifies the experimental protocols, it also creates a problem. Typically, only a small number of genes are differentially expressed between two conditions; it therefore becomes difficult to separate the biologically relevant genes from the vast majority of genes that are unchanged or whose changes are artifactual. This stems in large part from the inherent noisiness and often poor reproducibility of the microarray assay, especially with respect to genes expressed at low levels [2, 3]. In addition, though differential expression (as we have defined it) is a binary characteristic, some genes undergo a greater fold change increase/decrease than others. It is relatively easy to detect differentially expressed genes with large changes in expression; the challenge is detecting those genes that exhibit a statistically significant but small fold change. Importantly, there is no guarantee that the most biologically relevant genes are those with the most extreme changes in expression; indeed, "innocent bystander" expression changes – unrelated to the phenotype of interest – may dilute the signal of the assay [4]. The difficulty in sensitively detecting expression changes is in part a result of methods that treat each transcript on the chip as an independent biological and statistical entity [5]. Examples of such methods include the t test, SAM, and the B statistic [6–8]. Yet it is widely accepted that gene products act together in networks to carry out their functions [9]. Although our knowledge of these networks is incomplete, it is attractive to have methods that use this information.
Experimental approaches to these analysis challenges include performing more biological replicates and validating results using more precise assays such as qRT-PCR or protein quantification, but the former solution has a practical limit [10] and both approaches are resource-limited. Therefore, informatics solutions offer an opportunity to leverage dependencies and functional relationships between genes to better identify DE genes. Lu, Liu et al. use a multivariate statistic to test for DE genes, though the dependency information here comes only from the microarray data and ignores additional known biology [5]. More commonly, researchers use methods that incorporate prior biological knowledge into microarray analysis to enhance the experimental signal. The knowledge can come from other high-throughput genomic assays, like chromatin immunoprecipitation or protein-protein interaction assays, or more commonly, from gene functional annotation, like Gene Ontology terms [11] or KEGG pathway information [12]. Of these methods, nearly all sacrifice resolution at the DE gene level in favor of aggregated statistics associated with the presumably more robust DE "gene group" level. Thus, several statistical approaches seek enrichment of gene groups in the high or low scoring end of a ranked list derived from a single condition microarray [13–18]. The latter three methods have the advantage of not requiring an arbitrary cut-off of significant vs. non-significant genes, and all methods have been used with some success in detecting DE gene groups. However, there are certainly situations where gene-level resolution is more useful than gene group resolution, such as a search for candidate genes or potential biomarkers. While the members of the highest scoring gene groups can be a source of high scoring genes, it is not clear how to consistently compare the significance of genes from different groups. Additionally, certain forms of biological knowledge, such as binary protein-protein interaction data, are not easily amenable to methods requiring discrete gene groupings.
Thus, we need a formal, consistent method to incorporate known biological relationships with gene expression data to identify differential expression at gene-level resolution. Such an approach should yield more accurate and reproducible identification of DE genes than would be possible with any data source alone or with a less sophisticated data integration strategy. One such method, GeneRank [19], uses the same principles as Google's PageRank algorithm [20] to rank differentially expressed genes based on gene expression data and prior biological knowledge. This method relies on a parameter d, which governs the relative weight given to knowledge versus expression data. The authors of GeneRank suggest setting d to 0.5 for general use, but it is clear even in simulated studies that the optimal choice for this parameter varies by dataset and there is no obvious way to choose such a value. In addition, the method validation used simulations and perturbations of existing data; thus it is not known how well the approach would work in a traditional unsupervised learning task with a proper gold standard.
In this paper, we develop, implement, validate, and apply to an experimental dataset M-BISON (Microarray-Based Integration of data SOurces using Networks), a formal probabilistic model that integrates biological knowledge with microarray data to better identify DE genes. Our model lets us assess the significance of each gene's difference in measured expression level between the conditions of interest, accounting for experimental noise and expected correlations in expression with other genes based on biological knowledge. M-BISON relies on two parameters, α(DE)and α(NDE), and we introduce a strategy to choose these parameter values. We use a simulation study to establish the limits of M-BISON's ability to improve results, both in terms of microarray data accuracy and knowledge structure. We then apply the method to a real world biological dataset by predicting heat shock-related DE genes from a Saccharomyces cerevisiae hsf1 mutant microarray dataset [21], conserved yeast DNA sequence motifs [22], and a gold standard based on Hsf1p chromatin immunoprecipitation assays [23]. In so doing, we assess the accuracy and interpretability of M-BISON's predictions, and we compare M-BISON's performance to that of GeneRank.
Results
Simulated data
Parameters used to generate 54 simulated data combinations of 1000 genes each
Simulated microarray data | ||
---|---|---|
Dataset | AUC0 | # DE |
A | 0.60 | 100 |
B | 0.75 | 100 |
C | 0.91 | 100 |
D | 0.62 | 200 |
E | 0.75 | 200 |
F | 0.91 | 200 |
Simulated biological knowledge | ||
Knowledge | β | RC |
1 | 0.50 | 0.91 |
2 | 1.00 | 0.91 |
3 | 1.64 | 0.91 |
4 | 0.50 | 2.27 |
5 | 1.00 | 2.27 |
6 | 1.64 | 2.27 |
7 | 0.50 | 4.55 |
8 | 1.00 | 4.55 |
9 | 1.64 | 4.55 |
Yeast dataset validation
Comparison to related method
Differential expression stories
Discussion
A robust and properly validated method to incorporate known biological relationships with gene expression data to identify gene-wise differential expression would be useful and has not been available. To address this need, we created M-BISON, a probabilistic approach to integrating biological knowledge with microarray data.
One might argue that in microarray expression analyses, we should "let the data speak for themselves" and not bias the analysis of differential expression with potentially inaccurate or irrelevant biological knowledge. There are a few points to make. First, it is always possible to run unbiased analyses given the experimental data (this of course should be done), but the data may be speaking in a noisy room with echoes, in which case having some idea of what the data may be saying could assist in interpretation. Second, our method (as illustrated in the case studies of YOR020C and YBR045C) is an effective way to formally define "surprising results" and thus focus attention on potentially novel findings. If the analysis with biological knowledge and the unbiased analysis lead to marked changes in rank for a gene (YOR020C moved from 2762 in the unbiased analysis to 23 with M-BISON) then it deserves study; there is clearly an interesting tension between the data and current knowledge worth examining. If the biased and unbiased analyses rank a gene similarly, then the data and knowledge roughly agree and so a startling discovery is less likely. Thus, M-BISON allows correct knowledge to assist the analysis and highlights inconsistencies so that potentially incorrect knowledge can be identified and reconsidered.
Biological knowledge can take many forms, including gene functional annotation, protein-protein interactions, DNA motifs, and microarray data compendia. Each of these sources can readily be represented as a graph whose nodes are genes and whose edges connect genes that either share a biological function, code for interacting protein products, contain the same upstream DNA motif, or correlate significantly across array experiments. Using these graphs, we are able to represent knowledge ranging from labelled groups (e.g. Gene Ontology terms) to binary connections (e.g. protein interaction data) in a format amenable to a graphical model. In this manner, M-BISON can incorporate many different kinds of biological knowledge and treat them identically, governing their degree of influence by two parameters α(NDE)and α(DE). Intuitively, α(NDE)and α(DE)determine the amount of weight we give to the knowledge to decrease and increase, respectively, the overall DE score for each gene. With these two parameters set to zero, the scores are calculated from the microarray data only; as the parameters' values increase, knowledge plays a larger and larger role.
Using simulated data, we tested M-BISON on a variety of data/knowledge structures and, using an empirical approach for combining results from runs obtained with different parameter values, demonstrated better performance in identifying DE genes in a subset of these structures. We acknowledge that such an empirical approach is unconventional in the context of undirected probabilistic models, but as mentioned in the Methods section, standard parameter estimation strategies were ineffective. Specifically, empirical M-BISON works well when applied to noisy microarray data (low AUC0), conditions resulting in a small number of DE genes, and knowledge sources with a high relative connectivity and/or high β value. While these characteristics might be viewed as limitations, we note they are also precisely the conditions that call for a method like M-BISON. First, improving DE gene identification with expression data is only really necessary when the data are noisy. Second, as the cost of additional experimentation often limits how many DE genes can be pursued in the lab, a method giving higher quality DE predictions is more useful when the overall number of DE genes is expected to be low. If there are many DE genes, experiments can be restricted to the most significant DE gene candidates, which even when identified with a method using only microarray data are likely to be correct. Finally, a knowledge source that is relevant to the conditions of the microarray experiment will likely have a high relative connectivity and β value. We note that while it is impossible to know the exact values of the above features without knowing the right answer, rough estimates of the first two can be obtained from an analysis of the microarray data, while the third can be made favorable by careful choice of the knowledge source.
The above three conditions pertain only to using empirical M-BISON. In our simulation studies, we have shown that using M-BISON with optimal parameter values yields better DE gene identification over a wide range of data and knowledge structures. Though the optimal parameter values vary for differing data-knowledge configurations, they are generally the same for different instances of the same configuration (data not shown). Thus, knowing the (in general, unknown) configuration of the data and biological knowledge would allow intelligent choosing of parameter values and expand the number of examples where M-BISON would be useful. We are currently focusing on this task.
We validated M-BISON on a yeast microarray dataset by showing improved performance in heat shock-related DE gene prediction. We used genome-wide conserved motif data as knowledge and Hsf1p transcription factor binding data as an approximation to a DE gene gold standard (silver standard). Using empirical M-BISON, we show a substantial increase in AUC and pAUC.2 when compared to performance using the B statistic with array data alone. Figure 6 illustrates the differences in high-scoring true positives between the B statistic, single parameter M-BISON, and empirical M-BISON. The empirical method gives the longest list, though the overlap with the B statistic is extensive. The genes not present in the B statistic list are those that became significant due to their associated knowledge; there are nine such genes in the top 250 when using the empirical approach. Most of these genes make good biological sense: the top two genes YER037W and YLR350W share an Ste12 motif, predicted to be involved in stress response [28], and YBL075C and YPL106C are both ATPases involved with the HSP70 family and Hsp90 complex, respectively. The most unique (though shortest) list belongs to single parameter M-BISON. Nearly all of these true positives became significant due to DNA motif knowledge. As before, many of these unique genes make biological sense: YAL005C and YMR186W are both members of a heat shock protein family, YOR020C inhibits the activity of a heat shock protein, and YDR184C is involved in stress response. Interestingly, the top two genes of this list (YOR020C and YFL039C) share two motifs, Reb1 and ARS, neither of which have a known relation to heat or stress response. Each list has at least a few genes that do not appear in the other two, so all three methods of analysis have some value in identifying true positive genes.
We took a closer look at the behavior of M-BISON with respect to two gene stories: YOR020C and YBR045C. We displayed the effects knowledge had on these genes in the context of their knowledge neighbors and saw that the boosting or lowering of these genes was biologically justified. Those few genes whose behavior seemed contradictory to the majority of their network neighbors were explained by expanding the amount of the knowledge neighborhood considered. In addition, viewing M-BISON's effect on the Ndt80 transcription factor (YHR124W) in the context of its associated knowledge has suggested it to us as a potentially novel player in heat shock/stress response pathways. While the assumptions of the hypergeometric p-value in testing motif enrichment are not fully met (genes connected by knowledge will not have independent M-BISON scores), the overwhelmingly small p-value achieved by Ndt80 lends some confirmation to our hypothesis.
Using physical DNA binding data as a substitute for a DE gene gold standard, while biologically reasonable, is clearly not perfect. To investigate possible shortcomings with this approach, we looked in detail at the top 50 high-scoring genes from the lists generated by the B statistic with microarray data only, single parameter M-BISON, and empirical M-BISON (Figure 7 shows results for the first two methods). For each gene, we searched for background information implicating it in stress or heat response. Four of six such genes uncovered by single parameter M-BISON were not considered truly DE by the Hsf1p physical binding data, while the B statistic found only one such gene. Thus, a perfect gold standard might be expected to reveal even better M-BISON performance, at least in the highest-scoring part of the gene list.
It is worth considering what would constitute a perfect gold standard for validating a DE gene predictor. Ideally, a more accurate quantification of RNA (i.e. qRT-PCR) would be used, but obtaining such a measurement for every gene present on a microarray for a given condition is not feasible. Approaches used in the past include evaluating gene lists based on gene annotations or prior knowledge of a condition, but these methods are at best indirect and apply to only a subset of all genes. We believe that using a related high-throughput assay like ChIP-chip for a silver standard is a good compromise, as TFs are expected to regulate the expression of many of their target genes. By choosing a closely related knowledge source like genome-wide motifs, which should predispose DE genes to be those containing motifs used by TFs, we hope to further minimize discrepancy between the silver standard and the unknown truth. We note that though a standard like this will always miss some true positives (i.e. those resulting from secondary targets), it will still be useful in comparing relative performances between DE predicting methods.
We compared M-BISON performance to that of a competing method, GeneRank, using the heat shock dataset. We note that GeneRank depends on a single parameter, d, whereas the M-BISON model requires specification of three: α(NDE), α(DE), and α(b), the first two of which are non-trivial to estimate. In terms of AUC and pAUC.2, M-BISON outperforms GeneRank, both in performance using optimal parameter choices and, more importantly, in performance when knowledge of the truth (and thus the values of the optimal parameters) is withheld. We believe the reason for such improved performance is two-fold. First, because M-BISON utilizes a probabilistic framework, uncertainty present in gene expression data is automatically incorporated into the model. This is in contrast to GeneRank, which uses log ratios of expression. We would expect this to improve results in much the same way that a statistically sound method of detecting DE genes with microarray data improves over fold-change based methods. We also note that though M-BISON requires a Gibbs sampling procedure for inference, the computational time required to evaluate a parameter combination is approximately the same as is needed by GeneRank, which must solve a large system of linear equations. Second, when considering the good performance of empirical M-BISON, we note that this approach effectively tries many combinations of parameter values for each gene, choosing the combination that yields the most significant p-value. In contrast, GeneRank uses one value of the parameter d, and thus we would expect better M-BISON performance in cases where different DE genes would benefit from different strengths of knowledge influence (i.e., differing values of α(NDE)and α(DE)).
One additional advantage of using a probabilistic model in M-BISON is that any genome-wide experimental modality using scores that can be converted to probabilities is a candidate for M-BISON. Thus, ChIP-chip, protein microarray, and even genome-wide association data might benefit from such a knowledge integration approach.
Several challenges remain in using M-BISON for its intended purpose. As mentioned above, one difficulty is choosing optimal parameter values in data-knowledge configurations that do not benefit from the use of the empirical approach. A second and related challenge is the proper handling of knowledge that is uninformative to the condition queried by the microarray experiment. In this case use of any amount of knowledge would be expected to introduce error into the results. By treating all knowledge as a structured graph, we believe we can solve this issue in the process of solving the first challenge, by predicting the unknown configuration of data and biological knowledge. Lastly, we should accommodate more complicated forms of knowledge. One example of this is continuously valued (rather than binary) connections between genes. This would prove useful if Gene Ontology semantic similarity measures between genes were used as knowledge. Another example is use of multiple forms of knowledge, each separately parameterized. With more than a few sources, a grid search becomes computationally prohibitive, and thus a more sophisticated approach to sampling parameter values becomes necessary.
Conclusion
In this work we have developed, validated, and made available M-BISON, a probabilistic method for integrating biological knowledge with gene expression data to identify DE genes. The method in its current state shows good performance in a useful subset of microarray data-biological knowledge configurations. In addition, our work provides a solid foundation to explore such a method's utility with other high-throughput data sources.
Methods
M-BISON
Our method allows relationships between genes, dictated by biological knowledge, to influence the prediction of DE genes from microarray data. Intuitively, it boosts the DE scores of genes that are connected through knowledge to many other high-scoring genes, and lowers the scores of genes that are connected to many low-scoring genes. This is in contrast to traditional methods of DE gene identification, which assign scores to each gene independently of all others.
where v k and d k represent the variance of and residual degrees of freedom from estimating , respectively, and refers to a random variable having chi-square distribution with d k degrees of freedom. True values for the random variables {β k } and {σ k 2} are unknown, and a non-zero value for β k implies that gene k is DE. Unlike Smyth, we do not assume that the estimators and from different genes are unconditionally independent; instead, we model them as conditionally independent given the states of hidden variables {h k }:{h k : h k = I(β k ≠ 0), 1 ≤ k ≤ N}
where d0 and v0 represent empirical estimates of prior degrees of freedom and prior variance, respectively, and t d refers to a random variable having student's t-distribution with d degrees of freedom. In our model, the μ k | h k from different genes are also independent. These distributions lead directly to the two conditional densities of μ k , namely:p(μ k | h k = 0)p(μ k | h k = 1)
with s = [s1,...,sN]T and Z a normalization constant over all possible state configurations.
Standard parameter estimation methods
M-BISON requires careful choice of values for α(NDE)and α(DE)to yield good performance, and these optimal values are a priori unknown. To learn these values in an unsupervised manner, we first considered using the standard Boltzmann machine learning algorithm [29]. This approach applies gradient descent to maximize the likelihood of the observed data given the parameters. Due to the large number of possible states in the model, Gibbs sampling is used to generate samples from the distribution over states, and these samples are used to calculate the gradient. Unfortunately, this learning algorithm is known to be extremely slow for models with large numbers of nodes (our yeast data set has ~6000) [32]. Approximate methods have been devised to resolve these difficulties, but as our model has only two free parameters, we tried a grid search to identify parameter values that maximize data likelihood. Using a cross-validation strategy to avoid overfitting, we discovered that for real datasets, maximum likelihood parameter estimates were not those that yielded the best performance. Faced with this discrepancy, we turned to empirical approaches instead of continuing to explore more elaborate likelihood-based ones.
Empirical method to combine results from multiple runs
The parameterization of M-BISON trades off complexity in model specification for a small number of free parameters. We expect instances where different DE genes would benefit from different strengths of knowledge influence, so we investigated an approach that scores each gene based on a version of the model run with a potentially unique set of parameter values. In essence, this method integrates results from runs using many different combinations of parameter values.
This gives a new vector of scores P* (α0) = [P1* (α0) ⋯ P N * (α0)] T , which alone provides a way to eliminate false positive high scores that arise irrespective of the expression data.
We call this composite vector of scores . The above procedure allows the final score for each gene to come from the parameter value combination giving the most statistically significant result. In this way, we hope to take advantage of the strengths of all parameter value combinations in the grid without having to choose any single set. Pseudocode for the overall method is provided [see Additional file 2].
Data simulation
We simulate microarray data as normally distributed random variables whose parameters are determined from a highly replicated yeast expression dataset [33]. DE genes are drawn from a N(m i , .2) distribution with m i ~ N(0, .25); NDE genes are drawn from a N(0, .2) distribution. We achieve differing accuracies of array datasets by sampling different numbers of experimental replicates.
We simulate a knowledge source by first dropping N points randomly uniformly into a unit 3-dimensional cube. We add edges between any two points at Euclidean distance ≤ r from each other and remove all edges from nodes with degree ≤ c. The values for r and c are chosen empirically (and often separately for DE and NDE genes) to yield desired values of MD, β, MCC, and RC.
We evaluate M-BISON performance on simulated datasets given the known DE genes by calculating the area under the ROC curve (AUC) at a false positive rate cutoff of both 0.2 and 1. The R package ROCR [36] was used for this purpose.
Biological validation
To demonstrate the utility of M-BISON on real biological data, we chose a yeast dataset querying the heat shock transcription factor gene (hsf1). We use gene expression data from [21] and biological knowledge from [22]. For the latter, we included any upstream conserved motifs at or above a score of 55, and we connect two genes in the knowledge graph if they share at least one motif. As a silver standard, we use ChIP-chip results from [23], which reports 165 unique ORFs bound by Hsf1p. Thus, we evaluate M-BISON's performance, in terms of AUC, of predicting likely heat shock-related DE genes (physical Hsf1p binding sites) given relevant expression data and biological knowledge. A table listing each gene's expression value, M-BISON score, and upstream motifs is provided [see Additional file 1].
Comparison to existing methods
We downloaded the MATLAB code for GeneRank, available as an additional file on the BMC Bioinformatics website. GeneRank relies on one parameter, d, which can take on values from 0 to 1. We ran GeneRank using each value of d from 0 to 1 in 0.01 increments on the hsf1 data and knowledge listed above and compared its performance to M-BISON using AUC.
Differential expression story
We generated graphical representations of all knowledge networks using the tkplot function of the R igraph package. Layouts were created using the fruchterman-reingold algorithm with default parameters.
Hypergeometric p-values for motif enrichment were calculated using the phyper function in the R stats package. This function computes the probability due to chance of seeing a given or better enrichment of motifs at the top of a ranked list.
Availability and requirements
We implemented M-BISON and its associated evaluation functions as a documented R package, available at simtk.org under the project name "m-bison".
Declarations
Acknowledgements
BJD is supported by the Geraldine Jackson Fuhrman Stanford Graduate Fellowship and an HHMI predoctoral fellowship. RBA is supported by NIH grant 5U01GM061374-08 for the PharmGKB. This work is supported by a gift from Microsoft. We thank David Heckerman for useful suggestions and comments on the work.
Authors’ Affiliations
References
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270(5235):467–470. 10.1126/science.270.5235.467View ArticlePubMedGoogle Scholar
- Tan PK, Downey TJ, Spitznagel EL Jr, Xu P, Fu D, Dimitrov DS, Lempicki RA, Raaka BM, Cam MC: Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res 2003, 31(19):5676–5684. 10.1093/nar/gkg763PubMed CentralView ArticlePubMedGoogle Scholar
- Draghici S, Khatri P, Eklund AC, Szallasi Z: Reliability and reproducibility issues in DNA microarray measurements. Trends Genet 2006, 22(2):101–109. 10.1016/j.tig.2005.12.005PubMed CentralView ArticlePubMedGoogle Scholar
- Miklos GL, Maleszka R: Microarray reality checks in the context of a complex disease. Nat Biotechnol 2004, 22(5):615–621. 10.1038/nbt965View ArticlePubMedGoogle Scholar
- Lu Y, Liu PY, Xiao P, Deng HW: Hotelling's T2 multivariate profiling for detecting differential expression in microarrays. Bioinformatics 2005, 21(14):3105–3113. 10.1093/bioinformatics/bti496View ArticlePubMedGoogle Scholar
- Callow MJ, Dudoit S, Gong EL, Speed TP, Rubin EM: Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome research 2000, 10(12):2022–2029. 10.1101/gr.10.12.2022PubMed CentralView ArticlePubMedGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001, 98(9):5116–5121. 10.1073/pnas.091062498PubMed CentralView ArticlePubMedGoogle Scholar
- Lonnstedt I, Speed T: Replicated microarray data. Stat Sinica 2002, 12(1):31–46.Google Scholar
- Leung YF, Cavalieri D: Fundamentals of cDNA microarray data analysis. Trends Genet 2003, 19(11):649–659. 10.1016/j.tig.2003.09.015View ArticlePubMedGoogle Scholar
- Pavlidis P, Li Q, Noble WS: The effect of replication on gene expression microarray experiments. Bioinformatics 2003, 19(13):1620–1627. 10.1093/bioinformatics/btg227View ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Kanehisa M: A database for post-genome analysis. Trends Genet 1997, 13(9):375–376. 10.1016/S0168-9525(97)01223-7View ArticlePubMedGoogle Scholar
- Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA: Global functional profiling of gene expression. Genomics 2003, 81(2):98–104. 10.1016/S0888-7543(02)00021-6View ArticlePubMedGoogle Scholar
- Al-Shahrour F, Diaz-Uriarte R, Dopazo J: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 2004, 20(4):578–580. 10.1093/bioinformatics/btg455View ArticlePubMedGoogle Scholar
- Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, Sherlock G: GO::TermFinder – open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 2004, 20(18):3710–3715. 10.1093/bioinformatics/bth456PubMed CentralView ArticlePubMedGoogle Scholar
- Breitling R, Amtmann A, Herzyk P: Iterative Group Analysis (iGA): a simple tool to enhance sensitivity and facilitate interpretation of microarray experiments. BMC Bioinformatics 2004, 5: 34. 10.1186/1471-2105-5-34PubMed CentralView ArticlePubMedGoogle Scholar
- Al-Shahrour F, Diaz-Uriarte R, Dopazo J: Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information. Bioinformatics 2005, 21(13):2988–2993. 10.1093/bioinformatics/bti457View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005, 102(43):15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
- Morrison JL, Breitling R, Higham DJ, Gilbert DR: GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics 2005, 6: 233. 10.1186/1471-2105-6-233PubMed CentralView ArticlePubMedGoogle Scholar
- Page L, Brin S, Motwani R, Winograd T: The PageRank citation ranking: bringing order to the web. In Tech Rep Stanford Digital Library Technologies Project. Stanford University; 1998.Google Scholar
- Hashikawa N, Mizukami Y, Imazu H, Sakurai H: Mutated yeast heat shock transcription factor activates transcription independently of hyperphosphorylation. J Biol Chem 2006, 281(7):3936–3942. 10.1074/jbc.M510827200View ArticlePubMedGoogle Scholar
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 2003, 423(6937):241–254. 10.1038/nature01644View ArticlePubMedGoogle Scholar
- Hahn JS, Hu Z, Thiele DJ, Iyer VR: Genome-wide analysis of the biology of stress responses through heat shock transcription factor. Mol Cell Biol 2004, 24(12):5249–5256. 10.1128/MCB.24.12.5249-5256.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004, 3: Article3.PubMedGoogle Scholar
- Sakurai H, Fukasawa T: A novel domain of the yeast heat shock factor that regulates its activation function. Biochemical and biophysical research communications 2001, 285(3):696–701. 10.1006/bbrc.2001.5234View ArticlePubMedGoogle Scholar
- Sakurai H, Hashikawa N, Imazu H, Fukasawa T: Carboxy-terminal region of the yeast heat shock factor contains two domains that make transcription independent of the TFIIH protein kinase. Genes Cells 2003, 8(12):951–961. 10.1046/j.1356-9597.2003.00689.xView ArticlePubMedGoogle Scholar
- Wu C: Heat shock transcription factors: structure and regulation. Annual review of cell and developmental biology 1995, 11: 441–469. 10.1146/annurev.cb.11.110195.002301View ArticlePubMedGoogle Scholar
- Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 2003, 34(2):166–176. 10.1038/ng1165View ArticlePubMedGoogle Scholar
- Ackley DH, Hinton GE, Sejnowski TJ: A Learning Algorithm for Boltzmann Machines. Cognitive Sci 1985, 9(1):147–169. 10.1016/S0364-0213(85)80012-4View ArticleGoogle Scholar
- Chen Y, Xu D: Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae. Nucleic Acids Res 2004, 32(21):6414–6424. 10.1093/nar/gkh978PubMed CentralView ArticlePubMedGoogle Scholar
- Karaoz U, Murali TM, Letovsky S, Zheng Y, Ding C, Cantor CR, Kasif S: Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA 2004, 101(9):2888–2893. 10.1073/pnas.0307326101PubMed CentralView ArticlePubMedGoogle Scholar
- Kappen HJ, Rodriguez FB: Mean field approach to learning in Boltzmann Machines. Pattern Recognition Letters 1997, 18(11–13):1317–1322. 10.1016/S0167-8655(97)00096-2View ArticleGoogle Scholar
- Jones DL, Petty J, Hoyle DC, Hayes A, Ragni E, Popolo L, Oliver SG, Stateva LI: Transcriptome profiling of a Saccharomyces cerevisiae mutant with a constitutively activated Ras/cAMP pathway. Physiol Genomics 2003, 16(1):107–118. 10.1152/physiolgenomics.00139.2003View ArticlePubMedGoogle Scholar
- Przulj N, Corneil DG, Jurisica I: Modeling interactome: scale-free or geometric? Bioinformatics 2004, 20(18):3508–3515. 10.1093/bioinformatics/bth436View ArticlePubMedGoogle Scholar
- Watts DJ, Strogatz SH: Collective dynamics of 'small-world' networks. Nature 1998, 393(6684):440–442. 10.1038/30918View ArticlePubMedGoogle Scholar
- Sing T, Sander O, Beerenwinkel N, Lengauer T: ROCR: visualizing classifier performance in R. Bioinformatics 2005, 21(20):3940–3941. 10.1093/bioinformatics/bti623View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.