Two Statistical Problems for Inference to Regulatory Structure from Associations of Gene Expression Measurements with Microarrays
Abstract
Of the many proposals for inferring genetic regulatory structure from microarray measurements of mRNA transcript hybridization, several aim to estimate regulatory structure from the associations of gene expression levels measured in repeated samples. The repeated samples may be from a single experimental condition, or from several distinct experimental conditions; they may be “equilibrium” measurements or time series; the associations may be estimated by correlation coefficients or by conditional frequencies (for discretized measurements) or by some other statistic. This paper describes two elementary statistical difficulties for all such procedures, no matter whether based on Bayesian updating, conditional independence testing, or other machine learning procedures such as simulated annealing or neural net pruning. One difficulty obtains if large numbers of cells are aggregated in a measurement of expression levels from a common population of cells; the other obtains if small numbers of cells are aggregated or if samples are separately aggregated over different populations of cells