The process for evaluating chemical safety is inefficient, costly, and animal intensive. There is growing consensus that the current process of safety testing needs to be significantly altered to improve efficiency and reduce the number of untested chemicals. In this study, the use of short-term gene expression profiles was evaluated for predicting the increased incidence of mouse lung tumors. Animals were exposed to a total of 26 diverse chemicals with matched vehicle controls over a period of three years. Upon completion, significant batch-related effects were observed. Adjustment for batch effects significantly improved the ability to predict increased lung tumor incidence. For the best statistical model, the estimated predictive accuracy under honest five-fold cross-validation was 79.3% with a sensitivity and specificity of 71.4 and 86.3%, respectively. A learning curve analysis demonstrated that gains in model performance reached a plateau at 25 chemicals, indicating that the size of the current data set was sufficient to provide a robust classifier. The classification results showed a small subset of chemicals contributed disproportionately to the misclassification rate. For these chemicals, the misclassification was more closely associated with genotoxicity status than efficacy in the original bioassay. Statistical models were also used to predict dose-response increases in tumor incidence for methylene chloride and naphthalene. The average posterior probabilities for the top models matched the results from the bioassay for methylene chloride. For naphthalene, the average posterior probabilities for the top models over-predicted the tumor response, but the variability in predictions were significantly higher. The study provides both a set of gene expression biomarkers for predicting chemically-induced mouse lung tumors as well as a broad assessment of important experimental and analysis criteria for developing microarray-based predictors of safety-related endpoints.