Illumina HiSeq 2500 sequencing; GSM2047323: NA19098-r1-A01; Homo sapiens; RNA-Seq

NA19098-r1-A01

Single cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However, the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single cell Fluidigm C1 platform. To do so, we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples, to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype, but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation, indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results, we suggest a framework for effective scRNA-seq studies. Overall design: We collected single cell RNA-seq (scRNA-seq) data from three YRI iPSC lines using the Fluidigm C1 microfluidic system followed by sequencing. We added ERCC spike-in controls to each sample, and used 5-bp random sequence UMIs to allow for the direct quantification of mRNA molecule numbers. For each of the YRI lines, we performed three independent C1 collections; each replicate was accompanied by processing of a matching bulk sample using the same reagents. This study design allows us to estimate error and variability associated with the technical processing of the samples, independently from the biological variation across single cells of different individuals. We were also able to estimate how well scRNA-seq data can recapitulate the RNA-seq results from population bulk samples. We combined the 96 single cell samples from each C1 chip into their own master mix and sequenced across three lanes of a HiSeq 2500 (3 individuals x 3 replicates x 96 wells x 3 lanes = 2592 files). We prepared two separate library preparations for each bulk sample, combined them all into one master mix, and sequenced across four lanes (3 individuals x 3 replicates x 2 library preparations x 4 lanes = 72 files).

Batch effects and the effective design of single-cell gene expression studies

Submitted by Gene Expression Omnibus on 09-JUL-2016

Single cell loading and capture was performed following the Fluidigm manual "Using C1 to Generate Single-Cell cDNA Libraries for mRNA Sequencing Protocol" (PN 100-7168). Briefly, 30 ul of C1 Suspension Reagent was added to a 70-ul aliquot of ~17,500 cells. Five ul of this cell mix were loaded onto 10-17 um C1 Single-Cell Auto Prep IFC microfluidic chip (Fluidigm), and the chip was then processed on a C1 instrument using the cell-loading script according to the manufacturer's instructions. Using the standard staining script, the iPSCs were stained with StainAlive TRA-1-60 Antibody (Stemgent, PN 09-0068). The capture efficiency and TRA-1-60 staining were then inspected using the EVOS FL Cell Imaging System (ThermoFisher)(supplemental Table X). Immediately after imaging, reverse transcription and cDNA amplification were performed in the C1 system using the SMARTer PCR cDNA Synthesis kit (Clontech) and the Advantage 2 PCR kit (Clontech) according to the instructions in the Fluidigm user manual with minor changes to incorporate UMI labeling (Islam et al. 2014). Specifically, the reverse transcription primer and the 1:50,000 AmbionÂ® ERCC Spike-In Mix1 (Life Tech) were added to the lysis buffer, and the template-switching oligos which contain the UMI (5-bp random sequence) were included in the reverse transcription mix. When the run finished, full-length, amplified, single-cell cDNA libraries were harvested in a total of approximately 13 ul C1 Harvesting Reagent and quantified using DNA High Sensitivity LabChip (Caliper). A bulk sample, a 40 ul aliquot of ~10,000 cells, was collected in parallel with each C1 chip using the same reaction mixes following the C1 protocol of "Tube Controls with Purified RNA" (PN 100-7168, Appendix A). For sequencing library preparation, fragmentation and isolation of 5' fragments were performed according to the UMI protocol (Islam et al. 2014). Instead of using commercial available Tn5 transposase, Tn5 protein stock was freshly purified in house using the IMPACT system (pTXB1, NEB) following the protocol previously described (Picelli et al. 2014). The activity of Tn5 was tested and shown to be comparable with the EZ-Tn5-Transposase (Epicentre). Importantly, all the libraries in this study were generated using the same batch of Tn5 protein purification. For each of the bulk samples, two libraries were generated using two different indices in order to get sufficient material. All of the 18 bulk libraries were then pooled and labelled as the "bulk" for sequencing.