Identifying genes preferentially expressed in undifferentiated embryonic stem cells
© Li and Leder; licensee BioMed Central Ltd. 2007
Received: 09 March 2007
Accepted: 28 August 2007
Published: 28 August 2007
Skip to main content
© Li and Leder; licensee BioMed Central Ltd. 2007
Received: 09 March 2007
Accepted: 28 August 2007
Published: 28 August 2007
The mechanism involved in the maintenance and differentiation of embryonic stem (ES) cells is incompletely understood.
To address this issue, we have developed a retroviral gene trap vector that can target genes expressed in undifferentiated ES cells. This gene trap vector harbors both GFP and Neo reporter genes. G-418 drug resistance was used to select ES clones in which the vector was integrated into transcriptionally active loci. This was then followed by GFP FACS profiling to identify ES clones with reduced GFP fluorescence and, hence, reduced transcriptional activity when ES cells differentiate. Reduced expression of the GFP reporter in six of three hundred ES clones in our pilot screening was confirmed to be down-regulated by Northern blot analysis during ES cell differentiation. These six ES clones represent four different genes. Among the six integration sites, one was at Zfp-57 whose gene product is known to be enriched in undifferentiated ES cells. Three were located in an intron of a novel isoform of CSL/RBP-Jkappa which encodes the key transcription factor of the LIN-12/Notch pathway. Another was inside a gene that may encode noncoding RNA transcripts. The last integration event occurred at a locus that may harbor a novel gene.
Taken together, we demonstrate the use of a novel retroviral gene trap vector in identifying genes preferentially expressed in undifferentiated ES cells.
Stem cells offer hope of potential therapies for diseases as disparate as diabetes, Parkinson's and Alzheimer's disease. Naturally, knowledge of the intrinsic properties of stem cells must be gained before the hope of using stem cells for therapeutic purposes becomes a reality. In particular it will be important to know how these cells that retain the ability to continuously self-renew in a multipotential state can be maintained in this undifferentiated state.
There are more than a dozen different kinds of stem cells in mammals, including those that retain the ability of totipotential differentiation, the so-called embryonic stem cells or pluripotential stem cells . Recent data suggest that some adult stem cells are more plastic than originally thought . Not only can certain types of adult stem cells be induced to differentiate along several cell lineages (termed transdifferentiation), it has also been observed that differentiated cells can be reverted to stem cell-like cells (termed dedifferentiation) and redirected to other cell lineages . Although it has been recently shown that cell fusion could provide an alternative explanation for the phenomenon of transdifferentiation [3, 4], it is still consistent with the notion that there are some key regulators in stem cells that keep them in an undifferentiated pluripotent state.
We are interested in identifying these determinants since they are likely to play an important role in the maintenance, proliferation, survival or differentiation of stem cells. ES cells should express the determinants typical of all stem cells. They can be easily maintained in culture and are amenable to both genetic and molecular manipulations. Thus, we have chosen to use mouse ES cells as a model system in which to isolate determinants that maintain stem cells in their undifferentiated state or promote the differentiation of stem cells into a certain lineage. One assumption for such stem cell determinants is that at least some are only expressed in undifferentiated stem cells, but not in differentiated cells. Indeed, the Oct-4 and the Nanog genes, both of which encode a homeodomain transcription factor, are required for the establishment of the undifferentiated state of ES cells and are rapidly down-regulated in differentiated cells [5–10]. The expression of another transcription factor, Rex-1, also correlates with the undifferentiated state of ES cells [11, 12]. To identify additional genes that behave similarly to Oct-4, Nanog and Rex-1, we have used a gene trap approach coupled with GFP FACS analysis to screen for genes that are expressed in undifferentiated ES cells, but dramatically down-regulated in the differentiated cells. Among these genes, presumably some play an essential role as determinants of the undifferentiated state or factors in promoting the differentiation of ES cells into a particular lineage.
Gene trap is a genetic approach that utilizes promoter-less reporters to target endogenous functional genes [13–16]. It could be either plasmid- or retrovirus-based. The advantage of using a retroviral vector is that retroviral infection normally results in a single integration event. We have constructed a novel promoter-less retroviral gene trap vector harboring both GFP and Neo reporter genes that can only be expressed when the vector is integrated into an active endogenous gene. By utilizing this vector, coupled with GFP FACS profiling of 300 ES clones, we have successfully isolated 30 candidate clones that displayed less GFP fluorescence when the ES cells differentiated. We confirmed by RT-PCR or Northern blot analysis that the genes trapped in six of the candidate ES clones were down-regulated during ES cell differentiation.
The ease of using the gene trap approach and GFP FACS profiling, plus the simplicity of identifying the trapped genes, make this vector a suitable choice to target and isolate ES cell-specific genes in mice. Similar to previous studies [17–22], it can be applied to other cell types to identify developmentally regulated genes or genes that are responsive to certain growth factors.
If the endogenous gene is turned off when ES cells differentiate, the expression of GFP and Neo reporters in the trapped locus will not persist and these cells will lose both green fluorescence and drug resistance. We have taken advantage of one of the conspicuous traits of ES cells; they have the tendency to spontaneously differentiate. In the presence of feeder fibroblast cells and leukemia inhibitory factor (LIF), a significant fraction of ES cells remain undifferentiated. By contrast, almost all ES cells become differentiated within four days without feeder fibroblast cells and LIF. Two populations of cells were derived for each ES clone: undifferentiated cells grown in the presence of feeder fibroblast cells and LIF or differentiated cells grown without either feeder fibroblast cells or LIF. These two populations of cells were compared for each ES clone for the level of GFP fluorescence by FACS profiling.
About three hundred ES clones were screened to identify those that displayed significant reduction in GFP fluorescence when differentiated. About 40% of the G-418-resistant ES clones we examined did not show any fluorescence in FACS analysis. We reasoned that enzyme-based drug selection is more sensitive than GFP fluorescence and thus not all G-418-resistant ES clones displayed GFP fluorescence. For the rest of the ES clones, expression of the GFP reporter appeared to remain unchanged in most ES clones when comparing undifferentiated cells and their differentiated counterparts. For example, in the ES clone 5B33 very little difference in GFP fluorescence was detected between the undifferentiated and differentiated populations of cells (Figure 2B, left panel). Nevertheless, about 30 out of 300 ES clones screened exhibited significant down-regulation of GFP expression by FACS analysis. The highly GFP-positive portion of the cells from the undifferentiated sample of the ES clone 5C1 was almost completely missing in the differentiated sample (Figure 2B, right panel).
Illustration of the trapped genes proven to be significantly down-regulated in the differentiated ES cells versus undifferentiated ES cells
Trapped ES clone
2G2, 5C25, 6B13
A novel CSL isoform
Eight-cell embryo, Germ cell, ES cell
Embryo, Germ cell, ES cell
Blastocyst, ES cell
Illustration of ES clones that display significant down-regulation in FACS profiling but could not be confirmed by Northern blot
Trapped ES clone
Matched gene (or genomic region)
5B1, 5B5, 5B20, 5A8
2F1 (2 integrations)
UBA52 and LYSMD1
One clone resulted from integration in the gene Zfp-57. This gene was previously isolated in a screen for genes that were expressed in undifferentiated teratocarcinoma F9 cells but down-regulated upon treatment of F9 cells with retinoic acid . In agreement with our studies, Zfp-57 was also found to be expressed in ES cells and down-regulated when differentiated into neurons [28–30]. It encodes a basic protein with multiple zinc fingers and is predominantly localized to the nucleus . Similar to the findings described in a recent paper [30, 31], we also found that ZFP-57 contains a putative KRAB box (Li, Youngson, Zhou, Ito, Ferguson-Smith and Leder, unpublished data).
The gene trapped in the fifth ES clone matches two EST clones that are expressed in the blastocyst. Analysis of cDNAs for this gene suggests the presence of multiple alternatively spliced isoforms with the characteristics of noncoding RNA molecules (see below).
The identity of the gene trapped in the last ES clone, which was confirmed to be down-regulated when ES cells differentiate, remains to be determined.
Sequencing of these fusion RT-PCR products not only confirmed our prediction for these trapped genes, but also provided us with the opportunity to observe the exact splicing patterns of fusion transcripts around the integration site. As shown schematically in Figure 4, there are basically three types of transcripts generated depending on whether the integration occurred in an intron or into an exon. When the integration site was localized in an intron of the endogenous gene (Figure 4B), as was the case in the ES clone 2G2 and 5C25, a completely spliced fusion transcript was observed due to the presence of the nearby splice donor sequence at the junction of the exon N and intron N which acted as the donor to the strong splice acceptor site in front of the IRES-EGFP reporter. However, when the gene trap vector was inserted into an exon of the endogenous gene as was observed in the ES clone 5C1 (Figure 4C), two kinds of transcripts were produced: a read-through fusion transcript and a partially spliced fusion transcript. The read-through fusion transcript includes the 5' portion of the interrupted exon (Exon N-5') of the endogenous gene, the 3'LTR region, the SA region and the reporters. The partially spliced fusion transcript was generated when the cryptic splice donor site present in the 3'LTR was utilized to promote splicing of the 5' portion of the interrupted exon (Exon N-5') along with a portion of the 3'LTR in the splice acceptor site in front of the reporters. No matter whether the gene trap vector is inserted into an intron or an exon, fusion transcripts will be generated and both GFP and neo reporter genes will be translated as two independent protein products due to the presence of an IRES sequence in front of each reporter gene.
We employed either Northern blot with gene-specific probes or semi-quantitative RT-PCR to test if the expression of the endogenous genes behaves similarly to that of the GFP reporter gene in the correspondingly trapped ES clones.
We utilized a semi-quantitative RT-PCR approach to investigate the novel embryonic isoform of CSL/RBP-Jkappa trapped in the ES clone 2G2, 5C25 and 6B13. Four strategies were used: two primers specific to the embryonic isoform (I of Figure 5B), or two primers common to all isoforms (II of Figure 5B), or one primer specific to the embryonic isoform and the other common to all isoforms (III of Figure 5B), or one primer specific to the most common and ubiquitously expressed isoform and the other common to all the isoforms (IV of Figure 5B). In either strategy I or III, when at least one primer is specific to the particular embryonic isoform, a relatively bright band of the expected molecular weight was seen on the agarose gel for the RT-PCR products amplified from two independent total RNA samples (1U and 2U in Figure 5B) of undifferentiated wild-type ES cells. In contrast, only a minimal amount of product was amplified from two corresponding samples obtained from differentiated cells (1D and 2D in Figure 5B). As expected, similar amounts of amplified product were obtained from the corresponding samples when either all the isoforms were amplified (II of Figure 5B) or only the most common one was amplified (IV of Figure 5B). Thus, it appears that this novel embryonic isoform of CSL/RBP-Jkappa is preferentially expressed in the undifferentiated ES cells.
Transcriptional down-regulation of the novel gene trapped in the ES clone 5C11 was verified by Northern blot using a gene-specific probe (Figure 5C). Three independent total RNA samples were derived from both the undifferentiated (Lanes 1, 2 and 5 of Figure 5C) as well as the differentiated (Lanes 3, 4 and 6 of Figure 5C) wild-type ES cells. Clearly, many more transcripts were present in all the samples prepared from the undifferentiated ES cells than those from the differentiated cells.
As has previously been shown, it is certainly possible to use a genomic approach to identify ES cell-specific genes at a much larger scale [33, 34]. DNA micro-array as well as serial analysis of gene expression (SAGE) has been widely used in a variety of model systems. However, the retroviral gene trap approach has the advantage of being mutagenic, allowing us to directly evaluate the function of the genes in vivo. In addition, as demonstrated here and in other studies [20, 35], this strategy can identify both novel genes as well as novel isoforms of known genes. For example, we have identified a novel embryonic isoform of CSL/RBP-JKappa. By contrast, DNA micro-array is based on known genes and available EST databases. SAGE analysis can generate previously unknown tags, but it is based on a short stretch of nucleotides at the 3'-end of a transcript and sometimes it is difficult to unequivocally locate the corresponding genes by simply searching the genome databases. The ability to identify novel tags in the SAGE analysis is limited by the abundance of the transcripts of a given gene. In the case of the novel embryonic isoform of CSL/RBP-Jkappa pulled out from our screen, it is unlikely that it would be discovered from the SAGE analysis of ES cells since it differs from other isoforms only at the extreme 5' exon. The DNA micro-array approach may also fail to identify this novel isoform unless oligonucleotides corresponding to this extreme 5' exon were included in the array.
The gene trap approach has its limitations. First, it is not designed as a large-scale approach as is DNA micro-array or SAGE and is limited by the efficiency of the vector to trap the endogenous genes. Second, although retroviral integrations are generally thought to be random, it appears that there are some hotspots for murine Moloney leukemia virus (MMLV) and other retroviruses . It is possible that these drawbacks could be partially overcome by screening more ES clones.
The infection efficiency that was achieved with our current vector was still quite low based on the number of G418-resistant colonies obtained from a 10-cm dish of ES cells infected with this vector. It is true that ES cells are notoriously difficult to infect and normally their infection efficiencies are two to three orders of magnitude lower than those of fibroblast cells . It is possible that we could increase the infection efficiencies by using VSVG pseudotyping which has been shown to be very effective in lentiviral vector-based infection of ES cells .
We favor our current reporter system expressing GFP and Neo as two independent proteins over a bi-functional reporter system involving the fusion of GFP and Neo reporter genes because we found that expression of the fused GFP and Neo reporter gene that we constructed is toxic to the cells even without G-418 drug selection (Supplemental Figure S3, see Additional file 1). Consistent with this, it was reported that a fusion protein consisting of GFP and Neo reporter genes displayed no visible GFP fluorescence although it had sufficient Neo activity .
We noticed that roughly one-fifth of the ES clones that had significant reduction of GFP fluorescence in FACS profiling were verifiable by Northern blot. One plausible reason for this is that FACS measures GFP fluorescence at the single cell level whereas Northern blot is based on the approximation of loading equal amount of total RNA derived from a group of cells. In this case undifferentiated ES cells proliferate very rapidly and it is likely that more ribosomal and other RNA transcripts involved in protein synthesis are produced in a given undifferentiated ES cell than are produced in a differentiated cell. Therefore, we may have under-estimated the amount of GFP reporter transcripts in the undifferentiated cells on a per cell basis by Northern blot. Another possible explanation is that the degree of reduction in GFP fluorescence varied greatly among the candidate ES clones. Indeed, almost all the clones that were verifiable by Northern blot were the ones that displayed dramatic down-regulation in FACS profiling when ES cells became differentiated.
It is also possible to carry out a similar screening in ES cells by using the well-established promoter-trap vectors containing a beta-geo marker gene encoding a β-galactosidase and Neo fusion protein, in combination with a fluorogenic β-galactosidase substrate [13, 20, 21, 35, 40]. However, it is much easier to detect GFP fluorescence by FACS and GFP fluorescence intensity is directly correlated with the amount of GFP protein produced in a given cell. In addition, the gene for β-galactosidase is much bigger than that of GFP. This may cause a big drop in the viral titer, hence the infection efficiency, due to the increased size of the corresponding insert. On the other hand, infection efficiency is important for retrovirus-based screening in ES cells for the reasons discussed above. Therefore, we prefer our current reporter system for gene trap-based screening in ES cells.
The novel embryonic isoform of CSL/RBP-Jkappa is very specific to undifferentiated ES cells. It is almost absent in differentiated cells. Other isoforms of CSL/RBP-Jkappa are generally thought to be ubiquitously expressed except that RBP-2N was highly enriched in the pre-B cell line 38B9 . Interestingly, it was reported before that two monoclonal antibodies against CSL/RBP-Jkappa only stained the undifferentiated ES cells but not differentiated cells . It remains to be determined whether the protein that these two monoclonal antibodies recognize corresponds to another isoform (or the same isoform identified in this study) of CSL/RBP-Jkappa expressed only in the undifferentiated ES cells. It is also intriguing that CSL/RBP-Jkappa and LIN-12/Notch signaling are involved in the generation and maintenance of definitive neural stem cells [43, 44]. It is possible that this novel embryonic isoform of CSL/RBP-Jkappa may play a similar role in generation or maintenance of certain stem cells during embryonic development.
ES cells share many characteristics with other somatic stem cells. They all have self-renewal and multipotential capacities. Many aspects of stem cell biology using ES cells will probably be applicable to other stem cells as well. However, unlike some other stem cells, ES cells can be readily maintained in culture and are amenable to genetic and molecular manipulations. The ability to generate germline transmissible mice from ES cells also gives us the unique opportunity to look into the phenotypes in vivo. The usefulness of this gene trap vector is manifested by its application in targeting the functional endogenous locus as well as introducing GFP and Neo reporters at the same locus (Li and Leder, unpublished data). The presence of the GFP reporter allows us to analyze the expression patterns in vivo and isolate the particular subset of cells expressing the candidate genes by using GFP-based cell sorting. Although we focused on ES cell-specific genes that are down-regulated during differentiation in this study, this gene trap system can also be used to identify genes that are up-regulated during ES cell differentiation (Li and Leder, unpublished data).
ES cells can be induced to differentiate into many cell lineages . It is possible that some of the ES cell-specific genes identified in this study may have persistent expression in lineages which were not contained in the differentiation outcomes observed in this study. In this sense, ES cell differentiation can mimic many developmental processes in vivo. It is important to know how the undifferentiated state is maintained in ES cells and what initiates the decision to differentiate. A good starting point to answering these questions is to systematically survey ES cell-specific genes for their roles in maintaining the undifferentiated state as well as in promoting the differentiation of ES cells.
Murine TC1 ES cells were grown on mitomycin C-treated feeder fibroblast cells. The medium used for undifferentiated ES cells is DMEM (Gibco) supplemented with 15% fetal bovine serum (Sigma) and 103units/ml Leukemia Inhibitory Factor (LIF) (Chemicon). To obtain differentiated cells, the same growth medium was used and ES cells were grown without LIF and feeder fibroblast cells for four days.
Inverse PCR was used to delete the intervening sequence in the original RET vector , resulting in the fusion of GFP and Neo reporter genes (Figure 1). The IRES sequence derived from EMCV was inserted at the junction in front of the start codon of Neo. The IRES sequence of the cellular NF-kappaB repressing factor (NRF) gene  was first amplified from a cDNA library by PCR and then placed in-between the splice acceptor site and GFP reporter.
The gene trap vector was transfected into phoenix viral packaging cells  by using the calcium phosphate precipitation method. After culturing for 48 hours at 37°C, the viral supernatant was harvested and filtrated through 0.45 μm filter to remove cell debris. This viral filtrate was applied directly onto adherent ES cells grown in the presence of feeder fibroblast cells. G-418 was added to the medium 24 hours later after viral infection at the final concentration of 260 μg/ml. ES cells were subjected to the drug selection for 7 to 10 days until colonies appeared with a diameter of about 1 mm.
G-418-resistant colonies were individually picked, partially digested with trypsin and transferred to 24-well plates seeded with feeder fibroblast cells. After growing for 3 to 5 days, half of the ES cells derived from each clone were frozen as a stock and the rest were split into two wells of 24-well plates. One portion of the cells was grown in the medium with feeder fibroblast cells plus LIF and the other was grown without feeder cells and LIF. After being cultured for 4 days, ES cells from both undifferentiated and differentiated populations were harvested for FACS analysis.
Trizol reagent (Invitrogen) was used to harvest total RNA from ES cells. After purification, total RNA was dissolved in DEPC-treated water. The concentration of RNA was measured with spectrophotometer and approximately equal amounts of total RNA for differentiated and undifferentiated samples were loaded for Northern blot analysis.
Total RNA was also prepared from the adult mouse organs with Trizol reagent.
Genomic DNA was prepared from each candidate ES clone and subjected to HindIII restriction enzyme digestion. After HindIII was heat-inactivated, the digested genomic DNA was treated with T4 DNA ligase. The ligation mixture was used directly as the DNA template for the inverse PCR reactions. Two rounds of the inverse PCR reactions were performed, with the PCR reaction mixture from the first round inverse PCR as the template for the second round inverse PCR by using two nested primers. The two primers used for the first round inverse PCR reaction are IR(N)-GF1 and U5-R1, with the sequences of 5'-cctgccacagacttagaatcagcc and 5'-ctcttgcagttgcatccgacttgtg, respectively. The two primers used for the nested second round inverse PCR are IR(N)-GF2 and U5-R2, with the sequences of 5'-ctctaaggaccctgattcc and 5'-gtggtctcgctgttccttg, respectively.
The primer used for the first-strand cDNA synthesis is Xho-Junc with the sequence of 5'-tcgagccctgagccgta which is complementary to the sequence at the junction of SA and I(N) in the gene trap vector eGeoN/E+pA. The reverse primer used for the PCR reaction is Bcl-2R2 with the sequence of 5'-ccgtacagttccacaaag which is complementary to the sequence present in the exon 3 of bcl-2 in the SA region of the gene trap vector. The forward primers used for PCR are gene-specific primers derived from sequences present in the exons of the predicted endogenous genes upstream of the gene trap integration sites in these ES clones. For the ES clone 2G2 and 5C25, it is P1 with the sequence of 5'-tgagaagcccaggcttctctg. For the ES clone 5C1 and 5C11, they are 5C1-F1 of 5'-agtctcttgccttctcagg and 5C11-F1 of 5'-cctttttggaatggagacagcag, respectively.
First-strand cDNA template was synthesized from 2 μg total RNA of undifferentiated wild-type TC1 ES cells by using Superscript II™ (Invitrogen/Gibco). One-tenth of the first-strand cDNA mixture was used in the RT-PCR reactions. Thirty cycles of amplification were performed with the Advantage GC cDNA polymerase (BD Biosciences Clontech). The primers used are P1 5'-ccttcaagatatccagcaag, P2 5'- cctataggccaacatttgag, P3 5'- caaacgactcactagggaag, P4 5'- cacagggttgagactcttg and P5 5'- gtaatgccctccggttttcctc.
We want to thank Yasumasa Ishida and Nissim Benvenisty for their advice when this project was initiated. Racheli Eiges and Nissim Benvenisty kindly provided the REX-1::EGFP reporter construct. We are grateful to Montserrat Michelman for her help with ES cells and to Juanita Campos for her help with FACS analysis. We thank Alex Bishop, Nicholas Chester, Holger Babbe and Markus Dettenhofer for the comments on the manuscript. X. Li was a postdoctoral fellow of the Helen Hay Whitney Foundation.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.