| A genome-wide map of aberrantly expressed chromosomal islands in colorectal cancer1Max Planck Institute for Molecular Genetics, Dept. of Computational Molecular Biology., Berlin, Germany 2Dept. of General, Vascular and Thoracic Surgery, Charité – Campus Benjamin Franklin, Berlin, Germany 3Signature Diagnostics AG, Potsdam, Germany 4HELIOS Hospital Emil von Behring, Institute of Pathology, Berlin, Germany 5Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach, Germany 6Dept. of Visceral, Thoracic, and Vascular Surgery, University Hospital Carl Gustav Carus Dresden, Germany 7Department of Surgery, Augusta-Kranken-Anstalt GmbH, Bochum, Germany 8metaGen Pharmaceuticals i.L., Berlin, Germany 9Present address: Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland 10Present address: immatics biotechnologies GmbH, Tübingen, Germany 11Present address: ALTANA Pharma AG, Preclinical Research Bioinformatics, Konstanz, Germany
Molecular Cancer 2006, 5:37doi:10.1186/1476-4598-5-37 The electronic version of this article is the complete one and can be found online at: http://www.molecular-cancer.com/content/5/1/37
©
2006 Staub et al; licensee BioMed Central Ltd. AbstractBackgroundCancer development is accompanied by genetic phenomena like deletion and amplification of chromosome parts or alterations of chromatin structure. It is expected that these mechanisms have a strong effect on regional gene expression. ResultsWe investigated genome-wide gene expression in colorectal carcinoma (CRC) and normal epithelial tissues from 25 patients using oligonucleotide arrays. This allowed us to identify 81 distinct chromosomal islands with aberrant gene expression. Of these, 38 islands show a gain in expression and 43 a loss of expression. In total, 7.892 genes (25.3% of all human genes) are located in aberrantly expressed islands. Many chromosomal regions that are linked to hereditary colorectal cancer show deregulated expression. Also, many known tumor genes localize to chromosomal islands of misregulated expression in CRC. ConclusionAn extensive comparison with published CGH data suggests that chromosomal regions known for frequent deletions in colon cancer tend to show reduced expression. In contrast, regions that are often amplified in colorectal tumors exhibit heterogeneous expression patterns: even show a decrease of mRNA expression. Because for several islands of deregulated expression chromosomal aberrations have never been observed, we speculate that additional mechanisms (like abnormal states of regional chromatin) also have a substantial impact on the formation of co-expression islands in colorectal carcinoma. BackgroundDNA microarrays have become a standard tool for the analysis of mRNA expression levels in colorectal cancer cells. Most studies focus on the identification of differentially expressed genes in tissues at different tumor stages or on the identification of new tumor subclasses and their diagnostic gene expression signatures [1-6]. In contrast, much less is known about the influence of chromosomal neighborhood on gene expression in tumors. In tumors different genetic mechanisms are known to affect gene expression in wider chromosomal regions. Chromosomal aberrations, like homozygous and heterozygous deletions or amplifications, alter the DNA copy number of large genomic regions or even whole chromosome arms, leading to inactivation of tumor suppressor genes [7,8] or to activation of oncogenes. Another genetic phenomenon that is assumed to have drastic effects on gene expression in cancer cells is the aberrant alteration of chromatin structure. Methylation of genomic DNA, histone acetylation, and histone methylation are assumed to have a large impact on the accessibility of DNA for transcription initiation [9]. Such epigenetic mechanisms can affect large genomic regions by possibly either silencing or activating large arrays of genes. However, the regulatory mechanisms governing chromatin assembly and disassembly are only beginning to emerge. So far, due to methodological limitations it has not been possible to study the role of such phenomena for gene expression in cancer cells on a genome-wide scale. Nevertheless, evidence from single-gene focused studies suggests that chromatin regulation does play an important role in tumorigenesis [10,11]. Regardless of which mechanism leads to coordinated expression in chromosomal domains, solely the knowledge about such domains is of considerable importance. Such knowledge could guide further studies that aim to differentiate between those differentially expressed genes that cause tumorigenesis and are the primary targets of regional genomic aberrations and those that are rather the outcome than the cause of tumor development. The rationale for the existence of such piggy-back genes is the following. The silencing of genes at close distance to a known tumor suppressor gene (TSGs) would in many cases just be a side effect of TSG silencing. A similar reasoning applies to oncogenes that can be activated by increased expression: genes that are co-amplified could also be expressed at higher levels although they do not contribute to tumorigenesis. Typical searches for differentially expressed genes by microarrays usually ignore such piggy-back effects. This may lead to the identification of large numbers of differentially expressed genes (DEGs), of which only a smaller fraction is causative for tumor development. Though some experimental data recently became available linking microarray expression with DNA copy number analyses in some solid tumors [12-16] the knowledge about the existence of genomic islands of coordinated expression in colorectal carcinoma (CRC) is still limited. During the preparation of this manuscript a first assessment of chromosomal expression patterns in CRC in conjunction with genome-wide DNA copy number analyses became available [17]. Tsafrir et al. described a correlation of gene copy number and expression for both, deleted and amplified genes. They claimed that the described alterations become more frequent as the tumors progress from benign to metastatic forms, highlighting the need for a more precise characterization of regions of coordinate expression and gene copy number change. In addition to this most recent work, a substantial body of literature on chromosomal aberrations in CRC has accumulated [7,15,18-25] that could help to interpret findings on islands of coordinated chromosomal expression. The need for a more precise definition of chromosomal regions of altered gene expression prompted us to find a new approach to investigate chromosomal co-expression domains in CRC. The focus of our study was the identification of up- or down-regulated gene expression in primary colon carcinoma cells compared to normal colon epithelia of the same patient. By using laser capture microdissection (LCM) we aimed to investigate transcript abundance in relatively pure cell populations, trying to minimize the influence of contaminating stroma tissue or infiltrating peripheral blood cells on expression measurements. The use of Affymetrix DNA microarray technology allowed us to simultaneously assess mRNA levels of all known human genes using only small amounts of cells obtained by LCM. Finally, we developed a new bioinformatic approach to identify regions of chromosomal deregulation which enabled the most precise survey of chromosomal expression domains in colon cancer available today. In particular, we were interested in the question whether our data correlated with the data of Tsafrir et al. who performed genome scale arrayCGH and chip-based expression analyses on a different set of colorectal cancer patients [17]. In contrast to Tsafrir et al. we put more emphasis on the identification of precise boundaries of expression domains and therefore we consider our work as complementary to their pioneering study. ResultsEvaluation of data set quality by tissue-wise hierarchical clusteringPrior to the analysis of chromosomal expression domains, we aimed to check whether the quality of our complete array expression data set (> 44 k genes) allows to extract discrepancies between tumor samples and normal epithelial tissues. Purely unsupervised hierarchical clustering of tissue samples based on gene expression vectors can provide such information. The use of the full set of 44 k genes for clustering is not desirable, because of high signal-noise ratios and computational considerations. Therefore, we pre-selected potentially informative genes for hierarchical clustering. We selected only genes which had reliable information about genomic localization and for which probe sets exceeded a minimum expression threshold in at least 20% of the experiments. To enrich informative genes for tissue distinction, we required a minimum standard deviation across all 50 samples. The pre-selection resulted in 514 probe sets. Note that we avoided to pre-select genes based on differential expression between tumor and normal tissue. We applied three rounds of normalization to genes and arrays. Finally, we applied standard centroid hierarchical clustering (Pearson correlation) to this dataset. Two large clusters were revealed (Figure 1). 18 out of 25 normal tissues formed one single cluster. The remaining 8 normal tissues mainly clustered together with matching tumor samples from same patients. This suggests that coalescence between tumor and normal samples from the same patients could be due to patient-specific gene expression characteristics. As the majority of normal samples could be clearly separated from tumors, we concluded that our data set is well suited to explore differences in gene expression between normal and tumor cells of colorectal origin.
Global search for chromosomal islands with up- or down-regulationChromosome-scale analysis of gene expression (see Figures 2, 3, 4, 5) already suggested that there are many regions of misregulated expression in our CRC samples. The detailed analysis of expression along the chromosome in windows of sizes 5, 11, 21, 31, 41, 51 genes resulted in the identification of 251 partially overlapping intervals of up- or down-regulation (see 1). These intervals were condensed in 81 non-overlapping regions of expression imbalance: 43 regions with loss of expression and 38 regions with gain of expression (see Table 1). We determined the fraction of affected genes on each chromosome (see Table 2). In total, 25.3% of all genes under consideration show expression imbalance. Slightly more genes lie in chromosomal regions that show loss of expression (13.3%) than gain (12%) of expression. The fractions of genes with gain or loss of expression vary strikingly from chromosome to chromosome. Chromosomes 9, 10, 15, 18, and 22 showed only regional expression loss, whereas 8, 13, 20, and X showed only regional increase in expression. There were too few informative genes on chromosome Y to carry out a full analysis using all window sizes, but small window sizes did not reveal significant deregulation.
Additional file 1. original ChARM output on chromosomal intervals of coordinated up- or down-regulated expression. This files contains the full original output of ChARM analyses (see methods section). Annotation of probeset IDs with gene symbols and chromosome bands was added subsequently. Format: DOC Size: 497KB Download file This file can be viewed with: Microsoft Word Viewer Table 1. Individual chromosomal islands of up- or down-regulation. Table 2. Statistics on expression imbalances across human chromosomes. Individual chromosomal islands with gain of expression8q11.23-q21.13Gain of expression in region 8q11.23-q21.13 is strongest in a small interval (8q12.1) that spans genes from TCEA1 to PLAG1 (see Figures 6, 7, 8). There have been numerous reports of copy number gains of chromosome 8q in CRC [18,21,23,25] which suggests a possible mechanism leading to over-expression in our patients. The known blood cell oncogene LYN is located in this interval and it is up-regulated in several of our tumor samples. It has been reported before that LYN is expressed in colorectal tumors [26]. The concerted up-regulation of LYN along with other genes in this region suggests a role for LYN in CRC. Another interesting gene in this interval is PLAG1 (pleomorphic adenoma gene 1) for which chromosomal aberrations have been described that lead to over-expression in salivary gland tumors [27,28]. No informative expression measures were obtained for the MOS protein kinase gene which is located between RPS20 and PLAG1, although this may be due to technical limitations. Genes encoding components of the translation machinery, the mitochondrial ribosomal protein MRPL15 and cytosolic ribosomal proteins RPL7 and RPS20, are located in this region, highlighting the need for enhanced translation in cancer cells. The concomitant down-regulation of the TOX and ANKTM1 genes in many patients in an environment of transcriptional activation is remarkable, but the functional significance remains unclear. Buffart et al. have reported amplifications of 8q11-q24 in metastasizing CRC [29], highlighting a possible mechanism for gain of expression in this region. In summary, our analysis suggests that chromosomal region 8q12.1 is a candidate target region for genetic alterations that lead to over-expression in CRC.
20q11.22-q11.23The region 20q11.22-q11.23 was among the most frequently up-regulated regions (see Figures 12, 13, 14). Amplifications of regions on chromosome 20q have been identified independently by several groups in CRCs [19,21,23,24]. The interval comprises the known tumor gene SRC (located between MANBAL and BLCAP in Figures 12, 13, 14) for which no informative expression measures were obtained. We note that it is possible that the SRC gene is the primary target of up-regulation in our CRC patients, the up-regulation of other genes being just piggy-back effects. However, also the up-regulation of the CTNN1L1 transcript could be of potential functional significance for CRC development. CTNN1L1 shows partial homology to the known colorectal cancer gene beta-catenin in the armadillo repeat region and has a nuclear localization signal, suggesting that it could play an important role in signal transduction to the nucleus in CRC. Also up-regulation of the E3 ubiquitin ligase Itchy (ITCH) could be of potential importance as selective ubiquitin-tagging of signaling proteins for destruction is an emerging mechanism in cancer biology. The need for accelerated protein synthesis in cancer cells is reflected by the up-regulation of the translation initiation factor EIF2S2. Remarkably, we found dramatic down-regulation against the regional trend of C20orf110 alias p53-inducible protein 2 (TP53INP2) whose expression is usually positively controlled by the p53 protein. For unknown reasons p53 seems to be unable to induce TP53INP2 expression in the majority of CRCs studied here.
12q14.2-12q22We observed increased expression of genes in chromosomal region 12q14.2-12q22 (see Figures 15, 16, 17). The MDM2 gene at 12q15 is a possible target of this misregulation. However, within this large region there is a smaller region at 12q21.1-q21.2 spanning eight genes that exhibit exceptionally high expression in our tumor samples. Among these is LGR5 alias GPR49, a G-protein coupled receptor that has large leucine-rich repeats in its N-terminus. We could confirm the up-regulation of GPR49 in CRC by quantitative PCR and in-situ hybridization (data not shown). This finding and the exceptional suitability of G-protein-coupled receptors as drug targets make the LGR5/GPR49 protein a potential target for future therapeutical approaches. We do not know of any other reports that link this region to CRC.
17q21.33-17q23.2The chromosomal interval 17q21.33-17q23.2 harbors numerous up-regulated genes (see Figures 18, 19, 20). Chromosomal gains of this region in CRC have been described by two independent studies [21,25]. Up to 18 of 25 patients show up-regulation of expression in this region. The known tumor gene NME1 (non-metastatic 1; encoding the NM23A protein, a nucleoside diphosphate kinase) is among the most frequently up-regulated genes in this region. Also the paralogous genomic neighbor NME2 which acts in the same pathway is strongly up-regulated. These two genes are possibly the primary targets of regional expression up-regulation. However, up-regulation of several other genes is also remarkable. The up-regulation of the mitochondrial ribosomal component MRPS23 is notable as it is in agreement with other observations of up-regulation of genes acting in translation (see above). Additionally, the RING finger gene FLJ20315/RNF124, possibly encoding a novel E3 ubiquitin ligase, and the suppressor of Ty 4 homologue 1 (SUPT4H1), a putative human chromatin regulator that alters transcription, are genes that are strongly up-regulated and could have the potential to contribute to development of CRC.
Individual chromosomal islands with loss of expression1p36.13-1p36.11The most strongly down-regulated region in our study is 1p36.13-1p36.11 (see Figures 9, 10, 11). A larger chromosome region comprising this fragment has recently been reported to be frequently deleted in CRC (see Tables 1 and 4). No tumor suppressor gene has been found yet. Our data suggest multiple genes that could act as class II TSGs. Several have been associated with proliferative processes or even cancer before. The PLA2G2A encodes phospholipase A2 group IIA which has been proposed as a TSG and a marker for metastasis and patient survival in gastric cancer [30]. The E2F2 transcription factor is a known regulator of TSGs and interacts specifically with the RB protein. It plays an important role in the cell cycle. The CDC42 protein is a small Rho-like GTPase. It acts in intracellular signaling and is involved in various processes like control of morphology, migration, endocytosis, and the cell cycle. Therefore, PLAG2A, E2F2 and CDC42 are the primary candidate tumor suppressors in this region. Table 4. Expression in Islands frequently deleted in CRC.
4p15.31-4p15.2The region 4p15.31-4p15.2 is part of a larger region (see Table 1) that showed marked down-regulation of expression in our tumor samples (see Figures 21, 22, 23). Full or partial losses of chromosome 4 are well known phenomena in the development of CRC [18,19,23,24]. One of the strongly down-regulated genes in this region is the SLIT2 gene at 4p15.31 that encodes a membrane protein regulating cellular migration. It has recently been described as a new tumor suppressor gene in CRC, gliomas, lung and breast tumors and seems to be transcriptionally inactivated by epigenetic silencing [31-33]. In addition, several other genes of this region could serve as candidate class II tumor suppressor genes. The GPR125 gene encodes an orphan G-protein coupled receptor that has a large extracellular N-terminus with an immunoglobulin domain and leucine-rich repeats, similar to GPR49 described above. The PCDH7 gene belongs to the protocadherin gene family. It encodes a transmembrane protein that has seven extracellular cadherin repeats, suggesting that it is involved in cellular adhesion and adhesion-dependent intracellular signaling. The functions of genes in this region suggest that this regional expression loss influences adhesion and migration properties of cancer cells. Both, epigenetic silencing and chromosomal aberrations are potential mechanisms leading to expression loss in this region.
18q21.2-18q23There are several reports of loss of chromosome 18q in CRC (see Tables 1 and 4). We found a smaller region of expression down-regulation at 18q21.2-18q23 (see Figures 24, 25, 26). There is a hot spot for down-regulation in direct vicinity of the BCL2 gene. Its special role in cancer qualified the anti-apoptotic BCL2 protein as a therapeutic target molecule [34,35]. Here we observed down-regulation of BCL2 and its neighbors which is contradictory to its known anti-apoptotic cancer-promoting function. Distal to BCL2 at 18q21.1-18q21.2 there is a region of less pronounced down-regulation between ME2 and MBD2. The SMAD4 (Hs.298320) is only weakly down-regulated and the biological significance is questionable. The DCC (deleted in colorectal carcinoma), proximal to MBD2, is the largest gene in this region, but no statements about its expression can be made because of a lack of informative expression measures. The SMAD2 and SMAD7 genes are in close vicinity to this region. In summary, we do not have direct evidence for down-regulation of tumor suppressor genes in this region. Instead, we observed down-regulation of the cancer-promoting BCL2 gene. Therefore, the biological significance of this domain of expression loss remains elusive. Possibly, the down-regulation of the BCL2 region is just a by-stander effect of deletions targeted at DCC disruption. Alternatively, BCL2 down-regulation could be an unsuccessful attempt of the tumor cells' genetic program to shift the cellular homeostasis towards cell death.
5q22.2-5q23.1Not unexpected, we found loss of expression in region 5q22.2-5q23.1 (see Figures 27, 28, 29). This interval harbors two known TSGs in colon cancer, the adenomatous polyposis coli gene (APC) gene and the mutated in colorectal cancer (MCC). We were not able to obtain expression values for APC. APC is located at the border of a region at 5q22.2-5q22.3 that harbors several drastically down-regulated genes. Central in this region is the MCC gene. The distal border is the CDO1 gene. We assume that deletion or epigenetic silencing of this region is a frequent mechanism contributing to colorectal tumorigenesis. It is possible that also APC or MCC show reduced expression, that genes in this region other than APC and MCC are piggy-back genes, and that their misregulation is not of functional significance for tumorigenesis.
14q24.3The chromosomal region 14q24.3 has been implicated in colorectal cancer several times (see Table 1). We found coordinated down-regulation of expression of genes in 14q24.1-14q24.3 (see Figures 30, 31, 32). The region comprises the MLH3 gene that is linked to hereditary non-polyposis colorectal cancer type 7 (HNPCC7). We note that also the FOS gene encoding one half of the bZIP dimer activator protein (AP-1) at 14q24.3 is strongly down-regulated. FOS is known as an oncogene and its down-regulation is therefore unexpected. However, deletions of 14q24.3 have been linked to metastatic CRC [36]. In combination, these results suggest that there is a class II tumor metastasis suppressor in this region. This class II TSG is probably not MLH3, as its protein function is hardly related to cellular functions promoting metastasis. The functions of several other strongly misregulated proteins, however, make them better candidates for metastasis suppressors. KIAA0317 codes for a predicted transmembrane ubiquitin ligase. Ubiquitin ligases can help to tag misfolded transmembrane proteins in the ER for destruction via the proteasome system [37]. Absence of such a function could result in misexpressed proteins at the cell surface which could promote metastasis. Other potential candidates for metastasis suppressor genes in this region code for the transmembrane Alzheimer protein PSEN1, the GTPase activating protein KIAA0440/SIPA1L1, the PDZ-domain synaptojanin 2-binding protein SYNJ2BP and the developmental regulator and Notch interaction partner NUMB.
DiscussionGlobal analysis of chromosomal regions with expression gain or lossWe found that 25% of the genes lie in regions that are affected by expression imbalance in colon cancer. This does not mean that 25% of the genes are misregulated as many genes that fall into these regions are not expressed at all in tumors and in normal epithelium of the colon. Additionally, we note that these numbers are probably an upper limit because the sliding window approach probably included several genes in close proximity to the boundaries of misexpressed regions. Nevertheless, the number of regions of imbalanced expression is remarkable and suggest that there is extensive regulation in CRC at the genomic level. Recently, Nakao et al. estimated from genome-wide array CGH data that ~17% of the human genome is affected by DNA copy number changes in CRC [23]. Prior to a more detailed analysis of individual regions in this study, this suggested that not all regional expression changes in CRC will be explainable by DNA copy number aberrations. There are only slightly more genes with expression loss than regions with expression gain. One can argue that a tumor ought to show a higher frequency of expression loss than expression gain. Reasons are that there should be a tendency to lose tumor suppressor genes selectively and to lose non-essential genes (genomic ballast) as a side effect. If transcription would be a process that is predominantly driven by positive regulation of transcriptional activators, one would assume that any partial genome loss results in a slow down of transcription. In the light of these considerations, an equally high number of regions with expression gain can be interpreted in two ways. Either positive selection drives expression gain of some regions in cancer cells, or a default phenotype of transcription suppression dominates in normal cells which is relaxed during tumor cell development. Gene expression in chromosomal regions with frequent DNA copy number changes in CRCMost studies reported frequent gains of chromosome 7, 8q, 13q, 20q and losses of 4 and 18q in CRC [18,19,21-25]. These broadly-defined alterations are in perfect agreement with chromosome-specific trends in our expression data, especially the exclusive presence of domains of expression gain on 8, 13 and 20 and the exclusive presence of domains of expression loss on chromosome 4 and 18 (see Table 2 and Figures 21, 22, 23, 24, 25, 26). There is a single discrepancy for chromosome 7: region 7q11-7q12 has been reported as amplified in CRC, but its expression is significantly down-regulated in our tumor samples. For a more detailed survey of congruence between gene expression and chromosomal aberrations in CRC, we compared our results to six previous studies reporting chromosomal gains or losses in distinct chromosomal regions [18,19,21-25] (see Tables 3, 4). We considered only those chromosomal regions that were reported by different researchers or were found to be aberrant in > 20% of tumor samples. In summary, we found that the majority of deletion regions show a reduction in expression. This suggests that regional transcriptional silencing in CRC is mainly achieved by loss of genomic DNA. In contrast, amplified regions rather show heterogeneous expression changes. We found regions of expression gain in the most frequently reported regions of chromosome gain on 7, 8q, 13, 20q. These regions are in support for a positive correlation of DNA copy number and transcript abundance, although a direct causal relationship is not shown in this study. Table 3. Expression in Islands frequently amplified in CRC. However, there are also many regions of frequent deletions that did not show alterations in expression or that were even down-regulated (7q11.2-7q12, 9q34, 12p13.1-13.2, 15q22-15q23, 16p12-16p11, 22q11; compare Tables 3 and 4). One possible explanation is that these down-regulated regions are not amplified in our tumor samples. An alternative explanation is that the influence of chromosomal amplification on transcription levels can be either positive or negative. It is possible that amplification of a particular genomic region disrupts transcription of amplified genes by a yet unknown mechanism, e.g. by induction of chromatin-based silencing, or by separation of essential enhancer regions from transcription starts. Platzer et al. found amplifications in 7p, 8q, 13q, 20q in 26%–43% of their CRC patients and revealed by microarray-based expression analysis that only 81 of 2146 genes in amplified regions show over-expression (3.8%) whereas 164 of 2146 genes show under-expression (7.7%). Using a different approach (microdissection, oligo arrays, analysis aimed at the identification of single chromosomal expression domains and not at the location of all differentially expressed genes in chromosomes) we found several smaller up-regulated regions and no regions of down-regulation in the same chromosomal regions. Therefore, our data partly contradicts the findings of Platzer et al. which state that in these frequently amplified regions gene expression is rather down-regulated. However, other misregulated expression domains (see above) of our study confirmed the general notion by Platzer et al. that frequently amplified regions in CRC can also exhibit down-regulation of transcript levels. Aberrantly expressed chromosomal islands linked to hereditary cancerRoughly 5% of all colorectal carcinomas are hereditary non-polyposis colorectal cancers (HNPCCs). In HNPCC, histologically verified colorectal carcinoma is found in at least three relatives from two or more successive generations. In at least one patient, the age of onset should be less than 50 years. Seven chromosomal regions have been linked to HNPCC. More than half of these HNPCC regions show misregulated expression in our patients. Three regions show down-regulation (3p21.3, 2q31-q33 comprising PMS1, 14q24.3 comprising MLH3), one region shows up-regulation (7p22 comprising PMS2), and three regions do not show significant changes in expression (2p22.p21 comprising MSH2, 2p16 comprising MSH6, 3p22 comprising TGFBR2). Eleven further chromosomal regions are linked to hereditary colorectal carcinoma under a common entry in OMIM (14500). More than 50% of these regions show significant expression changes in our data. Five regions show down-regulation (1p35, 14q24.3, 17p11.2, 17p13.1, 22q13), one region shows up-regulation (2p25), and five regions do not show significant expression changes in our data (3q26.3, 8p22-p21.3, 11p11.2, 15q15, 17q24). In combination, these findings strongly suggest that expression changes in regions linked to hereditary CRC play a role in CRC development. Congruence of our study with the genome-wide copy number and expression analysis of Tsafrir et alA particular focus of our study was on the congruence of our data with that of Tsafrir et al. [17]. These authors described 11 alterations of whole chromosomes or chromosome arms. Using our approach based solely on expression data we found precisely defined region of coordinated up-regulation in all four regions of gene expression and gDNA copy number gain that they reported (+7, +8q, +13q, +20q). For six of seven aberrations (-1p, -4, -5q, -14q, -15q, -18) we discovered smaller expression islands of coordinated down-regulation. We were not able to reproduce the finding of expression loss on 8p. In summary, this large congruence of our results with that of Tsafrir et al. can be regarded as an external validation of our results. The comparison illustrates the power of our data analysis approach which allows to define expression islands on a single-gene resolution. Most importantly it confirms our confidence in the use of the chip platform (Affymetrix U133A) that was used in both studies and apparently can lead to largely congruent results in different patient cohorts and laboratories. ConclusionRoughly a quarter of all human genes is located in islands of misregulated gene expression in colorectal cancer. There are only slightly more down-regulated than up-regulated genes. Chromosomal regions that are linked to hereditary colorectal cancer often exhibit deregulated expression, suggesting that they are implicated in spontaneous CRC not only through collection of mutations. Thus, genes in these chromosomal hotspots may be systematically tested in patients with sporadic CRC for molecular lesions and for transcriptional silencing. Chromosomal regions that are frequently deleted in CRC very often comprise islands in which we found reduced expression. Although many regions that are known to be amplified in colorectal tumors show a gain of expression, there are also a considerable number of amplified islands that show no alterations or even down-regulation. Comparison of published CGH studies with our expression data suggests that amplified or deleted chromosomal regions are responsible for many islands with aberrant expression. However, we suggest that it is necessary to invoke other mechanism like epigenetic regulation of chromatin or disruption of enhancer actions to explain the remaining expression imbalances. MethodsPatients25 colorectal cancer patients undergoing elective standard oncological resection at the department of surgery, Charité, Campus Benjamin Franklin, Berlin, Germany were prospectively recruited for this study. The study was approved by the local ethical committee and informed consent was obtained from all patients. Rectal cancer patients receiving neo-adjuvant radiochemotherapy were excluded from this study. Tissue samples and UV-laser microdissectionTransmural cancer specimens were snap frozen (liquid nitrogen) within 20 minutes following excision and stored at -80°C. All tissue samples were evaluated by a pathologist before and during laser micro-dissection to ensure an enrichment of vital tumor cells. Six-micron serial frozen sections were cut on a standard cryostat and mounted on RNase-free foil (2,5 μm) coated on glass slides followed by immediate fixation (70% ethanol for 30s), H&E staining, and ethanol dehydration (70%, 95% and finally 100% ethanol). After vacuum drying the membranes carrying the sections were manually turned and coated on new RNase free glass slides. Optically transparent CapSure LCM caps (ARCTURUS, CA) were placed on the foil over a selected field of cells. Vital colorectal epithelial carcinoma cells (> 90% proportion) from the invasion front were isolated using UV-LCM Systems from PALM (Microlaser Technologie, Germany) and SL (Microtest GmbH, Germany). After visual control of completeness of dissection the captured cells were immersed in denaturation buffer (GTC Extraction Buffer, 2% beta-mercaptoethanol, Promega, WI) and stored at -80°C. mRNA-extraction, cRNA-preparation and -amplificationPoly(A)+ RNAs were isolated using PolyATtract 1000 kit (Promega, Heidelberg, Germany) according to the manufacturer's recommendations. For each sample the cDNA synthesis and repetitive in vitro transcription were performed three times, as described previously [38-40]. In brief, the total amount of prepared mRNA from one sample was used. First strand cDNA synthesis was initiated using the Affymetrix T7-oligo-dT promoter-primer combination. The second strand cDNA was synthesized by internal priming. In vitro transcription was performed using Ambion's Megascript kit (Ambion, Huntington, UK) as recommended by the manufacturer. From the generated cRNA a new first strand synthesis was initiated using 0.025 mM of a random hexamer as primer. After completion, the second strand synthesis was primed using the Affymetrix T7-oligo-dT promoter-primer combination at a concentration of 0.1 mM. A second in vitro transcription was performed and then the procedure was repeated one additional time. During the third in vitro transcription biotin-labeled nucleotides were incorporated into the cRNA as recommended by the Affymetrix protocol. Microarray hybridizationBIO+cRNAs were hybridized on Affymetrix Human Genome U133A and U133B GeneChips, that consist of 44.928 probe sets (Affymetrix, Santa Clara, CA). Fragmentation, preparation of hybridization cocktails, hybridization, washing, staining and scanning of Affymetrix GeneChip were performed according to the manufacturer's protocols. Preprocessing of expression dataWe used our own algorithm to condensate the probe level data provided by Affymetrix CEL-files per chip experiment: Background intensity was computed as the mean of the 2% darkest feature intensities. This background value was subtracted from each feature value. Subsequently, each feature value was divided by the median of all feature values. As a representative expression value (PMQ) for each probe set, the third quartile (75%) of all intensities of all perfect match oligonucleotides was used. Furthermore, to distinguish real expression signals from noise the Wilcoxon signed rank test was applied to each probe set. A probe set was called detectable if the result of the Wilcoxon signed rank test applied to its 11 probe pairs (perfect match versus mismatch oligonucleotide) had a significance level of p < 0.1 and relative expression value (PMQ) of > 4.0. We used these constraints for decision whether a gene is expressed or not due to validation results of several gene expression pattern by quantitative RT-PCR and/or Northern Blot analysis in our lab (data not shown). For each patient and probeset an expression ratio was calculated according to the following rules: If expression was detectable in both the normal and tumor sample (Wilcoxon test p <= 0.10 and relative expression value PMQ >= 4), the ratio PMQ(T)/PMQ(N) is our expression ratio (hereafter called T/N). If expression was undetectable in either the normal or the tumor sample, the expression ratio was either set to T/N = 2 (normal absent) or to T/N = 0.5 (tumor absent). If expression was undetectable in both the normal and tumor sample, no expression ratio was calculated and we call the probe set not informative. For each probe set the number of cases which showed an up-regulation (T/N >= 2), a down-regulation (T/N <= 0.5) or the number of unchanged transcription levels (0.5 < T/N < 2) were counted. We filtered out those probe sets which are not informative in any patient, reducing the number of probe sets to 19404. To eliminate redundancy of probe sets with respect to genes, we kept only the most informative probe set of a single gene, i.e. the probe set which is informative in the highest number of matched sample pairs. Additionally, only probe sets that could unambiguously be linked to a particular genomic locus were considered (chromosome band and position; see Affymetrix U133A/B annotation files). Finally, the pre-processing resulted in a total number of 10.935 probe sets which were the basis of all further analyses. Analysis of expression along chromosomesIn each graph of Figures 2, 3, 4, 5, we plotted the numbers of patient samples with tumor up/down regulation (percentage on informative cases) for all genes according to their position on the chromosome. In these plots, the smoothing of the curve is achieved by averaging over 50 consecutive genes. Significant deviations from average expression in a particular chromosomal region is not sufficient to infer coordinated deregulation. This is because it does not allow to infer whether all genes of a region are actually de-regulated in the same subset of patients. They could also be de-regulated in different patients. Consider three genes G1, G2, G3 and their expression in patients A,B,C,D. Each gene is up-regulated in 50% of patients. If the genes are up-regulated in different patients (G1 is up-regulated in A/B, G2 is up-regulated in B/C, G3 is up-regulated in C/D), then one can not assume that there is a regional up-regulation in all patients. However, if the genes are up-regulated in the same patients (G1, G2 and G3 are all up-regulated in A and B), then it is fair to assume that they have undergone coordinated regional up-regulation. Chance effects more likely create non-coordinated up-regulation. To capture such a gene-versus-gene correlation structure, we performed the following for a given chromosome region: For each pair of genes of a given chromosome region we count the number of their coordinated (simultaneous) up-regulations (based on the above computed fold changes) over the set of patients and the number of coordinated down-regulations, separately. These values can be represented in gray-scale plots: one gray scale plot for the coordinated up-regulation and a similar one for coordinated down-regulation. Both, horizontal and vertical axis comprise genes of the chromosome region in the right chromosomal order (see Figures 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32). The darkness of squares represents the number of coordinated up- or down-regulations, respectively. Coordinately up-regulated regions show up as squares with high "correlation" measures along the diagonal. Such resulting cross-comparison matrices can be visualized interactively for any chromosomal region on our supplementary website[41] along with heat maps of expression intensities and are used in Figures 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32. Alternatively, we applied "correlation" measures like Pearson correlation coefficients on fold changes, mutual information, and set-theoretic coefficients like the Dice and Jaccard coefficients on binary patterns of up-regulation and down-regulation (only available on our website [41]). Although this analysis is already instructive for the visual identification of general up/down-regulation of a particular region, it does not allow to infer the precise boundaries of deregulated regions. Several software packages for the analysis of array CGH data exist that have been announced to also be suited for the analysis of expression data [42-44]. In the following, we used the ChARM software package [44]. ChARM can be used to infer intervals of variable size with significant positive or negative signal amplitudes in ordered data, such as log(intensity) values in array CGH data and mRNA expression data. We applied the ChARM algorithm on different data sets that harbor information about the numbers of patients with coordinated up- and down-regulation of expression for all genes on human autosomes and the X chromosome. For each chromosome six separate data sets were prepared, according to scanning window sizes of 5, 11, 21, 31, 41, 51. Within each window all possible gene pairs (excluding self comparisons) were considered. For each gene pair, the number of coordinated up-regulated (counted as +1) and down-regulated (counted as -1) was determined. For each window the sum of these gene pair-specific values divided by the total number of pairs gave the cumulative misregulation score (CMS). In a sliding window approach, each gene was associated with a CMS value. CMS values for genes at the edges of chromosomes were calculated with reduced window sizes. The main theoretical advantage of the use of CMS scores compared to raw up-regulation counts or averaged expression ratios is that it captures only information from co-regulated neighboring gene pairs: Noise signals fluctuate across genes and may more often lead to artificial assignment of high expression ratios between two genes. In contrast, real signals of regional up-/down-regulation lead to consistent changes in the same patients for two genes. For each window size, CMS data sets of each chromosome were subject to ChARM analysis. ChARM determines borders of regions with high signal amplitudes in ordered data, here regions of expression imbalances along a chromosome, by an expectation-maximization approach. In addition, ChARM provides different statistical estimates to judge the significance of expression deregulation in a particular chromosomal region [44]. The identified deregulated regions were further evaluated manually using heat maps and the above mentioned gene-versus-gene "correlation" plots (see above, Figures 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32 and accompanying website). Availability and requirementsProject name: Colorectal carcinoma comparative chromosomal gene expression analysis (CC-CCGEA) [41]. Project home page: http://ccgea.molgen.mpg.de/cgi-bin/ccgea/ccgea.pl webcite Operating system(s): all Programming language: Perl-CGI Licence: GNU GPL Restrictions to use by non-academics: none Competing interestsThe author(s) declare that they have no competing interests. Authors' contributionsES guided and performed data analysis and drafted the manuscript, JG performed RNA sample preparation, microdissection and hybridization of chips, DM hybridized DNA chips and contributed to data analysis, SR was involved in data preprocessing and analysis, IK performed laser capture microdissection, ECV did laser capture microdissection, TB supervised chip hybridization and data preprocessing, BM and HJB were responsible for clinical part of the study including sample acquisition and patients' informed consent, CP and BW supervised chip hybridization, quality control and data preprocessing, AR conceived the study and revised the manuscript. AcknowledgementsWe especially acknowledge valuable contributions by Klaus Hermann who died of stomach cancer during this study. He has been a wonderful colleague for us and has put much effort into the implementation of the data preprocessing pipeline that was used in this study. References
Have something to say? Post a comment on this article! |




on Google Scholar







author email
corresponding author email
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 12.
Figure 13.
Figure 14.
Figure 15.
Figure 16.
Figure 17.
Figure 18.
Figure 19.
Figure 20.
Figure 9.
Figure 10.
Figure 11.
Figure 21.
Figure 22.
Figure 23.
Figure 24.
Figure 25.
Figure 26.
Figure 27.
Figure 28.
Figure 29.
Figure 30.
Figure 31.
Figure 32.