Skip to main content

Comprehensive analysis of H3K27me3 LOCKs under different DNA methylation contexts reveal epigenetic redistribution in tumorigenesis

Abstract

Background

Histone modification H3K27me3 plays a critical role in normal development and is associated with various diseases, including cancer. This modification forms large chromatin domains, known as Large Organized Chromatin Lysine Domains (LOCKs), which span several hundred kilobases.

Result

In this study, we identify and categorize H3K27me3 LOCKs in 109 normal human samples, distinguishing between long and short LOCKs. Our findings reveal that long LOCKs are predominantly associated with developmental processes, while short LOCKs are enriched in poised promoters and are most associated with low gene expression. Further analysis of LOCKs in different DNA methylation contexts shows that long LOCKs are primarily located in partially methylated domains (PMDs), particularly in short-PMDs, where they are most likely responsible for the low expressions of oncogenes. We observe that in cancer cell lines, including those from esophageal and breast cancer, long LOCKs shift from short-PMDs to intermediate-PMDs and long-PMDs. Notably, a significant subset of tumor-associated long LOCKs in intermediate- and long-PMDs exhibit reduced H3K9me3 levels, suggesting that H3K27me3 compensates for the loss of H3K9me3 in tumors. Additionally, we find that genes upregulated in tumors following the loss of short LOCKs are typically poised promoter genes in normal cells, and their transcription is regulated by the ETS1 transcription factor.

Conclusion

These results provide new insights into the role of H3K27me3 LOCKs in cancer and underscore their potential impact on epigenetic regulation and disease mechanisms.

Introduction

Gene expression is regulated by epigenetic modifications. The tri-methylation of lysine 27 on histone H3 (H3K27me3), catalyzed by the Polycomb Repressive Complex 2 (PRC2), is a well-studied histone modification associated with transcriptional repression [1]. In embryonic stem cells, over 85% of H3K27me3-marked promoters are poised promoters, containing both H3K4me3 and H3K27me3 marks [2]. These promoters are enriched for developmental factors and contribute to a bivalent chromatin state [3, 4]. During cellular differentiation, the distribution of H3K27me3 becomes more diffuse, leading to the loss of H3K27me3 at CGI promoters in a subset of genes that undergo either transcriptional activation in certain lineage-specific developmental context or long-term repression in other cell types [5, 6]. Increasingly, studies show that H3K27me3 tends to cluster, forming extensive blocks that span hundreds of kilobases [7,8,9,10,11]. These H3K27me3 blocks exhibit stronger gene expression repression and denser genomic interactions [8, 10] compared to individual peaks. Moreover, H3K27me3 blocks are strongly associated with developmental functions [10] and predict primitive cell identity [11]. Several studies have employed the CREAM R package to identify these regions, referring to them as Large Organized Chromatin Lysine Domains (LOCKs) [9, 11, 12], which were originally described for the H3K9me2 modification and were subsequently extended to other H3K modifications [13]. However, the features and functions of LOCKs remain incompletely understood. In addition, although studies have suggested that the length of these clusters may impact functionality [7, 10], the small-clustered H3K27me3 peaks have not received equal attention.

DNA methylation is another extensively researched layer of epigenetic modification, where DNA methyltransferases add a methyl group to cytosine at CpG sites, forming 5-methylcytosine [14]. DNA methylation's effect on gene expression depends on its location: methylation in the promoter is related to gene silencing, while methylation in the gene body is linked to activation in dividing cells [15]. Additionally, DNA methylation maintains the silencing of transposable elements [16, 17]. In normal cells, DNA methylation tends to remain stable over generations. However, with increased cell division, as seen in aging and tumor cells, significant methylation loss can occur [18,19,20]. Previous studies employing whole-genome methylation sequencing (WGBS) have found that tissues with low methylation levels often exhibit widespread regions of methylation loss called Partially Methylated Domains (PMDs), contrasting sharply with the surrounding Highly Methylated Domains (HMDs) [21, 22]. PMDs range in size of hundreds of kilo-base pairs (kb) and are primarily found in gene-poor domains [21, 23]. They strongly overlap with repressive chromatin domains [23,24,25], nuclear lamina-associated regions and late-replicating regions [22], with their boundaries being enriched in CTCF binding sites [22, 26]. Functionally, PMDs contribute to gene repression and increased genomic instability [26, 27]. A previous study revealed that the majority of PMDs were shared across human normal tissues, tumor tissues, and mouse tissues, leading to the definition of common PMDs [28]. This concept expands PMD research beyond the WGBS method and has been effectively applied to other studies [29, 30]. A recent study also classified common PMDs into three subsets (short (S)-PMDs, intermediate (I)-PMDs, and long (L)-PMDs) based on replication timing and methylation variability, revealing substantial differences in domain size among them [29]. Moreover, genes in the S-PMDs were found to be enriched in immune-related pathways. However, specific features and functions of S/I/L-PMDs remain unclear.

DNA methylation and histone modifications show discernible patterns in intensity and distribution [31, 32]. Indeed, there is an antagonistic relationship between H3K27me3 peaks and DNA methylation, with a negative correlation in intensity and limited genomic overlap [21, 33, 34]. Considering that H3K27me3 forms LOCKs and DNA hypomethylated regions extend into PMDs, we reasoned that investigation of H3K27me3 LOCKs and DNA methylation on a scale of hundreds of kilobases may reveal new epigenetic insights. Interestingly, in taxane-resistant triple-negative breast cancer cells, extensive DNA methylation loss is replaced by H3K27me3 LOCKs, compensating for transposon suppression [9]. Furthermore, in breast cancer, PMDs are occupied by H3K27me3 and H3K9me3, with H3K27me3 flanking H3K9me3 [24]. Similarly, in the D425 medulloblastoma cell line, PMDs covered by H3K27me3 are shorter than those covered by H3K9me3, implying a spatial correlation between PMDs and H3K27me3 or H3K9me3 blocks [25]. While previous studies have suggested a negative correlation between DNA methylation and H3K27me3, as well as a colocalization of PMDs and H3K27me3 LOCKs, these observations require further investigation due to the limited diversity of tissue types and sample sizes.

To delve deeper into the functionality of H3K27me3 LOCKs, here we utilized multi-omics data from 109 normal samples sourced from the Roadmap project and normal-tumor paired cell lines of esophageal squamous cell carcinoma (ESCC) and breast cancer (BRCA) retrieved from the GEO database (Fig. 1A). We employed the CREAM R package to identify long (greater than 100 kb) and short LOCKs (up to 100 kb) across all samples, subsequently providing a comprehensive analysis to investigate the genomic features and biological functions associated with different types of LOCKs across different DNA methylation backgrounds. We find that H3K27me3 long LOCKs are primarily localized within S-PMDs, where the difference in oncogene expression between those inside long LOCKs and those outside LOCKs is most pronounced. However, in tumors, this pattern changes: long LOCKs no longer preferentially localize to S-PMDs. On the other hand, peaks in short LOCKs, which exhibit the lowest nearest gene expression, are enriched in common HMDs. When we divided tumor-loss short LOCKs into upregulated and hypermethylated groups, we found that the upregulated group contained a higher proportion of poised promoters. These results provide a novel insight into the role of H3K27me3 LOCKs across different epigenomic contexts in differentiated cells and their involvement in cancer.

Fig. 1
figure 1

Genomic characteristics of three groups of H3K27me3 peaks. A Schematic of the sample types, multi-omics data, the LOCK identification method, and study design. B Comparative analysis of H3K27me3 intensity (n = 109), peak size (n = 109), DNA methylation (n = 20), and gene expression (n = 49) across different peak types. In each sample, the median value for each observation associated with each peak type was calculated, and the box plots aggregate the median values from all samples. The boxes indicate the 25th percentile, the median, and the 75th percentile. C A heatmap showing the top 20 significantly enriched GO terms for the genes associated with each group of peaks. Each column represents data from one peak type in a sample, with colors indicating the significance of enrichment for each GO term in the input gene list. D The left stacked bar chart shows the genomic distribution across 109 samples. On the x-axis are all samples, and on the y-axis are the percentages of various genomic elements relative to the total peak region size. On the right, a box plot represents the percentage for intergenic and promoter-TSS elements among peaks. UTR, untranslated region; TSS, transcription start site; TTS, transcription termination site. E Enrichment levels of three types of peaks within common PMDs or HMDs. Group comparisons in B, C, and E were performed using two-tailed paired t-tests, with statistical significance indicated by asterisks (***p < 0.001, **p < 0.01, *p < 0.05, NS. = not significant). F The heatmap shows the significance of peak enrichment in DMV regions or transposon regions, with colors representing the -log(q-value) generated by the LOLA R package. ME mesendoderm, TBL trophoblast-like cells, NPC neural progenitor cells, MSC mesenchymal stem cells, DMV DNA methylation valleys, LINEs long interspersed nuclear elements, LTRs long terminal repeats, SINEs short interspersed nuclear elements

Results

Distinct genomic characteristics of three groups of H3K27me3 peaks

Recent studies have demonstrated that H3K27me3 peaks often form extensive clusters, termed LOCKs, spanning several hundred kilobases, previously identified using the CREAM software [9, 11]. To investigate the characteristics and functions of H3K27me3 LOCKs, we utilized H3K27me3 ChIP-seq data from 109 normal samples generated by the Roadmap project (excluding embryonic stem cells (ESCs) and induced pluripotent stem cells), and applied the CREAM R package for LOCK detection. Based on the analytic results from CREAM, we classified the LOCKs into long (greater than 100 kb) and short LOCKs (up to 100 kb). Peaks not identified as part of any LOCKs by CREAM were designated as typical peaks (Fig. 1A, Fig. S1A-B).

After identifying the typical peaks, peaks in short LOCKs, and peaks in long LOCKs across 109 Roadmap samples, we analyzed the characteristics of these three groups of peaks. We found that, compared to typical peaks, peaks within long or short LOCKs exhibited higher peak intensity, larger size, lower DNA methylation levels, and reduced expression of nearest genes (Fig. 1B). The number of peaks in each group is an important characteristic for understanding short and long LOCKs. Short LOCKs exhibit a significantly higher LOCK number compared to long LOCKs (Fig. S1C). However, because of their shorter lengths, each short LOCK contains fewer peaks, leading to a total peak count in long LOCKs that is approximately five times greater than in short LOCKs or typical peaks (Fig. S1D).

To investigate the functional processes regulated by the three groups of peaks, we performed Metascape analysis on the nearest genes of the three groups of peaks in each sample [35, 36]. The top 20 most enriched GO terms revealed that as the domain size increased (i.e., from typical peaks to short LOCKs and then to long LOCKs), there was a gradual increase in the enrichment of developmental processes, such as “epithelial cell differentiation”, “embryonic organ development” and “gland development” (Fig. 1C). Furthermore, the genome distribution annotation revealed that peaks in short LOCKs were more frequently distributed in promoter-TSS regions compared to the other two groups of peaks, which might explain why peaks in short LOCKs were more strongly associated with low gene expression (Fig. 1B, D).

Considering the propensity of H3K27me3 to be distributed in regions with low DNA methylation [24], we analyzed the genomic relationship between LOCKs and common PMD/HMD regions as previously annotated [28]. We computed the enrichment scores of H3K27me3 peaks in common PMDs and common HMDs, adjusted for the total lengths of these regions. Our analysis revealed that long LOCK peaks and typical peaks exhibited higher enrichment in common PMD regions, whereas peaks in short LOCKs showed a greater enrichment in common HMD regions (Fig. 1E). In addition to H3K27me3, H3K9me3 is also a histone modification associated with gene suppression. To provide a comparison, we utilized H3K9me3 ChIP-seq data from Roadmap samples to identify H3K9me3 LOCKs using the same methodology. At the peak level, peaks within H3K9me3 long LOCKs also exhibited significantly higher enrichment in common PMD regions. Notably, there was no significant difference between peaks in H3K9me3 short LOCKs and typical H3K9me3 peaks in the enrichment of common PMDs and common HMDs (Fig. S1E). This suggests that although both H3K27me3 and H3K9me3 are inhibitive histone modifications and are generally enriched in DNA hypomethylated regions [25], there is a significant difference in their genomic characteristics and distribution patterns.

Both DNA methylation valleys (DMVs) and transposons (such as interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs) and long terminal repeats (LTRs)) are closely associated with H3K27me3 and DNA methylation. Research has indicated that early developmental regulatory genes are often situated within DMVs, characterized by CGI promoters, and are frequently silenced by H3K27me3 in differentiated cells [37]. On the other hand, transposons are primarily regulated by DNA methylation and can be compensatorily suppressed by H3K27me3 LOCKs when DNA methylation is depleted [9]. Therefore, we analyzed the enrichment of DMVs [37] and transposons [38] within the peak regions. The results indicated significant enrichment of DMVs in peaks within short LOCKs, while SINEs and LINEs were enriched in the peaks within long LOCKs (Fig. 1F). Additionally, LTRs and LINEs were found to be enriched in typical peaks. In summary, peaks within long and short LOCKs exhibit distinct H3K27me3 features and expression levels of nearest genes compared to typical peaks. Additionally, the three groups of peaks exhibit distinct preferences for distribution across various genomic elements, including common PMDs/HMDs, promoters/intergenic regions, and DMVs/transposons.

H3K27me3 long LOCKs are enriched in S-PMDs and associated with low oncogene expression

In addition to conducting analyses at the peak level, we also performed analyses at the LOCK level to examine the overall characteristics of LOCKs. Our first focus was on long LOCKs which contained ~ 70% of all H3K27me3 peaks. Given that (i) common PMDs and common HMDs exhibit distinct DNA methylation profiles, (ii) the significant impact of DNA methylation on the distribution of H3K27me3, we conducted independent analyses on long LOCKs within these regions. Specifically, we assessed the DNA methylation levels and gene expression patterns within the LOCK regions (Fig. 2A, B). The genes are required to have an overlap size with the long LOCKs that exceeds half of their length and must be the nearest gene of at least one peak in the long LOCKs. In common PMD regions, characterized by low methylation levels and the suppression of gene expression, long LOCK regions exhibited a lower median methylation level and expression level compared to those in common HMDs. However, these differences were not statistically significant compared to the overall common PMDs. Conversely, in common HMD regions, the long LOCK regions displayed significantly lower levels of DNA methylation and gene expression than the background.

Fig. 2
figure 2

Features of H3K27me3 long LOCKs. AC DNA methylation (A), gene expression (B), and gene density (C) in long LOCKs and the background within common PMDs or HMDs. D Relative loop density in the long LOCKs and the background within the whole genome, common PMDs, or common HMDs. Line colors represent the relative loop density within long LOCKs compared to the background in each sample. The Winsorize method was used to remove outliers, excluding samples with values beyond mean ± 3 SDs for long LOCKs or the background. E The enrichment of long LOCKs in S/I/L-PMDs and common HMDs. F Gene density in long LOCKs and in the background within S/I/L-PMDs. G The top 20 pathways most enriched in genes within long LOCKs in S/I/L-PMDs. Each column represents a Roadmap sample. H The beeswarm plot shows the proportion of OGs or TSGs in long LOCKs among all genes in long LOCKs, separately for common PMDs/HMDs and S/I/L-PMDs. The purple dots represent the proportion of OGs or TSGs among all genes in common PMDs/HMDs or S/I/L-PMDs. OGs oncogenes, TSGs tumor suppressor genes. I The density plot shows the proportion of OGs among all genes in long LOCKs, separately for S/I/L-PMDs. J Comparison of the expression of OGs or TSGs within long LOCKs to those outside long or short LOCKs, separately for common PMDs or HMDs. K Consistent with the content depicted in J, but for OGs in S/I/L-PMDs. The two-tailed paired t-test was used (***p < 0.001, **p < 0.01, *p < 0.05, NS = not significant) in (B-D, H, J and K). L H3K27me3 modification near the IGF2BP1 gene in different Roadmap samples. IGF2BP1 is an OG, with more than half of its length located in S-PMDs

It has been shown that genes are relatively scarcer within PMDs compared to HMDs [27]. We thus aimed to investigate the gene density within LOCKs. Our results revealed that in all samples, the gene density of long LOCKs within common PMDs was significantly higher than that of the overall common PMDs. Conversely, the median gene density of long LOCKs within common HMDs was comparable to the background (Fig. 2C). These findings suggest that long LOCKs within common PMD regions are preferentially distributed in genomic regions with higher gene density. Since gene-dense regions typically exhibit more complex regulation than gene-poor domains, we obtained Hi-C data from the ENCODE database that is corresponding to the Roadmap samples and computed the relative number of loops in long LOCK regions as well as randomly selected regions (Fig. S2A). As anticipated, we observed higher density of chromatin loops in common HMDs compared to common PMDs, supportive of the increased gene regulation activity in common HMDs (Fig. 2D). Notably, the number of loops in the long LOCK regions was significantly higher than that in the randomly selected regions, both across the entire genome and within the common PMDs. In common HMDs, however, there was no significant difference between the two groups. These findings indicate long LOCKs within common PMD regions are characterized by higher gene density and exhibit enhanced chromatin interactions. This propensity may facilitate their regulation on gene transcription.

As at peak level, we calculated the enrichment score of the whole LOCKs in common PMDs and common HMDs (Fig. 1E). Similar to the peak level, short LOCKs were still significantly enriched in common HMDs, but there was no significant difference in the enrichment fraction of long LOCKs in common PMDs or HMDs (Fig. S2B). In contrast, both long LOCKs and short LOCKs of H3K9me3 were more significantly enriched in common PMDs (Fig. S2C). To explore the association between PMD length and long LOCKs, we categorized common PMDs into three subsets (S/I/L-PMDs) based on a published method (Fig. S2D-E) [29]. The results revealed that H3K27me3 long LOCKs were more significantly enriched in S-PMDs (Fig. 2E), while H3K9me3 long LOCKs were more significantly enriched in L-PMDs (Fig. S2F). As shown in Fig. 2F, there were differences in overall gene density in the S/I/L-PMDs, with S-PMDs having the highest and L-PMDs having the lowest. Across all three regions, the gene density of long LOCKs was higher than the overall level, with the most pronounced difference observed in L-PMDs. Regarding potential functional implications, we examined genes covered by long LOCKs over half of their size. Metascape analysis revealed that long LOCKs in S-PMDs mainly cover genes related to cell skeleton and epithelial cell differentiation, while those in I-PMDs mainly cover genes related to stimulus and sensation, and those in L-PMDs mainly cover genes related to development and cell fate determination (Fig. 2G). These results indicate that long LOCKs in S/I/L-PMDs exhibit distinct characteristics and distribution preferences, in line with the finding that S/I/L-PMD regions possess different genomic characteristics [29].

A previous study has reported that broad H3K27me3 regions are enriched for oncogenes (OGs), but the influence of different DNA methylation backgrounds was not considered [7]. We calculated the proportion of 1,500 OGs and 1,500 tumor suppressor genes (TSGs) [39] in common HMDs/PMDs as well as S/I/L-PMDs (purple dots in Fig. 2H), along with the proportion of these genes within long LOCKs in each methylation domain (orange dots in Fig. 2H). The results showed significant differences in the distribution of OGs and TSGs under different methylation backgrounds: there were more OGs in common PMDs, and more TSGs in common HMDs, consistent with previous findings that DNA methylation loss was associated with oncogene suppression [40]. Interestingly, the I- and L-PMDs had a higher proportion of OGs than S-PMDs. Moreover, the proportions of OGs or TSGs in long LOCKs in each methylation domain were mostly similar to the background. We further calculated the proportion of OGs in long LOCKs within S/I/L-PMDs. Surprisingly, the proportion in S- was higher than in I- and L-PMDs, indicating that although S- had a lower proportion of OGs compared to I- and L-PMDs, the OGs within S-PMDs were more often to be associated with long LOCKs (Fig. 2I). Additionally, we analyzed the expression levels of OGs and TSGs in common PMDs and common HMDs. Compared to OGs outside of LOCKs, those within long LOCKs had significantly lower expression only in common PMDs (Fig. 2J). Further analysis within S/I/L-PMDs revealed that long LOCKs in S-PMDs are most associated with the low expression of OGs (Fig. 2K). Figure 2L shows IGF2BP1 as an example of an OG within S-PMDs, which is located within long LOCKs across multiple tissue samples. Therefore, these results demonstrate that long LOCKs within S/I/L-PMDs exhibit distinct characteristics and functions, with long LOCKs being enriched in S-PMDs. Moreover, while the PMD domain is generally associated with oncogene suppression, our data suggests that in S-PMD, this inhibitory function is possibly through long LOCK-mediated transcription silencing.

Redistribution of H3K27me3 long LOCKs in tumor cell lines

To explore the alterations of long LOCKs in tumors, we analyzed epigenomic data for ESCC and BRCA cells, alongside their corresponding normal cells, from the GEO database. By applying the same analytical methods and criteria utilized in the Roadmap data analysis, we identified both long and short LOCKs across these cell lines. Echoing findings from the Roadmap data, long LOCKs showed significant enrichment within common PMDs both in tumor and normal cell lines (Fig. 3A). Intriguingly, when comparing tumor cell lines against normal counterparts, an increased enrichment in common PMDs was observed, accompanied with a decreased enrichment in common HMDs. This pattern suggests a shift in the H3K27me3 landscape of tumor cell lines, and indicates a tighter association between long LOCKs and common PMDs. As a part of the control analysis, we also analyzed H3K9me3 data from BRCA samples and their matched normal cell lines to assess the distribution of H3K9me3 long LOCKs. Despite all cell lines showing a preference for the enrichment of H3K9me3 long LOCKs within common PMDs, a notable distinction was observed for tumors. In tumor samples, the presence of H3K9me3 long LOCKs in common PMDs was reduced compared to normal samples, with an opposite trend noted in common HMDs (Fig. S3A).

Fig. 3
figure 3

Alteration of long LOCKs in normal and tumor cell lines. A Changes in the enrichment of H3K27me3 long LOCKs in common PMDs or HMDs between normal and tumor cell lines. B Tumor-loss, shared, and gain long LOCKs based on tumor and normal cell line pairs were identified. The stacked bar charts show the proportions of common PMDs, common HMDs, and non-common PMD/HMD regions covered by these three groups of LOCKs. C Distribution of tumor-loss, shared, and gain LOCKs in the NE2-KYSE450 pair across different genomic regions. D Average H3K27me3 intensity in the three groups of long LOCK regions and their surrounding regions for the NE2-KYSE450 pair. E Heatmap shows the average DNA methylation levels of the three groups of long LOCK regions in NE2 vs. KYSE450, across the whole genome and within common PMDs and HMDs. F Normalized gene expression levels within the three groups of LOCKs in common PMDs, for NE2 and KYSE450. G Consistent with F, but for oncogenes. The two-tailed t-test was used (***p < 0.001, **p < 0.01, *p < 0.05, NS = not significant) in (A, F and G)

To delve deeper into the dynamics of long LOCKs in the context of cancer, we paired tumor cells with corresponding normal cells to establish three categories of long LOCKs: those that were lost in tumor compared to normal (tumor-loss LOCKs), shared LOCKs, and those that were gained in tumor compared to normal (tumor-gain LOCKs). Subsequent analysis revealed a predominance loss of long LOCKs in common HMDs and non-common PMD/HMD domains (referred to as “others”), while common PMDs featured a balanced mix of both tumor-loss and tumor-gain LOCKs (Fig. 3B). This pattern highlights a significant reorganization of long LOCKs within the tumor epigenome, predominantly characterized by their retreat from non-common PMD regions and relocalization into common PMDs. We conducted statistical analyses on the three groups of LOCKs across various normal-tumor pairs. While the majority of LOCKs in each group resided in common PMDs, tumor-loss LOCKs only comprised 41.3% to 51.8% of the common PMD regions, whereas common PMDs constituted 88.1% to 95.7% of tumor-gain LOCKs (Fig. 3C; Fig. S3B). This elucidates the increased enrichment of long LOCKs in common PMDs within tumor cell lines (Fig. 3A).

We next assessed the H3K27me3 intensity and DNA methylation level of the three groups of long LOCKs. As anticipated, tumor-loss LOCKs showed decreased H3K27me3 intensity and increased DNA methylation in tumor cell lines, while tumor-gain LOCKs exhibited the opposite pattern. Shared LOCKs displayed a slight increase in H3K27me3 signal, coupled with a slight decrease in DNA methylation signal (Fig. 3D, E; Fig. S3C-D). One exception was noted in Normal vs TN-Basal, where HCC1937 displayed an overall higher DNA methylation signal; however, even in this case, we observed that the difference between HCC1937 and MCF10A in tumor-gain LOCKs was smaller than in tumor-loss LOCKs (Fig. S3D). Since long LOCKs in tumor and normal cell lines are predominantly located within common PMDs (Fig. 3A), with the proportions of the three groups of LOCKs in common PMDs being relatively comparable (Fig. 3B), we assessed the expression levels of the genes (Fig. 3F; Fig. S4A) and OGs (Fig. 3G; Fig. S4B) within these LOCKs in common PMDs. Our analysis revealed an upregulation pattern in both genes and OGs within tumor-loss LOCKs in tumor cell lines, while a downregulation trend was observed in genes and OGs within tumor-gain LOCKs in tumors. This suggests that long LOCKs relate to the inhibition of the expression of genes and OGs within their vicinity. We further investigated gene expression levels within long LOCKs compared to genes outside of any LOCKs (Fig. S4C). Similar to the patterns observed in Roadmap samples, both normal and tumor cell lines showed a lower gene expression in long LOCKs within S-PMDs, although this did not reach statistical significance in the normal cell lines. Consequently, we discerned that long LOCKs are more prevalently enriched in common PMDs within tumors and exhibit repositioning within these regions, thereby exerting a pivotal influence on the regulation of gene and OG expression levels.

A subset of H3K27me3 long LOCKs in tumors are converted from H3K9me3 domains in normal cells

Due to the enrichment of long LOCKs in S-PMDs observed in normal tissues of the Roadmap samples, we wondered whether this characteristic persisted in cell lines. Interestingly, in normal cell lines, long LOCKs continued to show enrichment in S-PMDs. However, in tumor cell lines, there was no significant difference in the enrichment of long LOCKs across S/I/L-PMDs (Fig. 4A). Conversely, H3K9me3 long LOCKs were significantly enriched in L-PMDs in both tumor and normal cell lines (Fig. S5A), consistent with the Roadmap samples. Upon analyzing the proportions of the three groups of LOCKs in S/I/L-PMDs, we found that tumor-loss LOCKs dominated the S- and I-PMDs. In contrast, in L-PMDs, tumor-gain LOCKs represented the largest proportion (except for NE2 vs KYSE450; Fig. 4B). The three groups of LOCKs in S/I/L-PMDs also exhibited the expected DNA methylation levels. Indeed, the DNA methylation level of tumor-loss LOCKs increased in tumors, while that of tumor-gain LOCKs showed the opposite pattern (Fig. 4C). We observed that in tumor cell lines, DNA methylation levels followed the pattern L-PMDs < I-PMDs < S-PMDs (Fig. 4C), in line with the finding that L-PMDs is at the latest stage of DNA replication, followed by I- and then S-PMDs (Fig. S2D). As a corollary, these data imply that in tumors, due to the most significant decrease in DNA methylation levels in the L-PMDs, more inhibitive markers are supplemented, leading to the redistribution of H3K27me3 long LOCKs from S- to I- and L-PMDs.

Fig. 4
figure 4

Changes in long LOCKs within S/I/L-PMDs in normal and tumor cells. A The enrichment of H3K27me3 long LOCKs in S/I/L-PMDs. The two-tailed t-test was used (***p < 0.001, **p < 0.01, *p < 0.05, NS. = not significant). B The proportions of the three groups of long LOCKs covering S/I/L-PMDs. C The heatmap shows average DNA methylation levels of the three groups of long LOCKs within S/I/L-PMDs in normal breast cell lines and Her2 + BRCA cell lines. D The heatmap shows H3K27me3 and H3K9me3 signals of three groups of long LOCKs within S/I/L-PMDs in normal breast cell lines and Her2 + BRCA cell lines. Each row represents a long LOCK region, with bars on the right indicating its PMD classification (S/I/L), the corresponding LOCK group, along with the change state of H3K27me3 and H3K9me3 in tumors compared to normal samples, with a minimum change of 0.1. The text on the far right shows the proportion of tumor-gain LOCKs in I- or L-PMDs with decreased, unchanged, or increased H3K9me3 signals. T, tumor; N, normal. E As an example, a snapshot of Integrated Genome Viewer (IGV) software is shown. It reflects the loss of H3K9me3 in a tumor-gain LOCK region within an L-PMD

While observing the expected changes in DNA methylation levels in the three groups of LOCKs in tumor cell lines, we also noticed that the tumor-gain LOCKs (i.e., non-LOCK regions in normal cell lines) exhibited low DNA methylation levels, even lower than those of tumor-loss LOCKs and shared LOCKs in the same cell line, especially in common PMDs (Fig. 3E; Fig. S3D) and in L-PMDs (Fig. 4C; Fig. S5B). Given that in normal samples, L-PMDs are where H3K9me3 tends to distribute (Fig. S2F; Fig. S5A), we speculated that these locations might represent regions predominantly covered by H3K9me3 in normal cell lines. Therefore, we computed the H3K9me3 intensity of tumor-gain LOCK regions within common PMDs in normal cell lines, revealing a robust modification level, particularly noticeable in L-PMDs (Fig. 4D). Additionally, according to our predefined threshold (see Methods), a substantial proportion of tumor-gain H3K27me3 LOCKs within I- (23–40%) and L-PMDs (28–61%) exhibited enrichment of H3K9me3 modifications in normal cell lines. In contrast, only a minimal percentage of tumor-gain H3K27me3 LOCKs in S-PMDs, ranging from 0–29%, displayed such enrichment (Fig. 4D, Fig. S5C). Figure 4E displays a snapshot of an exemplary region. Further, through expression level analyses of genes associated with tumor-gain LOCKs, we observed that most genes in these regions, with the exception of a few outliers, show very low expression in both normal and tumor cell lines, indicating that H3K27me3 long LOCKs compensate for the gene-suppressing effect of H3K9me3 in normal cells within these domains (Fig. S5D). Hence, our results elucidate that tumor cell lines predominantly acquire H3K27me3 long LOCKs within L-PMDs, with a subset originating from regions previously marked by H3K9me3 modification in normal cells.

Upregulated CGI genes associated with tumor-loss short LOCKs contain more poised genes

Besides long LOCKs, we were also interested in the dynamics and characteristics of short LOCKs. In addition to observing that peaks within short LOCKs displayed more robust H3K27me3 signals and a lower nearest gene expression (Fig. 1B), we also noted a significantly higher density of peaks in short LOCKs (Fig. 5A), indicating that these are compact but highly effective regulatory domains. Moreover, our earlier analyses have revealed that short LOCKs harbor a greater fraction of promoter-TSS sites (Fig. 1D). Further region enrichment analysis with LOLA R package revealed that, apart from being enriched in promoters, CGIs, and CGI promoters, short LOCKs also exhibited enrichment in ESC poised promoters and associated markers such as H3K4me3, H3K27me3 and EZH2 (Fig. 5B). We thus analyzed H3K4me3 ChIP-seq data from 109 Roadmap samples to identify CGI promoters in each sample that exhibit concurrent H3K4me3 and H3K27me3 modifications. Subsequently, we computed the proportion of poised promoters out of CGI promoters with H3K27me3 peak binding and observed that this proportion was significantly higher in short LOCKs compared to the genome-wide level (Fig. 5C upper panel). We also calculated the proportion of poised promoters overlapped with ESC poised promoters [41] over all poised promoters for each sample. Similarly, within short LOCKs, this proportion was notably higher compared to the genome-wide level (Fig. 5C below panel). These findings indicate that short LOCKs are not only enriched with poised promoters but also harbor a greater proportion of these promoters that are vestiges from the embryonic period than genome-wide.

Fig. 5
figure 5

Short LOCKs are related to poised promoters. A Peak density within short or long LOCKs, indicating the percentage of the total peak length relative to the total LOCK length, in 109 Roadmap samples. B The heatmap shows the top 20 LOLA terms most significantly enriched in peaks within short LOCKs. Each column represents a peak type from a Roadmap sample, with color intensity showing the level of significance. C Comparing short LOCKs to the whole genome based on Roadmap samples. The top part shows the proportion of poised promoters among CGI promoters with H3K27me3 peak binding, while the bottom part shows the proportion of poised promoters overlapping with ESC poised promoters among all poised promoters. D The number of CGI promoters within tumor-loss, shared, and gain short LOCKs identified across different normal-tumor pairs. E A schematic illustrating the identification process of upregulated and hypermethylated CGI-promoter genes. F Number of CGI promoters within an upregulated or hypermethylated group. G In the NE2-KYSE450 pair, the H3K27me3 intensity, DNA methylation, gene expression, H3K27ac intensity, and H3K4me3 intensity of upregulated or hypermethylated CGI promoters. T, tumor; N, normal. The two-tailed t-test was used (***p < 0.001, **p < 0.01, *p < 0.05, NS. = not significant) in (A and G). H Percentage of poised promoters among upregulated or hypermethylated CGI promoters

We further conducted investigations on short LOCKs using paired tumor-normal cell lines. Due to heterogeneity among cell lines and the small overall coverage of short LOCKs, there were few regions of short LOCKs shared between different cell lines of the same type. To broaden regions available for the study, we compared one tumor cell line with one normal cell line for each subtype of BRCA, allowing us to identify tumor-loss short LOCKs, shared short LOCKs, and tumor-gain short LOCKs. Upon statistical analysis, it became evident that tumor-loss short LOCKs were the predominant change (Fig. 5D). Our attention was thus primarily directed towards tumor-loss short LOCKs. Given their propensity to reside at CGI promoters, we investigated genes with CGI promoters and situated within short LOCKs to explore the regulatory influence of short LOCKs on gene expression. As tumor-loss short LOCKs correspond to short LOCKs in normal cell lines, we reasoned that genes regulated by this group might have comparatively low expression levels in normal cell lines. Therefore, we screened the genes based on an FPKM < 2 threshold in normal cell lines. Following methodologies outlined in our previous study [41], we identified two groups of genes based on their expression levels and promoter DNA methylation status: upregulated CGI-promoter genes and hypermethylated CGI-promoter genes (Fig. 5E, F).

We computed the H3K27me3 signal, DNA methylation level, and gene expression level for both groups, showing expected patterns (Fig. 5G; Fig. S6). Furthermore, the H3K27ac and H3K4me3 signals of both groups aligned with the gene expression levels. Following this, we determined the poised promoters in normal cell lines by integrating data from H3K4me3 ChIP-seq for each cell pair, and subsequently calculated the proportion of poised promoters in both groups. Although the number of promoters differed between the upregulated and hypermethylated groups in ESCC and BRCA cell lines (Fig. 5F), the upregulated group contained a higher proportion of poised promoters consistently in both cell lines (Fig. 5H), indicating that poised promoters harboring both repressive and active markers are more likely to be activated when there is a loss of H3K27me3 short LOCKs. According to our results, short LOCKs are associated with poised promoters. This infers that genes associated with the loss of short LOCKs in tumor cell lines may either remain suppressed due to promoter hypermethylation or become activated through regulation by active histone modifications.

Identification of ETS1 as a key TF activating genes in tumor-loss short LOCKs

To pinpoint the transcription factors (TFs) driving the upregulated genes in tumor-loss short LOCKs, we designated the promoters from this group as the foreground set and those from the hypermethylated group as the background set, applying the HOMER software for motif discovery (Fig. 6A). Following statistical analysis of significantly enriched motif subfamilies, we observed that the ETS subfamily ranked the highest (Fig. 6B). ETV4 and ETS1 were top ranked TFs in this subfamily (Fig. 6A).

Fig. 6
figure 6

Identification of ETS1 as a key TF upregulated in tumor-loss short LOCKs. A Bubble plot shows motifs identified using Homer software with upregulated CGI-promoters as foreground and hypermethylated CGI-promoters as background. Motifs are ranked by their occurrence across multiple normal-tumor pairs. No significant motifs were found in the NE2 vs KYSE510 group. Bubble size reflects TF expression in the tumor cell line of each pair and color indicates FDR. B Counting motif subfamily occurrences in all normal-tumor pairs. C Normalized expression level of the ETS1 gene in different ESCC cell lines from the DepMap portal. TPM: transcripts per million. D IGV snapshots display the genomic environment of representative ETS1 target genes which are upregulated in the KYSE450 cell line compared to NE2. E, F Relative expression of ETS1 target genes detected by qRT-PCR after ETS1 knockdown in KYSE150 (E) and overexpression in T.T cells (F). Two-tailed t-test was used (***p < 0.001, **p < 0.01, *p < 0.05, NS. = not significant)

While there is no publicly available ChIP-seq data of ETV4 in either ESCC or BRCA, we did find ChIP-seq data for ETS1 in head and neck squamous cell carcinoma (HNSCC), as well as RNA-seq data from HNSCC with ETS1 knockdown. HNSCC shares a high similarity with ESCC in terms of cellular origin and epigenomic features [42]. Notably, ETS1 directly occupied all six genes that have ETS1 motifs at promoters and belong to the upregulated group in NE2 vs KYSE450 (Fig. S7A). RNA-seq data revealed that upon ETS1 knockdown, these genes were downregulated (Fig. S7B). Encouraged by these results obtained from public data, we extended our investigation into ETS1 within ESCC cell lines. We retrieved expression data from the DepMap portal for these cell lines, finding that KYSE150 displayed the highest ETS1 expression, whereas T.T cell line had the lowest (Fig. 6C). Thus, in KYSE150, we performed ETS1 ChIP-seq and obtained H3K27ac ChIP-seq data from our prior study [43], identifying six genes with ETS1 motifs whose promoters exhibited peaks of both ETS1 and H3K27ac (Fig. 6D; Fig. S7C). Upon ETS1 knockdown in KYSE150 cells and ETS1 overexpression in T.T cells, we conducted quantitative reverse transcription PCR (qRT-PCR) to assess the expression of ETS1 and these target genes. The results revealed a significant downregulation of these genes after ETS1 knockdown (Fig. 6E) and a significant upregulation after ETS1 overexpression (Fig. 6F), validating the regulation of these genes by ETS1 in ESCC. Together, these findings suggest that a subset of genes upregulated in tumors due to loss of short LOCKs at their promoters are under the regulation of the ETS subfamily, particularly ETS1.

Discussion

In this study, we utilized the CREAM R package to categorize H3K27me3 modification regions from 109 normal samples into long LOCKs, short LOCKs and typical peaks, providing a detailed description of their characterization and function. Peaks in long LOCKs, the group having the largest number of peaks among the three categories, show the most significant enrichment in developmental pathways. Similarly, in a study performed in esophageal epithelial cell line NE2, researchers discovered that the broadest 5% of the H3K27me3 domains were preferentially associated with cell fate determination [10].

While the association between DNA methylation and H3K27me3 has been extensively investigated [44,45,46], research at larger genomic scales, such as the relationship between LOCKs and PMDs, remains relatively sparse. We observe that peaks in long LOCKs are enriched in common PMDs. Although the enrichment scores of these large domains in common PMDs and common HMDs are not significantly different, they are more prominently enriched in S-PMDs, both at the peak level and domain level. In contrast, long H3K9me3 LOCKs are enriched in L-PMDs. While both H3K27me3 and H3K9me3 are repressive histone modifications, they exhibit distinct features. For instance, H3K9me3 is often associated with heterochromatin and is typically found along the nuclear periphery [47, 48]. H3K27me3 tends to localize at CGIs and is thought to regulate plasticity [41]. Our findings unveil novel characteristics of their distribution in common PMD regions. Previous reports also support our conclusion that H3K9me3 tends to localize in L-PMDs, while H3K27me3 is predominantly enriched in S-PMDs [23,24,25]. Moreover, H3K27me3 and H3K9me3 demonstrate a mutually exclusive relationship [24] (Fig. S8). Besides DNA methylation, the positioning of H3K27me3 is influenced by a range of histone modifications, including H3K4me3 and H3K36me2 [49]. The intricacies of these histone modification patterns require further exploration.

Our results indicate an increased enrichment of long LOCKs within common PMDs in tumors. Additionally, the gain of long LOCKs in I- and L-PMDs has led to the disappearance of the enrichment phenomenon of long LOCKs within S-PMDs in tumor cells. While the observations in the liver cancer cell line HepG2 do not support our conclusion [23], they do not contradict it either, as some cell lines in our study exhibit similar behavior. Some of the gained LOCKs in tumors exhibit H3K9me3 modification in normal cells, suggesting a redistribution from H3K9me3 to H3K27me3 at these loci, thereby compensating for the suppression of the associated genes. The underlying mechanism for this transition remains unclear. Additionally, we observed an increased enrichment of H3K9me3 within common HMDs in tumors, potentially resulting from a significant reduction of H3K9me3 in PMDs within the tumor samples. However, as the sample size used to draw this conclusion was limited, more comprehensive studies with larger sample sizes are needed to validate and to elucidate the functions.

Our results suggest that long LOCKs may be associated with inhibitory effects on the genes within them. Peaks in long LOCKs predominantly localized in intergenic regions and introns (Fig. 1D). Existing research also supports the notion that peaks located outside of promoters can regulate gene expression. For example, Broad Gene Regulatory Domains (BGRDs) span gene bodies, extending hundreds of kilobases from the transcription start sites. H3K27me3 in the gene bodies of BGRDs may indicate repression of transcription elongation through polymerase II pausing [7]. Moreover, non-promoter peaks can influence gene expression via loops created by three-dimensional genome folding [8]. Our findings highlight that long LOCKs contain more loops compared to random regions, especially in common PMDs. Supporting this, in many cell lines, such as normal lung fibroblast cell line IMR90, fibroblast-derived cell line GM23248 and lymphoblastoid cell line GM12878, broad H3K27me3 peaks align with topologically associating domains (TADs) [10]. The mechanisms by which long LOCKs suppress internal gene activity warrant future exploration.

Oncogenes are reported to be enriched in BGRDs [7]. To validate and further investigate this intriguing phenomenon, we partitioned the genome into common PMDs/HMDs and further stratified common PMDs into S/I/L-PMDs. It is found that common PMDs, especially the L-PMDs, contain a higher proportion of oncogenes, leading to a hypothesis that reduced DNA methylation levels in tumors may be associated with tumor-suppressive expression programs [40]. However, we found that within each methylation domain, the proportion of oncogenes to all genes in long LOCKs was similar to the background. Since H3K27me3 tends to localize in areas of low DNA methylation, our findings suggest that studies on the content of oncogenes in BGRDs can be improved after taking account of their genomic localization. Besides, we observed that in S-PMDs, a larger proportion of oncogenes was located within long LOCKs. Moreover, long LOCKs in S-PMDs were associated with lower expression levels of oncogenes compared to other genes. This phenomenon requires broader validation and further research to understand its biological significance.

Short LOCKs, as we have defined them here, are LOCKs that are less than 100 kb in length. They exhibit the highest density of peaks, strongest intensity of peaks, lowest levels of DNA methylation, and the lowest expression levels in the nearest genes. Prior research indicates that the mutual exclusivity of H3K27me3 and DNA methylation primarily occurs in CGI regions [44]. This accounts for the low DNA methylation observed in peaks within short LOCKs (Fig. 1B), which are notably enriched in CGI regions (Fig. 1D; Fig. 5B). Unlike peaks in long LOCKs and typical peaks, peaks in short LOCKs are significantly more enriched in common HMDs in normal samples, indicating that short LOCKs are likely distributed in the short hypomethylated areas within common HMDs.

DMVs are known to be enriched in poised promoters [33, 50], and peaks in short LOCKs show enrichment in both poised promoters and DMVs. Originally identified in ESCs, poised promoters are thought to be readily convertible to either active or repressive states, playing a key role in sustaining stem cell pluripotency, whereas differentiated cells lose poised promoters [51,52,53]. Interestingly, we observed that the majority of poised promoters in differentiated cells retain the bivalent state found in ESCs, especially those within short LOCKs (Fig. 5C). Poised promoters contribute to maintaining the differentiated state in normal tissues, whereas in tumors, poised promoters are disrupted, leading to either increased DNA methylation or transcriptionally activated [41, 54]. We found that there are more lost short LOCKs than gained ones in tumor cell lines. Upon the loss of short LOCKs in CGI promoters, the group acquiring active markers and showing activated expression contains a higher proportion of poised promoters. This suggests that the status of CGI promoters associated with tumor-loss short LOCKs may be linked to their status as poised promoters in normal cells.

Typical peaks are relatively isolated peaks, predominantly enriched in intergenic regions. Despite their lower signal intensity and weakest capacity to suppress gene expression, our analysis suggests a potential association with LTR and LINE suppression, which requires further investigation [9].

Taken together, our findings offer a new perspective on the role of H3K27me3 LOCKs in various epigenomic contexts within differentiated cells and their involvement in cancer, uncovering the underlying epigenetic mechanisms (Fig. 7).

Fig. 7
figure 7

A schematic diagram summarizes the features and roles of long LOCKs and short LOCKs in differentiated cells and during tumorigenesis

Methods

Human cell lines

Human esophageal cancer cells lines KYSE150 were grown in RPMI-1640 medium (Corning, #10040CV) and T.T cells were cultured in DMEM Dulbecco’s modified Eagle medium/ Hams F-12 50/50 Mix (#10090CMR). Media for both cell lines were supplemented with 10% FBS (Omega Scientific, #FB-02) and 1% penicillin–streptomycin sulfate (Gibco, #10378016). Cell cultures were maintained in a 37 °C incubator containing 5% CO2. Cell lines were routinely tested for mycoplasma and authenticated using short tandem repeat analysis.

Transfection and lentiviral production

Cloning into the Lentiviral vector pLKO.1-TRC (Addgene, #10878) was performed for shRNA expression. Double-stranded oligonucleotide sequences for shRNAs against human ETS1 were ligated into the AgeI/EcoRI restriction sites of the digested pLKO.1-TRC lentiviral vector. These shRNA target sequences were listed in Table S1. Custom overexpression lentiviral vectors were purchased (Vector Builder, USA), containing either ORF_Stuffer or the full-length human ETS1 sequence. Transfection of second-generation lentiviral vectors required 2 μg of lentiviral vector (PLKO.1 TRC), 0.5 μg pMD2.G (Addgene, #12259) and 1.5 μg pPAX2 (Addgene, #12260), with the addition of 100 μL of serum-free DMEM media (Corning, #45000-304) and requiring 6 μL of BioT lipofectamine (Bioland Scientific, #B01-00). Transfection of third-generation overexpression lentiviral vectors required 2 μg of vector (ORF-Stuffer or Human ETS1), 1 μg pRSV-Rev (Addgene, #12253), 1.5 μg pMDLg/pRRE (Addgene, #12251), and 0.5 μg pMD2.G (Addgene, #12259) using 100 μL of serum-free DMEM media (Corning, #45000-304) and 6 μL of BioT lipofectamine (Bioland Scientific, #B01-00). Viral particle production occurred in co-transfected HEK293T cells. Cellular supernatant was filtered through a 0.45 μM filter 48 h post transfection. Target cells were infected by the virus in combination with 5 μg/ml Polybrene (Santa Cruz Biotechnology, #134220). All ETS1 sequences were validated by Sanger sequencing (Laragen, USA).

Quantitative real-time PCR

Total cellular RNA was extracted by the RNeasy Mini kit (QIAGEN, 70,106) and cDNA was made from the total RNA using Maxima H Minus cDNA Synthesis Master Mix with dsDNase (Thermo Scientific, M1682). qRT-PCR was conducted by the PowerUp SYBR Green Master Mix (Thermo Scientific, A25918). GAPDH was used for housekeeping gene normalization. Primers used in this study were listed in Table S1.

ChIP-seq

ChIP assay was performed as previously described [55]. Briefly, KYSE150 cells were first fixed in 10 mL of fresh media with 1% paraformaldehyde for 10 min at room temperature. Cells were then quenched with 1 mL of 1.25 M glycine for < 5 min, followed by three washes with ice cold 1X PBS. Cells were transferred to 15 mL conical tubes, spun down at 1,500 × g for 5 min and lysed twice with 1 mL lysis buffer (150 mM NaCl, 5 mM EDTA pH 7.5, 50 mM Tris pH 7.5, 0.5% NP-40) containing protease inhibitors (Roche, #04693124001). Cells were lysed mechanically by pipetting in a microcentrifuge tube and passing through a 23G/1 mL syringe. Cells were then centrifuged at 9,400 × g for 5 min at 4 °C and the supernatant was discarded. 1 mL of lysis buffer was added to the tube and the lysis process was repeated. Next, cell pellets were resuspended in 1 mL shearing buffer (formula: 1% SDS, 10 mM EDTA pH 8.0, 50 mM Tris pH 8.0) and sonicated using a Diagenode Bioruptor Pico sonicator. Subsequently, samples were centrifuged at 20,000 × g for 10 min at 4 °C and the supernatants were collected. The supernatant was diluted 1:5 in dilution buffer (formula: 1.1% Triton X-100, 0.1% SDS, 1.2 mM EDTA pH 8.0, 167 mM NaCl, 16.7 mM Tris pH 8.0). 2 μg ETS1 antibody (Cell Signaling Technology, 14069S) was then added and incubated at 4 °C overnight on a rotating platform. 30uL of Dynabeads Protein G beads (Thermo Scientific, 10004D) were added next day to the samples and incubated at 4 °C for 4 h on a rotating platform. Dynabeads were separated by a magnet stand and washed X8 with ice cold lysis buffer and twice with ice cold TE buffer (formula: 10 mM Tris pH 8.0, 1 mM EDTA pH 7.5, adjusted to a final pH of 7.6). DNA samples were retrieved by 100uL of reverse crosslinking buffer (formula: 136 mM NaHCO3, 0.96% SDS) twice for 15 min while rocking at room temperature. The supernatant was combined with 4uL of 5 M NaCl and incubated at 65 °C for 14–16 h. 1uL of RNase (Thermos Scientific, #EN0531) was added to samples and incubated at 37 °C for 30 min, followed by 4uL 0.5 M EDTA pH 8.0 with 8uL of 1 M Tris pH 7.0, and 1uL of Proteinase K (10 mg/mL) (Invitrogen, #100005393) at 45 °C for 1 h. Samples were purified by 500uL of Buffer PB (Qiagen, 19066) and vortexing, transferred to a spin column and incubated for 2 min at room temperature. Samples were washed by Buffer PE (Qiagen, 19065) three times, and DNA was eluted by 55uL of sterile water and incubated for 5 min before being spun at 17,900 g for 2 min. The products were subjected to DNA library preparation and deep sequencing using Illumina HiSeq platform.

ChIP-seq data analysis

Raw reads were mapped to the human genome sequences using bwa [56]. The SAMtools software was employed to convert SAM files into BAM files and to sort them [57]. Then, the sambamba software was utilized to remove PCR duplicates from the BAM files [58]. BEDtools was used for excluding regions within the blacklist [59, 60]. MACS2 was utilized to subtract input and identify narrow peaks with “-q 0.01 –keep-dup 1” options [61]. For histone ChIP-seq, the parameter “-nomodel –extsize 146” was utilized, while default parameters were applied to TF ChIP-seq. The bamCompare program from deepTools was employed to generate BigWig files with the CPM normalization method [62].

For the ChIP-seq data from GSE85158, certain biological replicates exhibited substantial differences in peak numbers. To address concerns regarding data quality and to maximize peak availability, we restricted our analysis to ChIP-seq data with peak numbers > 9000. Furthermore, when dealing with replicate pairs, the replicate with a higher peak count was included in our analysis. The BigWig and narrowPeak files for Roadmap samples were directly downloaded from their database website (https://egg2.wustl.edu/roadmap/web_portal/index.html). In the RoadMap BigWig files, the values represented -log10(Poisson P-value calculated by MACS2) of the signals. All calculations of ChIP-seq intensity for specific regions were performed using computeMatrix software, which processed the BigWig files. Notably, in this study, all analyses involving Roadmap samples were conducted using the hg19 genome coordinates, whereas analyses pertaining to tumor or normal cell lines were performed using the hg38 version.

LOCK identification

LOCK identification was based on chromosomes 1–22. We used the narrowPeak file as input for the CREAM R package, defining the identified regions as long LOCKs (≥ 100 kb) and short LOCKs (< 100 kb) respectively. To determine the optimal WScutoff parameter for CREAM, we tested values ranging from −1.0 to 1.0, in increments of 0.1, across the 109 samples. We monitored the number of peaks in LOCKs and typical peaks, finding that a WScutoff of approximately −0.7 yielded a stable count of both types of peaks (Fig. S1A-B). Consequently, we selected −0.7 as the WScutoff for our analysis.

Peak annotation and functional analysis

The annotatePeaks.pl function in the HOMER software was utilized to assign the nearest gene to each peak and annotate the genomic elements where the peak was located, including exon, intron, promoter-TSS, and others [35]. The Gene Ontology (GO) annotation for the genes associated with each group of peaks was conducted using Metascape [36]. This process was executed individually for genes associated with peaks in long LOCKs, peaks in short LOCKs, and typical peaks within each sample. For the nearest genes of LOCK peaks, we also required that over half of the gene's length is within the LOCK region, which is commonly used to associate genes with large genomic domains [63]. When the number of nearest genes for a peak type in a sample exceeds 3000, we randomly select 3000 genes from them for input, considering the limit of 3000 genes in the input data for Metascape software. We aggregated the -log10(q-value) for each GO term across all samples and all three peak types, presenting the top 20 terms with the highest sum (Fig. 1C).

Analysis of enrichment levels across different regions

In computing the enrichment level of LOCKs or peaks across various genomic regions, taking the enrichment score of long LOCKs within common PMDs as an example, we divided the total size of the overlap between long LOCKs and common PMDs by the total size of long LOCKs, and then further divided by the total size of common PMDs.

We also employed the LOLA R package to analyze the enrichment of peaks in the built-in datasets of LOLA, utilizing all peaks as the background [64]. The top 20 LOLA terms that exhibited the most significant enrichment for peaks in short LOCKs were displayed (Fig. 5B). Furthermore, LOLA provides the flexibility to incorporate custom datasets. Hence, we downloaded TE regions from the UCSC database [38] and obtained DMV regions from a previous study [37]. The enrichment levels of the three types of peaks within these regions were analyzed with the LOLA R package as well (Fig. 1F).

RNA-seq data analysis

Raw RNA-seq reads were aligned to the human genome using HISAT2 [65]. htseq-count was employed to assign hisat2-aligned reads to genes, utilizing human gene annotation files from GENCODE [66, 67]. The FPKM was computed. Moreover, Z-score Median Absolute Deviation (ZMAD) was computed for tumor or normal cell line samples. In detail, the expression value of each gene was subtracted from the median expression value of all genes, and then divided by the median absolute deviation.

When considering the gene expression associated with LOCKs, using the example of long LOCKs in common PMDs group, if over half of a gene resides within long LOCKs and over half within common PMDs, the expression of that gene is counted in this group.

WGBS, MethylC-seq and EPIC data analysis

The analysis of WGBS and MethylC-seq data was carried out in the Bismark pipeline [68]. The EPIC data was analyzed with R scripts and sesame package [69]. Probes that overlapping with repeats or single nucleotide polymorphisms were masked (https://zwdzwd.github.io/InfiniumAnnotation#reference). The bedGraphToBigWig tool was used to convert a BedGraph file into a BigWig file [70]. When calculating DNA methylation levels, CpG sites located within CGI regions were not considered, except for calculating that of CGI promoters (Fig. 5G and Fig. S6).

Identification of S/I/L-PMDs

In files from previous studies that identified common PMDs/HMDs, each region spans 100 kb[28]. We merged contiguous common PMD regions. Afterward, we utilized computeMatrix to determine the replication timing of each merged common PMDs with Repli-Seq data of five normal cell lines (HUVEC, IMR90, NHEK, BJ_1 and BJ_2) from UCSC database and the average signals of the quantile normalization results were obtained. After assigning the average methylation variability for each region, the hierarchical clustering was performed (Fig. S2D). The lengths of the three clusters of common PMDs align with the findings of Hyunchul Jung, et al. [29] (Fig. S2E). liftOver was employed to convert between different genome versions [71].

Hi-C data analysis

We obtained Hi-C data from the ENCODE database that could be matched with Roadmap samples [72]. In cases where a Roadmap sample matched multiple Hi-C datasets, we randomly selected one. Using the BEDtools shuffle program, we identified regions equivalent in size and quantity to long LOCKs within each sample for 50 times, ensuring these regions did not overlap with long or short LOCKs. Following this, we tallied the number of loops with both ends situated within long LOCKs or random regions, respectively. These counts were then divided by the number of long LOCKs or random regions to yield the relative loop density (Fig. S2A). The average of 50 repeated samplings is regarded as the outcome for regions outside of LOCKs. The Winsorize method removed outliers, excluding samples with values beyond mean ± 3 SDs for long LOCKs or the background.

Custom regions for LOLA analysis

Following the LOLA R package manual, we added custom target regions and conducted LOLA analysis. These regions include DMVs [37], transposons [38], promoters, CGI promoters and ESC poised promoters. The promoter region extends 500 bp upstream and 500 bp downstream from the transcription start sites. CGI promoters refer to promoters that overlap with CGI regions, which were merged from CGI regions identified in three different studies [73,74,75]. The ESC poised promoters were obtained from our previous study [41].

Identification of tumor-loss/shared/tumor-gain LOCKs

We utilized the long LOCKs and short LOCKs from both normal and tumor cell lines as input, fragmenting them using the BEDtools multiinter program. Using the comparison between Her2 + BRCA cell lines and normal cell lines as an example, if a fragmented region was classified as a long LOCK in over 2/3 of normal cell lines and neither a long nor a short LOCK in more than 2/3 of tumor cell lines, it's designated as a tumor-loss LOCK. Conversely, in the opposite scenario, it's labeled as a tumor-gain LOCK. However, if a region is identified as a long LOCK in over 2/3 of both normal and tumor cell lines, it's termed a shared LOCK. The identification method for tumor-loss/shared/tumor-gain short LOCKs is similar to the aforementioned.

Availability of data and materials

The datasets generated and analysed during the current study are available in the GEO repository [GSE270715].

References

  1. Bieluszewski T, Xiao J, Yang Y, Wagner D. PRC2 activity, recruitment, and silencing: a comparative perspective. Trends Plant Sci. 2021;26(11):1186–98.

    Article  CAS  PubMed  Google Scholar 

  2. Mantsoki A, Devailly G, Joshi A. CpG island erosion, polycomb occupancy and sequence motif enrichment at bivalent promoters in mammalian embryonic stem cells. Sci Rep. 2015;5:16791.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Jeon AJ, Tucker-Kellogg G. Bivalent genes that undergo transcriptional switching identify networks of key regulators of embryonic stem cell differentiation. BMC Genomics. 2020;21:614.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Kumar D, Cinghu S, Oldfield AJ, Yang P, Jothi R. Decoding the function of bivalent chromatin in development and cancer. Genome Res. 2021;31(12):2170–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Petruk S, Cai J, Sussman R, Sun G, Kovermann SK, Mariani SA, Calabretta B, McMahon SB, Brock HW, Iacovitti L, et al. Delayed accumulation of H3K27me3 on nascent DNA is essential for recruitment of transcription factors at early stages of stem cell differentiation. Mol Cell. 2017;66(2):247–57.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Sara A, Miller MD, Kingston RE. H3K27me3 is dispensable for early differentiation but required to maintain differentiated cell identity. bioRXiv. 2020.

    Google Scholar 

  7. Zhao D, Zhang L, Zhang M, Xia B, Lv J, Gao X, Wang G, Meng Q, Yi Y, Zhu S, et al. Broad genic repression domains signify enhanced silencing of oncogenes. Nat Commun. 2020;11(1):5560.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Cai Y, Zhang Y, Loh YP, Tng JQ, Lim MC, Cao Z, Raju A, Lieberman Aiden E, Li S, Manikandan L, et al. H3K27me3-rich genomic regions can function as silencers to repress gene expression via chromatin interactions. Nat Commun. 2021;12(1):719.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Deblois G, Tonekaboni SAM, Grillo G, Martinez C, Kao YI, Tai F, Ettayebi I, Fortier AM, Savage P, Fedor AN, et al. Epigenetic switch-induced viral mimicry evasion in chemotherapy-resistant breast cancer. Cancer Discov. 2020;10(9):1312–29.

    Article  CAS  PubMed  Google Scholar 

  10. Yuan J, Jiang Q, Gong T, Fan D, Zhang J, Chen F, Zhu X, Wang X, Qiao Y, Chen H, et al. Loss of grand histone H3 lysine 27 trimethylation domains mediated transcriptional activation in esophageal squamous cell carcinoma. NPJ Genom Med. 2021;6(1):65.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Madani Tonekaboni SA, Haibe-Kains B, Lupien M. Large organized chromatin lysine domains help distinguish primitive from differentiated cell populations. Nat Commun. 2021;12(1):499.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Madani Tonekaboni SA, Mazrooei P, Kofia V, Haibe-Kains B, Lupien M. Identifying clusters of cis-regulatory elements underpinning TAD structures and lineage-specific regulatory networks. Genome Res. 2019;29(10):1733–43.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Wen B, Wu H, Shinkai Y, Irizarry RA, Feinberg AP. Large histone H3 lysine 9 dimethylated chromatin blocks distinguish differentiated from embryonic stem cells. Nat Genet. 2009;41(2):246–50.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Sergeeva A, Davydova K, Perenkov A, Vedunova M. Mechanisms of human DNA methylation, alteration of methylation patterns in physiological processes and oncology. Gene. 2023;875:147487.

    Article  CAS  PubMed  Google Scholar 

  15. Yang X, Han H, De Carvalho DD, Lay FD, Jones PA, Liang G. Gene body methylation can alter gene expression and is a therapeutic target in cancer. Cancer Cell. 2014;26(4):577–90.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Jonsson ME, Ludvik Brattas P, Gustafsson C, Petri R, Yudovich D, Pircs K, Verschuere S, Madsen S, Hansson J, Larsson J, et al. Activation of neuronal genes via LINE-1 elements upon global DNA demethylation in human neural progenitors. Nat Commun. 2019;10(1):3182.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Hur K, Cejas P, Feliu J, Moreno-Rubio J, Burgos E, Boland CR, Goel A. Hypomethylation of long interspersed nuclear element-1 (LINE-1) leads to activation of proto-oncogenes in human colorectal cancer metastasis. Gut. 2014;63(4):635–46.

    Article  CAS  PubMed  Google Scholar 

  18. Johnstone SE, Gladyshev VN, Aryee MJ, Bernstein BE. Epigenetic clocks, aging, and cancer. Science. 2022;378(6626):1276–7.

    Article  CAS  PubMed  Google Scholar 

  19. Feinberg AP, Vogelstein B. Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature. 1983;301(5895):89–92.

    Article  CAS  PubMed  Google Scholar 

  20. Gama-Sosa MA, Slagel VA, Trewyn RW, Oxenhandler R, Kuo KC, Gehrke CW, Ehrlich M. The 5-methylcytosine content of DNA from human tumors. Nucleic Acids Res. 1983;11(19):6883–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462(7271):315–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Berman BP, Weisenberger DJ, Aman JF, Hinoue T, Ramjan Z, Liu Y, Noushmehr H, Lange CP, van Dijk CM, Tollenaar RA, et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat Genet. 2011;44(1):40–6.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Salhab A, Nordstrom K, Gasparoni G, Kattler K, Ebert P, Ramirez F, Arrigoni L, Muller F, Polansky JK, Cadenas C, et al. A comprehensive analysis of 195 DNA methylomes reveals shared and cell-specific features of partially methylated domains. Genome Biol. 2018;19(1):150.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Hon GC, Hawkins RD, Caballero OL, Lo C, Lister R, Pelizzola M, Valsesia A, Ye Z, Kuan S, Edsall LE, et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 2012;22(2):246–58.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Hovestadt V, Jones DT, Picelli S, Wang W, Kool M, Northcott PA, Sultan M, Stachurski K, Ryzhova M, Warnatz HJ, et al. Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing. Nature. 2014;510(7506):537–41.

    Article  CAS  PubMed  Google Scholar 

  26. Decato BE, Qu J, Ji X, Wagenblast E, Knott SRV, Hannon GJ, Smith AD. Characterization of universal features of partially methylated domains across tissues and species. Epigenetics Chromatin. 2020;13(1):39.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Brinkman AB, Nik-Zainal S, Simmer F, Rodriguez-Gonzalez FG, Smid M, Alexandrov LB, Butler A, Martin S, Davies H, Glodzik D, et al. Partially methylated domains are hypervariable in breast cancer and fuel widespread CpG island hypermethylation. Nat Commun. 2019;10(1):1749.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Zhou W, Dinh HQ, Ramjan Z, Weisenberger DJ, Nicolet CM, Shen H, Laird PW, Berman BP. DNA methylation loss in late-replicating domains is linked to mitotic cell division. Nat Genet. 2018;50(4):591–602.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Jung H, Kim HS, Kim JY, Sun JM, Ahn JS, Ahn MJ, Park K, Esteller M, Lee SH, Choi JK. DNA methylation loss promotes immune evasion of tumours with high mutation and copy number load. Nat Commun. 2019;10(1):4278.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Endicott JL, Nolte PA, Shen H, Laird PW. Cell division drives DNA methylation loss in late-replicating domains in primary human cells. Nat Commun. 2022;13(1):6659.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Li Y, Chen X, Lu C. The interplay between DNA and histone methylation: molecular mechanisms and disease implications. EMBO Rep. 2021;22(5):e51803.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Wu MY, Tsai TF, Beaudet AL. Deficiency of Rbbp1/Arid4a and Rbbp1l1/Arid4b alters epigenetic modifications and suppresses an imprinting defect in the PWS/AS domain. Genes Dev. 2006;20(20):2859–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Li Y, Zheng H, Wang Q, Zhou C, Wei L, Liu X, Zhang W, Zhang Y, Du Z, Wang X, et al. Genome-wide analyses reveal a role of Polycomb in promoting hypomethylation of DNA methylation valleys. Genome Biol. 2018;19(1):18.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Fu K, Bonora G, Pellegrini M. Interactions between core histone marks and DNA methyltransferases predict DNA methylation patterns observed in human cells and tissues. Epigenetics. 2020;15(3):272–82.

    Article  PubMed  Google Scholar 

  35. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576–89.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, Benner C, Chanda SK. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10(1):1523.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Xie W, Schultz MD, Lister R, Hou Z, Rajagopal N, Ray P, Whitaker JW, Tian S, Hawkins RD, Leung D, et al. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell. 2013;153(5):1134–48.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Fernandes JD, Zamudio-Hurtado A, Clawson H, Kent WJ, Haussler D, Salama SR, Haeussler M. The UCSC repeat browser allows discovery and visualization of evolutionary conflict across repeat families. Mob DNA. 2020;11:13.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Davoli T, Xu AW, Mengwasser KE, Sack LM, Yoon JC, Park PJ, Elledge SJ. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell. 2013;155(4):948–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Johnstone SE, Reyes A, Qi Y, Adriaens C, Hegazi E, Pelka K, Chen JH, Zou LS, Drier Y, Hecht V, et al. Large-scale topological changes restrain malignant progression in colorectal cancer. Cell. 2020;182(6):1474–89.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Zheng Y, Huang G, Silva TC, Yang Q, Jiang YY, Koeffler HP, Lin DC, Berman BP. A pan-cancer analysis of CpG Island gene regulation reveals extensive plasticity within Polycomb target genes. Nat Commun. 2021;12(1):2485.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Sanchez-Danes A, Blanpain C. Deciphering the cells of origin of squamous cell carcinomas. Nat Rev Cancer. 2018;18(9):549–61.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Jiang YY, Jiang Y, Li CQ, Zhang Y, Dakle P, Kaur H, Deng JW, Lin RY, Han L, Xie JJ, et al. TP63, SOX2, and KLF5 establish a core regulatory circuitry that controls epigenetic and transcription patterns in esophageal squamous cell carcinoma cell lines. Gastroenterology. 2020;159(4):1311–27.

    Article  CAS  PubMed  Google Scholar 

  44. Brinkman AB, Gu H, Bartels SJ, Zhang Y, Matarese F, Simmer F, Marks H, Bock C, Gnirke A, Meissner A, et al. Sequential ChIP-bisulfite sequencing enables direct genome-scale investigation of chromatin and DNA methylation cross-talk. Genome Res. 2012;22(6):1128–38.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Reddington JP, Perricone SM, Nestor CE, Reichmann J, Youngson NA, Suzuki M, Reinhardt D, Dunican DS, Prendergast JG, Mjoseng H, et al. Redistribution of H3K27me3 upon DNA hypomethylation results in de-repression of Polycomb target genes. Genome Biol. 2013;14(3):R25.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Zhang H, Liu Y, Xie Y, Zhu Y, Liu J, Lu F. H3K27me3 shapes DNA methylome by inhibiting UHRF1-mediated H3 ubiquitination. Sci China Life Sci. 2022;65(9):1685–700.

    Article  CAS  PubMed  Google Scholar 

  47. Peters AH, Kubicek S, Mechtler K, O’Sullivan RJ, Derijck AA, Perez-Burgos L, Kohlmaier A, Opravil S, Tachibana M, Shinkai Y, et al. Partitioning and plasticity of repressive histone methylation states in mammalian chromatin. Mol Cell. 2003;12(6):1577–89.

    Article  CAS  PubMed  Google Scholar 

  48. Ugarte F, Sousae R, Cinquin B, Martin EW, Krietsch J, Sanchez G, Inman M, Tsang H, Warr M, Passegue E, et al. Progressive chromatin condensation and H3K9 methylation regulate the differentiation of embryonic and hematopoietic stem cells. Stem Cell Reports. 2015;5(5):728–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Lorzadeh A, Romero-Wolf M, Goel A, Jadhav U. Epigenetic regulation of intestinal stem cells and disease: a balancing act of DNA and histone methylation. Gastroenterology. 2021;160(7):2267–82.

    Article  CAS  PubMed  Google Scholar 

  50. Karlow JA, Devarakonda S, Xing X, Jang HS, Govindan R, Watson M, Wang T. Developmental pathways are epigenetically reprogrammed during lung cancer brain metastasis. Cancer Res. 2022;82(15):2692–703.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K, et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006;125(2):315–26.

    Article  CAS  PubMed  Google Scholar 

  52. Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007;448(7153):553–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Vastenhouw NL, Zhang Y, Woods IG, Imam F, Regev A, Liu XS, Rinn J, Schier AF. Chromatin signature of embryonic pluripotency is established during genome activation. Nature. 2010;464(7290):922–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Hahn MA, Li AX, Wu X, Yang R, Drew DA, Rosenberg DW, Pfeifer GP. Loss of the polycomb mark from bivalent promoters leads to activation of cancer-promoting genes in colorectal tumors. Cancer Res. 2014;74(13):3617–29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Ziman B, Yang Q, Zheng Y, Sheth M, Nam C, Zhao H, Zhang L, Hu B, Bhowmick NA, Sinha UK, et al. Epigenomic analyses identify FOXM1 as a key regulator of anti-tumor immune response in esophageal adenocarcinoma. Cell Death Dis. 2024;15(2):152.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S. The sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31(12):2032–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Amemiya HM, Kundaje A, Boyle AP. The ENCODE blacklist: identification of problematic regions of the genome. Sci Rep. 2019;9(1):9354.

    Article  PubMed  PubMed Central  Google Scholar 

  61. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dundar F, Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44(W1):W160-165.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Pauler FM, Sloane MA, Huang R, Regha K, Koerner MV, Tamir I, Sommer A, Aszodi A, Jenuwein T, Barlow DP. H3K27me3 forms BLOCs over silent genes and intergenic regions and specifies a histone banding pattern on a mouse autosomal chromosome. Genome Res. 2009;19(2):221–33.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Sheffield NC, Bock C. LOLA: enrichment analysis for genomic region sets and regulatory elements in R and bioconductor. Bioinformatics. 2016;32(4):587–9.

    Article  CAS  PubMed  Google Scholar 

  65. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37(8):907–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31(2):166–9.

    Article  CAS  PubMed  Google Scholar 

  67. Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland JE, Mudge JM, Sisu C, Wright JC, Armstrong J, Barnes I, et al. Gencode 2021. Nucleic Acids Res. 2021;49(D1):D916–23.

    Article  CAS  PubMed  Google Scholar 

  68. Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011;27(11):1571–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Zhou W, Triche TJ Jr, Laird PW, Shen H. SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Res. 2018;46(20): e123.

    PubMed  PubMed Central  Google Scholar 

  70. Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010;26(17):2204–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Gao B, Huang Q, Baudis M. segment_liftover: a Python tool to convert segments between genome assemblies. F1000Res. 2018;7:319.

    Article  PubMed  PubMed Central  Google Scholar 

  72. Luo Y, Hitz BC, Gabdank I, Hilton JA, Kagda MS, Lam B, Myers Z, Sud P, Jou J, Lin K, et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 2020;48(D1):D882–9.

    Article  CAS  PubMed  Google Scholar 

  73. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H, Gabo K, Rongione M, Webster M, et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet. 2009;41(2):178–86.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Takai D, Jones PA. Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc Natl Acad Sci U S A. 2002;99(6):3740–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Nassar LR, Barber GP, Benet-Pages A, Casper J, Clawson H, Diekhans M, Fischer C, Gonzalez JN, Hinrichs AS, Lee BT, et al. The UCSC Genome Browser database: 2023 update. Nucleic Acids Res. 2023;51(D1):D1188–95.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

YY. Zheng was supported by Shenzhen Science and Technology Program (JCYJ20230807110309019 and JCYJ20220530144815036), National Natural Science Foundation of China (32200538), Guangdong Basic and Applied Basic Research Foundation (2024A1515010703), Shenzhen Key Laboratory of Bone Tissue Repair and Translational Research (NO. ZDSYS20230626091402006), and the Research Start-up Fund of the Seventh Affiliated Hospital, Sun Yat-sen University (ZSQYBRJH0025). MN. Liu was supported by Shenzhen Science and technology Program (No. RCBS20221008093310022).

Author information

Authors and Affiliations

Authors

Contributions

YY. Zheng conceived and devised the study; YY. Zheng, D-C. Lin and Y. Liang designed experiments and analyses; Y. Liang, MN. Liu and BY. Liu performed bioinformatics and statistical analysis; B. Ziman performed experiments; Y. Liang, YY. Zheng and D-C. Lin analyzed the data; YY. Zheng and D-C. Lin supervised the research; Y. Liang wrote the manuscript with assistance from YY. Zheng and D-C. Lin; GJ. Peng, Q. Mao, XJ. Liu and LZ. Jiang edited the manuscript; All authors reviewed the manuscript.

Corresponding author

Correspondence to Yueyuan Zheng.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, Y., Liu, M., Liu, B. et al. Comprehensive analysis of H3K27me3 LOCKs under different DNA methylation contexts reveal epigenetic redistribution in tumorigenesis. Epigenetics & Chromatin 18, 6 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13072-025-00570-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13072-025-00570-0

Keywords