Chromatin interactome mapping identifies target genes at breast cancer risk signals

Sivakumaran H, Beesley J, Marjaneh MM, Chenevix-Trench G, French JD and Edwards SL

Cancer Division, QIMR Berghofer Medical Research Institute, Brisbane, Australia.

Genome-wide association studies (GWAS) for breast cancer have identified 196 independent signals associated with increased risk. The majority of risk-associated variants within these signals fall in regulatory sequences, such as enhancers, that control gene expression. We perform in situ Capture Hi-C using a high-resolution Variant Capture array (VCHi-C), which includes probes to cover all fine-mapped candidate causal variants. We apply VCHi-C and Promoter Capture Hi-C (PCHi-C) to link risk variants to their target genes in six human mammary epithelial and breast cancer cell lines. We use the CHiCAGO pipeline to assign confidence scores, apply a strict threshold, and identify between 10-27,000 interactions per cell type. Hierarchical clustering of interaction scores stratifies cell lines by estrogen receptor status. Global analysis of promoter-interacting regions (PIRs) shows strong enrichment for cell-type specific accessible chromatin, histone marks for active enhancers and transcription factor binding, supporting the regulatory potential of many PIRs. In total, reciprocally validated CHiCAGO-identified interactions results in 647 candidate target genes. To further prioritise the CHi-C-derived chromatin interactions, we use a recently developed Bayesian framework, to fine-map the direct contacts. Importantly, the combined PCHi-C and VCHi-C contact fine-mapping enables us to prioritize 1832 out of 7375 highly-correlated risk variants and lowers the total number of target genes to 393. One example which makes evident the utility of this dual approach is the 1p22 risk region, where contact fine-mapping decreases the number of risk variants from 34 to 8, and the candidate target genes from 14 to 2. Our results demonstrate the power of combining genetics, computational genomics and molecular studies to streamline the identification of key variants and target genes at GWAS-identified risk regions.