Select Page
Multiscale and integrative single-cell Hi-C analysis with Higashi
Nature Biotechnology, in press, 2021
Ruochi Zhang, Tianming Zhou, Jian Ma.

Single-cell Hi-C (scHi-C) can reflect cell-to-cell variability of 3D chromatin organization, but the sparseness of measured interactions poses an analysis challenge. Here, we report Higashi, an algorithm based on hypergraph representation learning, that can incorporate the latent correlations between single cells to enhance overall imputation of contact maps. Higashi outperforms existing methods for embedding and imputation of scHi-C data and is able to identify multiscale 3D genome features in single cells such as compartmentalization and TAD-like domain boundaries, allowing refined delineation of their cell-to-cell variability. Moreover, Higashi can incorporate epigenomic signals, that are jointly profiled in the same cell, into the hypergraph representation learning framework as compared to separate analysis of two modalities, leading to improved embeddings for single-nucleus methyl-3C data. In a scHi-C dataset from human prefrontal cortex, Higashi identifies connections between 3D genome features and cell type-specific gene regulation. Higashi can also potentially be extended to analyze single-cell multiway chromatin interactions and other multimodal single-cell omics data.

The key algorithmic design of Higashi is to transform the scHi-C data into a hypergraph. Such transformation preserves the single-cell resolution and the 3D genome features from the scHi-C contact maps. Specifically, the process of embedding the scHi-C data is now equivalent to learning node embeddings of the hypergraph, while imputing the scHi-C contact maps becomes predicting missing hyperedges within the hypergraph.
In Higashi, we use our recently developed Hyper-SAGNN architecture, which is a generic hypergraph representation learning framework, with substantial new development specifically for scHi-C analysis.

Higashi has five main components.

  1. We represent the scHi-C dataset as a hypergraph, where each cell and each genomic bin are represented as cell node and genomic bin node, respectively.Each non-zero entry in the single-cell contact map is modeled as a hyperedge connecting the corresponding cell and the two genomic loci of that particular chromatin interaction.
    This formalism integrates embedding and data imputation for scHi-C.
  2. We train a hypergraph neural network based on the constructed hypergraph.
  3. We extract the embedding vectors of cell nodes from the trained hypergraph neural network for downstream analysis.
  4. We use the trained hypergraph neural network to impute single-cell Hi-C contact maps with the flexibility to incorporate the latent correlations between cells to enhance overall imputation, enabling more detailed and reliable characterization of 3D genome features.
  5. With a number of new computational strategies, we reliably compare A/B compartment scores and TAD-like domain boundaries across individual cells to facilitate the analysis of cell-to-cell variability of these large-scale 3D genome features and its implication in gene transcription.

In addition, we have developed a visualization tool to allow interactive navigation of the embedding vectors and the imputed contact maps from Higashi to facilitate discovery.

Citation

@article {Zhang2020multiscale,
author = {Zhang, Ruochi and
Zhou, Tianming and Ma, Jian},
title = {Multiscale and integrative single-cell Hi-C analysis with Higashi},
year = {2020},
doi = {10.1101/2020.12.13.422537},
publisher = {Cold Spring Harbor Laboratory},
journal = {bioRxiv}}