This set of tracks represent the top peaks in the clusters provided by our pipeline, RECLU, on the cap analysis of gene expression (CAGE) datasets including 156 human primary cells and the HeLa and THP-1 cells. We proposed the pipeline to identify transcription start sites (TSSs) with reproducibility and multiple scales. We extracted both the lowest peaks (termed "bottom") and the highest peaks (termed "top") to capture clusters at different levels of the hierarchy. For the detail about our pipeline, refer our paper. This work is part of the FANTOM5 project.
You can get these datasets from here.
A cluster position at the left of each cluster is shown in the following way;
chromosome:start position..end position,strand
ex. chr1:56520..56522,+
We used two CAGE datasets. The first was the human CAGE data with replicates set for 156 primary cells sequenced on a HeliScope sequencer and mapped to the hg19 genome assembly in the FANTOM5 project. All primary cell data and ethics application numbers are described in the FANTOM5 main paper1). In brief the majority of primary cell samples were purchased from commercial suppliers while the remainder were obtained through collaborating institutes from patients who provided informed consent. The other was the triplicate human CAGE dataset for the HeLa and THP-1 samples sequenced on a HeliScope sequencer and mapped to the hg18 genome assembly by Kanamori-Katayama et al.2)
1) Forrest et al. A promoter-level mammalian expression atlas. Nature 507(7493), 462-470. 2014.
2) Kanamori-Katayama et al. Unamplified cap analysis of gene expression on a single molecule sequencer. Genome Res 21, 1150-1159. 2011.
Ohmiya et al. RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE) BMC Genomics 25;15:269. 2014.