Re-processing of the data generated by the FANTOM5 project (hg38 v1) === All the data produced by the FANTOM5 project was originally processed on hg19 and mm9 for human and mouse respectively. With the recent update of genome assembly and related information, we reprocessed the FANTOM5 data here. - target genome: hg38 (the raw set of hg38 sequences, except for alt, random, Un.) - inquiries: fantom-help@gsc.riken.jp - original data: http://fantom.gsc.riken.jp/5/datafiles/phase2.0 Special note on "analysis set" --- Note that we are separately working on the hg38 "analysis set" (ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/) in which several duplicated regions are hard masked and decoy sequences are included. For special analysis requiring such data, we would advise to wait for the "analysis set" based reprocessing (for a few months) Updates --- * Sep 18, 2015 initial release Data types --- - CAGE read alignment: the raw HeliScope reads are aligned by delve (http://fantom.gsc.riken.jp/software/). The resulting alignment formatted in ( *.bam ) are indexed ( *.bai ) - CTSS (CAGE tag starting site): 5'-end of the CAGE read alignments with mapping quality above 20 and percent identity 85% are counted at 1bp resolution. Genomic coordinates are formatted as BED and the counts are described in its score column - experimental meta data: *sdrf.txt is a tab delimited flat file describing the experimental details for each sample. Directory and file names --- Data files are located under the directory names as .. - Technology is either hCAGE (CAGE sequencing on Heliscope single molecule sequencer) or LQhCAGE (Low Quantity hCAGE). For details on the protocols used, please see [http://fantom.gsc.riken.jp/sstar/Protocols]. - The biological category is one of primary_cell, cell line, timecourse, fractionation or tissue. - A part of file name represent the sample name. The sample name is encoded by percent encoding, and concatenated with , , , , and data types described wbove. Reference --- - FANTOM5 main papers * Forrest ARR, et al. A promoter-level mammalian expression atlas. Nature 507: 462–470 (2014) * Andersson R, et al. An atlas of active enhancers across human cell types and tissues. Nature 507: 455–461 (2014) * Arner E, et al. 2015. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science (80- ) 347: 1010–1014. http://www.sciencemag.org/cgi/doi/10.1126/science.1259418. - FANTOM5 databases / data resource: * Lizio M, et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol 16: 22 (2015) - HeliScopeCAGE: * Kanamori-Katayama M, et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res 21: 1150–1159 (2011) * Itoh M, Automated workflow for preparation of cDNA for cap analysis of gene expression on a single molecule sequencer. PLoS One 7: e30809 (2012) - BAM: https://samtools.github.io/hts-specs/SAMv1.pdf - BED: https://genome.ucsc.edu/FAQ/FAQformat.html#format1 - SDRF: http://isatab.sourceforge.net/format.html