Protocols:Motif Activity Response Analysis(MARA): Difference between revisions

Revision as of 13:58, 31 August 2012

Author : Michiel De Hoon

Last updated: 2012.03.29

OSC Table input file requirements

Your OSC Table file should follow the general OSC Table file requirements. In particular, note that the first column of the OSC Table file should be the cluster identifier. The name of this column should be "id".

In addition,

1. The header section of the OSC Table file should specify the genome assembly

  that was used;

2. The data section should contain the normalized (tags-per-million) expression

  data (columns containing raw data will be ignored);

3. The data section should contain a column labeled 'pos' that shows the

  representative position within each promoter. Usually this representative
  position is defined as the most highly expressed position within a promoter,
  but you can choose the criterion as you want. The position should be a
  zero-based coordinate.

If you have microarray data instead of CAGE data, you can convert them into the appropriate OSC Table file format.

Step 1: Calculate the binding profile of TFs with respect to the TSS

Calculate for each TF the binding profile with respect to the transcription start site. If you are lazy, instead of calculating this yourself you can also use the binding profile as calculated using the FANTOM5 data.

Also, in the data section, the representative position for each promoter should be shown in a column labeled "pos" (without the quotes). In most cases, the representative position of a promoter is defined as its most highly expressed position, though other definitions can in principle be used. In the current version of the motif activity pipeline, there are no restrictions on the name of the promoter (as shown in the first column in the data section of the OSC Table file).

This step makes use of the precalculated TFBSs stored in a separate file.

Step 2: Associate TFBSs with promoters

Associate predicted TFBSs to CAGE promoters. This script also makes use of the precalculated TFBSs stored. This step creates a single file containing the associations of predicted TFBSs to promoters.

Step 3: Calculate the motif activities

Use the file you created in Step 2 to calculate the motif activities. The threshold on the number of predicted TFBSs for each motif defaults to 150. This means that motifs with less than 150 predicted binding sites are discarded. This step will generate a single file containing the motif activities in each condition, their standard deviations, and their overall Z-scores.

Step 4: Calculate the network as predicted from the motif activities

Calculate the MARA network using the file containing the motif activities calculated in Step 3, and the file containing the predicted TFBSs for each promoter calculated in Step 2. A threshold on the Z-score on the network edges can be specified; this threshold defaults to 1.5. This step creates a single file containing the motif-to-promoter network as calculated by MARA.

Step 5: Convert the MARA network to a Cytoscape-loadable file

To view the network in Cytoscape, the MARA network file needs to be converted to an input file in the proper format for Cytoscape. The resulting Cytoscape input file corresponds to a motif-to-promoter network as extracted from the full MARA network.

Step 6: Find the top motifs in each experimental condition

Calculate the Z-score for each motif in each experimental condition, and sort the motifs based on their Z-score.

Step 7: Create subnetworks for each experimental condition

From the full MARA network extract subnetworks for each experimental condition, showing only the top motifs in each.

@@ Line 1: / Line 1: @@
-Calculating motif activities in 7 easy steps
+Author : Michiel De Hoon
-============================================
 Last updated: 2012.03.29
@@ Line 24: / Line 23: @@
 If you have microarray data instead of CAGE data, you can convert them
-into the appropriate OSC Table file format using the script
+into the appropriate OSC Table file format.
-convert_microarray.py. Note that this script is still in development and may not
-work flawlessly. In theory it should work as follows:
-  python convert_microarray.py <assembly> <input_filename>
-where <assembly> is the genome assembly name, and <input_filename> is the name
-of the file containing the microarray data. This script will write out a single
-file level2.osc in the OSC Table file format.
 Step 1: Calculate the binding profile of TFs with respect to the TSS
 --------------------------------------------------------------------
-Use the script make_profile.py to calculate for each TF the binding profile
+Calculate for each TF the binding profile with respect to the transcription start site. If you are lazy, instead of calculating this yourself you can also use the binding profile as calculated using the FANTOM5 data.
-with respect to the transcription start site. If you are lazy, instead of
-calculating this yourself you can also use the binding profile as calculated
-using the FANTOM5 data. To calculate the binding profiles yourself, use
-  python make_profile.py [-t <tfbs>] [-o <output>] [-p] level2.osc[.gz|.bz2]
-where level2.osc[.gz|.bz2] is the OSC Table file containing your CAGE or
-microarray data. This OSC Table file should follow the generic OSC Table file
-format definition (see http://fantom.gsc.riken.jp/4/download/Tables/doc/). It
-should also declare the genome assembly in the header section, as in this
-example:
-##ParameterValue[genome_assembly] = hg19
 Also, in the data section, the representative position for each promoter should
@@ Line 60: / Line 38: @@
 Table file).
-The script makes use of the precalculated TFBSs stored in the file
+This step makes use of the precalculated TFBSs stored in a separate file.
-<assembly>.sites.bed located in /osc-fs_home/scratch/mdehoon/Data/TFBS; you can
-use the -t option to specify a different directory in which to look for this
-file. This script creates a single file containing the binding profiles for
-all transcription factors in a tab-delimited format. Use the -p option if you
-want to create figures for each of the binding profiles.
 Step 2: Associate TFBSs with promoters
 --------------------------------------
-Use the script associate_tfbs.py to associate predicted TFBSs to CAGE
+Associate predicted TFBSs to CAGE promoters.
-promoters:
+This script also makes use of the precalculated TFBSs stored. This step creates a single file containing the associations of predicted TFBSs to promoters.
-  python associate_tfbs.py [-t <tfbs>] [-o <output>] [-p <profile_filename>]
-                         level2.osc[.gz|.bz2]
-where level2.osc[.gz|.bz2] is the OSC file containing your CAGE or microarray
-data, and <profile_filename> is the name of the file you created in Step 1.
-This script also makes use of the precalculated TFBSs stored in the file
-<assembly>.sites.bed located in /osc-fs_home/scratch/mdehoon/Data/TFBS; you can
-use the -t option to specify a different directory in which to look for this
-file. This script creates a single file containing the associations of
-predicted TFBSs to promoters.
 Step 3: Calculate the motif activities
 --------------------------------------
-We are now ready to calculate the motif activities:
+Use the file you created in Step 2 to calculate the motif activities. The threshold on the number of predicted TFBSs for each motif defaults to 150. This means that motifs with less than 150 predicted binding sites are discarded. This step will generate a single file containing the motif activities in each condition, their standard deviations,
-  python mara.py -t <tfbs> [-o <output>] [-n <ntfbs>] level2.osc[.gz|.bz2]
-where level2.osc[.gz|.bz2] is the OSC file containing your CAGE or microarray
-data, and <tfbs> is the name of the file you created in Step 2. With the -n
-option you can specify the threshold on the number of predicted TFBSs for each
-motif, which defaults to 150. This means that motifs with less than 150
-predicted binding sites are discarded. This script will generate a single file
-containing the motif activities in each condition, their standard deviations,
 and their overall Z-scores.
@@ Line 102: / Line 55: @@
 Step 4: Calculate the network as predicted from the motif activities
 --------------------------------------------------------------------
-Calculate the MARA network using the script make_network.py:
+Calculate the MARA network using the file containing the motif activities calculated in Step 3, and the file containing the predicted TFBSs for each promoter calculated in Step 2. A threshold on the Z-score on the network edges can be specified; this threshold defaults to 1.5. This step creates a single file containing the motif-to-promoter network
-  python make_network.py -a <activity> -t <tfbs> [-z <threshold>]
-                       [-o <output>] level2.osc[.gz|.bz2]
-where level2.osc[.gz|.bz2] is the OSC file containing your CAGE or microarray
-data, <activity> is the name of the file containing the motif activities (which
-you calculated in Step 3), and <tfbs> is the name of the file containing the
-predicted TFBSs for each promoter (which you calculated in Step 2). You can specify a threshold on the Z-score on the network edges; this threshold defaults to
-.5. This script creates a single file containing the motif-to-promoter network
 as calculated by MARA.
@@ Line 118: / Line 62: @@
 -------------------------------------------------------------
 To view the network in Cytoscape, the MARA network file needs to be converted
-to an input file in the proper format for Cytoscape. For this purpose you can
+to an input file in the proper format for Cytoscape. The resulting Cytoscape input file corresponds to a motif-to-promoter network as extracted from the full MARA network.
-use the convert_cytoscape.py script:
-  python convert_cytoscape.py [-o <output>] level2.osc[.gz|.bz2]
-                              level2.network
-where level2.osc[.gz|.bz2] is the OSC file containing your CAGE or microarray
-data, and level2.network is the name of the file you created in Step 4. In this
-step, the OSC file is only used to determine the genome assembly. The resulting
-Cytoscape input file corresponds to a motif-to-promoter network as extracted
-from the full MARA network.
 Step 6: Find the top motifs in each experimental condition
 ----------------------------------------------------------
-For this purpose, we calculate the Z-score for each motif in each experimental
+Calculate the Z-score for each motif in each experimental condition, and sort the motifs based on their Z-score.
-condition, and we sort the motifs based on their Z-score:
-  python  sort_motifs.py [-o <output>] <input_filename>
-where <input_filename> is the name of the file containing the motif activities,
-which you calculated in Step 3.
 Step 7: Create subnetworks for each experimental condition
 ----------------------------------------------------------
-From the full MARA network we now extract subnetworks for each experimental
+From the full MARA network extract subnetworks for each experimental
-condition, showing only the top motifs in each:
+condition, showing only the top motifs in each.
-  python write_top_networks.py [-o <output_directory>]
-                             <cytoscape_filename> <sortedtfs_filename>
-where <cytoscape_filename> is the name of the Cytoscape input file generated in
-Step 4, and <sortedtfs_filename> is the file containing the top motifs in each
-experimental condition generated in Step 6. This script will write out one
-Cytoscape input file for each experimental condition in <sortedtfs_filename>.

Personal tools

Protocols:Motif Activity Response Analysis(MARA): Difference between revisions - resource_browser

Search

Human

Mouse

Rat

Dog

Chicken

Macaque

Cross species

Other

Tools

Protocols:Motif Activity Response Analysis(MARA): Difference between revisions

From FANTOM5_SSTAR

Revision as of 13:58, 31 August 2012

	FAQ
	Mozilla Cavendish skin Report a bug – Skin version: 3.0.0