Quick Usage

python MuMRescueLite.py <input file> <output file><window>

What Does This Script Do?

Sequence tags that map to multiple genomic loci (multi-mapping tags or MuMs), are routinely omitted from further analysis, leading to experimental bias and reduced coverage. MuMRescueLite probabilistically reincorporates multi-mapping tags into mapped short read data with acceptable computational requirements. Please check the reference articles for more details.

More detailed usage

This program requires the following arguments:

  1. input file: the data file to be processed. Its format isdescribed in the next section.
  2. output file: the rescue result is written to this file. Its format is described in the next section.
  3. window: the number of bases around each MuM location to seek single mapped tags at each multi mapping location; MuMRescueLite will search a length of window/2 upstream and downstream of a given MuM location.

Input File Format

MuMRescueLite.py accepts a tab-delimited ascii text file with 1 header line as input. These columns of this file must consist of:

  1. identifier of the tag. unique id or unique sequence.
  2. total number of mapped locations of this tag.
  3. mapped chromosome or name of assembly.
  4. mapped strand.
  5. start of genomic mapped position; must be same or smaller than the end.
  6. end of genomic mapped position.
  7. number of times (a count) the sequence was observed in this experimental condition.

The start of an example input file:

#ID	locations	chromosome	strand  start   end	count
s3.25mer.txt-1	1	chr12	+	105579297	105579321	1
s3.25mer.txt-4	1	chr8	+	95642182	95642206	1
s3.25mer.txt-7	6	chr13	+	66975161	66975185	1
s3.25mer.txt-7	6	chr13	-	72592620	72592644	1
s3.25mer.txt-7	6	chr14	-	46332831	46332855	1
s3.25mer.txt-7	6	chr19	-	32540873	32540897	1
s3.25mer.txt-7	6	chr1	-	113777719	113777743	1
s3.25mer.txt-7	6	chr2	+	70297183	70297207	1

Output File Format

MuMRescueLite.py writes results as a tab-delimited ascii text file and appends a "weight" column for each input line, with 1 header line. A detailed description is show as follows;

  1. identifier of the tag. unique id or unique sequence.
  2. total number of mapped locations of this tag
  3. mapped chromosome or name of assembly
  4. mapped strand
  5. start of genomic mapped position; must be same or smaller than the end
  6. end of genomic mapped position
  7. number of the sequence observed in this experimental condition
  8. weight as probability for this sequence of this mapped position; 1.0 for the single mapped sequences, from 0.0 to 1.0 for the multi mapped tags

System Requirements

Reference

License

MIT license; see LICENCE.txt

Contact

g.faulkner@expressiongenomics.org

README Authors

Takehiro Hashimoto, Michiel J. L. deHoon, Geoffrey J. Faulkner