INSP3CT

Interpolation and Statistical Proximity of 3C Tables

This project is maintained by shayben

INSP3CT


Interpolation and Statistical Proximity of 3C Tables

Version 1.0


Provided as supplementary alongside -

"Spatial localization of co-regulated genes exceeds genomic gene clustering in the S. cerevisiae genome" By Shay Ben-Elazar, Zohar Yakhini, and Itai Yanai. Accepted for publication to NAR (Dec 2012).


Description

This program can be used to take a sparse contact dataset and interpolate it to a sampled matrix at an arbitrary resolution. On top of the resulting matrix, any set of annotation of loci is then inspected for co-localization both in 3d and 1d. The result can then optionally be displayed in a figure similar to Figure 3 in the above paper.

Requirements

It is recommended to run this on a machine with at least 8 GB of RAM, depending on the size of the dataset. Multiple cores can and should be utilized to speed up analysis. Our setup (for this particular dataset) was a machine with 192 GB RAM and 12 dedicated cores. Basic Matlab knowhow is a prerequisite.

Input and parameters:

Edit the main pipeline file, INSP3CT.m, and make sure to set the following parameters.

Required input files:

An example of a sufficient file-set can be found on github (YeastDatasetExample.zip). The parameters corresponding to this input are commented out in the code.

Additional parameters:

You also need to manually create the following variables before running any co-localization analysis:

There is a commented-out example which randomly generates loci and annotations used for testing.

Usage

Main file is INSP3CT.m.

  1. Eden, E., Lipson, D., Yogev, S. and Yakhini, Z. (2007) Discovering motifs in ranked lists of DNA sequences. PLoS Comput Biol, 3, e39
  2. Eden, E., Navon, R., Steinfeld, I., Lipson, D. and Yakhini, Z. (2009) GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics, 10, 48.).

Output

Disclaimer

These scripts are provided as is and under the GPL license. Much of the code is NOT optimized since it has undergone many iterations and was not specially tailored for the purposes in this code. Please make sure to read the instructions and paper thoroughly before contacting the authors. Importantly, please note that the analysis in the paper was performed on a haploid genome and there are repercussions for switching to a diploid genome in the interpretation of the input data which were not addressed in this code or paper.

Copyright

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Contact

For questions regarding this program e-mail: shayben@cs.technion.ac.il. Further inqueries can be directed to Itai Yanai, website email: yanai@technion.ac.il.


Copyright 2012 Shay Ben Elazar ©