NEXT-peak
NEXT-peak is implementing the Normal-EXponential Two-peak (NEXT-peak) model. To specify the model, parameters beta and sigma have to be estimated. Parameter estimation can be done with ChIP-seq data and a list of the corresponding motif sites. NEXT-peak can use optional mappability information. The mappability information should be saved in a directory with the file names of chr1.map, chr2.map .... The files should contain values of zeros or ones, where 1 represents mappable, 0, unmappable. Using the PeakSeq suite, these files can be generated by modifying CountMap.py of the PeakSeq suite.


Usage

$ nextpeak <OPTIONS>

                    -<option code> <option value>

Options

-l    To provide a list of the genomic sequence lengths. A possible format is a single column of values.
    If this file is not provided, a user can use -g option to provide a genomic sequence file.
    If neither -l nor -g options was used, sequence lengths of hg18 (Homo Sapiens Build 36.1) is used.


-g    When -l option is not used, a user can provide a genomic sequence file, containing all genomic
    sequences in FASTA format. Possible headers are

    ">" immediately followed by a chromosome number, e.g., >1, >2, etc.

    headers of UCSC human genome.  ">chr" followed by a number or "X", "Y",
        e.g., >chr1, >chr2, ..., >chrX, >chrY.

    headers of NCBI Build 36.1.
        e.g. >gi|89161185|ref|NC_000001.9|NC_000001 Homo sapiens chromosome 1, reference assembly, complete sequence


-c    To provide mapped ChIP-seq data. The format should be "+"/"-" for strand information, followed by
    a genome number, followed by a genomic coordinate. e.g.,

          -       5       25494792
          +       18      31889166
          -       23      118034178
          ...
          (more lines)
    When Bowtie is used for mapping, this format can be generated with --suppress 1,5,6,7,8 option with
    headers containing only numbers.

-d    To provide a directory name that contains files for optional mappability information. The directory
    should contain "/" at the last position of the text, e.g., /home/mappability/
    Under the directory, it should contain files with names "chr1.map", "chr2.map",..., "chr24.map",etc.
    Note that "chrX.map" and "chrY.map" are not allowed; use "chr23.map" and "chr24.map" instead.
    Each file should contain a sequence of zeros and ones, "0" for unmappable location, "1" for mappable.

-r    To generate optional output for the candidate regions for peaks. With this output, a user can
    make a plot of any particular region of interest. If mappability information was used, unmappable
    locations have "Non" instead of usually a pair of counts, one for the left strand, the other for
    the right strand, separated by "-", e.g., 2-4 means 2 tags were mapped in the left (forward) strand,
    4 tags were mapped in the right (backward) strand for this particular genomic location.

-m    To estimate NEXT-peak model parameters beta and sigma, a user can provide a motif site location file.
    The file should have two columns, the first is for the genome number, the second is for the left end
    location of the site. When using -m option, make sure to provide the tag length (-h option) and
    the motif length (-i option) as well.

-h    To provide the length of tags.

-i    To provide the length of the motif.

-a    To provide the name of the file that contains observed frequencies around the motif sites.

-b    To provide the value of parameter beta. If -m option is used, beta and sigma will be estimated.

-s    To provide the value of parameter sigma.

-w    To provide a window length for identifying regions of interest. (Default: 150)

-t    To provide a minimum count in a window for identifying regions of interest. (Default: 15)

-v    To provide the vector length for NEXT-peak density tabulation. (Default: 1000)


Output format

NEXT-peak creates two parts of the output: a global summary and regions statistics.

A global summary appears ahead of region statistics. It shows a ChIP-seq file name, parameter estimates for the model, and suggested cutoff  values for spurious region. Note that the area under the curve (AUC) is not the usual AUC from a ROC curve. This is a precision curve of Figure 3 in the article. It considers top 10,000 peaks.

NEXT-peak generates several statistics for each candidate region where each line holds statistics for each region. Total 20 columns will be reported. The information in the columns are as follows:


Output example

ChIP-seq file: /STAT1/STAT1mapped.txt
Mappability information is used
Motif location file: /CPP/FindKnownMotif/STAT1_w15_5m6.txt
Parameter estimation using anchored data: beta=68.3992 sigma=50.5182 mu=500 nu=180350 rho=182.072 Total count=724845
cutoff for spurious region: (region length = 400, p-value for GOF = 1e-6) AUC = 0.307992 Distance to motif cutoff=250
1 554323 558411 4089 3759 0.919296 791.791 12.5462 3488 0.462538 7541 208.413 28.4792 0.976817 0.00378813 0.0261623 0 19.8336 12 0.0703005
1 558564 560144 1581 1341 0.848197 905.581 8.03588 1257 0.483834 2598 251.906 20.3322 0.883445 0.00758099 0.0879905 0 37.5651 12 0.000180821
1 703751 703951 201 170 0.845771 177.155 7.92663 16 0.727273 22 19.1956 8.83313 0.0406596 0.0259798 0.371624 6.39239e-08 16.5858 12 0.165855
1 865987 866342 356 356 1 174.524 6.68542 30 0.517241 58 53.0984 6.03224 0.0142044 0.00847225 0.825629 2.92872e-13 14.6734 12 0.259783
1 926070 926219 150 150 1 85.3262 6.54363 7 0.466667 15 18.3815 4.48855 0.0150987 0.0149618 0.698026 0.000978739 13.2669 12 0.349941
1 938384 938767 384 384 1 205.205 5.9193 31 0.596154 52 44.2749 3.86602 0.0147415 0.00503388 0.78228 1.23131e-11 20.1754 12 0.0638401
1 1004685 1004834 150 150 1 130.048 5.75204 13 0.866667 15 27.2357 3.47877 0.00553989 0.0115959 0.889202 0.00261428 17.7941 12 0.122086
1 1041429 1041681 253 253 1 108.664 5.64222 14 0.451613 31 20.2571 3.08478 0.0298033 0.0060964 0.513534 6.61504e-09 11.8285 12 0.459548
1 1060382 1061082 701 661 0.942939 400.4 3.65425 140 0.454545 308 302.083 3.04245 0.0289262 0.0023014 0.875843 4.61065e-99 64.2658 12 3.72534e-09
...
(more lines)


Usage examples

                $ nextpeak -c ChIPmapped.txt -b 65.2 -s 45.1

                $ nextpeak -c ChIPmapped.txt -b 65.2 -s 45.1 -l hg19SeqLen.txt

                $ nextpeak -c ChIPmapped.txt -b 65.2 -s 45.1 -g hg19all.fa

                $ nextpeak -c ChIPmapped.txt -m motifSites.txt -h 27 -i 15
                $ nextpeak -c ChIPmapped.txt -b 65.2 -s 45.1 -d /home/Mappability/

                $ nextpeak -c ChIPmapped.txt -m motifSites.txt -h 27 -i 15 -a anchoredFreq.txt

                $ nextpeak -c ChIPmapped.txt -m motifSites.txt -h 27 -i 15 -r regions.txt

                $ nextpeak -c ChIPmapped.txt -m motifSites.txt -h 27 -i 15 -a anchoredFreq.txt -r regions.txt -d /home/Mappability/


Files and Installation

The files in the downloaded directory include: 
  1. nextpeak1.1_linux.zip: Linux executable
  2. nextpeak1.1_cpp_files.zip: C++ source files

Reference

Kim NK, Jayatillake RV, Spouge JL,
"NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data"
(2013), BMC Genomics, 14:349