SweeD

We developed SweeD, a parallel and checkpointable tool that implements a composite likelihood ratio test for detecting selective sweeps.
SweeD is based on the SweepFinder algorithm (Nielsen et al. 2005).

SweeD can calculate the theoretical SFS of a given demographic model (stepwise changes or with an exponential growth phase + stepwise changes) by using the method by Živković and Stephan (2011).

SweeD is numerically more stable than SweepFinder (in terms of floating-point arithmetic operations and in particular for folded data), and is faster than SweepFinder when the number of sequences is large.
SweeD has been tested on simulated datasets with up to 10,000 sequences and 1,000,000 SNPs.

The sequential version of SweeD is up to 21 times faster than SweepFinder, depending on the number of SNPs and the number of sequences.
Performance improves over SweepFinder with an increasing number of sequences.
For few sequences, SweeD is as fast as SweepFinder.

SweeD has been also used to analyze the Chromosome 1 from the 1000 Genomes Project.
The dataset comprises more than 2000 sequences and about 2,896,000 SNPs. The analysis required 8h and 15mins.

You can download the source code of version 3.3.2 (January 2017) here.

The most up to date version is always available on the github repository
(git clone https://github.com/alachins/sweed.git)

version and bug history:

  • 3.3.2 It corrects a couple of bugs: 1. the outgroup is correctly assigned in every replication; 2. if prob > 1, then it continues with the next grid, instead of aborting all calculations
  • 3.2.12 (26 November 2014) here. This corrects a bug when we free the memory
  • 3.2.11 here. This version allows the parsing of subset of samples from the VCF file.
  • 3.2.10 here. This version skips (ms) empty alignments (not tested for SF, MaCS, VCF) and alignments that eventually have no polymorphic sites.
  • 3.2.9 here. This version allows for stepsize between grid-points to be less than one. This may be useful when files with multiple alignments are processed.
  • 3.2.8 here. In this version we allow the INFO and FILTER to have only the “.” character. Also, we have fixed some bugs regarding the recognition of the filetype. ungetc function was replaced with the fseek.
  • 3.2.7 here. This version should be able to handle the ^M characters. However, we recommend to not use input files with the ^M terminal character.
  • 3.2.6 here. This version does print only the integer position for the location of the sweep.
  • 3.2.5 here: SweeD can now read the small letters (a,c,g,t) for DNA
  • v3.2.4: SweeD can now read the msABC format. msABC is like ms but it contains some additional information on the ‘//’ line.
  • v3.2.3: the -noSeparator suppresses the header line for all but the first simulation. Also, this facilitates import in R.
  • v3.2.2: add the flag -noSeparator to suppress printing of the // separator between datasets
  • v3.1 Fixed bug in the VCF file format parser that was associated with handling missing

The manual is available here

Leave a Reply