RAiSD: Selective Sweep Detection based on Multiple Signatures

RAiSD (Raised Accuracy in Sweep Detection) is a stand-alone software implementation of the μ statistic for selective sweep detection. Unlike existing implementations, including our previously released tools (SweeD and OmegaPlus), RAiSD scans whole-genome SNP data based on a composite evaluation scheme that captures multiple sweep signatures at once.

The main article describing RAiSD and the μ statistic is published in Communications Biology:

  1. RAiSD detects positive selection based on multiple signatures of a selective sweep and SNP vectors (PDF)

Other related publications:

  1. Accelerated Inference of Positive Selection on Whole Genomes (PDF)

The source code of RAiSD is available on github:

SweeD: Selective Sweep Detection based on SFS

SweeD implements a composite likelihood ratio test to detect selective sweeps from whole genome data.

The main article describing SweeD is published in Molecular Biology and Evolution:

SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes
Pavlos Pavlidis, Daniel Živković, Alexandros Stamatakis, Nikolaos Alachiotis
Molecular Biology and Evolution, Volume 30, Issue 9, September 2013, Pages 2224–2234,

The github page of SweeD is here:

OmegaPlus: Selective Sweep Detection based on LD

OmegaPlus detects selective sweeps by employing the omega-statistic. The omega-statistic (first described by Kim and Nielsen, 2004) is a statistic that receives high values when there are two neighboring regions in an alignment, that each has high LD levels, but the LD level between them is low.

The main publication is here:
2012 Sep 1;28(17):2274-5. doi: 10.1093/bioinformatics/bts419. Epub 2012 Jul 3.

The github page of OmegaPlus is here:

Demography Inference (ABC)

Demography inference is an interesting topic of population genetics because it allows us to understand the history of species. It’s also a vital part of selection inference. Because both demography and selection occur simultaneously on populations and often in similar time scales it’s difficult to disentangle one from the other. We implemented msABC to facilitate the inference of the demographic model.

Mol Ecol Resour. 2010 Jul;10(4):723-7. doi: 10.1111/j.1755-0998.2010.02832.x. Epub 2010 Feb 2.
msABC: a modification of Hudson’s ms to facilitate multi-locus ABC analysis.
Pavlidis P1, Laurent S, Stephan W.

github page:

SPS: Forward-Backward Spatial Simulator for Genetic Data

Forward-Backward Stimulator for Genetic Data in a spatial framework. Using SPS we can explore the nice properties and patterns on genomes because of “just living in space and not in an ideal Wright-Fisher model”.

FEG: Forward Evolutionary Game Simulator

Forward Spatial Simulator featuring a predator – prey model. Using FEG, we can see the footprints that selection on behavior leaves on the genome.

EVONET: Evolution of Gene Regulatory Networks

Evonet simulates the evolution of GRNs by means of genetic drift and selection. Based on the work of Dr. Andreas Wagner on boolean GRNs,  Evonet expands those ideas by implementing two regulatory regions on each network gene. Evonet is written in C to ensure maximum compatibility and permit faster execution times. Github + Readme:

Endogenous Virus Evolution

During a retrovirus infection, a DNA copy of the viral RNA genome is permanently integrated into the nuclear DNA of the host cell as a provirus. Endogenous retroviruses (ERVs) have contributed to more than 8% of the human genome.
This project’s goal is to locate endogenous retroviruses DNA copies into host’s genome and their lineage and then to consider whether is a mutation in genes close to and what the consequences are.

Mitochondrial-Nuclear genes Co-Evolution

HiC Data analysis — Evolution of DNA 3D Structure

In this project, we compare the binding sites of TFs from the ENCODE Consortium against random points and we perform hypothesis testing and statistical tests for the location of TFs and Histone Modifiers relative to the boundaries of LADs and TADs. Furthermore, we examine the distribution of Transcription Start Sites (TSS) from the GENCODE project within or outside TADs’ and LADs’ boundaries, and we measure the distance of the former from the latter, as well as from the center of each 3D DNA conformation.  Simultaneously, we address the type of transcript that arises from each TSS, with our focus being placed on protein coding genes. In addition, we identify the recent evolutionary forces that shape the patterns of polymorphisms in relation to LAD and TAD boundaries within Human and Mice populations. Thus, we provide insights into whether recent, within populations evolution acts on the 3D structures of DNA within the nucleus. Finally, using Dn/Ds measurements we examine whether older selective forces have been applied on genes located within or outside  TADs and LADs.

Population Metagenomics

Evolutionary models of amino acids substitutions based on their neighborhood tertiary structure

We have used the Protein Interaction Statistics (PrInS) algorithm to statistically describe interactions between amino acids using protein structures. PrInS produces a scoring matrix to describe the frequency of amino acid interactions in the protein structures. Then, this matrix was converted to several different types of matrices in order to be compared with the model substitution matrices, such BLOSUM62 and PAM120, in the prediction of protein evolution.

Analysis of RNA-Seq and Chip-Seq Data