CoMuS: Coalescent of Multiple Species and CoMuStats

image_t6.jpgIntroduction

Here, you can download the source code of CoMuS, CoMuStats, as well as R scripts that can be used for different analyses. The R scripts implement specific examples that have been used in CoMuS’ manuscript or, in general, I consider them interesting. You can download them, modify them according to your needs.

You can download the most recent version from the github:
https://github.com/idaios/comus

or type in terminal: git clone https://github.com/idaios/comus.git

Instructions:

  1. download the code from the previous link
  2. tar xvfz comus.tar.gz
  3. cd comus
  4. to compile: make -f Makefile.gcc (you may need to remove the *.o files)
    i.e. rm *.o and then make -f Makefile.gcc
  5. now you should have the comus executable
  6. IMPORTANT: there is a pre-compiled executable in the comus.tar.gz. However, sometimes, depending on the system you will not be able to execute it.Instead, you will see the error message:comus: /lib64/libc.so.6: version `GLIBC_2.14′ not found (required by comus). Please first clean: make clean -f Makefile.gcc
    and then recompile the code: make -f Makefile.gcc 

Scripts and demonstration

In CoMuS manuscript we have used several scripts, simulations and inference examples to demonstrate CoMuS usage. These scripts are provided here either for demonstration or testing purposes.

  • ancestral sampling

    CoMuS allows the simultaneous simulation of both modern and ancestral samples. To facilitate the simultaneous coalescent simulation of modern and ancestral samples, CoMuS implementation starts at the present-day with the whole dataset (modern and ancestral). However, all events that involve ancestral samples or their population are forbidden until sampling (time proceeds backwards). After sampling (backwards in time), evolutionary processes (e.g. recombination, mutations, coalescent, migration etc) take place as usually according to the model parameters.

    Here, we demonstrate the usage of CoMuS to infer potential ancestral gene flow between an extant population and an extinct sample (fossil). The simulation scenario is as follows: we assume a sample of 10 sequences from species A sampled at present, and a sample of 10 sequences from an extinct species B sampled at time 0.2 (phylogenetic time units). The time of the MRCA has been set to 0.5 (phylogenetic time units). We assume no gene flow after speciation between the species A and B. We illustrate the dendrogram for this scenario in Figure 1 (below). Assuming that the above scenario represents the true evolutionary history for extant species A and extinct species B, our goal is to infer: (i) whether gene flow between A and B is absent or present, and (ii) the time of sampling for species B. The time of the MRCA (= 0.5 phylogenetic units) as well as θ (= 100) value is assumed to be known. The length of the simulated region is 1kb and we assumed a mutation model with equal mutation rates between each pair of bases.

    Download the script here
    tree_ancestral

Figure 1: a coalescent tree example comprises both present-day and ancestral sampling

Testing species delimitation software

  • testing species delimitation with gene-flow

An apparent usage of CoMuS is to test species delimitation software. More specifically various parameters such as

  • changes of population size
  • population subdivision
  • ghost populations
  • etc

can be examined and the performance of species delimitation software can be assessed. Here, we test the PTP software developed by Jiajie Zhang et al. (including myself) in the group of Alexandros Stamatakis in Heidelberg. The manuscript is available from here.

The scripts used for this demonstration can be found here. (NOTE: to run the full set of commands in the scripts you need to install raxml. see raxml-github)

  • Testing species delimitation with various birth rates

Scripts can be found here. The ideas are similar as those presented above.

Inferring parameters values

CoMuS can be used to infer parameter using the ABC framework. We have used two scenarios: (i) 2 species, each of 10 sequences sampled, inference of the birth rate b (ii) 10 species, each of 10 sequences, inference of the birth rate b and the time of the most recent common ancestor, i.e. the time that the two populations find common ancestor.

  • Scripts for scenario (i) can be found here
  • Scripts for scenario (ii) can be found here

Examples and Manual

  • Inside the comus directory you will find a directory called ‘manual’ which contains the manual, i.e. manual.pdf
  • Also there is a directory called examples. There is a run.sh file that contains commands as well as useful notes that explain most of the results. Please consult them first.

CoMuStats

This software can be used to calculate summary statistics from single or multi-FASTA alignment files. Multi-alignment fasta files should be separated by //.

For example:

>seq1
ACGTG
>seq2
ACGGG
//
>seq1
ACCTC
>seq2
ACCCC

Bugs and Previous versions

  1. CoMuS v 1.0 : -oFormat unrecognized
  2. CoMuS v2.0 March 2016 (CoMuStats was not running properly the sliding window code).

Log Report

  1. CoMuStats 1.0.1 accepts an outgroup. Run CoMuStats without arguments for further details.