Create your own conference schedule! Click here for full instructions

Presentation Detail


Informatics

Ishikawa, Sohta [1], Inagaki, Yuji [2], Hashimoto, Testuo [2], Sato, Mitsuhisa [3].

Efficient parallelization of the maximum-likelihood phylogenetic inference with the non-homogeneous substitution model.

Recent advances in DNA sequencing techniques enable us to phylogenetically analyze large matrices including sequences from diverse species (and genes). At present, the vast majority of maximum-likelihood (ML) phylogenetic programs only implement ‘homogeneous’ substitution models enforcing an uniform evolutionary process to the entire tree. Because the nucleotide and amino acid sequences in distantly related species certainly evolve under different evolutionary processes (non-homogeneous evolution), the assumption of homogeneous models is often violated, and such model violation may result in various forms of phylogenetic artifacts.  For accurate phylogenetic inferences from real-world sequence data, non-homogeneous (NH) substitution models, which allow model parameters to vary across the tree, are more realistic than homogeneous models. However, the analyses with NH models, in which an enormous amount of model parameters need to be optimized, can be computationally intense. Therefore, an efficient parallelization of the phylogenetic programs is critical to analyze real-world data with a NH model within a reasonable computational time. In this study, we parallelized a phylogenetic program which implements a NH model that allows the adenine + thymine (A + T) content to vary across the tree (NHML, Galtier and Gouy 1999). We applied two approaches for parallel computing, OpenMP and MPI, into the tree searching algorithm of NHML. We evaluated the performance of this HYBRID version of NHML by analyzing simulated nucleotide datasets. The entire analyses were conducted using T2K-Tsukuba super cluster, in which each computational node is composed of 4 quad-core CPUs (AMD Opteron 8356, 2.30 GHz). The performance of the HYBRID version of NHML was found to be improved by the number of the nodes on the super cluster used. When 16 nodes/256 cores were used, the ML tree search became ~47 times faster than the control. We also intentionally assessed the accuracy of NHML by analyzing large simulated sequence datasets with compositional heterogeneity, which were simulated from various topologies of the model trees.


Log in to add this item to your schedule

1 - University of Tsukuba, Graduate School of Life and Environmental Sciences, Tenno-dai 1-1-1, Tsukuba-shi, Ibaraki-ken, 305-0006, Japan
2 - University of Tsukuba, Institute of Biological Sciences, Tenno-dai 1-1-1, Tsukuba-shi, Ibaraki-ken, 305-0006, Japan
3 - University of Tsukuba, Graduate School of Systems and Information Engineering, Tenno-dai 1-1-1, Tsukuba-shi, Ibaraki-ken, 305-0006, Japan

Keywords:
maximum likelihood
phylogenetic inference
non-homogeneous models
parallel computing
simulation.

Presentation Type: Regular Oral Presentation
Session: 101
Location: Alpine A and B/Snowbird Center
Date: Monday, June 24th, 2013
Time: 9:15 AM
Number: 101004
Abstract ID:640
Candidate for Awards:Ernst Mayr Award


Copyright © 2000-2013, Botanical Society of America. All rights reserved