Below are links and breif descriptions of software applications etc that have been developed by members of hte group.
A computational tool for identifying novel Antimicrobial Peptides (AMPs).
AMPLY is a bioinformatics pipeline designed to take any form of digital biological data and retrieve novel antimicrobial peptides (AMPs) for synthesis and screening against multi-drug resistant (MDR) strains of bacteria and fungi. Developed by Ben Thomas, this work was funded by Life Science Research Network Wales.
Amply is now part of a spin-out company Amply Discovery and all enquiries to its use should be directed there.https://amplydiscovery.com/
Links: Amply Website
Tool to implement a j48 decision tree from WEKA
Developed by Chris Creevey, Lucy Dillon and Nick Dimonaco, this tool is designed to allow easy implementation of j48 decision trees from WEKA on novel data.
Automated Quality improvement for multiple sequence alignments
Chris Creevey co-developed this protocol and software with Jean Muller while working at the Bork Group in EMBL. The protocol carries out the automatic identification of the most reliable multiple sequence alignment for a given protein family. The implementation relies on two alignment programs (MUSCLE and MAFFT), one refinement program (RASCAL) and one assessment program (NORMD), but other programs could be incorporated at any of the three steps.
A tool for concatenating multiple fasta alignments for supermatrix phylogenetic analyses
Developed by Chris Creevey as part of a general phylogenomics software suite.
Construction of Supertrees and exploration of phylogenomic information from partially overlapping datasets.
The software, developed by Chris Creevey, implements methods of determining the optimal phylogenetic supertrees, given a set of input source trees. The methods implemented all allow the investigation of data in a phylogenomic context.
Check trees for compatibility with defined groupings in unrooted trees - “The incontestable clan test”
Clan_Check analyses single-copy phylogenetic trees to assess if they violate clans defined by the user. This is designed for large-scale phylogenomic analyses where the user may have thousands of phylogenetic trees. This tool can help enrich the data for orthologs, by identifying where paralogy has caused violation of “well known” clans in outgroups.
Published in “Siu-Ting, Karen, et al. “Inadvertent paralog inclusion drives artefactual topologies and timetree estimates in phylogenomics.” Molecular biology and evolution (2019).”
Identify unstable taxa in phylogenies
Method and implementation developed where Chris Creevey worked in collaboration with Karen Siu-Ting, Mark Wilkinson and Davide Pisani. The method uses is a heuristic extension to the Safe Taxonomic Reduction method to identify unstable taxa in phylogenies and extends it by using a compatibility approach to test for taxa that can be equivalent or not in their character information. The implementation and program also uses Cytoscape to visualise taxonomic equivalents in a network.
The method used is detailed here.
If you use Concatabominations, cite:
Siu-Ting, Karen, et al. “Concatabominations: identifying unstable taxa in morphological phylogenetics using a heuristic extension to safe taxonomic reduction.” Systematic biology 64.1 (2014): 137-143.
A rumen microbiome focussed version of the PICRUSt functional inference software
Using 16S rDNA profiles from the Global Rumen Census and almost 500 fully sequenced microbial genomes from the Hungate 1000 project, CowPI is a rumen focused version of the PICRUSt tool for functional inference from 16S metataxonomic data.
Fast heuristic methods of detecting adaptive evolution in protein-coding genes.
CRANN is a software program written in the C programming language that can be used to investigate adaptive evolution in a number of ways. The program implements some of the most popular methods of measuring synonymous and non-synonymous distances between a pair of sequences (Li et al., 1985; Li,1993). The most powerful part of this program is to be found in its ability to detect adaptive evolution along evolutionary ineages. There are two methods implemented in the software—the method described by Messier and Stewart (1997) and the method of Creevey and McInerney (2002).
If you use Crann, cite:
Creevey, C. and J. O. McInerney (2003). CRANN: Detecting adaptive evolution in protein-coding DNA sequences Bioinformatics 19: 1726.
Creevey, C. and J. O. McInerney (2002). An algorithm for detecting directional and non-directional positive selection, neutrality and negative selection in protein coding DNA sequences. Gene 300: 43-51.
Identifying cryptic haplotypes from metagenomic datasets
These are twin software tools written by Sam Nichols as part of his PhD. Hansel implements a graph-inspired data structure for determining likely chains of sequences from breadcrumbs of evidence and Gretel implements an algorithm for recovering haplotypes from metagenomes from Hansel.
Automated Likelihood decay indices and Maximum likelihood phlyogeny construction with PAUP*
Written by Chris Creevey, Machete takes as input a nexus formatted aligned DNA or Amino Acid sequences and uses PAUP* to automatically calculate maximum likelihood trees (and/or carry out boostrap analyses) while optimising the models. It has been desinged to allow calculation of the likelihood decay supports for each internal branch of the resulting tree.
Machete controls and interacts with PAUP* using a pipe, and not using a predefined script. This allows dataset-specific optimisations to be carried out (as a user would).
Metagenomic Framewotk for the Study of Microbial Communities
Written by Francesco Rubino.
While metagenomics has been used extensively to study microbial communities from a taxonomic and functional perspective, little has been done to address how the species in a microbiome are adapted to and maintain specific roles in dynamic environments like the rumen.
To address this issue we have developed a framework for the robust analysis of metagenomic data that includes fully automated analysis from next-generation sequencing (NGS) reads to assembly, gene-predicition and taxonomic identification. Furthermore we implement approaches to estimate SNP diversity in metagenomic samples and carry out statistical tests to identify genes where sequence diversity exists.
The framework allows easy customisation of any metagenomic workflow, by providing the necessary functions and scripts to manipulate data from NGS pipelines and provides bespoke analyses of the data. MGKit also does not enforce a specific pipeline on the user, but leverages analysis patterns and common files formats to make it easier to experiment with different types of analyses.
MGKit is implemented in Python1 and uses common libraries used in the Python Scientific Community, like NumPy, SciPy, Matplotlib4 and pandas5, along with packages used in NGS data analysis, like HTSeq and pysam.
Software for annotation of both novel and known single nucleotide polymorphisms (SNPs)
Developed by Anthony Doran and Chris Creevey. It is specifically designed for use with organisms which are either not supported by other tools or have a small number of annotated SNPs available, however it can also be used to analyse datasets from organisms which are densely sampled for SNPs.
An iterative approach for large metagenome assemblies.
Developed by Tom Hitch, Spherical is an iterative approach to assembling metagenomic datasets written by Tom Hitch as part of his PhD. Spherical has been designed to produce a more complete assembly from deep sequenced metagenomic data. Utilization of multiple iterations of assembly allows for regions which otherwise would be missed to be assembled without a reduction in contig accuracy. Another use for Spherical is its ability to produce metagenomic assemblies using a subset of the initial input file, allowing for assembly of a metagenome whilst using a fraction of the RAM that would otherwise be required.
StORF-Reporter, a toolkit that takes as input an annotated genome and returns missed CDS genes from the Unannotated Regions (URs).
Developed by Nick Dimonaco as part of his PhD, this tool extracts Unnannotated Regions from PROKKA genome annotations, finds Stop - Open Reading Frames and reports them in a new PROKKA formatted GFF file in the PROKKA output directory.
Rapid calculation of pathlength distances between taxa on phlyogenetic trees
Developed by Chris Creevey this tooal allows rapid calculation of pathlength distances between taxa on phlyogenetic trees. This comprises two tools that are used to calculate pathlength distance on phylogenetic trees. Treedist_pair which returns the distance between two branches of a tree (including internal branches). Treedist_all which returns the pathlength distances between all terminal branches of a tree.