Binning trees by topology

Recently stumbled across a 2013 paper from Ryan and Irene Newton describing a tool, called PhyBin, for binning phylogenetic trees, i.e. clustering them by similarity into groups (“bins”). They use the Robinson Foulds metric to represent the distance between trees.

The reason for doing this is to look at the phylogenies of individual gene ortholog clusters in a set of genomes, to find those genes that have a phylogeny different from the others. This might be useful e.g. to detect genes that have undergone horizontal gene transfer. The example they used for their paper was the insect symbiont Wolbachia.

It seems like a nice way to screen a set of genomes for genes that might be interesting. I had wanted to try to do something like this, but with a concordance-factor approach instead. Some other thoughts:

  • Each gene is represented by one tree – uncertainty is not taken into account, unlike with concordance factors, as implemented in BUCKy for example
  • If there are horizontally-transferred genes, they would probably have patchy distribution and not be in every species. But such genes that are present in only some genomes would be pre-excluded from the analysis, also in concordance analysis. In PhyBin paper the authors mention the case of Wolbachia prophage which has precisely this limitation.
  • Collapsing short branches is a good idea
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s