select-star-species
Usage:
select-star-species [--dist-matrix] <subset size> <tree file>
Description:
Given a phylogentic tree, this program attempts to find the subset of species whose subtree is "closest" to a star topology. By "closest" we mean that the pairwise distances between distinct species are nearly equal, but are also as near as possible to the average pairwise distance for the complete tree.Algorithm:
Inputs:m: size of desired subsetS ← List of species in T
T: phylogentic tree
n ← size(S)
if (n choose m) > 1e5 then exit
M ← pairwise distance matrix for T
μM ← mean pairwise distance for M
C ← List of all subsets of S of size m
best_score ← max_float
best_subset ← ø
for each s in Cscore ← score(M, μM, s)print best_subset
if score < best_score thenbest_score ← score
best_subset ← s
score(M, μM, s) ← Var(sM) / μM
Var(sM) ← ∑ a, b ∈ s, a ≠ b (M[a, b] - μs)2
μs ← ∑a, b ∈ s, a ≠ bM[a,b] / ((len(s)2 - len(s)) / 2)Input:
A file containing a phylogentic tree in Newick format.Output:
The list of species in the subset, one species per line, is sent to the standard output.Options:
--dist-matrix
- This option causes the matrix of interspecies distances for the full tree to be written to the standard output.