Tuesday, March 18, 2014

Neighbour-joining trees based on factor input data

Here is a simple way to generate a neighbour-joining tree from factor data in R. This might be useful for, e.g.,  a cladistic analysis, or even analysis of indels, length mutations…? In this example, the factors are unordered and pairwise dissimilarities are calculated based on the proportion of shared character states, i.e. if 20% of the factors have shared states between a pair of species, their dissimilarity is 0.8; 1 indicates none shared. You can also specify ordered factors. The tree might be an end in itself, or a starting point for further analysis/optimisation. Note that while you can specify that the 'daisy' function should use 'Gower' dissimilarity (which handles factors), it is not needed in this case as it is automatically used if some of the input data are not numeric.



require(cluster)

DATA <- data.frame(f1=c(1,1,2,1),f2=c(3,1,2,2), f3=c(4,3,2,2), f4=c(5,5,4,1), f5=c(2,5,5,5), row.names=c("species1", "species2", "species3", "species4")) #dummy data

DATA
         f1 f2 f3 f4 f5
species1  1  3  4  5  2
species2  1  1  3  5  5
species3  2  2  2  4  5
species4  1  2  2  1  5


FUNCx <- function(x) as.factor(x)

DATA2 <- as.data.frame(apply(DATA, 2, FUNCx)) #convert from numeric to factor

DAISY <- daisy(DATA2) #generate dissimilarity matrix

DAISY
Dissimilarities :
         species1 species2 species3
species2      0.6                  
species3      1.0      0.8         
species4      0.8      0.6      0.4

Metric :  mixed ;  Types = N, N, N, N, N 
Number of objects : 4


require(ape)

plot(nj(DAISY)) #plot tree
Neighbour-joining tree based on dummy factor data using the function 'daisy' with Gower dissimilarity




No comments:

Post a Comment