R tools and scripts for vegetation science and ecology: Neighbour-joining trees based on factor input data

Tuesday, March 18, 2014

Neighbour-joining trees based on factor input data

Here is a simple way to generate a neighbour-joining tree from factor data in R. This might be useful for, e.g., a cladistic analysis, or even analysis of indels, length mutations…? In this example, the factors are unordered and pairwise dissimilarities are calculated based on the proportion of shared character states, i.e. if 20% of the factors have shared states between a pair of species, their dissimilarity is 0.8; 1 indicates none shared. You can also specify ordered factors. The tree might be an end in itself, or a starting point for further analysis/optimisation. Note that while you can specify that the 'daisy' function should use 'Gower' dissimilarity (which handles factors), it is not needed in this case as it is automatically used if some of the input data are not numeric.

require(cluster)

DATA <- data.frame(f1=c(1,1,2,1),f2=c(3,1,2,2), f3=c(4,3,2,2), f4=c(5,5,4,1), f5=c(2,5,5,5), row.names=c("species1", "species2", "species3", "species4")) #dummy data

DATA

f1 f2 f3 f4 f5

species1 1 3 4 5 2

species2 1 1 3 5 5

species3 2 2 2 4 5

species4 1 2 2 1 5

FUNCx <- function(x) as.factor(x)

DATA2 <- as.data.frame(apply(DATA, 2, FUNCx)) #convert from numeric to factor

DAISY <- daisy(DATA2) #generate dissimilarity matrix

DAISY

Dissimilarities :

species1 species2 species3

species2 0.6

species3 1.0 0.8

species4 0.8 0.6 0.4

Metric : mixed ; Types = N, N, N, N, N

Number of objects : 4

require(ape)

plot(nj(DAISY)) #plot tree

Neighbour-joining tree based on dummy factor data using the function 'daisy' with Gower dissimilarity

R tools and scripts for vegetation science and ecology

Tuesday, March 18, 2014

Neighbour-joining trees based on factor input data

No comments:

Post a Comment

Popular Posts

Search This Blog