Accounting for phylogenetic relatedness in statistical models – Marco Plebani

A special case of data non-independence is represented by the phylogenetic relationship among species. Let’s say that we want to study the correlation between dietary habits and mean body weight in mammals. The body weight of different species may be affected by their dietary habits, but their phylogenetic relationship may also matter – closely related species may have similar mean body weight just because of their common evolutionary history. To study the correlation between dietary habits and mean body weight in mammals, one has to account for phylogenetic signal. One quick and dirty solution is to use mixed-effect models with taxonomic levels specified as nested random effects:

MEM.model <- lme4::lmer(body.weight ~ dietary.pref 
            + (1 + dietary.pref | family/genus/species), 
            data = mammals) 
# "mammals" is an imaginary dataset. It would have to contain, for each data point, information about body.weight and dietary.pref as well as the family/genus/species to which they belong.

A more specific alternative is to integrate phylogenetic information into the analysis. This is something I have only dabbled in, so consider the following notes as a starting point (updated to 2021). Phylogenetic generalized least squares (PGLS) allow to inform the model about the autocorrelation between taxonomic units using existing phylogenetic trees. Here is an example where they look at the relationship between wing length and tarsus length among Geospiza finch species. This approach only allows for one data point for each leaf of the phylogenetic tree. Package phyr can perform generalised mixed-effect models that account for phylogenetic relatedness and can deal with data sets in which every taxonomical unit is represented by multiple data points (see here and here).

Did you enjoy this? Consider joining my on-line course “First steps in data analysis with R” and learn data analysis from zero to hero!

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.