AC3: Analysis of phylogenetic independant Contrasts for Continuous and Categorical variables

François Gao, Jonathan Corcy, Michel Laurin

Summer 2007

This package of modules and library classes was built on the model of PDAP. AC3 works in the same way as PDAP.


Overview

A central question in current comparative biology deals with correlation between characters and the corresponding ability to infer a character value (or state) using observations in another character. Phylogenetically Independent Contrasts (FIC) analysis was proposed by Felsenstein (1985). It became one of the most (perhaps the most) widely used comparative methods to perform regressions between variables while handling the lack of independence between observations resulting from shared history between species, as shown by the large number of citations of this paper (2382 citations as of 8-8-2007, according to the ISI; all numbers of citations reported below are from the same date and source). This method is widely used for continuous variables, but its application in cases in which the independent variable is discrete is more problematic because either the method proposed is not user-friendly either its results are criticized.

The AC3 : Analysis of phylogenetic independent Contrasts for Continuous and Categorical variables module (an adaptation of the PDAP module of Garland et al’s Phenotypic Diversity Analysis Program) allows the user to do a regression between standardized contrasts of continuous characters and contrasts of a categorical or discrete character.

The method used to calculate the contrasts of continuous characters is taken from the PDAP module for Mesquite (Midford et al., 2003). To calculate the categorical contrasts, AC3 uses unstandardized contrasts of the most parsimonious dicrete states, which are reconstructed by Mesquite (Maddison and Maddison, 2006). The contrast value of a node is the substraction between the states of its immediate descendants. Contrary to CAIC (Purvis and Rambaut, 1995), AC3 uses all contrasts of known value, even those which equal zero for the discrete or categorical variable. This procedure increases the number of contrasts and should give more accurate results because the variance in the continuous character within taxa which share a discrete state is relevant to determine the statistical significance of the regression coefficient.

First, AC3 computes contrasts for all nodes for which a single most parsimonious state for the discrete or categorical variable can be attributed to both immediate descendants. This calculation is performed as in the PDAP module for Mesquite, except that contrasts for the discrete variable are not standardized (those for the continuous variable are standardized in the usual way). Ambiguity in optimization of the discrete character can result in excluding several potential contrasts. To use as much information as possible, AC3 then tries to compute contrast between any terminal taxon or internal node which was not previously used; to distinguish them from the first set of contrasts, these are called the recalculated contrasts below.

However, AC3 will not compute such recalculated contrasts if their path goes through a node which has already been contrasted, to maintain phylogenetic independence of the contrasts. Recalculated contrasts of a continuous character will generally be different from contrasts computed by PDAP because in these cases, the contrasted taxa are not immediate descendants of the basal node at which the contrast is computed. The recalculated contrast value of the continuous character at the node is the difference in character value between the contrasted taxa divided by the square root of the sum of raw (uncorrected) branch length if the contrast is between two terminal taxa. If the contrast involves one or two internal nodes, the recalculated (vk’ in the terminology of Felsenstein, 1985) length is used for the branch immediately below these nodes; for all other branches, the original (vk) length is used. This is because branches need to be lengthened only when an estimate (rather than an observed) character value is used in a contrast. You can see an example of the calculations below.

Contrasts are taken between immediate daughter-nodes of the nodes, except when more than one most parsimonious state can be attributed to at least one of the immediate daughter-nodes. In such cases, contrasts are not taken (grey), unless some daughter-nodes were not used in contrasts and the path linking them has not been used either. This is the case for nodes 2 and 30 (squares), at which contrasts can be taken between more distant descendants (yellow). In this case, AC3 uses 17 out of a maximal number of 22 contrasts (22 contrasts would be available if there were no ambiguity in the optimization of the discrete character), including 8 non-0 contrasts for the discrete character (contrasts are usually different from 0 for continuous characters at all nodes). CAIC would only use the 8 non-0 contrasts (red).


AC3 Module

You need to have a tree and at least one continuous and one categorical matrices to run the AC3 module.

When you have selected the tree window, you can find in the menu bar the « Analysis » option.

There, click on « New Chart for Tree » to find the new submenu: « AC3 Diagnostic Chart » . Click on it to launch the module.

Then you will be asked to choose a source of continuous and a source of categorical characters. You can choose between different sources by selecting the right option shown in the following windows :

 

          

If you have several stored continuous or categorical matrices, you may choose the data matrix you want to use for the calculations by selecting them in the following windows: (Note: if you did not assign all branch lengths a warning will appear, and by default, the branch lengths in calculations will be set to 1 for all branches of unassigned length).

 

        

Then you should have an AC3 Chart window.

 

In the menu bar, you can see a new menu item « AC3.Chart » in which you can choose between different kinds of screens.

The AC3 module includes those options:

and the new option:

Please note that the reconstruction method that this module uses in order to get ancestral states is parsimony with ordered states. Also, if your tree contains polytomies this module will solve the problem by creating 0-length branches and the polytomy assumption of the tree used in calculations will be soft (actually it is the same method used in the PDAP module). Thus the tree you used to get the module started may not be the same tree used in calculations unless you set the parsimony reconstruction model to ordered, the polytomy assumption to soft (by default the polytomy assumption is set to hard in Mesquite, so this must be changed every time you open the file or switch to a new tree), and solve polytomies by adding 0-length branches. However, even if you do not alter the tree and/or set the correct parsimony model, this will have no incidence in the computed results. The reconstruction method as well as the polytomy assumption are options that can-not be changed in this module.

As mentioned earlier, some continuous characters have a recalculated contrast value due to the fact that their corresponding categorical character also has a recalculated contrast. You can see which nodes have been recalculated in the text tab of screens 9A, 9B and 14. This means that if you try to compare the results between PDAP and AC3 you should get the same values for all the continuous contrasts except for the recalculated contrasts (in rare cases the recalculated contrast may have the same value). Note that you can only see which nodes have recalculated contrasts in the windows cited above.

You may notice that some of the points listed in the text tab of the chart window are not displayed in the graphics window. This is because only the points used in calculations (including recalculated contrasts) are shown; this is also why you have information about the total number of points, the number of points used and the number of points that are not used. The total number of points is usually the total number of nodes in the reference tree, the number of points used counts how many points have X and Y coordinates, the number of points that are not used is merely the difference between the first and the second. Therefore, a valid data point is a point for which a categorical contrast AND a continuous contrast can be computed (only the categorical contrast is sometimes impossible to compute). That is why there are two scrollboxes in every chart window, because each time you choose a different categorical character a new set of constrast is computed thus potentially changing the number of used contrasts.


Download

To download the AC3 package, click on this link.


Installation

The AC3 module is written in JAVA like all modules for Mesquite and thus works on most computers (PCs with Windows, Macintosh OSX, etc.), as long as JAVA is installed (http://www.java.com/fr/).

To install the AC3 package, you need have Mesquite program (http://mesquiteproject.org/mesquite/mesquite.html) and you must add the " AC3" directory in the mesquite folder within Mesquite_Folder. Then launch the Mesquite application.

You can get an example file which illustrates how AC3 works here (Right-click then Save target as...).


Citation

Gao F., Corcy J., and Laurin M. 2007. AC3 : Analysis of Continuous vs. Categorical Contrasts, package for Mesquite, version 1.00. http://mesquiteproject.org/ac3_mesquite/

 


Contact

 

For questions regarding this module, please contact Michel Laurin at laurin@ccr.jussieu.fr and put AC3 in the subject line.

 


Acknowledgements

This program is based in large part upon the code of the PDAP :PDTREE of Mesquite (Midford et al., 2003).


References

Felsenstein, J. 1985. Phylogenies and the comparative method. The American Naturalist 125: 1-15.

Garland, T., Jr., Dickerman, A. W., Janis, C. M. and Jones, J. A. 1993. Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42: 265-292.

Maddison, W. P. and Maddison, D. R. 2006.  Mesquite: a modular system for evolutionary analysis.  Version 1.1. Available at http://mesquiteproject.org.

Midford, P. E., T. Garland Jr. & W. Maddison. 2003. PDAP:PDTREE package for Mesquite, version 1.00. Available at http://mesquiteproject.org/pdap_mesquite/

Purvis, A. and Rambaut, A. 1995. Comparative analysis by independent contrasts (CAIC): an Apple Macintosh application for analysing comparative data. Computer Applications in the Biosciences 11: 247-251.