Overdispersed Poisson regression

Manuel Spínola from the Instituto Internacional en Conservación y Manejo de Vida Silvestre at the Universidad Nacional in Heredia, Costa Rica, writes,

I have a question regarding to difference between ANOVA and Multilevel Models. When do you shift from ANOVA to a Multilevel Model? For example, I have a data structure as follow (data set incomplete):

Number of species Habitat type
12 A
24 A
12 A
32 A
22 A
21 A
21 A
12 B
32 B
32 B
23 B
21 B
22 B
12 B
32 B
12 C
34 C
43 C
34 C
23 C
22 C

The data for each habitat type was taken from different fragments or patches with different area (in hectares). It means that (for the 2 first rows), 12 species were detected in a 5 ha patch of habitat A, and 24 species were detected in a 30 ha patch of habitat A.
Habitat A has 7 replicates, habitat B has 8 replicates, and habitat C has 6 replicates.
I want to know if the number of species differs by habitats.
Is the data structure appropriate for a Multilevel Model or do I need to do the analysis with a GLM using a log link and a Poisson distribution? Do I need to consider the size of the patch (may be as an offset variable)?

Also, is there a difference between random effect (like in ANOVA) and random term (like in Multilevel Model)?

First off, Anova and multilevel modeling are closely connected: both are ways of using a linear model to structure data, partitioning effects into batches. That is, each row of an Anova table corresponds to a batch of linear predictors which, in the corresponding multilevel model, would be modeled exchangeably. See here and here.

Second, what’s up with the number of species? It’s bizarre that these are all formed from 1’s, 2’s, 3’s, and 4’s. Why are there never, for example, 18 species in a patch?

Third, a natural model to fit would be an overdispersed Poisson regression with log link, using log (patch area) as an offset, and using a multilevel model with habitat type as the grouping. But, with only 3 groupings, you’ll actually get similar results by just setting group A as the baseline and including indicators for B and C in your overdispersed Poisson regression model.

P.S. I’ll answer any question from Costa Rica because we spent our honeymoon there and had delicious platanos and arroz con pollo (most memorably in a lunch place where we noticed a tarantula crawling along on the floor).

3 thoughts on “Overdispersed Poisson regression

  1. I am not sure the Poisson model with log(patch area) as offset is appropriate here. A hidden assumption of this model is the number of species to be found for a patch is on average proportional to the patch area. This assumption may fail in some situations, especially when the patches are big enough to contain a moderate to large fraction of the species in the region. I think it is prudent to examine (1) if the distributions of patch areas are comparable across the three habitat types, (2) if there is large variation of patch areas within each habitat type, and (3) if there is a general linear relationship between the number of identified species and the patch area within each habitat type.

  2. Using log(Area) as an offset doesn't make sense ecologically: there's a large literature on species-area relationships. Using log(Area) in the model and estimating its regression coeficient would make more sense.

    It might also be worth checking how the trapping effort was distributed, as that will affect the number of species found (albeit in a complicated and unidentifiable way).

    Bob

  3. Andrew, Chun Li, and Bob, thank you very much for your posting. Your comments are extremely useful. The data set was made up to ilustrate the data structure, that the reason why there are only 1s, 2s, 3s, and 4s, sorry about that. Also, the real data set has 6 grouping (habitat type). Andrew, is your suggestion on setting froup A as baseline still valid? Does it mean as a dummy variable?
    The survey effort was distributed in a very bad way and I didn´t have any control on it. It was than some years ago and now I am trying to the appropriate analysis (if is possible) for the data.
    Andrew, can you model any ANOVA in a linear model framework so you can use AIC for model selection instead of looking at p values for statistical significance like in an anova table?
    Thank you again.
    Best,

    Manuel

Comments are closed.