4 Modelling
4.1 Guiding Questions
- What decisions did you make when creating your iSSAs and why?
- What is the biological or statistical justification for these decisions?
- How might your decisions impact your inferences?
Often practitioners need to combine data from multiple individuals. If only ‘population-level’ inference is the goal, then include the same number of clusters from each individual in a single model. An equal sampling intensity helps address potential bias.
4.2 Model Building
The model or model sets require justification. We direct to (Fieberg and Johnson 2015, Northrup et al. 2021) for detailed discussion and reference regarding model building. We advocate for global models or distinct competing candidates representing ecological processes. We do not recommend using a dredging approach, or large candidate model sets, as it often results in the interpretation of spurious results.
4.3 Two-step approach
4.3.1 Step 1
Global model or alternative hypotheses when the goal is to be descriptive of the ecological processes.
The global or alternative models can be composed of core or interest variables.
The concept of a core model is to identify key features of animal movement that are important but perhaps not the covariates of interest to the particular study or hypotheses. – Prokopenko et al 2017
# > code
4.3.2 Step 2
Bootstrap individual models to get population mean and CIs (Prokopenko et al. 2017, Scrafford et al. 2018)
Calculate a population level average by modelling each variable as a function of anything that interacted with that variable and the availability as an explanatory factor, with inverse variance as a weighting (Dickie et al. 2020) - See Supplementary information.
4.4 Mixed Model Approach
(Muff et al. 2020)
Regarding our discussion of nesting random effects: https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified nested is notated as:
(1 | group1 / group2)
4.5 Output
Here we are just describing what you will see from your output.
4.5.1 Coefficients
The estimates are the selection or movement coefficients, either for individuals or the population depending on your input data and model structure.
For a mixed model, the random effects output is relative to the fixed effect
To calculate individual selection coefficients = Fixed Effect + Random Effect
4.5.2 Std. Error/CIs
Check the fixed and random effect standard errors to see if they are really large or NAs.
For example, note the NAs in the example model using land cover. In the summary, at the bottom under “Conditional model”.
summary(tar_read(model_lc))
## Family: poisson ( log )
## Formula:
## case_ ~ -1 + I(log(sl_)) + I(log(sl_)):lc_adj + lc_adj + I(log(dist_to_water +
## 1)) + I(log(dist_to_water + 1)):I(log(sl_)) + (1 | indiv_step_id) +
## (0 + I(log(sl_)) | id) + (0 + I(log(sl_)):lc_adj | id) +
## (0 + lc_adj | id) + (0 + I(log(dist_to_water + 1)) | id) +
## (0 + I(log(dist_to_water + 1)):I(log(sl_)) | id)
## Data: DT
##
## AIC BIC logLik deviance df.resid
## NA NA NA NA 26488
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev. Corr
## indiv_step_id (Intercept) 1.00e+06 1.00e+03
## id I(log(sl_)) 1.87e-90 1.37e-45
## id.1 I(log(sl_)):lc_adjdisturbed 1.08e-02 1.04e-01
## I(log(sl_)):lc_adjforest 2.05e-03 4.52e-02 -1.00
## I(log(sl_)):lc_adjopen 1.08e-01 3.28e-01 1.00
## I(log(sl_)):lc_adjwetlands 2.05e-03 4.53e-02 -1.00
## id.2 lc_adjdisturbed 4.42e-01 6.64e-01
## lc_adjforest 3.32e-02 1.82e-01 -1.00
## lc_adjopen 4.42e+00 2.10e+00 1.00
## lc_adjwetlands 2.38e-26 1.54e-13 -0.27
## id.3 I(log(dist_to_water + 1)) 1.34e-02 1.16e-01
## id.4 I(log(dist_to_water + 1)):I(log(sl_)) 5.26e-90 2.29e-45
##
##
##
##
##
## -1.00
## 1.00 -1.00
##
##
## -1.00
## 0.30 -0.28
##
##
## Number of obs: 26521, groups: indiv_step_id, 2411; id, 6
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## I(log(sl_)) -0.13107 NaN NaN NaN
## lc_adjdisturbed -3.74565 NaN NaN NaN
## lc_adjforest -3.42646 NaN NaN NaN
## lc_adjopen -5.94910 NaN NaN NaN
## lc_adjwetlands -3.67823 NaN NaN NaN
## I(log(dist_to_water + 1)) -0.03581 NaN NaN NaN
## I(log(sl_)):lc_adjforest 0.23726 NaN NaN NaN
## I(log(sl_)):lc_adjopen 0.29305 NaN NaN NaN
## I(log(sl_)):lc_adjwetlands 0.28385 NaN NaN NaN
## I(log(sl_)):I(log(dist_to_water + 1)) 0.00072 NaN NaN NaN
4.5.3 Troubleshooting
We have had success troubleshooting by putting the error in google and looking
for it as a github issue with the package or lme4
since they’re built more or
less the same. Bolker has lots of hidden tips and tricks in there. Ben Bolker is
also very responsive.
https://cran.r-project.org/web/packages/glmmTMB/vignettes/troubleshooting.html
Use set.seed()
to get the same model output, check that the output does not vary greatly with different seeds or when it is not set.
Be conservative in “trusting” the model. Don’t accept models with any NAs in the response.
Unlike with clogit
in amt
, for glmmTMB
simpler models do not always
improve convergence, but adding covariates with informative variation will
improve model performance and convergence.
We have found through trial and error that cos(TA) can make or break the model. These poisson models seem to like lots of data and a fair number of variables, but the optimizer is cranky. If you have too few, and they’re correlated/have high VIF, then you will get NAs.
Use the performance package and the check_model()
or model_performance()
commands
glmmTMB
gives the Model convergence problem; non-positive-definite Hessian
matrix error very liberally. Generally, you don’t have to worry about it unless
you have other errors with it.
EXERCISE: note of individuals or variables that are not converging or are on the extremes of response. Do they have different availability, fewer points, more NAs
# > plot coefficient by sample size – is there a relationship?