Model Comparison for Cholera over Multiple Data Sets

In collections

Student Research Conference

Details

Title: Model Comparison for Cholera over Multiple Data Sets
Usage & Reproduction Rights: http://rightsstatements.org/vocab/InC/1.0/
Type: Video recordings
URI / Handle: http://hdl.handle.net/1961/muislandora:2975
Created: 2015-01-01T00:00:00Z
Abstract: Question/Focus: Is the AIC approach of model comparison a viable algorithm for time-series data over multiple data sets? If so, which variation of AIC provides the most stable results? The purpose of this work is to understand model selection variability due to use of different algorithms or different variables within the algorithm. This is an issue that has not been explored in mathematical modeling. Some papers have been published; however there is a lack of clarity on the application of AIC and it's components. Our work focuses on clarifying the necessary components for AIC to be effective and it also illustrates certain pitfall that researchers may come across when comparing models over multiple data sets. Researchers have considered a wide variety of models for cholera in recent years, however the most simple of these do not account for various known disease transmission pathways. This research has looked at several more complex, mechanistic models for cholera; however, here, the concern of “overfitting” the data arises because there is doubt whether the model created is simply fitting the data due to increased numbers of parameters, and is therefore not appropriate for predicting behavior for future cholera outbreaks. To account for this issue, there are various criteria for model selection, the most widely used of which is the Akaike information criterion (AIC). This method measures the relative quality of a statistical model by creating a trade-off between the goodness of fit and the number of parameters (i.e. the more parameters a model has the greater the penalty). In the literature, the variation of AIC that is used for time series data is AIC for the Least Squares Case. While the application of AIC criteria is clear for a single data set as a tool for model comparison, it is less clear how one might compare model performances over multiple data sets. This project is an in-depth analysis of applying AIC techniques to time-series data. Although there have been a few papers published over the years concerning this approach, they have not considered variation in parameter selection that seems to be integral to the outcome of the algorithm.

Stats

Viewed 17 times
Downloaded 2 times