THIS VERSION: 6.12.2024
In this document I try to fit the UK strikes data and the rig count data that we propose in the JSS paper to be analysed with the \(coconots\) package with the \(tscount\) package.
I start with the UK strikes data.
Then I present an analysis of the rig counts in the time span 2014 to 2020. The tscount package does a remarkably good job in capturing the dependence structure, but struggles with the slight underdispersion in the data. Values of the scoring rules are considerably larger (=worse) as those from the GPAR2 model.
Finally, I then present results for rig count in the time span 2017 to 2024. Here I find no specification of the models in tscount, that can capture the dynamics in the data satisfactorily.
data_uks <- read_excel("data_strikes.xlsx")
I start with the UK strikes data (public sector). In \(coconots\), we propose a GPAR2 model with harmonics and a linear trend (using a log-link).
I generate a holdout sample of 3, and prepare the regressors first:
hos <- 3
X <- data_uks$public
n <- length(X) - hos
sin_x <- sin(2*pi*1:n/12)
cos_x <- cos(2*pi*1:n/12)
trend <- (1:n - n/2) / n
const <- rep(1, n)
xregtsc <- cbind(const, trend, cos_x, sin_x)
Then we fit an INGARCH(1,1) type of model with a log link and negative binomial conditional distributional (Poisson led to a slightly u-shaped PIT histogram) assumption:
uks_tsc_fit <- tsglm(data_uks$public[1:n], model = list(past_obs = c(1),past_mean=1), link = "log", distr = "nbinom", xreg = xregtsc)
The standard regression output:
summary(uks_tsc_fit)
##
## Call:
## tsglm(ts = data_uks$public[1:n], model = list(past_obs = c(1),
## past_mean = 1), xreg = xregtsc, link = "log", distr = "nbinom")
##
## Coefficients:
## Estimate Std.Error CI(lower) CI(upper)
## (Intercept) 1.2547 0.8288 -0.3697 2.879134
## beta_1 0.3990 0.0670 0.2677 0.530197
## alpha_1 0.2153 0.1382 -0.0556 0.486154
## const -0.4992 0.6884 -1.8485 0.850020
## trend -0.3439 0.1196 -0.5783 -0.109619
## cos_x -0.0609 0.0309 -0.1214 -0.000343
## sin_x 0.0165 0.0306 -0.0434 0.076477
## sigmasq 0.0547 NA NA NA
## Standard errors and confidence intervals (level = 95 %) obtained
## by normal approximation.
##
## Link function: log
## Distribution family: nbinom (with overdispersion coefficient 'sigmasq')
## Number of coefficients: 8
## Log-likelihood: -729.3632
## AIC: 1474.726
## BIC: 1503.974
## QIC: 1498.696
The overdispersion parameter is quite small and very close to zero, the limiting case of a Poisson distribution. Its standard error is not available in analytic form, but requires bootstrapping.
I have stopped this, as even with moderate bootstrap sample sizes (500), the computation time is prohobitivly long.
Diagnostics:
acf(residuals(uks_tsc_fit), main = "ACF of response residuals")
pit(uks_tsc_fit, ylim = c(0, 1.5), main = "PIT Histogram")
cat("Mean of Pearson residuals:", mean(residuals(uks_tsc_fit,type="pearson")))
## Mean of Pearson residuals: -0.0006482209
cat("Variance of Pearson residuals: ", var(residuals(uks_tsc_fit,type = "pearson")))
## Variance of Pearson residuals: 0.9789472
scoring(uks_tsc_fit)
## logarithmic quadratic spherical rankprob dawseb normsq
## 2.55022104 -0.09559065 -0.30694330 1.82121573 3.35794571 0.97552471
## sqerror
## 11.54727482
One-step ahead prediction:
sin_xf <- sin(2*pi*(n+1)/12)
cos_xf <- cos(2*pi*(n+1)/12)
trendf <- ((n+1):(n+1) - n/2) / n #adjusted to a vector
constf <- 1
xcastk <- cbind(constf, trendf, cos_xf, sin_xf)
tsglm_poi_f <- predict(uks_tsc_fit,n.ahead = 1, newxreg = xcastk)
tsglm_poi_f$pred
## [1] 4.648801
tsglm_poi_f$interval
## lower upper
## [1,] 1 10
There is no obvious way for me to improve the models fit.
I tried to fit an INGARCH(1,0) as the alpha-parameter is only borderline significant, but that leads to issues with the variance-covariance matrix of the parameter vector and worse values for the scoring rules.
I infer from this that the version of the INGARCH model with a log-link and neg. binomial conditional distriburtion can fit the UK strikes data comparably with to the GPAR2 model used in \(coconots\), but based on the scroring rules, the latter would be preferred.
I first use the dataset 2014 to 2020 (March to March), which I propose to use in the JSS paper.
data2 <- read_excel("rigcountsAlaskaL2014-20-march2march.xlsx")
Preparation of the regressor. I start with the quarter 1 - dummy only.
Xrc2 <- data2$AlaskaL
nrc2 <- length(Xrc2)
constrc2 <- rep(1, nrc2)
q1rc2 <- data2$Q1
xregrc2 <- cbind(constrc2, q1rc2)
Then I fit a INGARCH(1,1) model with a linear link first:
fit_rc2_reg1 <- tsglm(Xrc2, model = list(past_obs = c(1),past_mean=1), xreg = xregrc2)
summary(fit_rc2_reg1)
##
## Call:
## tsglm(ts = Xrc2, model = list(past_obs = c(1), past_mean = 1),
## xreg = xregrc2)
##
## Coefficients:
## Estimate Std.Error CI(lower) CI(upper)
## (Intercept) 0.629584 0.719 -0.779 2.038
## beta_1 0.552083 0.108 0.341 0.764
## alpha_1 0.362488 0.126 0.115 0.610
## constrc2 0.000452 0.626 -1.226 1.227
## q1rc2 0.299250 0.271 -0.232 0.830
## Standard errors and confidence intervals (level = 95 %) obtained
## by normal approximation.
##
## Link function: identity
## Distribution family: poisson
## Number of coefficients: 5
## Log-likelihood: -641.2468
## AIC: 1292.494
## BIC: 1311.225
## QIC: 1293.475
According to the model fit, there is no need for the Q1-dummy.
Model diagnostics:
acf(residuals(fit_rc2_reg1), main = "ACF of response residuals")
cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg1,type="pearson")))
## Mean of Pearson residuals: -0.01929286
cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg1,type = "pearson")))
## Variance of Pearson residuals: 0.2370493
pit(fit_rc2_reg1, ylim = c(0, 1.5), main = "PIT Histogram")
scoring(fit_rc2_reg1)
## logarithmic quadratic spherical rankprob dawseb normsq
## 2.0487119 -0.1595609 -0.4074842 0.8768263 2.2563953 0.2366642
## sqerror
## 1.7873130
The model captures the serial correlation in the data surprisingly well!
One-step ahead prediction:
constf <- 1
q1f <- 0
xcastf <- cbind(constf, q1f)
tsglm_rc2_f <- predict(fit_rc2_reg1 ,n.ahead = 1, newxreg = xcastf)
tsglm_rc2_f$pred
## [1] 9.271769
tsglm_rc2_f$interval
## lower upper
## [1,] 4 16
In an effort to improve the model, I now fit an INGARCH(1,14;,1) model without any regressors. The lag 14 dependent variable, is trying to capture potential stochastic seasonality:
fit_rc2_reg2 <- tsglm(Xrc2, model = list(past_obs = c(1,14),past_mean=1))
summary(fit_rc2_reg2)
##
## Call:
## tsglm(ts = Xrc2, model = list(past_obs = c(1, 14), past_mean = 1))
##
## Coefficients:
## Estimate Std.Error CI(lower) CI(upper)
## (Intercept) 0.3878 0.4038 -0.4036 1.179
## beta_1 0.6139 0.1107 0.3969 0.831
## beta_14 0.0354 0.0524 -0.0672 0.138
## alpha_1 0.3081 0.1297 0.0538 0.562
## Standard errors and confidence intervals (level = 95 %) obtained
## by normal approximation.
##
## Link function: identity
## Distribution family: poisson
## Number of coefficients: 4
## Log-likelihood: -641.2983
## AIC: 1290.597
## BIC: 1305.581
## QIC: 1290.692
The parameter related to the stochastic seasonality is not statistically significant.
Model diagnostics:
acf(residuals(fit_rc2_reg2), main = "ACF of response residuals")
cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg2,type="pearson")))
## Mean of Pearson residuals: -0.02644024
cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg2,type = "pearson")))
## Variance of Pearson residuals: 0.2384399
pit(fit_rc2_reg2, ylim = c(0, 1.5), main = "PIT Histogram")
scoring(fit_rc2_reg2)
## logarithmic quadratic spherical rankprob dawseb normsq
## 2.0488764 -0.1595560 -0.4076225 0.8765324 2.2593817 0.2383772
## sqerror
## 1.7810585
The residuals seem to look a bit ‘better’ in some sense. But I fail to see any real improvement here.
I see no obvious way to improve the model.
I also include the analysis based on the original time span for the rig counts, including the Covid19 period here.
datarc <- read_excel("rigcountsAlaskaL2017-24.xlsx")
mean(datarc$AlaskaL)
## [1] 6.394521
var(datarc$AlaskaL)
## [1] 5.794475
plot(datarc$AlaskaL, type="l")
forecast::Acf(datarc$AlaskaL)
forecast::Pacf(datarc$AlaskaL)
I fit a model with a Q1-dummy and a linear trend as we use it in the coconots package (note log-link is required here):
Xrc <- datarc$AlaskaL
nrc <- length(Xrc)
trendrc <- (1:nrc - nrc/2) / nrc
constrc <- rep(1, nrc)
q1rc <- datarc$Q1
xregrc <- cbind(constrc, trendrc, q1rc)
fit_rc_reg1 <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link="log", distr = "poisson", xreg = xregrc)
summary(fit_rc_reg1)
##
## Call:
## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1),
## xreg = xregrc, link = "log", distr = "poisson")
##
## Coefficients:
## Estimate Std.Error CI(lower) CI(upper)
## (Intercept) 1.2061 0.9444 -0.64496 3.057
## beta_1 0.7641 0.1285 0.51234 1.016
## alpha_1 -0.0900 0.1689 -0.42098 0.241
## constrc -0.7297 1.0240 -2.73669 1.277
## trendrc 0.1570 0.0838 -0.00734 0.321
## q1rc 0.0745 0.0532 -0.02981 0.179
## Standard errors and confidence intervals (level = 95 %) obtained
## by normal approximation.
##
## Link function: log
## Distribution family: poisson
## Number of coefficients: 6
## Log-likelihood: -713.1914
## AIC: 1438.383
## BIC: 1461.782
## QIC: 1438.979
Hm, the alpha-parameter related to the lagged conditional mean seems not to be significant; the trend term wants to be positive (!) and the parameter associated with the first quarter is not significant.
acf(residuals(fit_rc_reg1), main = "ACF of response residuals")
pit(fit_rc_reg1, ylim = c(0, 1.5), main = "PIT Histogram")
mean(residuals(fit_rc_reg1,type="pearson"))
## [1] -0.02232376
var(residuals(fit_rc_reg1,type = "pearson"))
## [1] 0.2603702
scoring(fit_rc_reg1)
## logarithmic quadratic spherical rankprob dawseb normsq
## 1.9539492 -0.1781824 -0.4303744 0.8066894 2.0772362 0.2601552
## sqerror
## 1.7539446
Finally, I try to include an intervention term, a dummy taking on the value 1 from the second quarter or 2020 onwards to check, if the Covid19 period want to be treated different from the previous time period.
intervention <- interv_covariate(nrc , tau = c(159), delta = c(1))
fit_rc_regi <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link= "log", distr = "poisson", xreg = intervention)
summary(fit_rc_regi)
##
## Call:
## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1),
## xreg = intervention, link = "log", distr = "poisson")
##
## Coefficients:
## Estimate Std.Error CI(lower) CI(upper)
## (Intercept) 0.0920 0.1235 -0.1501 0.3341
## beta_1 0.7842 0.1368 0.5160 1.0524
## alpha_1 0.1222 0.1523 -0.1762 0.4206
## interv_1 -0.0268 0.0364 -0.0982 0.0446
## Standard errors and confidence intervals (level = 95 %) obtained
## by normal approximation.
##
## Link function: log
## Distribution family: poisson
## Number of coefficients: 4
## Log-likelihood: -703.6384
## AIC: 1415.277
## BIC: 1430.876
## QIC: 1414.709
The intervention dummy is nowhere near a sensible significance level.
acf(residuals(fit_rc_regi), main = "ACF of response residuals")
pit(fit_rc_regi, ylim = c(0, 1.5), main = "PIT Histogram")
mean(residuals(fit_rc_regi,type="pearson"))
## [1] -0.04130746
var(residuals(fit_rc_regi,type = "pearson"))
## [1] 0.2137347
scoring(fit_rc_regi)
## logarithmic quadratic spherical rankprob dawseb normsq
## 1.9277765 -0.1843596 -0.4387367 0.7609530 2.0325084 0.2148555
## sqerror
## 1.3042387