THIS VERSION: 6.12.2024

In this document I try to fit the UK strikes data and the rig count data that we propose in the JSS paper to be analysed with the \(coconots\) package with the \(tscount\) package.

I start with the UK strikes data.

Then I present an analysis of the rig counts in the time span 2014 to 2020. The tscount package does a remarkably good job in capturing the dependence structure, but struggles with the slight underdispersion in the data. Values of the scoring rules are considerably larger (=worse) as those from the GPAR2 model.

Finally, I then present results for rig count in the time span 2017 to 2024. Here I find no specification of the models in tscount, that can capture the dynamics in the data satisfactorily.

UK Strikes Data

data_uks <- read_excel("data_strikes.xlsx")

I start with the UK strikes data (public sector). In \(coconots\), we propose a GPAR2 model with harmonics and a linear trend (using a log-link).

I generate a holdout sample of 3, and prepare the regressors first:

hos <- 3

X <- data_uks$public
n <- length(X) - hos

sin_x <- sin(2*pi*1:n/12)
cos_x <- cos(2*pi*1:n/12)
trend <- (1:n - n/2) / n
const <- rep(1, n)
xregtsc <- cbind(const, trend, cos_x, sin_x)

Then we fit an INGARCH(1,1) type of model with a log link and negative binomial conditional distributional (Poisson led to a slightly u-shaped PIT histogram) assumption:

uks_tsc_fit <- tsglm(data_uks$public[1:n], model = list(past_obs = c(1),past_mean=1), link = "log", distr = "nbinom", xreg = xregtsc)

The standard regression output:

summary(uks_tsc_fit)
## 
## Call:
## tsglm(ts = data_uks$public[1:n], model = list(past_obs = c(1), 
##     past_mean = 1), xreg = xregtsc, link = "log", distr = "nbinom")
## 
## Coefficients:
##              Estimate  Std.Error  CI(lower)  CI(upper)
## (Intercept)    1.2547     0.8288    -0.3697   2.879134
## beta_1         0.3990     0.0670     0.2677   0.530197
## alpha_1        0.2153     0.1382    -0.0556   0.486154
## const         -0.4992     0.6884    -1.8485   0.850020
## trend         -0.3439     0.1196    -0.5783  -0.109619
## cos_x         -0.0609     0.0309    -0.1214  -0.000343
## sin_x          0.0165     0.0306    -0.0434   0.076477
## sigmasq        0.0547         NA         NA         NA
## Standard errors and confidence intervals (level =  95 %) obtained
## by normal approximation.
## 
## Link function: log 
## Distribution family: nbinom (with overdispersion coefficient 'sigmasq') 
## Number of coefficients: 8 
## Log-likelihood: -729.3632 
## AIC: 1474.726 
## BIC: 1503.974 
## QIC: 1498.696

The overdispersion parameter is quite small and very close to zero, the limiting case of a Poisson distribution. Its standard error is not available in analytic form, but requires bootstrapping.

I have stopped this, as even with moderate bootstrap sample sizes (500), the computation time is prohobitivly long.

Diagnostics:

acf(residuals(uks_tsc_fit), main = "ACF of response residuals")

pit(uks_tsc_fit, ylim = c(0, 1.5), main = "PIT Histogram")

cat("Mean of Pearson residuals:", mean(residuals(uks_tsc_fit,type="pearson")))
## Mean of Pearson residuals: -0.0006482209
cat("Variance of Pearson residuals: ", var(residuals(uks_tsc_fit,type = "pearson")))
## Variance of Pearson residuals:  0.9789472
scoring(uks_tsc_fit)
## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
##  2.55022104 -0.09559065 -0.30694330  1.82121573  3.35794571  0.97552471 
##     sqerror 
## 11.54727482

One-step ahead prediction:

sin_xf <- sin(2*pi*(n+1)/12)
cos_xf <- cos(2*pi*(n+1)/12)

trendf <- ((n+1):(n+1) - n/2) / n #adjusted to a vector

constf <- 1

xcastk <- cbind(constf, trendf, cos_xf, sin_xf) 

tsglm_poi_f <- predict(uks_tsc_fit,n.ahead = 1, newxreg = xcastk)

tsglm_poi_f$pred
## [1] 4.648801
tsglm_poi_f$interval
##      lower upper
## [1,]     1    10

There is no obvious way for me to improve the models fit.

I tried to fit an INGARCH(1,0) as the alpha-parameter is only borderline significant, but that leads to issues with the variance-covariance matrix of the parameter vector and worse values for the scoring rules.

I infer from this that the version of the INGARCH model with a log-link and neg. binomial conditional distriburtion can fit the UK strikes data comparably with to the GPAR2 model used in \(coconots\), but based on the scroring rules, the latter would be preferred.

Rig Count Data

Data span 2014 to 2020.

I first use the dataset 2014 to 2020 (March to March), which I propose to use in the JSS paper.

data2 <- read_excel("rigcountsAlaskaL2014-20-march2march.xlsx")

Preparation of the regressor. I start with the quarter 1 - dummy only.

Xrc2 <- data2$AlaskaL
nrc2 <- length(Xrc2)

constrc2 <- rep(1, nrc2)

q1rc2 <- data2$Q1

xregrc2 <- cbind(constrc2, q1rc2)

Then I fit a INGARCH(1,1) model with a linear link first:

fit_rc2_reg1 <- tsglm(Xrc2, model = list(past_obs = c(1),past_mean=1), xreg = xregrc2)
summary(fit_rc2_reg1)
## 
## Call:
## tsglm(ts = Xrc2, model = list(past_obs = c(1), past_mean = 1), 
##     xreg = xregrc2)
## 
## Coefficients:
##              Estimate  Std.Error  CI(lower)  CI(upper)
## (Intercept)  0.629584      0.719     -0.779      2.038
## beta_1       0.552083      0.108      0.341      0.764
## alpha_1      0.362488      0.126      0.115      0.610
## constrc2     0.000452      0.626     -1.226      1.227
## q1rc2        0.299250      0.271     -0.232      0.830
## Standard errors and confidence intervals (level =  95 %) obtained
## by normal approximation.
## 
## Link function: identity 
## Distribution family: poisson 
## Number of coefficients: 5 
## Log-likelihood: -641.2468 
## AIC: 1292.494 
## BIC: 1311.225 
## QIC: 1293.475

According to the model fit, there is no need for the Q1-dummy.

Model diagnostics:

acf(residuals(fit_rc2_reg1), main = "ACF of response residuals")

cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg1,type="pearson")))
## Mean of Pearson residuals:  -0.01929286
cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg1,type = "pearson")))
## Variance of Pearson residuals:  0.2370493
pit(fit_rc2_reg1, ylim = c(0, 1.5), main = "PIT Histogram")

scoring(fit_rc2_reg1)
## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
##   2.0487119  -0.1595609  -0.4074842   0.8768263   2.2563953   0.2366642 
##     sqerror 
##   1.7873130

The model captures the serial correlation in the data surprisingly well!

One-step ahead prediction:

constf <- 1
q1f <- 0 

xcastf <- cbind(constf, q1f) 

tsglm_rc2_f <- predict(fit_rc2_reg1 ,n.ahead = 1, newxreg = xcastf)

tsglm_rc2_f$pred
## [1] 9.271769
tsglm_rc2_f$interval
##      lower upper
## [1,]     4    16

In an effort to improve the model, I now fit an INGARCH(1,14;,1) model without any regressors. The lag 14 dependent variable, is trying to capture potential stochastic seasonality:

fit_rc2_reg2 <- tsglm(Xrc2, model = list(past_obs = c(1,14),past_mean=1))
summary(fit_rc2_reg2)
## 
## Call:
## tsglm(ts = Xrc2, model = list(past_obs = c(1, 14), past_mean = 1))
## 
## Coefficients:
##              Estimate  Std.Error  CI(lower)  CI(upper)
## (Intercept)    0.3878     0.4038    -0.4036      1.179
## beta_1         0.6139     0.1107     0.3969      0.831
## beta_14        0.0354     0.0524    -0.0672      0.138
## alpha_1        0.3081     0.1297     0.0538      0.562
## Standard errors and confidence intervals (level =  95 %) obtained
## by normal approximation.
## 
## Link function: identity 
## Distribution family: poisson 
## Number of coefficients: 4 
## Log-likelihood: -641.2983 
## AIC: 1290.597 
## BIC: 1305.581 
## QIC: 1290.692

The parameter related to the stochastic seasonality is not statistically significant.

Model diagnostics:

acf(residuals(fit_rc2_reg2), main = "ACF of response residuals")

cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg2,type="pearson")))
## Mean of Pearson residuals:  -0.02644024
cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg2,type = "pearson")))
## Variance of Pearson residuals:  0.2384399
pit(fit_rc2_reg2, ylim = c(0, 1.5), main = "PIT Histogram")

scoring(fit_rc2_reg2)
## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
##   2.0488764  -0.1595560  -0.4076225   0.8765324   2.2593817   0.2383772 
##     sqerror 
##   1.7810585

The residuals seem to look a bit ‘better’ in some sense. But I fail to see any real improvement here.

I see no obvious way to improve the model.

Time span 2017 to 2024

I also include the analysis based on the original time span for the rig counts, including the Covid19 period here.

datarc <- read_excel("rigcountsAlaskaL2017-24.xlsx")

mean(datarc$AlaskaL)
## [1] 6.394521
var(datarc$AlaskaL)
## [1] 5.794475
plot(datarc$AlaskaL, type="l")

forecast::Acf(datarc$AlaskaL)

forecast::Pacf(datarc$AlaskaL)

I fit a model with a Q1-dummy and a linear trend as we use it in the coconots package (note log-link is required here):

Xrc <- datarc$AlaskaL
nrc <- length(Xrc)

trendrc <- (1:nrc - nrc/2) / nrc
constrc <- rep(1, nrc)

q1rc <- datarc$Q1

xregrc <- cbind(constrc, trendrc, q1rc)


fit_rc_reg1 <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link="log", distr = "poisson", xreg = xregrc)


summary(fit_rc_reg1)
## 
## Call:
## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1), 
##     xreg = xregrc, link = "log", distr = "poisson")
## 
## Coefficients:
##              Estimate  Std.Error  CI(lower)  CI(upper)
## (Intercept)    1.2061     0.9444   -0.64496      3.057
## beta_1         0.7641     0.1285    0.51234      1.016
## alpha_1       -0.0900     0.1689   -0.42098      0.241
## constrc       -0.7297     1.0240   -2.73669      1.277
## trendrc        0.1570     0.0838   -0.00734      0.321
## q1rc           0.0745     0.0532   -0.02981      0.179
## Standard errors and confidence intervals (level =  95 %) obtained
## by normal approximation.
## 
## Link function: log 
## Distribution family: poisson 
## Number of coefficients: 6 
## Log-likelihood: -713.1914 
## AIC: 1438.383 
## BIC: 1461.782 
## QIC: 1438.979

Hm, the alpha-parameter related to the lagged conditional mean seems not to be significant; the trend term wants to be positive (!) and the parameter associated with the first quarter is not significant.

acf(residuals(fit_rc_reg1), main = "ACF of response residuals")

pit(fit_rc_reg1, ylim = c(0, 1.5), main = "PIT Histogram")

mean(residuals(fit_rc_reg1,type="pearson"))
## [1] -0.02232376
var(residuals(fit_rc_reg1,type = "pearson"))
## [1] 0.2603702
scoring(fit_rc_reg1)
## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
##   1.9539492  -0.1781824  -0.4303744   0.8066894   2.0772362   0.2601552 
##     sqerror 
##   1.7539446

Finally, I try to include an intervention term, a dummy taking on the value 1 from the second quarter or 2020 onwards to check, if the Covid19 period want to be treated different from the previous time period.

intervention <- interv_covariate(nrc , tau = c(159), delta = c(1))
fit_rc_regi <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link= "log", distr = "poisson", xreg = intervention)

summary(fit_rc_regi)
## 
## Call:
## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1), 
##     xreg = intervention, link = "log", distr = "poisson")
## 
## Coefficients:
##              Estimate  Std.Error  CI(lower)  CI(upper)
## (Intercept)    0.0920     0.1235    -0.1501     0.3341
## beta_1         0.7842     0.1368     0.5160     1.0524
## alpha_1        0.1222     0.1523    -0.1762     0.4206
## interv_1      -0.0268     0.0364    -0.0982     0.0446
## Standard errors and confidence intervals (level =  95 %) obtained
## by normal approximation.
## 
## Link function: log 
## Distribution family: poisson 
## Number of coefficients: 4 
## Log-likelihood: -703.6384 
## AIC: 1415.277 
## BIC: 1430.876 
## QIC: 1414.709

The intervention dummy is nowhere near a sensible significance level.

acf(residuals(fit_rc_regi), main = "ACF of response residuals")

pit(fit_rc_regi, ylim = c(0, 1.5), main = "PIT Histogram")

mean(residuals(fit_rc_regi,type="pearson"))
## [1] -0.04130746
var(residuals(fit_rc_regi,type = "pearson"))
## [1] 0.2137347
scoring(fit_rc_regi)
## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
##   1.9277765  -0.1843596  -0.4387367   0.7609530   2.0325084   0.2148555 
##     sqerror 
##   1.3042387