diff --git a/Rnotebook-tscounts.html b/Rnotebook-tscounts.html new file mode 100644 index 0000000000000000000000000000000000000000..4b530978da3be9ad11ef3db93dd44b43a7d575db --- /dev/null +++ b/Rnotebook-tscounts.html @@ -0,0 +1,2025 @@ + + + + +
+ + + + + + + + +THIS VERSION: 6.12.2024
+ +In this document I try to fit the UK strikes data and the rig count +data that we propose in the JSS paper to be analysed with the \(coconots\) package with the \(tscount\) package.
+I start with the UK strikes data.
+Then I present an analysis of the rig counts in the time span 2014 to +2020. The tscount package does a remarkably good job in capturing the +dependence structure, but struggles with the slight underdispersion in +the data. Values of the scoring rules are considerably larger (=worse) +as those from the GPAR2 model.
+Finally, I then present results for rig count in the time span 2017 +to 2024. Here I find no specification of the models in tscount, that can +capture the dynamics in the data satisfactorily.
+data_uks <- read_excel("data_strikes.xlsx")
+I start with the UK strikes data (public sector). In \(coconots\), we propose a GPAR2 model with +harmonics and a linear trend (using a log-link).
+I generate a holdout sample of 3, and prepare the regressors +first:
+hos <- 3
+
+X <- data_uks$public
+n <- length(X) - hos
+
+sin_x <- sin(2*pi*1:n/12)
+cos_x <- cos(2*pi*1:n/12)
+trend <- (1:n - n/2) / n
+const <- rep(1, n)
+xregtsc <- cbind(const, trend, cos_x, sin_x)
+Then we fit an INGARCH(1,1) type of model with a log link and +negative binomial conditional distributional (Poisson led to a slightly +u-shaped PIT histogram) assumption:
+uks_tsc_fit <- tsglm(data_uks$public[1:n], model = list(past_obs = c(1),past_mean=1), link = "log", distr = "nbinom", xreg = xregtsc)
+The standard regression output:
+summary(uks_tsc_fit)
+##
+## Call:
+## tsglm(ts = data_uks$public[1:n], model = list(past_obs = c(1),
+## past_mean = 1), xreg = xregtsc, link = "log", distr = "nbinom")
+##
+## Coefficients:
+## Estimate Std.Error CI(lower) CI(upper)
+## (Intercept) 1.2547 0.8288 -0.3697 2.879134
+## beta_1 0.3990 0.0670 0.2677 0.530197
+## alpha_1 0.2153 0.1382 -0.0556 0.486154
+## const -0.4992 0.6884 -1.8485 0.850020
+## trend -0.3439 0.1196 -0.5783 -0.109619
+## cos_x -0.0609 0.0309 -0.1214 -0.000343
+## sin_x 0.0165 0.0306 -0.0434 0.076477
+## sigmasq 0.0547 NA NA NA
+## Standard errors and confidence intervals (level = 95 %) obtained
+## by normal approximation.
+##
+## Link function: log
+## Distribution family: nbinom (with overdispersion coefficient 'sigmasq')
+## Number of coefficients: 8
+## Log-likelihood: -729.3632
+## AIC: 1474.726
+## BIC: 1503.974
+## QIC: 1498.696
+The overdispersion parameter is quite small and very close to zero, +the limiting case of a Poisson distribution. Its standard error is not +available in analytic form, but requires bootstrapping.
+I have stopped this, as even with moderate bootstrap sample sizes +(500), the computation time is prohobitivly long.
+Diagnostics:
+acf(residuals(uks_tsc_fit), main = "ACF of response residuals")
+
+pit(uks_tsc_fit, ylim = c(0, 1.5), main = "PIT Histogram")
+
+cat("Mean of Pearson residuals:", mean(residuals(uks_tsc_fit,type="pearson")))
+## Mean of Pearson residuals: -0.0006482209
+cat("Variance of Pearson residuals: ", var(residuals(uks_tsc_fit,type = "pearson")))
+## Variance of Pearson residuals: 0.9789472
+scoring(uks_tsc_fit)
+## logarithmic quadratic spherical rankprob dawseb normsq
+## 2.55022104 -0.09559065 -0.30694330 1.82121573 3.35794571 0.97552471
+## sqerror
+## 11.54727482
+One-step ahead prediction:
+sin_xf <- sin(2*pi*(n+1)/12)
+cos_xf <- cos(2*pi*(n+1)/12)
+
+trendf <- ((n+1):(n+1) - n/2) / n #adjusted to a vector
+
+constf <- 1
+
+xcastk <- cbind(constf, trendf, cos_xf, sin_xf)
+
+tsglm_poi_f <- predict(uks_tsc_fit,n.ahead = 1, newxreg = xcastk)
+
+tsglm_poi_f$pred
+## [1] 4.648801
+tsglm_poi_f$interval
+## lower upper
+## [1,] 1 10
+There is no obvious way for me to improve the models fit.
+I tried to fit an INGARCH(1,0) as the alpha-parameter is only +borderline significant, but that leads to issues with the +variance-covariance matrix of the parameter vector and worse values for +the scoring rules.
+I infer from this that the version of the INGARCH model with a +log-link and neg. binomial conditional distriburtion can fit the UK +strikes data comparably with to the GPAR2 model used in \(coconots\), but based on the scroring +rules, the latter would be preferred.
+I first use the dataset 2014 to 2020 (March to March), which I +propose to use in the JSS paper.
+data2 <- read_excel("rigcountsAlaskaL2014-20-march2march.xlsx")
+Preparation of the regressor. I start with the quarter 1 - dummy +only.
+Xrc2 <- data2$AlaskaL
+nrc2 <- length(Xrc2)
+
+constrc2 <- rep(1, nrc2)
+
+q1rc2 <- data2$Q1
+
+xregrc2 <- cbind(constrc2, q1rc2)
+Then I fit a INGARCH(1,1) model with a linear link first:
+fit_rc2_reg1 <- tsglm(Xrc2, model = list(past_obs = c(1),past_mean=1), xreg = xregrc2)
+summary(fit_rc2_reg1)
+##
+## Call:
+## tsglm(ts = Xrc2, model = list(past_obs = c(1), past_mean = 1),
+## xreg = xregrc2)
+##
+## Coefficients:
+## Estimate Std.Error CI(lower) CI(upper)
+## (Intercept) 0.629584 0.719 -0.779 2.038
+## beta_1 0.552083 0.108 0.341 0.764
+## alpha_1 0.362488 0.126 0.115 0.610
+## constrc2 0.000452 0.626 -1.226 1.227
+## q1rc2 0.299250 0.271 -0.232 0.830
+## Standard errors and confidence intervals (level = 95 %) obtained
+## by normal approximation.
+##
+## Link function: identity
+## Distribution family: poisson
+## Number of coefficients: 5
+## Log-likelihood: -641.2468
+## AIC: 1292.494
+## BIC: 1311.225
+## QIC: 1293.475
+According to the model fit, there is no need for the Q1-dummy.
+Model diagnostics:
+acf(residuals(fit_rc2_reg1), main = "ACF of response residuals")
+
+cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg1,type="pearson")))
+## Mean of Pearson residuals: -0.01929286
+cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg1,type = "pearson")))
+## Variance of Pearson residuals: 0.2370493
+pit(fit_rc2_reg1, ylim = c(0, 1.5), main = "PIT Histogram")
+
+scoring(fit_rc2_reg1)
+## logarithmic quadratic spherical rankprob dawseb normsq
+## 2.0487119 -0.1595609 -0.4074842 0.8768263 2.2563953 0.2366642
+## sqerror
+## 1.7873130
+The model captures the serial correlation in the data surprisingly +well!
+One-step ahead prediction:
+constf <- 1
+q1f <- 0
+
+xcastf <- cbind(constf, q1f)
+
+tsglm_rc2_f <- predict(fit_rc2_reg1 ,n.ahead = 1, newxreg = xcastf)
+
+tsglm_rc2_f$pred
+## [1] 9.271769
+tsglm_rc2_f$interval
+## lower upper
+## [1,] 4 16
+In an effort to improve the model, I now fit an INGARCH(1,14;,1) +model without any regressors. The lag 14 dependent variable, is trying +to capture potential stochastic seasonality:
+fit_rc2_reg2 <- tsglm(Xrc2, model = list(past_obs = c(1,14),past_mean=1))
+summary(fit_rc2_reg2)
+##
+## Call:
+## tsglm(ts = Xrc2, model = list(past_obs = c(1, 14), past_mean = 1))
+##
+## Coefficients:
+## Estimate Std.Error CI(lower) CI(upper)
+## (Intercept) 0.3878 0.4038 -0.4036 1.179
+## beta_1 0.6139 0.1107 0.3969 0.831
+## beta_14 0.0354 0.0524 -0.0672 0.138
+## alpha_1 0.3081 0.1297 0.0538 0.562
+## Standard errors and confidence intervals (level = 95 %) obtained
+## by normal approximation.
+##
+## Link function: identity
+## Distribution family: poisson
+## Number of coefficients: 4
+## Log-likelihood: -641.2983
+## AIC: 1290.597
+## BIC: 1305.581
+## QIC: 1290.692
+The parameter related to the stochastic seasonality is not +statistically significant.
+Model diagnostics:
+acf(residuals(fit_rc2_reg2), main = "ACF of response residuals")
+
+cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg2,type="pearson")))
+## Mean of Pearson residuals: -0.02644024
+cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg2,type = "pearson")))
+## Variance of Pearson residuals: 0.2384399
+pit(fit_rc2_reg2, ylim = c(0, 1.5), main = "PIT Histogram")
+
+scoring(fit_rc2_reg2)
+## logarithmic quadratic spherical rankprob dawseb normsq
+## 2.0488764 -0.1595560 -0.4076225 0.8765324 2.2593817 0.2383772
+## sqerror
+## 1.7810585
+The residuals seem to look a bit ‘better’ in some sense. But I fail +to see any real improvement here.
+I see no obvious way to improve the model.
+I also include the analysis based on the original time span for the +rig counts, including the Covid19 period here.
+datarc <- read_excel("rigcountsAlaskaL2017-24.xlsx")
+
+mean(datarc$AlaskaL)
+## [1] 6.394521
+var(datarc$AlaskaL)
+## [1] 5.794475
+plot(datarc$AlaskaL, type="l")
+
+forecast::Acf(datarc$AlaskaL)
+
+forecast::Pacf(datarc$AlaskaL)
+
+I fit a model with a Q1-dummy and a linear trend as we use it in the +coconots package (note log-link is required here):
+Xrc <- datarc$AlaskaL
+nrc <- length(Xrc)
+
+trendrc <- (1:nrc - nrc/2) / nrc
+constrc <- rep(1, nrc)
+
+q1rc <- datarc$Q1
+
+xregrc <- cbind(constrc, trendrc, q1rc)
+
+
+fit_rc_reg1 <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link="log", distr = "poisson", xreg = xregrc)
+
+
+summary(fit_rc_reg1)
+##
+## Call:
+## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1),
+## xreg = xregrc, link = "log", distr = "poisson")
+##
+## Coefficients:
+## Estimate Std.Error CI(lower) CI(upper)
+## (Intercept) 1.2061 0.9444 -0.64496 3.057
+## beta_1 0.7641 0.1285 0.51234 1.016
+## alpha_1 -0.0900 0.1689 -0.42098 0.241
+## constrc -0.7297 1.0240 -2.73669 1.277
+## trendrc 0.1570 0.0838 -0.00734 0.321
+## q1rc 0.0745 0.0532 -0.02981 0.179
+## Standard errors and confidence intervals (level = 95 %) obtained
+## by normal approximation.
+##
+## Link function: log
+## Distribution family: poisson
+## Number of coefficients: 6
+## Log-likelihood: -713.1914
+## AIC: 1438.383
+## BIC: 1461.782
+## QIC: 1438.979
+Hm, the alpha-parameter related to the lagged conditional mean seems +not to be significant; the trend term wants to be positive (!) and the +parameter associated with the first quarter is not significant.
+acf(residuals(fit_rc_reg1), main = "ACF of response residuals")
+
+pit(fit_rc_reg1, ylim = c(0, 1.5), main = "PIT Histogram")
+
+mean(residuals(fit_rc_reg1,type="pearson"))
+## [1] -0.02232376
+var(residuals(fit_rc_reg1,type = "pearson"))
+## [1] 0.2603702
+scoring(fit_rc_reg1)
+## logarithmic quadratic spherical rankprob dawseb normsq
+## 1.9539492 -0.1781824 -0.4303744 0.8066894 2.0772362 0.2601552
+## sqerror
+## 1.7539446
+Finally, I try to include an intervention term, a dummy taking on the +value 1 from the second quarter or 2020 onwards to check, if the Covid19 +period want to be treated different from the previous time period.
+intervention <- interv_covariate(nrc , tau = c(159), delta = c(1))
+fit_rc_regi <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link= "log", distr = "poisson", xreg = intervention)
+
+summary(fit_rc_regi)
+##
+## Call:
+## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1),
+## xreg = intervention, link = "log", distr = "poisson")
+##
+## Coefficients:
+## Estimate Std.Error CI(lower) CI(upper)
+## (Intercept) 0.0920 0.1235 -0.1501 0.3341
+## beta_1 0.7842 0.1368 0.5160 1.0524
+## alpha_1 0.1222 0.1523 -0.1762 0.4206
+## interv_1 -0.0268 0.0364 -0.0982 0.0446
+## Standard errors and confidence intervals (level = 95 %) obtained
+## by normal approximation.
+##
+## Link function: log
+## Distribution family: poisson
+## Number of coefficients: 4
+## Log-likelihood: -703.6384
+## AIC: 1415.277
+## BIC: 1430.876
+## QIC: 1414.709
+The intervention dummy is nowhere near a sensible significance +level.
+acf(residuals(fit_rc_regi), main = "ACF of response residuals")
+
+pit(fit_rc_regi, ylim = c(0, 1.5), main = "PIT Histogram")
+
+mean(residuals(fit_rc_regi,type="pearson"))
+## [1] -0.04130746
+var(residuals(fit_rc_regi,type = "pearson"))
+## [1] 0.2137347
+scoring(fit_rc_regi)
+## logarithmic quadratic spherical rankprob dawseb normsq
+## 1.9277765 -0.1843596 -0.4387367 0.7609530 2.0325084 0.2148555
+## sqerror
+## 1.3042387
+