Merge branch 'master' into 'main' (1f2e73fe) · Commits · Robert Jung / coconots

Rnotebook-tscounts.html

0 → 100644

+2025 −0

File added.

Preview size limit exceeded, changes collapsed.

Rnotebook-tscounts.md

0 → 100644

+460 −0

Original line number	Diff line number	Diff line
		THIS VERSION: 6.12.2024

		<!--
		When using the local coconots version:
		install.packages("devtools")
		library(usethis)
		library(devtools) -->

		In this document I try to fit the UK strikes data and the rig count data
		that we propose in the JSS paper to be analysed with the
		coconots package with the tscount package.

		I start with the UK strikes data.

		Then I present an analysis of the rig counts in the time span 2014 to
		2020. The tscount package does a remarkably good job in capturing the
		dependence structure, but struggles with the slight underdispersion in
		the data. Values of the scoring rules are considerably larger (=worse)
		as those from the GPAR2 model.

		Finally, I then present results for rig count in the time span 2017 to
		2024. Here I find no specification of the models in tscount, that can
		capture the dynamics in the data satisfactorily.

		# UK Strikes Data

		data_uks <- read_excel("data_strikes.xlsx")

		I start with the UK strikes data (public sector). In
		coconots, we propose a GPAR2 model with harmonics and a
		linear trend (using a log-link).

		I generate a holdout sample of 3, and prepare the regressors first:

		hos <- 3

		X <- data_uks$public
		n <- length(X) - hos

		sin_x <- sin(2pi1:n/12)
		cos_x <- cos(2pi1:n/12)
		trend <- (1:n - n/2) / n
		const <- rep(1, n)
		xregtsc <- cbind(const, trend, cos_x, sin_x)

		Then we fit an INGARCH(1,1) type of model with a log link and negative
		binomial conditional distributional (Poisson led to a slightly u-shaped
		PIT histogram) assumption:

		uks_tsc_fit <- tsglm(data_uks$public[1:n], model = list(past_obs = c(1),past_mean=1), link = "log", distr = "nbinom", xreg = xregtsc)

		The standard regression output:

		summary(uks_tsc_fit)

		##
		## Call:
		## tsglm(ts = data_uks$public[1:n], model = list(past_obs = c(1),
		## past_mean = 1), xreg = xregtsc, link = "log", distr = "nbinom")
		##
		## Coefficients:
		## Estimate Std.Error CI(lower) CI(upper)
		## (Intercept) 1.2547 0.8288 -0.3697 2.879134
		## beta_1 0.3990 0.0670 0.2677 0.530197
		## alpha_1 0.2153 0.1382 -0.0556 0.486154
		## const -0.4992 0.6884 -1.8485 0.850020
		## trend -0.3439 0.1196 -0.5783 -0.109619
		## cos_x -0.0609 0.0309 -0.1214 -0.000343
		## sin_x 0.0165 0.0306 -0.0434 0.076477
		## sigmasq 0.0547 NA NA NA
		## Standard errors and confidence intervals (level = 95 %) obtained
		## by normal approximation.
		##
		## Link function: log
		## Distribution family: nbinom (with overdispersion coefficient 'sigmasq')
		## Number of coefficients: 8
		## Log-likelihood: -729.3632
		## AIC: 1474.726
		## BIC: 1503.974
		## QIC: 1498.696

		The overdispersion parameter is quite small and very close to zero, the
		limiting case of a Poisson distribution. Its standard error is not
		available in analytic form, but requires bootstrapping.

		I have stopped this, as even with moderate bootstrap sample sizes (500),
		the computation time is prohobitivly long.

		Diagnostics:

		acf(residuals(uks_tsc_fit), main = "ACF of response residuals")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-1.png)

		pit(uks_tsc_fit, ylim = c(0, 1.5), main = "PIT Histogram")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-2.png)

		cat("Mean of Pearson residuals:", mean(residuals(uks_tsc_fit,type="pearson")))

		## Mean of Pearson residuals: -0.0006482209

		cat("Variance of Pearson residuals: ", var(residuals(uks_tsc_fit,type = "pearson")))

		## Variance of Pearson residuals: 0.9789472

		scoring(uks_tsc_fit)

		## logarithmic quadratic spherical rankprob dawseb normsq
		## 2.55022104 -0.09559065 -0.30694330 1.82121573 3.35794571 0.97552471
		## sqerror
		## 11.54727482

		One-step ahead prediction:

		sin_xf <- sin(2pi(n+1)/12)
		cos_xf <- cos(2pi(n+1)/12)

		trendf <- ((n+1):(n+1) - n/2) / n #adjusted to a vector

		constf <- 1

		xcastk <- cbind(constf, trendf, cos_xf, sin_xf)

		tsglm_poi_f <- predict(uks_tsc_fit,n.ahead = 1, newxreg = xcastk)

		tsglm_poi_f$pred

		## [1] 4.648801

		tsglm_poi_f$interval

		## lower upper
		## [1,] 1 10

		There is no obvious way for me to improve the models fit.

		I tried to fit an INGARCH(1,0) as the alpha-parameter is only borderline
		significant, but that leads to issues with the variance-covariance
		matrix of the parameter vector and worse values for the scoring rules.

		I infer from this that the version of the INGARCH model with a log-link
		and neg. binomial conditional distriburtion can fit the UK strikes data
		comparably with to the GPAR2 model used in coconots, but
		based on the scroring rules, the latter would be preferred.

		# Rig Count Data

		## Data span 2014 to 2020.

		I first use the dataset 2014 to 2020 (March to March), which I propose
		to use in the JSS paper.

		data2 <- read_excel("rigcountsAlaskaL2014-20-march2march.xlsx")

		Preparation of the regressor. I start with the quarter 1 - dummy only.

		Xrc2 <- data2$AlaskaL
		nrc2 <- length(Xrc2)

		constrc2 <- rep(1, nrc2)

		q1rc2 <- data2$Q1

		xregrc2 <- cbind(constrc2, q1rc2)

		Then I fit a INGARCH(1,1) model with a linear link first:

		fit_rc2_reg1 <- tsglm(Xrc2, model = list(past_obs = c(1),past_mean=1), xreg = xregrc2)

		summary(fit_rc2_reg1)

		##
		## Call:
		## tsglm(ts = Xrc2, model = list(past_obs = c(1), past_mean = 1),
		## xreg = xregrc2)
		##
		## Coefficients:
		## Estimate Std.Error CI(lower) CI(upper)
		## (Intercept) 0.629584 0.719 -0.779 2.038
		## beta_1 0.552083 0.108 0.341 0.764
		## alpha_1 0.362488 0.126 0.115 0.610
		## constrc2 0.000452 0.626 -1.226 1.227
		## q1rc2 0.299250 0.271 -0.232 0.830
		## Standard errors and confidence intervals (level = 95 %) obtained
		## by normal approximation.
		##
		## Link function: identity
		## Distribution family: poisson
		## Number of coefficients: 5
		## Log-likelihood: -641.2468
		## AIC: 1292.494
		## BIC: 1311.225
		## QIC: 1293.475

		According to the model fit, there is no need for the Q1-dummy.

		Model diagnostics:

		acf(residuals(fit_rc2_reg1), main = "ACF of response residuals")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-1.png)

		cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg1,type="pearson")))

		## Mean of Pearson residuals: -0.01929286

		cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg1,type = "pearson")))

		## Variance of Pearson residuals: 0.2370493

		pit(fit_rc2_reg1, ylim = c(0, 1.5), main = "PIT Histogram")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-2.png)

		scoring(fit_rc2_reg1)

		## logarithmic quadratic spherical rankprob dawseb normsq
		## 2.0487119 -0.1595609 -0.4074842 0.8768263 2.2563953 0.2366642
		## sqerror
		## 1.7873130

		The model captures the serial correlation in the data surprisingly well!

		One-step ahead prediction:

		constf <- 1
		q1f <- 0

		xcastf <- cbind(constf, q1f)

		tsglm_rc2_f <- predict(fit_rc2_reg1 ,n.ahead = 1, newxreg = xcastf)

		tsglm_rc2_f$pred

		## [1] 9.271769

		tsglm_rc2_f$interval

		## lower upper
		## [1,] 4 16

		In an effort to improve the model, I now fit an INGARCH(1,14;,1) model
		without any regressors. The lag 14 dependent variable, is trying to
		capture potential stochastic seasonality:

		fit_rc2_reg2 <- tsglm(Xrc2, model = list(past_obs = c(1,14),past_mean=1))

		summary(fit_rc2_reg2)

		##
		## Call:
		## tsglm(ts = Xrc2, model = list(past_obs = c(1, 14), past_mean = 1))
		##
		## Coefficients:
		## Estimate Std.Error CI(lower) CI(upper)
		## (Intercept) 0.3878 0.4038 -0.4036 1.179
		## beta_1 0.6139 0.1107 0.3969 0.831
		## beta_14 0.0354 0.0524 -0.0672 0.138
		## alpha_1 0.3081 0.1297 0.0538 0.562
		## Standard errors and confidence intervals (level = 95 %) obtained
		## by normal approximation.
		##
		## Link function: identity
		## Distribution family: poisson
		## Number of coefficients: 4
		## Log-likelihood: -641.2983
		## AIC: 1290.597
		## BIC: 1305.581
		## QIC: 1290.692

		The parameter related to the stochastic seasonality is not statistically
		significant.

		Model diagnostics:

		acf(residuals(fit_rc2_reg2), main = "ACF of response residuals")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-1.png)

		cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg2,type="pearson")))

		## Mean of Pearson residuals: -0.02644024

		cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg2,type = "pearson")))

		## Variance of Pearson residuals: 0.2384399

		pit(fit_rc2_reg2, ylim = c(0, 1.5), main = "PIT Histogram")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-2.png)

		scoring(fit_rc2_reg2)

		## logarithmic quadratic spherical rankprob dawseb normsq
		## 2.0488764 -0.1595560 -0.4076225 0.8765324 2.2593817 0.2383772
		## sqerror
		## 1.7810585

		The residuals seem to look a bit ‘better’ in some sense. But I fail to
		see any real improvement here.

		I see no obvious way to improve the model.

		#

		## Time span 2017 to 2024

		I also include the analysis based on the original time span for the rig
		counts, including the Covid19 period here.

		datarc <- read_excel("rigcountsAlaskaL2017-24.xlsx")

		mean(datarc$AlaskaL)

		## [1] 6.394521

		var(datarc$AlaskaL)

		## [1] 5.794475

		plot(datarc$AlaskaL, type="l")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-1.png)

		forecast::Acf(datarc$AlaskaL)

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-2.png)

		forecast::Pacf(datarc$AlaskaL)

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-3.png)

		I fit a model with a Q1-dummy and a linear trend as we use it in the
		coconots package (note log-link is required here):

		Xrc <- datarc$AlaskaL
		nrc <- length(Xrc)

		trendrc <- (1:nrc - nrc/2) / nrc
		constrc <- rep(1, nrc)

		q1rc <- datarc$Q1

		xregrc <- cbind(constrc, trendrc, q1rc)


		fit_rc_reg1 <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link="log", distr = "poisson", xreg = xregrc)


		summary(fit_rc_reg1)

		##
		## Call:
		## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1),
		## xreg = xregrc, link = "log", distr = "poisson")
		##
		## Coefficients:
		## Estimate Std.Error CI(lower) CI(upper)
		## (Intercept) 1.2061 0.9444 -0.64496 3.057
		## beta_1 0.7641 0.1285 0.51234 1.016
		## alpha_1 -0.0900 0.1689 -0.42098 0.241
		## constrc -0.7297 1.0240 -2.73669 1.277
		## trendrc 0.1570 0.0838 -0.00734 0.321
		## q1rc 0.0745 0.0532 -0.02981 0.179
		## Standard errors and confidence intervals (level = 95 %) obtained
		## by normal approximation.
		##
		## Link function: log
		## Distribution family: poisson
		## Number of coefficients: 6
		## Log-likelihood: -713.1914
		## AIC: 1438.383
		## BIC: 1461.782
		## QIC: 1438.979

		Hm, the alpha-parameter related to the lagged conditional mean seems not
		to be significant; the trend term wants to be positive (!) and the
		parameter associated with the first quarter is not significant.

		acf(residuals(fit_rc_reg1), main = "ACF of response residuals")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-1.png)

		pit(fit_rc_reg1, ylim = c(0, 1.5), main = "PIT Histogram")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-2.png)

		mean(residuals(fit_rc_reg1,type="pearson"))

		## [1] -0.02232376

		var(residuals(fit_rc_reg1,type = "pearson"))

		## [1] 0.2603702

		scoring(fit_rc_reg1)

		## logarithmic quadratic spherical rankprob dawseb normsq
		## 1.9539492 -0.1781824 -0.4303744 0.8066894 2.0772362 0.2601552
		## sqerror
		## 1.7539446

		#

		Finally, I try to include an intervention term, a dummy taking on the
		value 1 from the second quarter or 2020 onwards to check, if the Covid19
		period want to be treated different from the previous time period.

		intervention <- interv_covariate(nrc , tau = c(159), delta = c(1))
		fit_rc_regi <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link= "log", distr = "poisson", xreg = intervention)

		summary(fit_rc_regi)

		##
		## Call:
		## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1),
		## xreg = intervention, link = "log", distr = "poisson")
		##
		## Coefficients:
		## Estimate Std.Error CI(lower) CI(upper)
		## (Intercept) 0.0920 0.1235 -0.1501 0.3341
		## beta_1 0.7842 0.1368 0.5160 1.0524
		## alpha_1 0.1222 0.1523 -0.1762 0.4206
		## interv_1 -0.0268 0.0364 -0.0982 0.0446
		## Standard errors and confidence intervals (level = 95 %) obtained
		## by normal approximation.
		##
		## Link function: log
		## Distribution family: poisson
		## Number of coefficients: 4
		## Log-likelihood: -703.6384
		## AIC: 1415.277
		## BIC: 1430.876
		## QIC: 1414.709

		The intervention dummy is nowhere near a sensible significance level.

		acf(residuals(fit_rc_regi), main = "ACF of response residuals")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-1.png)

		pit(fit_rc_regi, ylim = c(0, 1.5), main = "PIT Histogram")

		![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-2.png)

		mean(residuals(fit_rc_regi,type="pearson"))

		## [1] -0.04130746

		var(residuals(fit_rc_regi,type = "pearson"))

		## [1] 0.2137347

		scoring(fit_rc_regi)

		## logarithmic quadratic spherical rankprob dawseb normsq
		## 1.9277765 -0.1843596 -0.4387367 0.7609530 2.0325084 0.2148555
		## sqerror
		## 1.3042387

Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-1.png

0 → 100644

+4.3 KiB

Loading image diff...

Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-2.png

0 → 100644

+5.98 KiB

Loading image diff...

Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-1.png

0 → 100644

+4.23 KiB

Loading image diff...