diff --git a/Rnotebook-tscounts.html b/Rnotebook-tscounts.html new file mode 100644 index 0000000000000000000000000000000000000000..4b530978da3be9ad11ef3db93dd44b43a7d575db --- /dev/null +++ b/Rnotebook-tscounts.html @@ -0,0 +1,2025 @@ + + + + + + + + + + + + + +R Notebook tscounts + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + +

THIS VERSION: 6.12.2024

+ +

In this document I try to fit the UK strikes data and the rig count +data that we propose in the JSS paper to be analysed with the \(coconots\) package with the \(tscount\) package.

+

I start with the UK strikes data.

+

Then I present an analysis of the rig counts in the time span 2014 to +2020. The tscount package does a remarkably good job in capturing the +dependence structure, but struggles with the slight underdispersion in +the data. Values of the scoring rules are considerably larger (=worse) +as those from the GPAR2 model.

+

Finally, I then present results for rig count in the time span 2017 +to 2024. Here I find no specification of the models in tscount, that can +capture the dynamics in the data satisfactorily.

+
+

UK Strikes Data

+
data_uks <- read_excel("data_strikes.xlsx")
+

I start with the UK strikes data (public sector). In \(coconots\), we propose a GPAR2 model with +harmonics and a linear trend (using a log-link).

+

I generate a holdout sample of 3, and prepare the regressors +first:

+
hos <- 3
+
+X <- data_uks$public
+n <- length(X) - hos
+
+sin_x <- sin(2*pi*1:n/12)
+cos_x <- cos(2*pi*1:n/12)
+trend <- (1:n - n/2) / n
+const <- rep(1, n)
+xregtsc <- cbind(const, trend, cos_x, sin_x)
+

Then we fit an INGARCH(1,1) type of model with a log link and +negative binomial conditional distributional (Poisson led to a slightly +u-shaped PIT histogram) assumption:

+
uks_tsc_fit <- tsglm(data_uks$public[1:n], model = list(past_obs = c(1),past_mean=1), link = "log", distr = "nbinom", xreg = xregtsc)
+

The standard regression output:

+
summary(uks_tsc_fit)
+
## 
+## Call:
+## tsglm(ts = data_uks$public[1:n], model = list(past_obs = c(1), 
+##     past_mean = 1), xreg = xregtsc, link = "log", distr = "nbinom")
+## 
+## Coefficients:
+##              Estimate  Std.Error  CI(lower)  CI(upper)
+## (Intercept)    1.2547     0.8288    -0.3697   2.879134
+## beta_1         0.3990     0.0670     0.2677   0.530197
+## alpha_1        0.2153     0.1382    -0.0556   0.486154
+## const         -0.4992     0.6884    -1.8485   0.850020
+## trend         -0.3439     0.1196    -0.5783  -0.109619
+## cos_x         -0.0609     0.0309    -0.1214  -0.000343
+## sin_x          0.0165     0.0306    -0.0434   0.076477
+## sigmasq        0.0547         NA         NA         NA
+## Standard errors and confidence intervals (level =  95 %) obtained
+## by normal approximation.
+## 
+## Link function: log 
+## Distribution family: nbinom (with overdispersion coefficient 'sigmasq') 
+## Number of coefficients: 8 
+## Log-likelihood: -729.3632 
+## AIC: 1474.726 
+## BIC: 1503.974 
+## QIC: 1498.696
+

The overdispersion parameter is quite small and very close to zero, +the limiting case of a Poisson distribution. Its standard error is not +available in analytic form, but requires bootstrapping.

+

I have stopped this, as even with moderate bootstrap sample sizes +(500), the computation time is prohobitivly long.

+

Diagnostics:

+
acf(residuals(uks_tsc_fit), main = "ACF of response residuals")
+

+
pit(uks_tsc_fit, ylim = c(0, 1.5), main = "PIT Histogram")
+

+
cat("Mean of Pearson residuals:", mean(residuals(uks_tsc_fit,type="pearson")))
+
## Mean of Pearson residuals: -0.0006482209
+
cat("Variance of Pearson residuals: ", var(residuals(uks_tsc_fit,type = "pearson")))
+
## Variance of Pearson residuals:  0.9789472
+
scoring(uks_tsc_fit)
+
## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
+##  2.55022104 -0.09559065 -0.30694330  1.82121573  3.35794571  0.97552471 
+##     sqerror 
+## 11.54727482
+

One-step ahead prediction:

+
sin_xf <- sin(2*pi*(n+1)/12)
+cos_xf <- cos(2*pi*(n+1)/12)
+
+trendf <- ((n+1):(n+1) - n/2) / n #adjusted to a vector
+
+constf <- 1
+
+xcastk <- cbind(constf, trendf, cos_xf, sin_xf) 
+
+tsglm_poi_f <- predict(uks_tsc_fit,n.ahead = 1, newxreg = xcastk)
+
+tsglm_poi_f$pred
+
## [1] 4.648801
+
tsglm_poi_f$interval
+
##      lower upper
+## [1,]     1    10
+

There is no obvious way for me to improve the models fit.

+

I tried to fit an INGARCH(1,0) as the alpha-parameter is only +borderline significant, but that leads to issues with the +variance-covariance matrix of the parameter vector and worse values for +the scoring rules.

+

I infer from this that the version of the INGARCH model with a +log-link and neg. binomial conditional distriburtion can fit the UK +strikes data comparably with to the GPAR2 model used in \(coconots\), but based on the scroring +rules, the latter would be preferred.

+
+
+

Rig Count Data

+
+

Data span 2014 to 2020.

+

I first use the dataset 2014 to 2020 (March to March), which I +propose to use in the JSS paper.

+
data2 <- read_excel("rigcountsAlaskaL2014-20-march2march.xlsx")
+

Preparation of the regressor. I start with the quarter 1 - dummy +only.

+
Xrc2 <- data2$AlaskaL
+nrc2 <- length(Xrc2)
+
+constrc2 <- rep(1, nrc2)
+
+q1rc2 <- data2$Q1
+
+xregrc2 <- cbind(constrc2, q1rc2)
+

Then I fit a INGARCH(1,1) model with a linear link first:

+
fit_rc2_reg1 <- tsglm(Xrc2, model = list(past_obs = c(1),past_mean=1), xreg = xregrc2)
+
summary(fit_rc2_reg1)
+
## 
+## Call:
+## tsglm(ts = Xrc2, model = list(past_obs = c(1), past_mean = 1), 
+##     xreg = xregrc2)
+## 
+## Coefficients:
+##              Estimate  Std.Error  CI(lower)  CI(upper)
+## (Intercept)  0.629584      0.719     -0.779      2.038
+## beta_1       0.552083      0.108      0.341      0.764
+## alpha_1      0.362488      0.126      0.115      0.610
+## constrc2     0.000452      0.626     -1.226      1.227
+## q1rc2        0.299250      0.271     -0.232      0.830
+## Standard errors and confidence intervals (level =  95 %) obtained
+## by normal approximation.
+## 
+## Link function: identity 
+## Distribution family: poisson 
+## Number of coefficients: 5 
+## Log-likelihood: -641.2468 
+## AIC: 1292.494 
+## BIC: 1311.225 
+## QIC: 1293.475
+

According to the model fit, there is no need for the Q1-dummy.

+

Model diagnostics:

+
acf(residuals(fit_rc2_reg1), main = "ACF of response residuals")
+

+
cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg1,type="pearson")))
+
## Mean of Pearson residuals:  -0.01929286
+
cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg1,type = "pearson")))
+
## Variance of Pearson residuals:  0.2370493
+
pit(fit_rc2_reg1, ylim = c(0, 1.5), main = "PIT Histogram")
+

+
scoring(fit_rc2_reg1)
+
## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
+##   2.0487119  -0.1595609  -0.4074842   0.8768263   2.2563953   0.2366642 
+##     sqerror 
+##   1.7873130
+

The model captures the serial correlation in the data surprisingly +well!

+

One-step ahead prediction:

+
constf <- 1
+q1f <- 0 
+
+xcastf <- cbind(constf, q1f) 
+
+tsglm_rc2_f <- predict(fit_rc2_reg1 ,n.ahead = 1, newxreg = xcastf)
+
+tsglm_rc2_f$pred
+
## [1] 9.271769
+
tsglm_rc2_f$interval
+
##      lower upper
+## [1,]     4    16
+

In an effort to improve the model, I now fit an INGARCH(1,14;,1) +model without any regressors. The lag 14 dependent variable, is trying +to capture potential stochastic seasonality:

+
fit_rc2_reg2 <- tsglm(Xrc2, model = list(past_obs = c(1,14),past_mean=1))
+
summary(fit_rc2_reg2)
+
## 
+## Call:
+## tsglm(ts = Xrc2, model = list(past_obs = c(1, 14), past_mean = 1))
+## 
+## Coefficients:
+##              Estimate  Std.Error  CI(lower)  CI(upper)
+## (Intercept)    0.3878     0.4038    -0.4036      1.179
+## beta_1         0.6139     0.1107     0.3969      0.831
+## beta_14        0.0354     0.0524    -0.0672      0.138
+## alpha_1        0.3081     0.1297     0.0538      0.562
+## Standard errors and confidence intervals (level =  95 %) obtained
+## by normal approximation.
+## 
+## Link function: identity 
+## Distribution family: poisson 
+## Number of coefficients: 4 
+## Log-likelihood: -641.2983 
+## AIC: 1290.597 
+## BIC: 1305.581 
+## QIC: 1290.692
+

The parameter related to the stochastic seasonality is not +statistically significant.

+

Model diagnostics:

+
acf(residuals(fit_rc2_reg2), main = "ACF of response residuals")
+

+
cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg2,type="pearson")))
+
## Mean of Pearson residuals:  -0.02644024
+
cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg2,type = "pearson")))
+
## Variance of Pearson residuals:  0.2384399
+
pit(fit_rc2_reg2, ylim = c(0, 1.5), main = "PIT Histogram")
+

+
scoring(fit_rc2_reg2)
+
## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
+##   2.0488764  -0.1595560  -0.4076225   0.8765324   2.2593817   0.2383772 
+##     sqerror 
+##   1.7810585
+

The residuals seem to look a bit ‘better’ in some sense. But I fail +to see any real improvement here.

+

I see no obvious way to improve the model.

+
+
+
+

+
+

Time span 2017 to 2024

+

I also include the analysis based on the original time span for the +rig counts, including the Covid19 period here.

+
datarc <- read_excel("rigcountsAlaskaL2017-24.xlsx")
+
+mean(datarc$AlaskaL)
+
## [1] 6.394521
+
var(datarc$AlaskaL)
+
## [1] 5.794475
+
plot(datarc$AlaskaL, type="l")
+

+
forecast::Acf(datarc$AlaskaL)
+

+
forecast::Pacf(datarc$AlaskaL)
+

+

I fit a model with a Q1-dummy and a linear trend as we use it in the +coconots package (note log-link is required here):

+
Xrc <- datarc$AlaskaL
+nrc <- length(Xrc)
+
+trendrc <- (1:nrc - nrc/2) / nrc
+constrc <- rep(1, nrc)
+
+q1rc <- datarc$Q1
+
+xregrc <- cbind(constrc, trendrc, q1rc)
+
+
+fit_rc_reg1 <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link="log", distr = "poisson", xreg = xregrc)
+
+
+summary(fit_rc_reg1)
+
## 
+## Call:
+## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1), 
+##     xreg = xregrc, link = "log", distr = "poisson")
+## 
+## Coefficients:
+##              Estimate  Std.Error  CI(lower)  CI(upper)
+## (Intercept)    1.2061     0.9444   -0.64496      3.057
+## beta_1         0.7641     0.1285    0.51234      1.016
+## alpha_1       -0.0900     0.1689   -0.42098      0.241
+## constrc       -0.7297     1.0240   -2.73669      1.277
+## trendrc        0.1570     0.0838   -0.00734      0.321
+## q1rc           0.0745     0.0532   -0.02981      0.179
+## Standard errors and confidence intervals (level =  95 %) obtained
+## by normal approximation.
+## 
+## Link function: log 
+## Distribution family: poisson 
+## Number of coefficients: 6 
+## Log-likelihood: -713.1914 
+## AIC: 1438.383 
+## BIC: 1461.782 
+## QIC: 1438.979
+

Hm, the alpha-parameter related to the lagged conditional mean seems +not to be significant; the trend term wants to be positive (!) and the +parameter associated with the first quarter is not significant.

+
acf(residuals(fit_rc_reg1), main = "ACF of response residuals")
+

+
pit(fit_rc_reg1, ylim = c(0, 1.5), main = "PIT Histogram")
+

+
mean(residuals(fit_rc_reg1,type="pearson"))
+
## [1] -0.02232376
+
var(residuals(fit_rc_reg1,type = "pearson"))
+
## [1] 0.2603702
+
scoring(fit_rc_reg1)
+
## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
+##   1.9539492  -0.1781824  -0.4303744   0.8066894   2.0772362   0.2601552 
+##     sqerror 
+##   1.7539446
+
+
+
+

+

Finally, I try to include an intervention term, a dummy taking on the +value 1 from the second quarter or 2020 onwards to check, if the Covid19 +period want to be treated different from the previous time period.

+
intervention <- interv_covariate(nrc , tau = c(159), delta = c(1))
+fit_rc_regi <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link= "log", distr = "poisson", xreg = intervention)
+
+summary(fit_rc_regi)
+
## 
+## Call:
+## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1), 
+##     xreg = intervention, link = "log", distr = "poisson")
+## 
+## Coefficients:
+##              Estimate  Std.Error  CI(lower)  CI(upper)
+## (Intercept)    0.0920     0.1235    -0.1501     0.3341
+## beta_1         0.7842     0.1368     0.5160     1.0524
+## alpha_1        0.1222     0.1523    -0.1762     0.4206
+## interv_1      -0.0268     0.0364    -0.0982     0.0446
+## Standard errors and confidence intervals (level =  95 %) obtained
+## by normal approximation.
+## 
+## Link function: log 
+## Distribution family: poisson 
+## Number of coefficients: 4 
+## Log-likelihood: -703.6384 
+## AIC: 1415.277 
+## BIC: 1430.876 
+## QIC: 1414.709
+

The intervention dummy is nowhere near a sensible significance +level.

+
acf(residuals(fit_rc_regi), main = "ACF of response residuals")
+

+
pit(fit_rc_regi, ylim = c(0, 1.5), main = "PIT Histogram")
+

+
mean(residuals(fit_rc_regi,type="pearson"))
+
## [1] -0.04130746
+
var(residuals(fit_rc_regi,type = "pearson"))
+
## [1] 0.2137347
+
scoring(fit_rc_regi)
+
## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
+##   1.9277765  -0.1843596  -0.4387367   0.7609530   2.0325084   0.2148555 
+##     sqerror 
+##   1.3042387
+
+ + + + +
+ + + + + + + + + + + + + + + diff --git a/Rnotebook-tscounts.md b/Rnotebook-tscounts.md new file mode 100644 index 0000000000000000000000000000000000000000..d2d057b3f9718c7cb4f233f614a9a9b7755bac71 --- /dev/null +++ b/Rnotebook-tscounts.md @@ -0,0 +1,460 @@ +THIS VERSION: 6.12.2024 + + + +In this document I try to fit the UK strikes data and the rig count data +that we propose in the JSS paper to be analysed with the +*c**o**c**o**n**o**t**s* package with the *t**s**c**o**u**n**t* package. + +I start with the UK strikes data. + +Then I present an analysis of the rig counts in the time span 2014 to +2020. The tscount package does a remarkably good job in capturing the +dependence structure, but struggles with the slight underdispersion in +the data. Values of the scoring rules are considerably larger (=worse) +as those from the GPAR2 model. + +Finally, I then present results for rig count in the time span 2017 to +2024. Here I find no specification of the models in tscount, that can +capture the dynamics in the data satisfactorily. + +# UK Strikes Data + + data_uks <- read_excel("data_strikes.xlsx") + +I start with the UK strikes data (public sector). In +*c**o**c**o**n**o**t**s*, we propose a GPAR2 model with harmonics and a +linear trend (using a log-link). + +I generate a holdout sample of 3, and prepare the regressors first: + + hos <- 3 + + X <- data_uks$public + n <- length(X) - hos + + sin_x <- sin(2*pi*1:n/12) + cos_x <- cos(2*pi*1:n/12) + trend <- (1:n - n/2) / n + const <- rep(1, n) + xregtsc <- cbind(const, trend, cos_x, sin_x) + +Then we fit an INGARCH(1,1) type of model with a log link and negative +binomial conditional distributional (Poisson led to a slightly u-shaped +PIT histogram) assumption: + + uks_tsc_fit <- tsglm(data_uks$public[1:n], model = list(past_obs = c(1),past_mean=1), link = "log", distr = "nbinom", xreg = xregtsc) + +The standard regression output: + + summary(uks_tsc_fit) + + ## + ## Call: + ## tsglm(ts = data_uks$public[1:n], model = list(past_obs = c(1), + ## past_mean = 1), xreg = xregtsc, link = "log", distr = "nbinom") + ## + ## Coefficients: + ## Estimate Std.Error CI(lower) CI(upper) + ## (Intercept) 1.2547 0.8288 -0.3697 2.879134 + ## beta_1 0.3990 0.0670 0.2677 0.530197 + ## alpha_1 0.2153 0.1382 -0.0556 0.486154 + ## const -0.4992 0.6884 -1.8485 0.850020 + ## trend -0.3439 0.1196 -0.5783 -0.109619 + ## cos_x -0.0609 0.0309 -0.1214 -0.000343 + ## sin_x 0.0165 0.0306 -0.0434 0.076477 + ## sigmasq 0.0547 NA NA NA + ## Standard errors and confidence intervals (level = 95 %) obtained + ## by normal approximation. + ## + ## Link function: log + ## Distribution family: nbinom (with overdispersion coefficient 'sigmasq') + ## Number of coefficients: 8 + ## Log-likelihood: -729.3632 + ## AIC: 1474.726 + ## BIC: 1503.974 + ## QIC: 1498.696 + +The overdispersion parameter is quite small and very close to zero, the +limiting case of a Poisson distribution. Its standard error is not +available in analytic form, but requires bootstrapping. + +I have stopped this, as even with moderate bootstrap sample sizes (500), +the computation time is prohobitivly long. + +Diagnostics: + + acf(residuals(uks_tsc_fit), main = "ACF of response residuals") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-1.png) + + pit(uks_tsc_fit, ylim = c(0, 1.5), main = "PIT Histogram") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-2.png) + + cat("Mean of Pearson residuals:", mean(residuals(uks_tsc_fit,type="pearson"))) + + ## Mean of Pearson residuals: -0.0006482209 + + cat("Variance of Pearson residuals: ", var(residuals(uks_tsc_fit,type = "pearson"))) + + ## Variance of Pearson residuals: 0.9789472 + + scoring(uks_tsc_fit) + + ## logarithmic quadratic spherical rankprob dawseb normsq + ## 2.55022104 -0.09559065 -0.30694330 1.82121573 3.35794571 0.97552471 + ## sqerror + ## 11.54727482 + +One-step ahead prediction: + + sin_xf <- sin(2*pi*(n+1)/12) + cos_xf <- cos(2*pi*(n+1)/12) + + trendf <- ((n+1):(n+1) - n/2) / n #adjusted to a vector + + constf <- 1 + + xcastk <- cbind(constf, trendf, cos_xf, sin_xf) + + tsglm_poi_f <- predict(uks_tsc_fit,n.ahead = 1, newxreg = xcastk) + + tsglm_poi_f$pred + + ## [1] 4.648801 + + tsglm_poi_f$interval + + ## lower upper + ## [1,] 1 10 + +There is no obvious way for me to improve the models fit. + +I tried to fit an INGARCH(1,0) as the alpha-parameter is only borderline +significant, but that leads to issues with the variance-covariance +matrix of the parameter vector and worse values for the scoring rules. + +I infer from this that the version of the INGARCH model with a log-link +and neg. binomial conditional distriburtion can fit the UK strikes data +comparably with to the GPAR2 model used in *c**o**c**o**n**o**t**s*, but +based on the scroring rules, the latter would be preferred. + +# Rig Count Data + +## Data span 2014 to 2020. + +I first use the dataset 2014 to 2020 (March to March), which I propose +to use in the JSS paper. + + data2 <- read_excel("rigcountsAlaskaL2014-20-march2march.xlsx") + +Preparation of the regressor. I start with the quarter 1 - dummy only. + + Xrc2 <- data2$AlaskaL + nrc2 <- length(Xrc2) + + constrc2 <- rep(1, nrc2) + + q1rc2 <- data2$Q1 + + xregrc2 <- cbind(constrc2, q1rc2) + +Then I fit a INGARCH(1,1) model with a linear link first: + + fit_rc2_reg1 <- tsglm(Xrc2, model = list(past_obs = c(1),past_mean=1), xreg = xregrc2) + + summary(fit_rc2_reg1) + + ## + ## Call: + ## tsglm(ts = Xrc2, model = list(past_obs = c(1), past_mean = 1), + ## xreg = xregrc2) + ## + ## Coefficients: + ## Estimate Std.Error CI(lower) CI(upper) + ## (Intercept) 0.629584 0.719 -0.779 2.038 + ## beta_1 0.552083 0.108 0.341 0.764 + ## alpha_1 0.362488 0.126 0.115 0.610 + ## constrc2 0.000452 0.626 -1.226 1.227 + ## q1rc2 0.299250 0.271 -0.232 0.830 + ## Standard errors and confidence intervals (level = 95 %) obtained + ## by normal approximation. + ## + ## Link function: identity + ## Distribution family: poisson + ## Number of coefficients: 5 + ## Log-likelihood: -641.2468 + ## AIC: 1292.494 + ## BIC: 1311.225 + ## QIC: 1293.475 + +According to the model fit, there is no need for the Q1-dummy. + +Model diagnostics: + + acf(residuals(fit_rc2_reg1), main = "ACF of response residuals") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-1.png) + + cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg1,type="pearson"))) + + ## Mean of Pearson residuals: -0.01929286 + + cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg1,type = "pearson"))) + + ## Variance of Pearson residuals: 0.2370493 + + pit(fit_rc2_reg1, ylim = c(0, 1.5), main = "PIT Histogram") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-2.png) + + scoring(fit_rc2_reg1) + + ## logarithmic quadratic spherical rankprob dawseb normsq + ## 2.0487119 -0.1595609 -0.4074842 0.8768263 2.2563953 0.2366642 + ## sqerror + ## 1.7873130 + +The model captures the serial correlation in the data surprisingly well! + +One-step ahead prediction: + + constf <- 1 + q1f <- 0 + + xcastf <- cbind(constf, q1f) + + tsglm_rc2_f <- predict(fit_rc2_reg1 ,n.ahead = 1, newxreg = xcastf) + + tsglm_rc2_f$pred + + ## [1] 9.271769 + + tsglm_rc2_f$interval + + ## lower upper + ## [1,] 4 16 + +In an effort to improve the model, I now fit an INGARCH(1,14;,1) model +without any regressors. The lag 14 dependent variable, is trying to +capture potential stochastic seasonality: + + fit_rc2_reg2 <- tsglm(Xrc2, model = list(past_obs = c(1,14),past_mean=1)) + + summary(fit_rc2_reg2) + + ## + ## Call: + ## tsglm(ts = Xrc2, model = list(past_obs = c(1, 14), past_mean = 1)) + ## + ## Coefficients: + ## Estimate Std.Error CI(lower) CI(upper) + ## (Intercept) 0.3878 0.4038 -0.4036 1.179 + ## beta_1 0.6139 0.1107 0.3969 0.831 + ## beta_14 0.0354 0.0524 -0.0672 0.138 + ## alpha_1 0.3081 0.1297 0.0538 0.562 + ## Standard errors and confidence intervals (level = 95 %) obtained + ## by normal approximation. + ## + ## Link function: identity + ## Distribution family: poisson + ## Number of coefficients: 4 + ## Log-likelihood: -641.2983 + ## AIC: 1290.597 + ## BIC: 1305.581 + ## QIC: 1290.692 + +The parameter related to the stochastic seasonality is not statistically +significant. + +Model diagnostics: + + acf(residuals(fit_rc2_reg2), main = "ACF of response residuals") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-1.png) + + cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg2,type="pearson"))) + + ## Mean of Pearson residuals: -0.02644024 + + cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg2,type = "pearson"))) + + ## Variance of Pearson residuals: 0.2384399 + + pit(fit_rc2_reg2, ylim = c(0, 1.5), main = "PIT Histogram") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-2.png) + + scoring(fit_rc2_reg2) + + ## logarithmic quadratic spherical rankprob dawseb normsq + ## 2.0488764 -0.1595560 -0.4076225 0.8765324 2.2593817 0.2383772 + ## sqerror + ## 1.7810585 + +The residuals seem to look a bit ‘better’ in some sense. But I fail to +see any real improvement here. + +I see no obvious way to improve the model. + +# + +## Time span 2017 to 2024 + +I also include the analysis based on the original time span for the rig +counts, including the Covid19 period here. + + datarc <- read_excel("rigcountsAlaskaL2017-24.xlsx") + + mean(datarc$AlaskaL) + + ## [1] 6.394521 + + var(datarc$AlaskaL) + + ## [1] 5.794475 + + plot(datarc$AlaskaL, type="l") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-1.png) + + forecast::Acf(datarc$AlaskaL) + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-2.png) + + forecast::Pacf(datarc$AlaskaL) + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-3.png) + +I fit a model with a Q1-dummy and a linear trend as we use it in the +coconots package (note log-link is required here): + + Xrc <- datarc$AlaskaL + nrc <- length(Xrc) + + trendrc <- (1:nrc - nrc/2) / nrc + constrc <- rep(1, nrc) + + q1rc <- datarc$Q1 + + xregrc <- cbind(constrc, trendrc, q1rc) + + + fit_rc_reg1 <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link="log", distr = "poisson", xreg = xregrc) + + + summary(fit_rc_reg1) + + ## + ## Call: + ## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1), + ## xreg = xregrc, link = "log", distr = "poisson") + ## + ## Coefficients: + ## Estimate Std.Error CI(lower) CI(upper) + ## (Intercept) 1.2061 0.9444 -0.64496 3.057 + ## beta_1 0.7641 0.1285 0.51234 1.016 + ## alpha_1 -0.0900 0.1689 -0.42098 0.241 + ## constrc -0.7297 1.0240 -2.73669 1.277 + ## trendrc 0.1570 0.0838 -0.00734 0.321 + ## q1rc 0.0745 0.0532 -0.02981 0.179 + ## Standard errors and confidence intervals (level = 95 %) obtained + ## by normal approximation. + ## + ## Link function: log + ## Distribution family: poisson + ## Number of coefficients: 6 + ## Log-likelihood: -713.1914 + ## AIC: 1438.383 + ## BIC: 1461.782 + ## QIC: 1438.979 + +Hm, the alpha-parameter related to the lagged conditional mean seems not +to be significant; the trend term wants to be positive (!) and the +parameter associated with the first quarter is not significant. + + acf(residuals(fit_rc_reg1), main = "ACF of response residuals") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-1.png) + + pit(fit_rc_reg1, ylim = c(0, 1.5), main = "PIT Histogram") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-2.png) + + mean(residuals(fit_rc_reg1,type="pearson")) + + ## [1] -0.02232376 + + var(residuals(fit_rc_reg1,type = "pearson")) + + ## [1] 0.2603702 + + scoring(fit_rc_reg1) + + ## logarithmic quadratic spherical rankprob dawseb normsq + ## 1.9539492 -0.1781824 -0.4303744 0.8066894 2.0772362 0.2601552 + ## sqerror + ## 1.7539446 + +# + +Finally, I try to include an intervention term, a dummy taking on the +value 1 from the second quarter or 2020 onwards to check, if the Covid19 +period want to be treated different from the previous time period. + + intervention <- interv_covariate(nrc , tau = c(159), delta = c(1)) + fit_rc_regi <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link= "log", distr = "poisson", xreg = intervention) + + summary(fit_rc_regi) + + ## + ## Call: + ## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1), + ## xreg = intervention, link = "log", distr = "poisson") + ## + ## Coefficients: + ## Estimate Std.Error CI(lower) CI(upper) + ## (Intercept) 0.0920 0.1235 -0.1501 0.3341 + ## beta_1 0.7842 0.1368 0.5160 1.0524 + ## alpha_1 0.1222 0.1523 -0.1762 0.4206 + ## interv_1 -0.0268 0.0364 -0.0982 0.0446 + ## Standard errors and confidence intervals (level = 95 %) obtained + ## by normal approximation. + ## + ## Link function: log + ## Distribution family: poisson + ## Number of coefficients: 4 + ## Log-likelihood: -703.6384 + ## AIC: 1415.277 + ## BIC: 1430.876 + ## QIC: 1414.709 + +The intervention dummy is nowhere near a sensible significance level. + + acf(residuals(fit_rc_regi), main = "ACF of response residuals") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-1.png) + + pit(fit_rc_regi, ylim = c(0, 1.5), main = "PIT Histogram") + +![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-2.png) + + mean(residuals(fit_rc_regi,type="pearson")) + + ## [1] -0.04130746 + + var(residuals(fit_rc_regi,type = "pearson")) + + ## [1] 0.2137347 + + scoring(fit_rc_regi) + + ## logarithmic quadratic spherical rankprob dawseb normsq + ## 1.9277765 -0.1843596 -0.4387367 0.7609530 2.0325084 0.2148555 + ## sqerror + ## 1.3042387 diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-1.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-1.png new file mode 100644 index 0000000000000000000000000000000000000000..5e8dea5783eb0c5ab6968b6fd1af9bd62f468b5c Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-1.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-2.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-2.png new file mode 100644 index 0000000000000000000000000000000000000000..3bc2d5a2bfbddb3f665c1f8b3dc425798975a855 Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-2.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-1.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-1.png new file mode 100644 index 0000000000000000000000000000000000000000..a31ead5ca82ae9b47d9918c4c1a50a7a7c25dfa6 Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-1.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-2.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-2.png new file mode 100644 index 0000000000000000000000000000000000000000..c1a344974a36e4e1215dc1a32b043b1c76b0fd78 Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-2.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-1.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-1.png new file mode 100644 index 0000000000000000000000000000000000000000..29e262d621a1a0a9f1af7cae768c3754064043d3 Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-1.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-2.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-2.png new file mode 100644 index 0000000000000000000000000000000000000000..0cde719d2591daa3adaf615c716bf382ec0b66a4 Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-2.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-3.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-3.png new file mode 100644 index 0000000000000000000000000000000000000000..893c8f2171a44b31615c0a02b33bafcac4354557 Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-3.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-1.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-1.png new file mode 100644 index 0000000000000000000000000000000000000000..4a2f2cc44ebc390102077438c62e187f818621ef Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-1.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-2.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-2.png new file mode 100644 index 0000000000000000000000000000000000000000..568acb2f03df70006027d19ffcb25c2678f788bb Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-2.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-1.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-1.png new file mode 100644 index 0000000000000000000000000000000000000000..512813a29037a7162f003bd30e6f9adfd4bf652d Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-1.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-2.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-2.png new file mode 100644 index 0000000000000000000000000000000000000000..704b1bab9270410101efefaee402b332c3d78c00 Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-2.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-1.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-1.png new file mode 100644 index 0000000000000000000000000000000000000000..5d62474f3b9075a2b5c48b8ecb761e09a82a81ea Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-1.png differ diff --git a/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-2.png b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-2.png new file mode 100644 index 0000000000000000000000000000000000000000..bc43e5847d497f0b01165a50fd18d250a35bd34b Binary files /dev/null and b/Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-2.png differ diff --git a/gitlab-readme.txt b/gitlab-readme.txt new file mode 100644 index 0000000000000000000000000000000000000000..d3601a41ce0a9e11954053d39bca3b15c8284910 --- /dev/null +++ b/gitlab-readme.txt @@ -0,0 +1,13 @@ +gitlab commandos zum hochladen der md files + + + + git remote add origin ssh://git@aidaho-edu.uni-hohenheim.de:30022/jungrk/coconots.git + +lokale Dateien in Verzeichnis + + + +git add . +git commit -m "Add Markdown document and figures" +git push \ No newline at end of file