Commit 1f2e73fe authored by Robert Jung's avatar Robert Jung
Browse files

Merge branch 'master' into 'main'

tscount 1

See merge request !4
parents 031f12ec 24f43049
Loading
Loading
Loading
Loading
+2025 −0

File added.

Preview size limit exceeded, changes collapsed.

Rnotebook-tscounts.md

0 → 100644
+460 −0
Original line number Diff line number Diff line
THIS VERSION: 6.12.2024

<!-- 
When using the local coconots version:
install.packages("devtools")
library(usethis)
library(devtools) -->

In this document I try to fit the UK strikes data and the rig count data
that we propose in the JSS paper to be analysed with the
*c**o**c**o**n**o**t**s* package with the *t**s**c**o**u**n**t* package.

I start with the UK strikes data.

Then I present an analysis of the rig counts in the time span 2014 to
2020. The tscount package does a remarkably good job in capturing the
dependence structure, but struggles with the slight underdispersion in
the data. Values of the scoring rules are considerably larger (=worse)
as those from the GPAR2 model.

Finally, I then present results for rig count in the time span 2017 to
2024. Here I find no specification of the models in tscount, that can
capture the dynamics in the data satisfactorily.

# UK Strikes Data

    data_uks <- read_excel("data_strikes.xlsx")

I start with the UK strikes data (public sector). In
*c**o**c**o**n**o**t**s*, we propose a GPAR2 model with harmonics and a
linear trend (using a log-link).

I generate a holdout sample of 3, and prepare the regressors first:

    hos <- 3

    X <- data_uks$public
    n <- length(X) - hos

    sin_x <- sin(2*pi*1:n/12)
    cos_x <- cos(2*pi*1:n/12)
    trend <- (1:n - n/2) / n
    const <- rep(1, n)
    xregtsc <- cbind(const, trend, cos_x, sin_x)

Then we fit an INGARCH(1,1) type of model with a log link and negative
binomial conditional distributional (Poisson led to a slightly u-shaped
PIT histogram) assumption:

    uks_tsc_fit <- tsglm(data_uks$public[1:n], model = list(past_obs = c(1),past_mean=1), link = "log", distr = "nbinom", xreg = xregtsc)

The standard regression output:

    summary(uks_tsc_fit)

    ## 
    ## Call:
    ## tsglm(ts = data_uks$public[1:n], model = list(past_obs = c(1), 
    ##     past_mean = 1), xreg = xregtsc, link = "log", distr = "nbinom")
    ## 
    ## Coefficients:
    ##              Estimate  Std.Error  CI(lower)  CI(upper)
    ## (Intercept)    1.2547     0.8288    -0.3697   2.879134
    ## beta_1         0.3990     0.0670     0.2677   0.530197
    ## alpha_1        0.2153     0.1382    -0.0556   0.486154
    ## const         -0.4992     0.6884    -1.8485   0.850020
    ## trend         -0.3439     0.1196    -0.5783  -0.109619
    ## cos_x         -0.0609     0.0309    -0.1214  -0.000343
    ## sin_x          0.0165     0.0306    -0.0434   0.076477
    ## sigmasq        0.0547         NA         NA         NA
    ## Standard errors and confidence intervals (level =  95 %) obtained
    ## by normal approximation.
    ## 
    ## Link function: log 
    ## Distribution family: nbinom (with overdispersion coefficient 'sigmasq') 
    ## Number of coefficients: 8 
    ## Log-likelihood: -729.3632 
    ## AIC: 1474.726 
    ## BIC: 1503.974 
    ## QIC: 1498.696

The overdispersion parameter is quite small and very close to zero, the
limiting case of a Poisson distribution. Its standard error is not
available in analytic form, but requires bootstrapping.

I have stopped this, as even with moderate bootstrap sample sizes (500),
the computation time is prohobitivly long.

Diagnostics:

    acf(residuals(uks_tsc_fit), main = "ACF of response residuals")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-1.png)

    pit(uks_tsc_fit, ylim = c(0, 1.5), main = "PIT Histogram")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-6-2.png)

    cat("Mean of Pearson residuals:", mean(residuals(uks_tsc_fit,type="pearson")))

    ## Mean of Pearson residuals: -0.0006482209

    cat("Variance of Pearson residuals: ", var(residuals(uks_tsc_fit,type = "pearson")))

    ## Variance of Pearson residuals:  0.9789472

    scoring(uks_tsc_fit)

    ## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
    ##  2.55022104 -0.09559065 -0.30694330  1.82121573  3.35794571  0.97552471 
    ##     sqerror 
    ## 11.54727482

One-step ahead prediction:

    sin_xf <- sin(2*pi*(n+1)/12)
    cos_xf <- cos(2*pi*(n+1)/12)

    trendf <- ((n+1):(n+1) - n/2) / n #adjusted to a vector

    constf <- 1

    xcastk <- cbind(constf, trendf, cos_xf, sin_xf) 

    tsglm_poi_f <- predict(uks_tsc_fit,n.ahead = 1, newxreg = xcastk)

    tsglm_poi_f$pred

    ## [1] 4.648801

    tsglm_poi_f$interval

    ##      lower upper
    ## [1,]     1    10

There is no obvious way for me to improve the models fit.

I tried to fit an INGARCH(1,0) as the alpha-parameter is only borderline
significant, but that leads to issues with the variance-covariance
matrix of the parameter vector and worse values for the scoring rules.

I infer from this that the version of the INGARCH model with a log-link
and neg. binomial conditional distriburtion can fit the UK strikes data
comparably with to the GPAR2 model used in *c**o**c**o**n**o**t**s*, but
based on the scroring rules, the latter would be preferred.

# Rig Count Data

## Data span 2014 to 2020.

I first use the dataset 2014 to 2020 (March to March), which I propose
to use in the JSS paper.

    data2 <- read_excel("rigcountsAlaskaL2014-20-march2march.xlsx")

Preparation of the regressor. I start with the quarter 1 - dummy only.

    Xrc2 <- data2$AlaskaL
    nrc2 <- length(Xrc2)

    constrc2 <- rep(1, nrc2)

    q1rc2 <- data2$Q1

    xregrc2 <- cbind(constrc2, q1rc2)

Then I fit a INGARCH(1,1) model with a linear link first:

    fit_rc2_reg1 <- tsglm(Xrc2, model = list(past_obs = c(1),past_mean=1), xreg = xregrc2)

    summary(fit_rc2_reg1)

    ## 
    ## Call:
    ## tsglm(ts = Xrc2, model = list(past_obs = c(1), past_mean = 1), 
    ##     xreg = xregrc2)
    ## 
    ## Coefficients:
    ##              Estimate  Std.Error  CI(lower)  CI(upper)
    ## (Intercept)  0.629584      0.719     -0.779      2.038
    ## beta_1       0.552083      0.108      0.341      0.764
    ## alpha_1      0.362488      0.126      0.115      0.610
    ## constrc2     0.000452      0.626     -1.226      1.227
    ## q1rc2        0.299250      0.271     -0.232      0.830
    ## Standard errors and confidence intervals (level =  95 %) obtained
    ## by normal approximation.
    ## 
    ## Link function: identity 
    ## Distribution family: poisson 
    ## Number of coefficients: 5 
    ## Log-likelihood: -641.2468 
    ## AIC: 1292.494 
    ## BIC: 1311.225 
    ## QIC: 1293.475

According to the model fit, there is no need for the Q1-dummy.

Model diagnostics:

    acf(residuals(fit_rc2_reg1), main = "ACF of response residuals")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-1.png)

    cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg1,type="pearson")))

    ## Mean of Pearson residuals:  -0.01929286

    cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg1,type = "pearson")))

    ## Variance of Pearson residuals:  0.2370493

    pit(fit_rc2_reg1, ylim = c(0, 1.5), main = "PIT Histogram")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-12-2.png)

    scoring(fit_rc2_reg1)

    ## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
    ##   2.0487119  -0.1595609  -0.4074842   0.8768263   2.2563953   0.2366642 
    ##     sqerror 
    ##   1.7873130

The model captures the serial correlation in the data surprisingly well!

One-step ahead prediction:

    constf <- 1
    q1f <- 0 

    xcastf <- cbind(constf, q1f) 

    tsglm_rc2_f <- predict(fit_rc2_reg1 ,n.ahead = 1, newxreg = xcastf)

    tsglm_rc2_f$pred

    ## [1] 9.271769

    tsglm_rc2_f$interval

    ##      lower upper
    ## [1,]     4    16

In an effort to improve the model, I now fit an INGARCH(1,14;,1) model
without any regressors. The lag 14 dependent variable, is trying to
capture potential stochastic seasonality:

    fit_rc2_reg2 <- tsglm(Xrc2, model = list(past_obs = c(1,14),past_mean=1))

    summary(fit_rc2_reg2)

    ## 
    ## Call:
    ## tsglm(ts = Xrc2, model = list(past_obs = c(1, 14), past_mean = 1))
    ## 
    ## Coefficients:
    ##              Estimate  Std.Error  CI(lower)  CI(upper)
    ## (Intercept)    0.3878     0.4038    -0.4036      1.179
    ## beta_1         0.6139     0.1107     0.3969      0.831
    ## beta_14        0.0354     0.0524    -0.0672      0.138
    ## alpha_1        0.3081     0.1297     0.0538      0.562
    ## Standard errors and confidence intervals (level =  95 %) obtained
    ## by normal approximation.
    ## 
    ## Link function: identity 
    ## Distribution family: poisson 
    ## Number of coefficients: 4 
    ## Log-likelihood: -641.2983 
    ## AIC: 1290.597 
    ## BIC: 1305.581 
    ## QIC: 1290.692

The parameter related to the stochastic seasonality is not statistically
significant.

Model diagnostics:

    acf(residuals(fit_rc2_reg2), main = "ACF of response residuals")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-1.png)

    cat("Mean of Pearson residuals: ", mean(residuals(fit_rc2_reg2,type="pearson")))

    ## Mean of Pearson residuals:  -0.02644024

    cat("Variance of Pearson residuals: ", var(residuals(fit_rc2_reg2,type = "pearson")))

    ## Variance of Pearson residuals:  0.2384399

    pit(fit_rc2_reg2, ylim = c(0, 1.5), main = "PIT Histogram")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-16-2.png)

    scoring(fit_rc2_reg2)

    ## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
    ##   2.0488764  -0.1595560  -0.4076225   0.8765324   2.2593817   0.2383772 
    ##     sqerror 
    ##   1.7810585

The residuals seem to look a bit ‘better’ in some sense. But I fail to
see any real improvement here.

I see no obvious way to improve the model.

# 

## Time span 2017 to 2024

I also include the analysis based on the original time span for the rig
counts, including the Covid19 period here.

    datarc <- read_excel("rigcountsAlaskaL2017-24.xlsx")

    mean(datarc$AlaskaL)

    ## [1] 6.394521

    var(datarc$AlaskaL)

    ## [1] 5.794475

    plot(datarc$AlaskaL, type="l")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-1.png)

    forecast::Acf(datarc$AlaskaL)

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-2.png)

    forecast::Pacf(datarc$AlaskaL)

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-17-3.png)

I fit a model with a Q1-dummy and a linear trend as we use it in the
coconots package (note log-link is required here):

    Xrc <- datarc$AlaskaL
    nrc <- length(Xrc)

    trendrc <- (1:nrc - nrc/2) / nrc
    constrc <- rep(1, nrc)

    q1rc <- datarc$Q1

    xregrc <- cbind(constrc, trendrc, q1rc)


    fit_rc_reg1 <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link="log", distr = "poisson", xreg = xregrc)


    summary(fit_rc_reg1)

    ## 
    ## Call:
    ## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1), 
    ##     xreg = xregrc, link = "log", distr = "poisson")
    ## 
    ## Coefficients:
    ##              Estimate  Std.Error  CI(lower)  CI(upper)
    ## (Intercept)    1.2061     0.9444   -0.64496      3.057
    ## beta_1         0.7641     0.1285    0.51234      1.016
    ## alpha_1       -0.0900     0.1689   -0.42098      0.241
    ## constrc       -0.7297     1.0240   -2.73669      1.277
    ## trendrc        0.1570     0.0838   -0.00734      0.321
    ## q1rc           0.0745     0.0532   -0.02981      0.179
    ## Standard errors and confidence intervals (level =  95 %) obtained
    ## by normal approximation.
    ## 
    ## Link function: log 
    ## Distribution family: poisson 
    ## Number of coefficients: 6 
    ## Log-likelihood: -713.1914 
    ## AIC: 1438.383 
    ## BIC: 1461.782 
    ## QIC: 1438.979

Hm, the alpha-parameter related to the lagged conditional mean seems not
to be significant; the trend term wants to be positive (!) and the
parameter associated with the first quarter is not significant.

    acf(residuals(fit_rc_reg1), main = "ACF of response residuals")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-1.png)

    pit(fit_rc_reg1, ylim = c(0, 1.5), main = "PIT Histogram")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-19-2.png)

    mean(residuals(fit_rc_reg1,type="pearson"))

    ## [1] -0.02232376

    var(residuals(fit_rc_reg1,type = "pearson"))

    ## [1] 0.2603702

    scoring(fit_rc_reg1)

    ## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
    ##   1.9539492  -0.1781824  -0.4303744   0.8066894   2.0772362   0.2601552 
    ##     sqerror 
    ##   1.7539446

# 

Finally, I try to include an intervention term, a dummy taking on the
value 1 from the second quarter or 2020 onwards to check, if the Covid19
period want to be treated different from the previous time period.

    intervention <- interv_covariate(nrc , tau = c(159), delta = c(1))
    fit_rc_regi <- tsglm(Xrc, model = list(past_obs = c(1),past_mean=1), link= "log", distr = "poisson", xreg = intervention)

    summary(fit_rc_regi)

    ## 
    ## Call:
    ## tsglm(ts = Xrc, model = list(past_obs = c(1), past_mean = 1), 
    ##     xreg = intervention, link = "log", distr = "poisson")
    ## 
    ## Coefficients:
    ##              Estimate  Std.Error  CI(lower)  CI(upper)
    ## (Intercept)    0.0920     0.1235    -0.1501     0.3341
    ## beta_1         0.7842     0.1368     0.5160     1.0524
    ## alpha_1        0.1222     0.1523    -0.1762     0.4206
    ## interv_1      -0.0268     0.0364    -0.0982     0.0446
    ## Standard errors and confidence intervals (level =  95 %) obtained
    ## by normal approximation.
    ## 
    ## Link function: log 
    ## Distribution family: poisson 
    ## Number of coefficients: 4 
    ## Log-likelihood: -703.6384 
    ## AIC: 1415.277 
    ## BIC: 1430.876 
    ## QIC: 1414.709

The intervention dummy is nowhere near a sensible significance level.

    acf(residuals(fit_rc_regi), main = "ACF of response residuals")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-1.png)

    pit(fit_rc_regi, ylim = c(0, 1.5), main = "PIT Histogram")

![](Rnotebook-tscounts_files/figure-markdown_strict/unnamed-chunk-21-2.png)

    mean(residuals(fit_rc_regi,type="pearson"))

    ## [1] -0.04130746

    var(residuals(fit_rc_regi,type = "pearson"))

    ## [1] 0.2137347

    scoring(fit_rc_regi)

    ## logarithmic   quadratic   spherical    rankprob      dawseb      normsq 
    ##   1.9277765  -0.1843596  -0.4387367   0.7609530   2.0325084   0.2148555 
    ##     sqerror 
    ##   1.3042387
+4.3 KiB
Loading image diff...
+5.98 KiB
Loading image diff...
+4.23 KiB
Loading image diff...
Loading