Data Exploration

glimpse & skimr output:

MrOs <- readxl::read_excel("MrOS_Baseline_Falls_Project.xlsx")
glimpse(MrOs)
## Rows: 5,994
## Columns: 20
## $ id       <chr> "BI0001", "BI0002", "BI0003", "BI0004", "BI0005", "BI0006", "~
## $ site     <chr> "BI", "BI", "BI", "BI", "BI", "BI", "BI", "BI", "BI", "BI", "~
## $ giage1   <dbl> 67, 67, 72, 65, 78, 78, 69, 65, 65, 72, 70, 67, 66, 73, 68, 7~
## $ mhdiab   <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ mhstrk   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ mhpark   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ mhcopd   <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ mharth   <dbl> 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0~
## $ mhcancer <dbl> 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0~
## $ pascore  <dbl> 183.25000, 179.57143, 300.00000, 107.00000, 166.71429, 41.000~
## $ qlhealth <dbl> 3, 3, 2, 1, 2, 2, 2, 1, 3, 1, 3, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1~
## $ hwbmi    <dbl> 29.42107, 31.87168, 26.09564, 28.56672, 29.11166, 28.45522, 1~
## $ gsgrpavg <dbl> 37.50, 38.00, 48.50, 43.00, 30.00, 26.00, 39.50, 41.00, 43.50~
## $ nfwlkspd <dbl> 1.2244898, 0.8882309, 1.3157895, 1.1964108, 0.8069939, 0.7736~
## $ b1fnd    <dbl> 0.8674233, 0.7402350, 0.6918427, 0.8783067, 0.9194940, 0.8862~
## $ b1thd    <dbl> 1.0464465, 0.9771559, 0.9386070, 1.0172246, 1.1047372, 1.0976~
## $ b1tbfkg  <dbl> 24.855927, 32.004398, 16.985812, 29.454647, 31.051432, 22.422~
## $ b1tblkg  <dbl> 57.92468, 62.85568, 61.85451, 63.01733, 46.56033, 49.63203, 4~
## $ mhfallv2 <dbl> 1, 1, 0, 1, NA, 0, 1, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, 1,~
## $ mhfalln2 <dbl> 3, 3, 0, 2, NA, 0, 6, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, 1,~
skimr::skim(MrOs)
Data summary
Name MrOs
Number of rows 5994
Number of columns 20
_______________________
Column type frequency:
character 2
numeric 18
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
id 0 1 6 6 0 5994 0
site 0 1 2 2 0 6 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
giage1 0 1.00 73.66 5.87 64.00 69.00 73.00 78.00 100.00 ▇▇▃▁▁
mhdiab 0 1.00 0.11 0.31 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
mhstrk 0 1.00 0.06 0.23 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
mhpark 0 1.00 0.01 0.09 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
mhcopd 0 1.00 0.11 0.31 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
mharth 0 1.00 0.47 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▇
mhcancer 0 1.00 0.29 0.45 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
pascore 3 1.00 146.49 68.29 0.00 100.00 142.00 186.14 486.43 ▃▇▃▁▁
qlhealth 2 1.00 1.83 0.71 1.00 1.00 2.00 2.00 5.00 ▅▇▂▁▁
hwbmi 7 1.00 27.38 3.83 17.21 24.77 26.90 29.49 50.67 ▂▇▂▁▁
gsgrpavg 114 0.98 38.56 8.16 7.00 33.00 38.50 44.00 65.50 ▁▂▇▅▁
nfwlkspd 12 1.00 1.20 0.23 0.19 1.06 1.21 1.35 2.13 ▁▂▇▃▁
b1fnd 1 1.00 0.78 0.13 0.27 0.70 0.77 0.86 1.60 ▁▇▅▁▁
b1thd 1 1.00 0.96 0.14 0.31 0.86 0.95 1.05 1.76 ▁▅▇▁▁
b1tbfkg 39 0.99 21.75 7.10 4.32 16.82 20.90 25.82 61.60 ▂▇▂▁▁
b1tblkg 39 0.99 56.90 7.25 34.32 51.80 56.39 61.42 86.52 ▁▇▇▂▁
mhfallv2 770 0.87 0.30 0.46 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
mhfalln2 770 0.87 0.62 1.27 0.00 0.00 0.00 1.00 12.00 ▇▁▁▁▁
obj_classes <- c()
for(i in seq(1:length(MrOs))) {
  obj_classes = c(obj_classes, class(MrOs[[1,i]]))
}
obj_classes
##  [1] "character" "character" "numeric"   "numeric"   "numeric"   "numeric"  
##  [7] "numeric"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric"  
## [13] "numeric"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric"  
## [19] "numeric"   "numeric"

Creation of site & qlhealth dummy variables and outcome variable

We’ll consider every variable in the data frame other than patient ID as a candidate variable for the model. We’ll use Portland as the referent group for when site is considered as a categorical variable. The outcome of interest is having more than one fall in a given year.

unique(MrOs$site)
## [1] "BI" "MN" "PA" "PI" "PO" "SD"
MrOs <- MrOs %>% 
  mutate(st_1 = case_when(
    site == "PO" ~ 0,
    site == "BI" ~ 1,
    site == "MN" ~ 0,
    site == "PA" ~ 0,
    site == "PI" ~ 0,
    site == "SD" ~ 0
  )) %>% 
  mutate(st_2 = case_when(
    site == "PO" ~ 0,
    site == "BI" ~ 0,
    site == "MN" ~ 1,
    site == "PA" ~ 0,
    site == "PI" ~ 0,
    site == "SD" ~ 0
  )) %>%   
  mutate(st_3 = case_when(
    site == "PO" ~ 0,
    site == "BI" ~ 0,
    site == "MN" ~ 0,
    site == "PA" ~ 1,
    site == "PI" ~ 0,
    site == "SD" ~ 0
  )) %>% 
  mutate(st_4 = case_when(
    site == "PO" ~ 0,
    site == "BI" ~ 0,
    site == "MN" ~ 0,
    site == "PA" ~ 0,
    site == "PI" ~ 1,
    site == "SD" ~ 0
  ))%>% 
  mutate(st_5 = case_when(
    site == "PO" ~ 0,
    site == "BI" ~ 0,
    site == "MN" ~ 0,
    site == "PA" ~ 0,
    site == "PI" ~ 0,
    site == "SD" ~ 1
  )) %>% 
  mutate(falls = case_when( # outcome coding
    mhfalln2 > 1 ~ 1,
    mhfalln2 <= 1 ~ 0
  ))

Creating the qlhealth dummy variables; ‘Excellent’ subjective health rating is used as the referent category:

MrOs$qlh2 <- ifelse(MrOs$qlhealth == 2,1,0)
MrOs$qlh3 <- ifelse(MrOs$qlhealth == 3,1,0)
MrOs$qlh4 <- ifelse(MrOs$qlhealth == 4,1,0)
MrOs$qlh5 <- ifelse(MrOs$qlhealth == 5,1,0)

Check Groups sizes for each unique value of each categorical variable

MrOs_group_check = MrOs %>%  # why did this break?
    dplyr::select(site, mhdiab, mhstrk, mhpark, mhcopd, mharth, mhcancer, 
    st_1, st_2, st_3, st_4, st_5,
    qlh2, qlh3, qlh4, qlh5)

counts_list <- map(.x = as.list(MrOs_group_check), .f = function(x) {
  vector = unique(x)
  q = c()
  for (i in vector) {
    v = x[x == i]
    obs = length(v)
    q = c(q, obs)
  }
  return(q)
})
counts_list
## $site
## [1]  969 1005  995 1005 1007 1013
## 
## $mhdiab
## [1] 5341  653
## 
## $mhstrk
## [1] 5650  344
## 
## $mhpark
## [1] 5942   52
## 
## $mhcopd
## [1] 5354  640
## 
## $mharth
## [1] 3147 2847
## 
## $mhcancer
## [1] 4249 1745
## 
## $st_1
## [1]  969 5025
## 
## $st_2
## [1] 4989 1005
## 
## $st_3
## [1] 4999  995
## 
## $st_4
## [1] 4989 1005
## 
## $st_5
## [1] 4981 1013
## 
## $qlh2
## [1] 2876 3120 5994
## 
## $qlh3
## [1]  762 5234 5994
## 
## $qlh4
## [1] 5912   84 5994
## 
## $qlh5
## [1] 5979   17 5994

Note that when the length of the unique level is 5994 (the number of total rows in the dataset) it’s due to the way R handles NA in boolean operations. The presence of this number in the above output is a signifier of missingness.

This also serves as a sanity check for the above dummy variable generation. We see that the correct number of observations exist in each dummy variable by comparing the lower count in the dummy variables to the counts in site.

There are only 52 subjects with a history of Parkinson’s. That may be an issue later.

Variable Seleciton

Step 1: Univariate Analysis

We employ a type 1 error rate of \(\alpha = 0.20\) for univariate Wald Tests and construct 95% confidence intervals for each candidate variable’s slope coefficient and one-unit odds ratio.

The null hypothesis for the Wald test, which is repeatedly used throughout this report, is that the beta coefficient in question \(\hat \beta_i\) is equal to zero. The test statistic for the Wald test is \(W = \frac{\hat \beta_1}{\widehat{SE} (\hat \beta_1)} \sim N(0,1)\). In output tables from R, the Wald statistic \(W\) is represented by z.value. The criteria to reject is \(P(|z| > W) < \alpha\), where alpha is the type one error rate specified for the particular test. Here, as has been already stated, we’ll use a p-value of 0.20 as our cutoff for statistical significance, but other values for \(\alpha\) will be specified throughout.

models = list(
  glm(falls ~ st_1 + st_2 + 
        st_3 + st_4 + st_5, data = MrOs, family = binomial()),
  glm(falls ~ qlh2 + qlh3 + qlh4 + qlh5
      , data = MrOs, family = binomial()),
  glm(falls ~ qlhealth, data = MrOs, family = binomial()),
  glm(falls ~ giage1, data = MrOs, family = binomial()),
  glm(falls ~ mhdiab, data = MrOs, family = binomial()),
  glm(falls ~ mhstrk, data = MrOs, family = binomial()),
  glm(falls ~ mhcopd, data = MrOs, family = binomial()),
  glm(falls ~ mhpark, data = MrOs, family = binomial()),
  glm(falls ~ mharth, data = MrOs, family = binomial()),
  glm(falls ~ mhcancer, data = MrOs, family = binomial()),
  glm(falls ~ pascore, data = MrOs, family = binomial()),
  glm(falls ~ qlhealth, data = MrOs, family = binomial()),
  glm(falls ~ hwbmi, data = MrOs, family = binomial()),
  glm(falls ~ b1tbfkg, data = MrOs, family = binomial()),
  glm(falls ~ b1tblkg, data = MrOs, family = binomial()),
  glm(falls ~ gsgrpavg, data = MrOs, family = binomial()),
  glm(falls ~ nfwlkspd, data = MrOs, family = binomial()),
  glm(falls ~ b1fnd, data = MrOs, family = binomial()),
  glm(falls ~ b1thd, data = MrOs, family = binomial()))

# kable looks better in pdf
# map(.x = 
#       map(.x = 
#             models,
#           .f =
#             function(x)
#               {data.frame(summary(x)$coefficients)}),
#     .f =
#       function(x) {kable(x)})

map(.x = 
            models,
          .f =
            function(x)
              {data.frame(summary(x)$coefficients)})
## [[1]]
##                 Estimate Std..Error      z.value     Pr...z..
## (Intercept) -1.844440689  0.1007913 -18.29960674 8.334832e-75
## st_1         0.281261900  0.1358047   2.07107635 3.835166e-02
## st_2         0.245584886  0.1344158   1.82705329 6.769176e-02
## st_3         0.169545560  0.1374381   1.23361392 2.173468e-01
## st_4        -0.009249738  0.1407042  -0.06573887 9.475857e-01
## st_5        -0.039027207  0.1411456  -0.27650310 7.821617e-01
## 
## [[2]]
##               Estimate Std..Error    z.value      Pr...z..
## (Intercept) -2.0329777 0.07303767 -27.834648 1.652517e-170
## qlh2         0.3104687 0.09037541   3.435322  5.918496e-04
## qlh3         0.8237946 0.12257383   6.720803  1.807257e-11
## qlh4         1.4220686 0.29417256   4.834131  1.337287e-06
## qlh5         1.8506561 0.60991898   3.034265  2.411221e-03
## 
## [[3]]
##               Estimate Std..Error    z.value      Pr...z..
## (Intercept) -2.5153390 0.11075597 -22.710642 3.516704e-114
## qlhealth     0.4219546 0.05376494   7.848137  4.222642e-15
## 
## [[4]]
##                Estimate  Std..Error   z.value     Pr...z..
## (Intercept) -4.95719888 0.502912673 -9.856977 6.394537e-23
## giage1       0.04381403 0.006765748  6.475859 9.427433e-11
## 
## [[5]]
##               Estimate Std..Error    z.value     Pr...z..
## (Intercept) -1.7742562 0.04141219 -42.843813 0.0000000000
## mhdiab       0.3855609 0.11716838   3.290657 0.0009995381
## 
## [[6]]
##              Estimate Std..Error    z.value    Pr...z..
## (Intercept) -1.763352 0.04016956 -43.897728 0.000000000
## mhstrk       0.539577 0.15225699   3.543857 0.000394319
## 
## [[7]]
##               Estimate Std..Error    z.value     Pr...z..
## (Intercept) -1.8029745  0.0418520 -43.079767 0.000000e+00
## mhcopd       0.5998285  0.1117919   5.365582 8.068873e-08
## 
## [[8]]
##              Estimate Std..Error    z.value     Pr...z..
## (Intercept) -1.747731  0.0390499 -44.756347 0.000000e+00
## mhpark       1.687106  0.3504974   4.813462 1.483375e-06
## 
## [[9]]
##              Estimate Std..Error    z.value      Pr...z..
## (Intercept) -2.038375 0.05942976 -34.298897 8.150162e-258
## mharth       0.591204 0.07862960   7.518847  5.526141e-14
## 
## [[10]]
##               Estimate Std..Error    z.value   Pr...z..
## (Intercept) -1.8042362 0.04683182 -38.525859 0.00000000
## mhcancer     0.2454726 0.08330589   2.946642 0.00321245
## 
## [[11]]
##                 Estimate   Std..Error    z.value     Pr...z..
## (Intercept) -1.515613225 0.0932491883 -16.253366 2.114290e-59
## pascore     -0.001456787 0.0005846094  -2.491899 1.270623e-02
## 
## [[12]]
##               Estimate Std..Error    z.value      Pr...z..
## (Intercept) -2.5153390 0.11075597 -22.710642 3.516704e-114
## qlhealth     0.4219546 0.05376494   7.848137  4.222642e-15
## 
## [[13]]
##                Estimate  Std..Error   z.value     Pr...z..
## (Intercept) -2.69687390 0.275876498 -9.775657 1.432256e-22
## hwbmi        0.03498152 0.009827731  3.559470 3.716035e-04
## 
## [[14]]
##                Estimate  Std..Error    z.value     Pr...z..
## (Intercept) -2.29343068 0.126722662 -18.098031 3.302639e-73
## b1tbfkg      0.02490161 0.005313164   4.686777 2.775415e-06
## 
## [[15]]
##                 Estimate  Std..Error    z.value     Pr...z..
## (Intercept) -1.670798121 0.313072432 -5.3367782 9.461268e-08
## b1tblkg     -0.001198906 0.005439674 -0.2204003 8.255594e-01
## 
## [[16]]
##                Estimate  Std..Error    z.value     Pr...z..
## (Intercept)  0.08840508 0.193110532  0.4577952 6.470996e-01
## gsgrpavg    -0.04789594 0.005091431 -9.4071670 5.096892e-21
## 
## [[17]]
##                Estimate Std..Error    z.value     Pr...z..
## (Intercept)  0.04826681  0.2131746  0.2264192 8.208754e-01
## nfwlkspd    -1.49582496  0.1792923 -8.3429400 7.246597e-17
## 
## [[18]]
##               Estimate Std..Error   z.value     Pr...z..
## (Intercept) -2.1797433  0.2416397 -9.020635 1.870026e-19
## b1fnd        0.5681193  0.3005950  1.889982 5.876034e-02
## 
## [[19]]
##               Estimate Std..Error   z.value     Pr...z..
## (Intercept) -2.0311866  0.2720271 -7.466855 8.213445e-14
## b1thd        0.3117375  0.2788094  1.118103 2.635232e-01

Slope coefficients pass our univariate Wald test (\(p < 0.2 = \alpha\)) for study site, subjective health assessment, age at enrollment, history of diabetes, history of stroke, history of COPD, history of Parkinson’s, history of arthritis, history of cancer, PASE score, body mass index (BMI), total body fat mass, average grip strength, walk speed, corrected femoral neck bone minderal density (BMD). The only variables excluded here are total body lean mass and corrected total hip BMD.

Computing Wald p-value for each selected variable

models = list(
  glm(falls ~ giage1, data = MrOs, family = binomial()),
  glm(falls ~ mhdiab, data = MrOs, family = binomial()),
  glm(falls ~ mhstrk, data = MrOs, family = binomial()),
  glm(falls ~ mhcopd, data = MrOs, family = binomial()),
  glm(falls ~ mhpark, data = MrOs, family = binomial()),
  glm(falls ~ mharth, data = MrOs, family = binomial()),
  glm(falls ~ mhcancer, data = MrOs, family = binomial()),
  glm(falls ~ pascore, data = MrOs, family = binomial()),
  glm(falls ~ hwbmi, data = MrOs, family = binomial()),
  glm(falls ~ b1tbfkg, data = MrOs, family = binomial()),
  glm(falls ~ gsgrpavg, data = MrOs, family = binomial()),
  glm(falls ~ nfwlkspd, data = MrOs, family = binomial()),
  glm(falls ~ b1fnd, data = MrOs, family = binomial()))

map(.x = models, .f = function(x) {
  coeffs = summary(x)$coefficients # table coefficients...
  p  = coeffs[2,4] # beta_1 p-val...
  b  = coeffs[,1][2] # beta_1...
  se = coeffs[2,2] # beta_1 se...
  beta_CI = regression_CI(pe = b, se = se, alpha = 0.05)
  OR_CI   = exp(beta_CI)
  table   = rbind(beta_CI, OR_CI)
  table$p = c(p, NA)
  rownames(table) = c(rownames(table)[1], "OR")
  return((table)) # put kable() back
})
## [[1]]
##        point_estimate lower_bound upper_bound            p
## giage1     0.04381403  0.03055341  0.05707465 9.427433e-11
## OR         1.04478804  1.03102495  1.05873484           NA
## 
## [[2]]
##        point_estimate lower_bound upper_bound            p
## mhdiab      0.3855609   0.1559151   0.6152067 0.0009995381
## OR          1.4704389   1.1687270   1.8500390           NA
## 
## [[3]]
##        point_estimate lower_bound upper_bound           p
## mhstrk       0.539577   0.2411588   0.8379952 0.000394319
## OR           1.715281   1.2727231   2.3117278          NA
## 
## [[4]]
##        point_estimate lower_bound upper_bound            p
## mhcopd      0.5998285   0.3807204   0.8189366 8.068873e-08
## OR          1.8218064   1.4633385   2.2680867           NA
## 
## [[5]]
##        point_estimate lower_bound upper_bound            p
## mhpark       1.687106    1.000144    2.374068 1.483375e-06
## OR           5.403820    2.718673   10.741001           NA
## 
## [[6]]
##        point_estimate lower_bound upper_bound            p
## mharth       0.591204   0.4370928   0.7453152 5.526141e-14
## OR           1.806162   1.5481997   2.1071054           NA
## 
## [[7]]
##          point_estimate lower_bound upper_bound          p
## mhcancer      0.2454726  0.08219608   0.4087492 0.00321245
## OR            1.2782253  1.08566866   1.5049342         NA
## 
## [[8]]
##         point_estimate lower_bound   upper_bound          p
## pascore   -0.001456787  -0.0026026 -0.0003109739 0.01270623
## OR         0.998544273   0.9974008  0.9996890744         NA
## 
## [[9]]
##       point_estimate lower_bound upper_bound            p
## hwbmi     0.03498152  0.01571952  0.05424352 0.0003716035
## OR        1.03560057  1.01584372  1.05574166           NA
## 
## [[10]]
##         point_estimate lower_bound upper_bound            p
## b1tbfkg     0.02490161    0.014488  0.03531522 2.775415e-06
## OR          1.02521425    1.014593  1.03594621           NA
## 
## [[11]]
##          point_estimate lower_bound upper_bound            p
## gsgrpavg    -0.04789594 -0.05787496 -0.03791692 5.096892e-21
## OR           0.95323298  0.94376795  0.96279293           NA
## 
## [[12]]
##          point_estimate lower_bound upper_bound            p
## nfwlkspd     -1.4958250  -1.8472314   -1.144418 7.246597e-17
## OR            0.2240637   0.1576731    0.318409           NA
## 
## [[13]]
##       point_estimate lower_bound upper_bound          p
## b1fnd      0.5681193 -0.02103617    1.157275 0.05876034
## OR         1.7649445  0.97918355    3.181252         NA

For each of the above tables, the beta coefficient CI is labeled with the name of the variable being assessed in the table. The odds ratio is on the second row. The p-value for the Wald test is in the first entry of the last column Note: NA simply signifies the cell as empty.

Since site & health quality are multilevel and categorical, they are computed separately.

site_model <- glm(falls ~ st_1 + st_2 + 
        st_3 + st_4 + st_5, data = MrOs, family = binomial())

# slope coefficients 
betas <- site_model$coef[2:6]
# standard errors 
ses <- as.data.frame(summary(site_model)$coef)$`Std. Error`[2:6]
# confidence interval beta & CI 
beta_upper_bounds <- betas + qnorm(1-0.05/2)*ses
beta_lower_bounds <- betas - qnorm(1-0.05/2)*ses
OR_upper_bounds <- exp(betas + qnorm(1-0.05/2)*ses)
OR_lower_bounds <- exp(betas - qnorm(1-0.05/2)*ses)
OR <- exp(betas)
# formatting 
tbl <- rbind(betas, 
      beta_upper_bounds,
      beta_lower_bounds, 
      OR,
      OR_upper_bounds, 
      OR_lower_bounds)
# output 
as.data.frame(t(tbl))
##             betas beta_upper_bounds beta_lower_bounds        OR OR_upper_bounds
## st_1  0.281261900         0.5474342        0.01508958 1.3248005        1.728812
## st_2  0.245584886         0.5090351       -0.01786530 1.2783688        1.663685
## st_3  0.169545560         0.4389193       -0.09982818 1.1847663        1.551030
## st_4 -0.009249738         0.2665255       -0.28502499 0.9907929        1.305421
## st_5 -0.039027207         0.2376132       -0.31566758 0.9617245        1.268219
##      OR_lower_bounds
## st_1       1.0152040
## st_2       0.9822933
## st_3       0.9049929
## st_4       0.7519955
## st_5       0.7293018

Subjective health status:

qlhealth_model <- glm(falls ~ qlh2 + qlh3 + qlh4 + qlh5,
                  data = MrOs,
                  family = binomial())

# slope coefficients 
betas <- qlhealth_model$coef[2:6]
# standard errors 
ses <- as.data.frame(summary(qlhealth_model)$coef)$`Std. Error`[2:6]
# confidence interval beta & CI 
beta_upper_bounds <- betas + qnorm(1-0.05/2)*ses
beta_lower_bounds <- betas - qnorm(1-0.05/2)*ses
OR_upper_bounds <- exp(betas + qnorm(1-0.05/2)*ses)
OR_lower_bounds <- exp(betas - qnorm(1-0.05/2)*ses)
OR <- exp(betas)
# formatting 
tbl <- rbind(betas, 
      beta_upper_bounds,
      beta_lower_bounds, 
      OR,
      OR_upper_bounds, 
      OR_lower_bounds)
# output 
as.data.frame(t(tbl))
##          betas beta_upper_bounds beta_lower_bounds       OR OR_upper_bounds
## qlh2 0.3104687         0.4876012         0.1333361 1.364064        1.628405
## qlh3 0.8237946         1.0640349         0.5835543 2.279132        2.898041
## qlh4 1.4220686         1.9986362         0.8455010 4.145687        7.378986
## qlh5 1.8506561         3.0460754         0.6552369 6.363994       21.032637
## NA.         NA                NA                NA       NA              NA
##      OR_lower_bounds
## qlh2        1.142634
## qlh3        1.792398
## qlh4        2.329144
## qlh5        1.925599
## NA.               NA

Drawing up the odds ratios and confidence intervals hasn’t brought us to any separate conclusion than the univariate Wald tests did as to which variables are retained at the end of this step.

Note: if we use an alternative alpha of 0.05 for our univariate Wald tests, we would exclude b1fnd. Nothing else would change.

initial gtsummary output with ORs & p-values

We can use the ‘tbl_uvregression’ function from the ‘gtsummary’ package to create a publishable table with the above information:

MrOs %>% 
  dplyr::select(
    giage1, mhdiab, mhstrk, mhpark, mhcopd, mharth, mhcancer, pascore, hwbmi, gsgrpavg, nfwlkspd, b1fnd, b1thd, b1tbfkg, b1tblkg,
    falls, 
    st_1,st_2,st_3,st_4,st_5,
    qlh2, qlh3, qlh4, qlh5) %>% 
  tbl_uvregression(
    method = glm,
    y = falls,
    hide_n = TRUE,
    exponentiate = TRUE,
    method.args = list(family = 'binomial'),
    label = list(
                  giage1 ~ "Age"
                 ,mhdiab ~ "Diabetes"
                 ,mhstrk ~ "Stroke"
                 ,mhpark ~ "Parkinsons"
                 ,mhcopd ~ "COPD"
                 ,mharth ~ "Arthritis or Gout"
                 ,mhcancer ~ "Cancer"
                 ,pascore ~ "PASE Score"
                 ,hwbmi ~ "Body Mass Index"
                 ,b1tbfkg ~ "Total Body Fat"
                 ,b1tblkg ~ "Lean Body Mass"
                 ,gsgrpavg ~ "Average Grip Strength"
                 ,nfwlkspd ~ "Walking Speed"
                 ,b1fnd ~ "Corrected Femoral Neck BMD"
                 ,b1thd ~ "Corrected Total Hip BMD"
                 ,st_1 ~ "Birmingham Site (vs PO)"
                 ,st_2 ~ "Minneapolis Site (vs PO)"
                 ,st_3 ~ "Palo Alto Site (vs PO)"
                 ,st_4 ~ "Pittsburg Site (vs PO)"
                 ,st_5 ~ "San Diego Site (vs PO)"
                 ,qlh2 ~ "Good Preceived Health (vs Excellent)"
                 ,qlh3 ~ "Fair Preceived Health (vs Excellent)"
                 ,qlh4 ~ "Poor Preceived Health (vs Excellent)"
                 ,qlh5 ~ "Very Poor Preceived Health (vs Excellent)"
    ),
  )%>%
  modify_caption("**Initial Univariate Analysis**") %>%
  bold_labels()
Initial Univariate Analysis
Characteristic OR1 95% CI1 p-value
Age 1.04 1.03, 1.06 <0.001
Diabetes 1.47 1.16, 1.84 <0.001
Stroke 1.72 1.26, 2.30 <0.001
Parkinsons 5.40 2.69, 10.8 <0.001
COPD 1.82 1.46, 2.26 <0.001
Arthritis or Gout 1.81 1.55, 2.11 <0.001
Cancer 1.28 1.08, 1.50 0.003
PASE Score 1.00 1.00, 1.00 0.013
Body Mass Index 1.04 1.02, 1.06 <0.001
Average Grip Strength 0.95 0.94, 0.96 <0.001
Walking Speed 0.22 0.16, 0.32 <0.001
Corrected Femoral Neck BMD 1.76 0.98, 3.17 0.059
Corrected Total Hip BMD 1.37 0.79, 2.36 0.3
Total Body Fat 1.03 1.01, 1.04 <0.001
Lean Body Mass 1.00 0.99, 1.01 0.8
Birmingham Site (vs PO) 1.22 1.00, 1.49 0.044
Minneapolis Site (vs PO) 1.18 0.97, 1.42 0.10
Palo Alto Site (vs PO) 1.07 0.87, 1.30 0.5
Pittsburg Site (vs PO) 0.86 0.70, 1.06 0.2
San Diego Site (vs PO) 0.83 0.67, 1.03 0.090
Good Preceived Health (vs Excellent) 1.02 0.87, 1.19 0.8
Fair Preceived Health (vs Excellent) 1.83 1.48, 2.25 <0.001
Poor Preceived Health (vs Excellent) 3.12 1.74, 5.41 <0.001
Very Poor Preceived Health (vs Excellent) 4.73 1.36, 15.7 0.010

1 OR = Odds Ratio, CI = Confidence Interval

Collapsing ‘qlhealth’

Considering that ‘good’ subjective overall health is essentially the same as ‘excellent’ overall health (p=0.8), we can collapse the subjective overall health status variables into a single binary variable with value 0 for ‘excellent’-‘good health’ (‘qlhealth’ = 1-2), and value 1 for ‘fair’ to ‘very poor’ health (‘qlhealth’ = 3-5):

MrOs$qlh <- ifelse(MrOs$qlhealth == 1 | MrOs$qlhealth == 2, 0, 1)

gtsummary output for paper with ORs & p-values

We can use the ‘tbl_uvregression’ function from the ‘gtsummary’ package to create a table for the current set of variables (i.e. collapsed subjective health status, but not the ordinal subjective health status variable):

MrOs %>% 
  dplyr::select(giage1, mhdiab, mhstrk, mhpark, mhcopd, mharth, mhcancer, pascore, hwbmi, gsgrpavg, nfwlkspd, b1fnd, b1thd, b1tbfkg, b1tblkg, falls, st_1,st_2,st_3,st_4,st_5, qlh) %>% 
  tbl_uvregression(
    method = glm,
    y = falls,
    exponentiate = TRUE,
    hide_n = TRUE,
    label = list(giage1 ~ "Age"
                 ,mhdiab ~ "Diabetes"
                 ,mhstrk ~ "Stroke"
                 ,mhpark ~ "Parkinsons"
                 ,mhcopd ~ "COPD"
                 ,mharth ~ "Arthritis or Gout"
                 ,mhcancer ~ "Cancer"
                 ,pascore ~ "PASE Score"
                 ,qlh ~ "Subjective Health Rating"
                 ,hwbmi ~ "Body Mass Index"
                 ,b1tbfkg ~ "Total Body Fat"
                 ,b1tblkg ~ "Lean Body Mass"
                 ,gsgrpavg ~ "Average Grip Strength"
                 ,nfwlkspd ~ "Walking Speed"
                 ,b1fnd ~ "Corrected Femoral Neck BMD"
                 ,b1thd ~ "Corrected Total Hip BMD"
                 ,st_1 ~ "Birmingham Site (vs PO)"
                 ,st_2 ~ "Minneapolis Site (vs PO)"
                 ,st_3 ~ "Palo Alto Site (vs PO)"
                 ,st_4 ~ "Pittsburg Site (vs PO)"
                 ,st_5 ~ "San Diego Site (vs PO)"
    ),
    method.args = list(family = 'binomial'),
  )%>%
  modify_caption("**Univariate Analysis**") %>%
  bold_labels()
Univariate Analysis
Characteristic OR1 95% CI1 p-value
Age 1.04 1.03, 1.06 <0.001
Diabetes 1.47 1.16, 1.84 <0.001
Stroke 1.72 1.26, 2.30 <0.001
Parkinsons 5.40 2.69, 10.8 <0.001
COPD 1.82 1.46, 2.26 <0.001
Arthritis or Gout 1.81 1.55, 2.11 <0.001
Cancer 1.28 1.08, 1.50 0.003
PASE Score 1.00 1.00, 1.00 0.013
Body Mass Index 1.04 1.02, 1.06 <0.001
Average Grip Strength 0.95 0.94, 0.96 <0.001
Walking Speed 0.22 0.16, 0.32 <0.001
Corrected Femoral Neck BMD 1.76 0.98, 3.17 0.059
Corrected Total Hip BMD 1.37 0.79, 2.36 0.3
Total Body Fat 1.03 1.01, 1.04 <0.001
Lean Body Mass 1.00 0.99, 1.01 0.8
Birmingham Site (vs PO) 1.22 1.00, 1.49 0.044
Minneapolis Site (vs PO) 1.18 0.97, 1.42 0.10
Palo Alto Site (vs PO) 1.07 0.87, 1.30 0.5
Pittsburg Site (vs PO) 0.86 0.70, 1.06 0.2
San Diego Site (vs PO) 0.83 0.67, 1.03 0.090
Subjective Health Rating 2.03 1.66, 2.47 <0.001

1 OR = Odds Ratio, CI = Confidence Interval

Step 2: First Multivariable Model

Creating a dataset with NA observations removed

Throughout our handling of multivariate models, we’ll use the following structure to subset the data to only the variables we’re considering before censoring observations with incomplete fields.

step2_narm <- MrOs %>% 
  dplyr::select(-b1tblkg,-b1thd,-mhfallv2) %>%
  drop_na()

Creating the full model

We’ll add all the variables identified as important in step one to form our full model. Then we’ll form a reduced model with every variable associated with a Wald test p-value less than 0.05 and compare the full and reduced model with the likelihood ratio test. The null hypothesis for the likelihood ratio test is \(H_0: \beta_{inf} = 0\), \(H_1: \beta_{inf} \neq 0\), where the criteria to reject is a p-value less than 0.05.

full_model <- 
  glm(falls ~  giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5,
      family = "binomial",
      data = step2_narm)

kable(summary(full_model)$coef)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3167998 0.9650490 -2.4007069 0.0163634
giage1 0.0220516 0.0083583 2.6382981 0.0083323
mhdiab 0.2008088 0.1269315 1.5820247 0.1136439
mhstrk 0.3058103 0.1631642 1.8742490 0.0608961
mhpark 1.6602638 0.3716284 4.4675379 0.0000079
mhcopd 0.4480101 0.1206134 3.7144295 0.0002037
mharth 0.4003747 0.0845162 4.7372532 0.0000022
mhcancer 0.1197176 0.0888581 1.3472894 0.1778871
pascore 0.0012108 0.0006323 1.9147608 0.0555230
qlh 0.2846465 0.1165178 2.4429434 0.0145680
hwbmi -0.0457038 0.0217355 -2.1027261 0.0354897
gsgrpavg -0.0333263 0.0058385 -5.7080633 0.0000000
nfwlkspd -0.6721857 0.2131998 -3.1528444 0.0016169
b1fnd 1.0038739 0.3368353 2.9803107 0.0028796
b1tbfkg 0.0397967 0.0111726 3.5619778 0.0003681
st_1 0.0441299 0.1462171 0.3018111 0.7627961
st_2 0.1287550 0.1430024 0.9003696 0.3679236
st_3 0.1563824 0.1450550 1.0780900 0.2809936
st_4 -0.1935640 0.1510001 -1.2818803 0.1998846
st_5 0.0007772 0.1478650 0.0052560 0.9958063
summary(full_model)$coef[,4] < 0.05 
## (Intercept)      giage1      mhdiab      mhstrk      mhpark      mhcopd 
##        TRUE        TRUE       FALSE       FALSE        TRUE        TRUE 
##      mharth    mhcancer     pascore         qlh       hwbmi    gsgrpavg 
##        TRUE       FALSE       FALSE        TRUE        TRUE        TRUE 
##    nfwlkspd       b1fnd     b1tbfkg        st_1        st_2        st_3 
##        TRUE        TRUE        TRUE       FALSE       FALSE       FALSE 
##        st_4        st_5 
##       FALSE       FALSE

Full model table for presentation

tbl_regression(full_model, 
    label = list(
                  giage1 ~ "Age"
                 ,mhdiab ~ "Diabetes"
                 ,mhstrk ~ "Stroke"
                 ,mhpark ~ "Parkinsons"
                 ,mhcopd ~ "COPD"
                 ,mharth ~ "Arthritis or Gout"
                 ,mhcancer ~ "Cancer"
                 ,pascore ~ "PASE Score"
                 ,hwbmi ~ "Body Mass Index"
                 ,b1tbfkg ~ "Total Body Fat"
#                 ,b1tblkg ~ "Lean Body Mass"
                 ,gsgrpavg ~ "Average Grip Strength"
                 ,nfwlkspd ~ "Walking Speed"
                 ,b1fnd ~ "Corrected Femoral Neck BMD"
#                 ,b1thd ~ "Corrected Total Hip BMD"
                 ,st_1 ~ "Birmingham Site (vs PO)"
                 ,st_2 ~ "Minneapolis Site (vs PO)"
                 ,st_3 ~ "Palo Alto Site (vs PO)"
                 ,st_4 ~ "Pittsburg Site (vs PO)"
                 ,st_5 ~ "San Diego Site (vs PO)"
                 ,qlh ~ "Subjective Health Rating"
    ),
               exponentiate = FALSE)
Characteristic log(OR)1 95% CI1 p-value
Age 0.02 0.01, 0.04 0.008
Diabetes 0.20 -0.05, 0.45 0.11
Stroke 0.31 -0.02, 0.62 0.061
Parkinsons 1.7 0.93, 2.4 <0.001
COPD 0.45 0.21, 0.68 <0.001
Arthritis or Gout 0.40 0.23, 0.57 <0.001
Cancer 0.12 -0.06, 0.29 0.2
PASE Score 0.00 0.00, 0.00 0.056
Subjective Health Rating 0.28 0.05, 0.51 0.015
Body Mass Index -0.05 -0.09, 0.00 0.035
Average Grip Strength -0.03 -0.04, -0.02 <0.001
Walking Speed -0.67 -1.1, -0.25 0.002
Corrected Femoral Neck BMD 1.0 0.34, 1.7 0.003
Total Body Fat 0.04 0.02, 0.06 <0.001
Birmingham Site (vs PO) 0.04 -0.24, 0.33 0.8
Minneapolis Site (vs PO) 0.13 -0.15, 0.41 0.4
Palo Alto Site (vs PO) 0.16 -0.13, 0.44 0.3
Pittsburg Site (vs PO) -0.19 -0.49, 0.10 0.2
San Diego Site (vs PO) 0.00 -0.29, 0.29 >0.9

1 OR = Odds Ratio, CI = Confidence Interval

Creating the reduced model

Based on the above output we will exclude site, hx. diabetes, hx stroke, hx cancer, PACE score from the reduced model, and perform the liklihood ratio test between the full and reduced models. We will utilize the ‘lrtest’ function from the ‘lmtest’ package to perform this computation:

reduced_model <-  
  glm(falls ~  giage1 + mhpark +
        mhcopd + mharth + qlh + 
        hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg,
      family = "binomial",
      data = step2_narm)

kable(summary(reduced_model)$coef)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.0489779 0.9367460 -2.187336 0.0287180
giage1 0.0219667 0.0080631 2.724354 0.0064427
mhpark 1.6046143 0.3702184 4.334237 0.0000146
mhcopd 0.4405674 0.1201254 3.667562 0.0002449
mharth 0.3899696 0.0839951 4.642765 0.0000034
qlh 0.3082162 0.1137323 2.710014 0.0067280
hwbmi -0.0485790 0.0213106 -2.279572 0.0226331
gsgrpavg -0.0339663 0.0057179 -5.940390 0.0000000
nfwlkspd -0.6160713 0.2083269 -2.957233 0.0031041
b1fnd 1.0747685 0.3329516 3.228002 0.0012466
b1tbfkg 0.0400358 0.0110026 3.638768 0.0002739
lrtest(full_model, reduced_model) # can't throw them all away 
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + 
##     mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + 
##     b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5
## Model 2: falls ~ giage1 + mhpark + mhcopd + mharth + qlh + hwbmi + gsgrpavg + 
##     nfwlkspd + b1fnd + b1tbfkg
##   #Df LogLik Df  Chisq Pr(>Chisq)  
## 1  20  -2015                       
## 2  11  -2024 -9 18.111    0.03391 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The reduced model does not provide a better fit than the full model. Therefore, we’ll use the same full model, but reduce the model by only one variable before taking the likelihood ratio.

Removing one variable at a time:

site

# exclude site, 
reduced_model <-  
  glm(falls ~  giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg,  
      family = "binomial",
      data = step2_narm)

kable(summary(reduced_model)$coef)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2998013 0.9471969 -2.428008 0.0151820
giage1 0.0226951 0.0082123 2.763562 0.0057174
mhdiab 0.1947331 0.1266762 1.537251 0.1242319
mhstrk 0.2966300 0.1627883 1.822182 0.0684273
mhpark 1.6679141 0.3700030 4.507839 0.0000065
mhcopd 0.4392186 0.1203981 3.648053 0.0002642
mharth 0.3928057 0.0841817 4.666167 0.0000031
mhcancer 0.1306247 0.0883052 1.479241 0.1390760
pascore 0.0011804 0.0006314 1.869660 0.0615311
qlh 0.2872734 0.1162074 2.472075 0.0134331
hwbmi -0.0520680 0.0215418 -2.417069 0.0156461
gsgrpavg -0.0335993 0.0057734 -5.819668 0.0000000
nfwlkspd -0.6195062 0.2104158 -2.944200 0.0032379
b1fnd 1.0487044 0.3351470 3.129088 0.0017535
b1tbfkg 0.0423281 0.0111156 3.808004 0.0001401
lrtest(full_model, reduced_model) # exclude site 
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + 
##     mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + 
##     b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5
## Model 2: falls ~ giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + 
##     mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + 
##     b1fnd + b1tbfkg
##   #Df  LogLik Df Chisq Pr(>Chisq)
## 1  20 -2015.0                    
## 2  15 -2018.8 -5 7.626     0.1781

Exclude site.

DM2

# exclude hx. diabetes,
reduced_model <-  
  glm(falls ~  giage1 + mhstrk + mhpark + mhcopd + mharth + mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5,  
      family = "binomial",
      data = step2_narm)

kable(summary(reduced_model)$coef)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3007835 0.9645015 -2.3854639 0.0170576
giage1 0.0214192 0.0083451 2.5666626 0.0102682
mhstrk 0.3099792 0.1631600 1.8998482 0.0574530
mhpark 1.6403437 0.3717948 4.4119600 0.0000102
mhcopd 0.4464551 0.1206180 3.7013976 0.0002144
mharth 0.3950127 0.0844378 4.6781494 0.0000029
mhcancer 0.1179072 0.0888248 1.3274127 0.1843722
pascore 0.0011765 0.0006316 1.8628692 0.0624807
qlh 0.3082914 0.1154913 2.6693900 0.0075989
hwbmi -0.0422946 0.0216008 -1.9580133 0.0502285
gsgrpavg -0.0342003 0.0058113 -5.8851472 0.0000000
nfwlkspd -0.6828114 0.2130471 -3.2049795 0.0013507
b1fnd 1.0457348 0.3356378 3.1156650 0.0018353
b1tbfkg 0.0388261 0.0111471 3.4830707 0.0004957
st_1 0.0415644 0.1462047 0.2842889 0.7761890
st_2 0.1253725 0.1429888 0.8768000 0.3805953
st_3 0.1577518 0.1450138 1.0878398 0.2766658
st_4 -0.1908312 0.1509465 -1.2642307 0.2061472
st_5 -0.0034095 0.1478064 -0.0230675 0.9815964
lrtest(full_model, reduced_model) # exclude hx. diabetes 
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + 
##     mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + 
##     b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5
## Model 2: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + mhcancer + 
##     pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + 
##     st_1 + st_2 + st_3 + st_4 + st_5
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1  20 -2015.0                     
## 2  19 -2016.2 -1 2.4395     0.1183

Exclude hx dm2.

Stroke

# exclude hx stroke
reduced_model <-  
  glm(falls ~  giage1 + mhdiab + mhpark + mhcopd + mharth + mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5,  
      family = "binomial",
      data = step2_narm)

kable(summary(reduced_model)$coef)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2481485 0.9644628 -2.3309851 0.0197541
giage1 0.0225768 0.0083484 2.7043301 0.0068442
mhdiab 0.2045196 0.1269161 1.6114545 0.1070807
mhpark 1.6451328 0.3716774 4.4262383 0.0000096
mhcopd 0.4495113 0.1205175 3.7298415 0.0001916
mharth 0.4004430 0.0845064 4.7386099 0.0000022
mhcancer 0.1169454 0.0888097 1.3168090 0.1879027
pascore 0.0011571 0.0006317 1.8315774 0.0670144
qlh 0.3026581 0.1160743 2.6074503 0.0091219
hwbmi -0.0467415 0.0217229 -2.1517161 0.0314197
gsgrpavg -0.0334915 0.0058359 -5.7388338 0.0000000
nfwlkspd -0.6968197 0.2126971 -3.2761133 0.0010525
b1fnd 0.9818889 0.3365007 2.9179404 0.0035235
b1tbfkg 0.0397265 0.0111691 3.5568111 0.0003754
st_1 0.0428483 0.1461945 0.2930912 0.7694525
st_2 0.1350566 0.1428618 0.9453654 0.3444724
st_3 0.1492962 0.1449258 1.0301560 0.3029368
st_4 -0.1896934 0.1508900 -1.2571633 0.2086945
st_5 -0.0009755 0.1477686 -0.0066017 0.9947327
lrtest(full_model, reduced_model) # exclude hx stroke 
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + 
##     mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + 
##     b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5
## Model 2: falls ~ giage1 + mhdiab + mhpark + mhcopd + mharth + mhcancer + 
##     pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + 
##     st_1 + st_2 + st_3 + st_4 + st_5
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1  20 -2015.0                       
## 2  19 -2016.7 -1 3.3623    0.06671 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Exclude stroke hx.

Cancer

# exclude hx cancer
reduced_model <-  
  glm(falls ~  giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5,  
      family = "binomial",
      data = step2_narm)

kable(summary(reduced_model)$coef)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3513806 0.9639752 -2.4392542 0.0147176
giage1 0.0232660 0.0083040 2.8017771 0.0050822
mhdiab 0.1985085 0.1268647 1.5647254 0.1176473
mhstrk 0.3020351 0.1631448 1.8513314 0.0641219
mhpark 1.6515718 0.3718381 4.4416421 0.0000089
mhcopd 0.4525767 0.1205321 3.7548234 0.0001735
mharth 0.4043216 0.0844530 4.7875338 0.0000017
pascore 0.0011983 0.0006320 1.8960852 0.0579488
qlh 0.2870121 0.1164437 2.4648150 0.0137084
hwbmi -0.0465792 0.0217262 -2.1439201 0.0320393
gsgrpavg -0.0334304 0.0058351 -5.7292027 0.0000000
nfwlkspd -0.6674874 0.2128704 -3.1356511 0.0017147
b1fnd 0.9970051 0.3366772 2.9613086 0.0030633
b1tbfkg 0.0404056 0.0111635 3.6194463 0.0002952
st_1 0.0428975 0.1461595 0.2934980 0.7691415
st_2 0.1189554 0.1427619 0.8332434 0.4047075
st_3 0.1525788 0.1449376 1.0527204 0.2924691
st_4 -0.2089466 0.1505619 -1.3877791 0.1652043
st_5 0.0001605 0.1478023 0.0010858 0.9991337
lrtest(full_model, reduced_model) # exclude hx cancer 
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + 
##     mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + 
##     b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5
## Model 2: falls ~ giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + 
##     pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + 
##     st_1 + st_2 + st_3 + st_4 + st_5
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1  20 -2015.0                     
## 2  19 -2015.9 -1 1.7988     0.1799

Exclude cancer hx.

PACE score

# exclude PACE score. 
reduced_model <-  
  glm(falls ~  giage1 + mhpark +
        mhcopd + mharth + qlh + 
        hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg,
      family = "binomial",
      data = step2_narm)

kable(summary(reduced_model)$coef)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.0489779 0.9367460 -2.187336 0.0287180
giage1 0.0219667 0.0080631 2.724354 0.0064427
mhpark 1.6046143 0.3702184 4.334237 0.0000146
mhcopd 0.4405674 0.1201254 3.667562 0.0002449
mharth 0.3899696 0.0839951 4.642765 0.0000034
qlh 0.3082162 0.1137323 2.710014 0.0067280
hwbmi -0.0485790 0.0213106 -2.279572 0.0226331
gsgrpavg -0.0339663 0.0057179 -5.940390 0.0000000
nfwlkspd -0.6160713 0.2083269 -2.957233 0.0031041
b1fnd 1.0747685 0.3329516 3.228002 0.0012466
b1tbfkg 0.0400358 0.0110026 3.638768 0.0002739
lrtest(full_model, reduced_model) # don't exclude PACE score 
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + 
##     mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + 
##     b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5
## Model 2: falls ~ giage1 + mhpark + mhcopd + mharth + qlh + hwbmi + gsgrpavg + 
##     nfwlkspd + b1fnd + b1tbfkg
##   #Df LogLik Df  Chisq Pr(>Chisq)  
## 1  20  -2015                       
## 2  11  -2024 -9 18.111    0.03391 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Include PACE score.

Adding back in PACE score

# exclude site, hx. diabetes, hx stroke, hx cancer, PACE score. 
full_model$call
## glm(formula = falls ~ giage1 + mhdiab + mhstrk + mhpark + mhcopd + 
##     mharth + mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + 
##     b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5, family = "binomial", 
##     data = step2_narm)
# step2_narm$
reduced_model <-  
   glm(formula = falls ~ giage1 + mhpark + mhcopd + 
    mharth + pascore + qlh + hwbmi + gsgrpavg + 
    nfwlkspd + b1fnd + b1tbfkg,
      family = "binomial",
    data = step2_narm)

kable(summary(reduced_model)$coef)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2585222 0.9449119 -2.390193 0.0168395
giage1 0.0238429 0.0081389 2.929489 0.0033952
mhpark 1.6237326 0.3705595 4.381841 0.0000118
mhcopd 0.4440094 0.1202219 3.693250 0.0002214
mharth 0.3916039 0.0840306 4.660252 0.0000032
pascore 0.0010793 0.0006297 1.714101 0.0865101
qlh 0.3305595 0.1146242 2.883855 0.0039284
hwbmi -0.0506954 0.0213719 -2.372059 0.0176893
gsgrpavg -0.0347554 0.0057405 -6.054417 0.0000000
nfwlkspd -0.6472821 0.2093076 -3.092493 0.0019848
b1fnd 1.0618377 0.3334290 3.184599 0.0014495
b1tbfkg 0.0419401 0.0110707 3.788377 0.0001516
lrtest(full_model, reduced_model) 
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhdiab + mhstrk + mhpark + mhcopd + mharth + 
##     mhcancer + pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + 
##     b1fnd + b1tbfkg + st_1 + st_2 + st_3 + st_4 + st_5
## Model 2: falls ~ giage1 + mhpark + mhcopd + mharth + pascore + qlh + hwbmi + 
##     gsgrpavg + nfwlkspd + b1fnd + b1tbfkg
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1  20 -2015.0                       
## 2  12 -2022.6 -8 15.193     0.0555 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We’ve added back PACE score to achieve the most parsimonious model for which the full model does not provide significant advantage in performance by the likelihood ratio test.

Reduced model table for presentation

tbl_regression(reduced_model, 
    label = list(
                  giage1 ~ "Age"
#                 ,mhdiab ~ "Diabetes"
#                 ,mhstrk ~ "Stroke"
                 ,mhpark ~ "Parkinsons"
                 ,mhcopd ~ "COPD"
                 ,mharth ~ "Arthritis or Gout"
#                 ,mhcancer ~ "Cancer"
                 ,pascore ~ "PASE Score"
                 ,hwbmi ~ "Body Mass Index"
                 ,b1tbfkg ~ "Total Body Fat"
#                 ,b1tblkg ~ "Lean Body Mass"
                 ,gsgrpavg ~ "Average Grip Strength"
                 ,nfwlkspd ~ "Walking Speed"
                 ,b1fnd ~ "Corrected Femoral Neck BMD"
#                 ,b1thd ~ "Corrected Total Hip BMD"
                 # ,st_1 ~ "Birmingham Site (vs PO)"
                 # ,st_2 ~ "Minneapolis Site (vs PO)"
                 # ,st_3 ~ "Palo Alto Site (vs PO)"
                 # ,st_4 ~ "Pittsburg Site (vs PO)"
                 # ,st_5 ~ "San Diego Site (vs PO)"
                 ,qlh ~ "Subjective Health Rating"
    ),
               exponentiate = FALSE)
Characteristic log(OR)1 95% CI1 p-value
Age 0.02 0.01, 0.04 0.003
Parkinsons 1.6 0.89, 2.4 <0.001
COPD 0.44 0.20, 0.68 <0.001
Arthritis or Gout 0.39 0.23, 0.56 <0.001
PASE Score 0.00 0.00, 0.00 0.087
Subjective Health Rating 0.33 0.10, 0.55 0.004
Body Mass Index -0.05 -0.09, -0.01 0.018
Average Grip Strength -0.03 -0.05, -0.02 <0.001
Walking Speed -0.65 -1.1, -0.24 0.002
Corrected Femoral Neck BMD 1.1 0.41, 1.7 0.001
Total Body Fat 0.04 0.02, 0.06 <0.001

1 OR = Odds Ratio, CI = Confidence Interval

Step 3: Check Removed Covariate(s) Using the Change in \(\beta_i\) Method

We’ll solve for a percent change in beta coefficients not including the intercept common in the full and reduced using a simple formula:

\[ \Delta = \left(\frac{final-initial}{final}\right )\times 100\% \]

And we’ll say that a \(\Delta > 20\%\) is significant. If an individual coefficient is altered by more than 20%, at least one of the excluded coefficients may be an important confounder of the association between the outcome and a variable whose slope coefficient was altered.

# reorder $call... 
full_model <- 
  glm(formula = falls ~ # make sure the $call is ordered correctly 
        giage1  + mhpark + mhcopd + mharth + qlh + hwbmi + 
        gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + mhstrk + pascore + 
        mhdiab + mhcancer + st_1 + st_2 + st_3 + st_4 + st_5, 
    family = "binomial",
    data = step2_narm)

reduced_model <- 
  glm(formula = falls ~ # make sure the $call is ordered correctly 
        giage1  + mhpark + mhcopd + mharth + qlh + hwbmi + 
        gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + mhstrk + pascore, 
    family = "binomial",
    data = step2_narm)

# the change in beta 
(reduced_model$coef[2:13]-full_model$coef[2:13])/reduced_model$coefficients[2:13]
##      giage1      mhpark      mhcopd      mharth         qlh       hwbmi 
##  0.05860291 -0.01376951 -0.01187351 -0.02210046  0.08866286  0.08615468 
##    gsgrpavg    nfwlkspd       b1fnd     b1tbfkg      mhstrk     pascore 
##  0.03436498 -0.07754304  0.07167337  0.05459226 -0.03473779 -0.06975277
# bool if change in beta exceeds 20%
abs((reduced_model$coef[2:13]-full_model$coef[2:13])/reduced_model$coefficients[2:13]) > 0.2
##   giage1   mhpark   mhcopd   mharth      qlh    hwbmi gsgrpavg nfwlkspd 
##    FALSE    FALSE    FALSE    FALSE    FALSE    FALSE    FALSE    FALSE 
##    b1fnd  b1tbfkg   mhstrk  pascore 
##    FALSE    FALSE    FALSE    FALSE

Step 3 model

It appears that none of the variables excluded in step two are confounders in this model. We’ll retain the reduced model from step two at this stage.

step3_model <- reduced_model
summary(step3_model)$coef[,4] < 0.05
## (Intercept)      giage1      mhpark      mhcopd      mharth         qlh 
##        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE 
##       hwbmi    gsgrpavg    nfwlkspd       b1fnd     b1tbfkg      mhstrk 
##        TRUE        TRUE        TRUE        TRUE        TRUE       FALSE 
##     pascore 
##       FALSE
kable(summary(step3_model)$coef)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3256488 0.9458596 -2.458767 0.0139415
giage1 0.0234243 0.0081483 2.874769 0.0040432
mhpark 1.6377133 0.3704380 4.421018 0.0000098
mhcopd 0.4427531 0.1203176 3.679869 0.0002334
mharth 0.3917176 0.0840421 4.660969 0.0000031
qlh 0.3123394 0.1151421 2.712642 0.0066749
hwbmi -0.0500126 0.0213813 -2.339081 0.0193312
gsgrpavg -0.0345123 0.0057439 -6.008541 0.0000000
nfwlkspd -0.6238133 0.2098472 -2.972703 0.0029519
b1fnd 1.0813800 0.3336991 3.240584 0.0011929
b1tbfkg 0.0420948 0.0110750 3.800883 0.0001442
mhstrk 0.2955438 0.1627388 1.816063 0.0693608
pascore 0.0011318 0.0006304 1.795557 0.0725650

Step 4: Checking Removed Covariates & Regrouping of Cagetorical Covariates

We’ll add back the subjects’ lean body mass and cancer status to determine if their joint significance by the Wald test is sufficient to include them in the model. We’ll again use \(\alpha = 0.05\) for the type one error rate of our Wald Test. We’ll add the variables individually, then together.

‘b1tblkg’, lean body mass

step3_narm1 <- MrOs %>% 
  dplyr::select(-b1thd,-mhfallv2) %>%
  drop_na()

# add back b1tblkg 
check_again_lean <- glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + 
    b1tbfkg + b1tblkg, 
    family = "binomial",
    data = step3_narm1)

summary(check_again_lean)$coef
##                 Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept) -2.603372408 0.9563516254 -2.722192 6.485052e-03
## giage1       0.025140758 0.0081895108  3.069873 2.141498e-03
## mhstrk       0.301647522 0.1628266514  1.852568 6.394423e-02
## mhpark       1.618930925 0.3697138706  4.378875 1.192934e-05
## mhcopd       0.451069377 0.1204304576  3.745476 1.800521e-04
## mharth       0.384990347 0.0841367768  4.575768 4.744768e-06
## pascore      0.001143792 0.0006300872  1.815291 6.947915e-02
## qlh          0.322746343 0.1153725017  2.797429 5.151114e-03
## hwbmi       -0.073355050 0.0242189021 -3.028835 2.454990e-03
## gsgrpavg    -0.039507594 0.0062236393 -6.347989 2.181476e-10
## nfwlkspd    -0.636315533 0.2103364029 -3.025228 2.484459e-03
## b1fnd        0.984569397 0.3373304990  2.918709 3.514847e-03
## b1tbfkg      0.043484761 0.0110869386  3.922161 8.775821e-05
## b1tblkg      0.018296076 0.0088195514  2.074491 3.803378e-02
summary(check_again_lean)$coef[14,4] < 0.05 # lean body mass now significant 
## [1] TRUE

lean body mass now significant

‘b1thd’, corrected hip bone mineral

step3_narm2 <- MrOs %>% 
  dplyr::select(-b1tblkg,-mhfallv2) %>%
  drop_na()

# add back b1tblkg 
check_again_hipBMD <- glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + 
    b1tbfkg + b1thd, 
    family = "binomial",
    data = step3_narm2)

summary(check_again_hipBMD)$coef
##                 Estimate   Std. Error    z value     Pr(>|z|)
## (Intercept) -2.316317936 0.9463701382 -2.4475814 1.438186e-02
## giage1       0.023495119 0.0081519942  2.8821314 3.949949e-03
## mhstrk       0.295571658 0.1627344959  1.8162815 6.932720e-02
## mhpark       1.633935914 0.3707090883  4.4075961 1.045243e-05
## mhcopd       0.442035604 0.1203489957  3.6729480 2.397683e-04
## mharth       0.391760718 0.0840442660  4.6613616 3.141242e-06
## pascore      0.001133974 0.0006303834  1.7988645 7.204011e-02
## qlh          0.312186685 0.1151349529  2.7114849 6.698259e-03
## hwbmi       -0.049186486 0.0215786273 -2.2794075 2.264285e-02
## gsgrpavg    -0.034409928 0.0057551538 -5.9789762 2.245443e-09
## nfwlkspd    -0.620648881 0.2101387351 -2.9535196 3.141726e-03
## b1fnd        1.234302948 0.6354465682  1.9424182 5.208650e-02
## b1tbfkg      0.042008028 0.0110801402  3.7912903 1.498667e-04
## b1thd       -0.170308786 0.6025939955 -0.2826261 7.774635e-01
summary(check_again_hipBMD)$coef[14,4] < 0.05 # corrected hip bone mineral now significant 
## [1] FALSE

Corrected hip bone mineral not significant.

‘b1thd + b1tblkg’, total lean body mass & hip BMD

step3_narm3 <- MrOs %>% 
  dplyr::select(-mhfallv2) %>%
  drop_na()

# add back b1tblkg 
check_again_both <- glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + 
    b1tbfkg + b1thd + b1tblkg,  
    family = "binomial",
    data = step3_narm3)

summary(check_again_both)$coef
##                 Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept) -2.593559845 0.9565397072 -2.711398 6.700015e-03
## giage1       0.025281541 0.0081955523  3.084788 2.036973e-03
## mhstrk       0.301695854 0.1628248093  1.852886 6.389866e-02
## mhpark       1.612643674 0.3700187349  4.358276 1.310912e-05
## mhcopd       0.450057793 0.1204628214  3.736072 1.869171e-04
## mharth       0.384910073 0.0841414787  4.574558 4.772265e-06
## pascore      0.001147099 0.0006301137  1.820464 6.868841e-02
## qlh          0.322649288 0.1153629417  2.796819 5.160841e-03
## hwbmi       -0.072434428 0.0243045554 -2.980282 2.879832e-03
## gsgrpavg    -0.039428670 0.0062257845 -6.333125 2.402450e-10
## nfwlkspd    -0.631531380 0.2106087250 -2.998600 2.712229e-03
## b1fnd        1.224202935 0.6360497045  1.924697 5.426728e-02
## b1tbfkg      0.043369918 0.0110912615  3.910278 9.218989e-05
## b1thd       -0.268568724 0.6048858502 -0.443999 6.570433e-01
## b1tblkg      0.018601860 0.0088469732  2.102624 3.549863e-02
summary(check_again_both)$coef[14,4] < 0.05 # corrected hip bone mineral density not significant even when lean body mass considered 
## [1] FALSE

Only total LBM, ‘b1tblkg’, is significant, even when both are considered.

step 4 model

We’ll add lean body mass to the model from the last step to end step four. Addition of these variables does not make the variables added back at the previous stage pass a univaraite Wald test where \(\alpha = 0.05\), but that’s inconsequential.

step4_model <- check_again_lean
df_1 <- as.data.frame(summary(step4_model)$coef)
df_1$pass <- df_1[,4] < 0.05
df_1
##                 Estimate   Std. Error   z value     Pr(>|z|)  pass
## (Intercept) -2.603372408 0.9563516254 -2.722192 6.485052e-03  TRUE
## giage1       0.025140758 0.0081895108  3.069873 2.141498e-03  TRUE
## mhstrk       0.301647522 0.1628266514  1.852568 6.394423e-02 FALSE
## mhpark       1.618930925 0.3697138706  4.378875 1.192934e-05  TRUE
## mhcopd       0.451069377 0.1204304576  3.745476 1.800521e-04  TRUE
## mharth       0.384990347 0.0841367768  4.575768 4.744768e-06  TRUE
## pascore      0.001143792 0.0006300872  1.815291 6.947915e-02 FALSE
## qlh          0.322746343 0.1153725017  2.797429 5.151114e-03  TRUE
## hwbmi       -0.073355050 0.0242189021 -3.028835 2.454990e-03  TRUE
## gsgrpavg    -0.039507594 0.0062236393 -6.347989 2.181476e-10  TRUE
## nfwlkspd    -0.636315533 0.2103364029 -3.025228 2.484459e-03  TRUE
## b1fnd        0.984569397 0.3373304990  2.918709 3.514847e-03  TRUE
## b1tbfkg      0.043484761 0.0110869386  3.922161 8.775821e-05  TRUE
## b1tblkg      0.018296076 0.0088195514  2.074491 3.803378e-02  TRUE

Step 5: Check Linearity Assumption for Concinuous Variables

Identifying continuous variables

glimpse(step3_narm1)
## Rows: 5,092
## Columns: 29
## $ id       <chr> "BI0001", "BI0002", "BI0003", "BI0004", "BI0006", "BI0007", "~
## $ site     <chr> "BI", "BI", "BI", "BI", "BI", "BI", "BI", "BI", "BI", "BI", "~
## $ giage1   <dbl> 67, 67, 72, 65, 78, 69, 65, 65, 72, 67, 66, 73, 68, 72, 65, 7~
## $ mhdiab   <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ mhstrk   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ mhpark   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ mhcopd   <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ mharth   <dbl> 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0~
## $ mhcancer <dbl> 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1~
## $ pascore  <dbl> 183.25000, 179.57143, 300.00000, 107.00000, 41.00000, 256.714~
## $ qlhealth <dbl> 3, 3, 2, 1, 2, 2, 1, 3, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1~
## $ hwbmi    <dbl> 29.42107, 31.87168, 26.09564, 28.56672, 28.45522, 19.33972, 2~
## $ gsgrpavg <dbl> 37.50, 38.00, 48.50, 43.00, 26.00, 39.50, 41.00, 43.50, 39.75~
## $ nfwlkspd <dbl> 1.2244898, 0.8882309, 1.3157895, 1.1964108, 0.7736944, 1.2875~
## $ b1fnd    <dbl> 0.8674233, 0.7402350, 0.6918427, 0.8783067, 0.8862731, 0.6301~
## $ b1tbfkg  <dbl> 24.855927, 32.004398, 16.985812, 29.454647, 22.422401, 9.9091~
## $ b1tblkg  <dbl> 57.92468, 62.85568, 61.85451, 63.01733, 49.63203, 49.53283, 6~
## $ mhfalln2 <dbl> 3, 3, 0, 2, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ st_1     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ st_2     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ st_3     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ st_4     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ st_5     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ falls    <dbl> 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ qlh2     <dbl> 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0~
## $ qlh3     <dbl> 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ qlh4     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ qlh5     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ qlh      <dbl> 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~

Graphical approach

We’ll employ the loess approach to screen for linearity in the log odds across the range of each continuous independent variable. For variables that are obviously nonlinear by loess, we’ll use the fractional polynomial approach to identify the best transformation.

# loess subject age 
step3_narm1 = step3_narm1 %>% # # 
  arrange(desc(giage1)) %>% 
  map_df(rev)
gg1 <- 
  ggplot(data = step3_narm1) +
  stat_smooth(mapping = aes(x = giage1, y = falls), method = "loess") + 
  stat_smooth(mapping = aes(x = giage1, y = falls), method = "lm", color = "red") + 
  theme_minimal()

# loess PACE score 
step3_narm1 = step3_narm1 %>% 
  arrange(desc(pascore)) %>% 
  map_df(rev)
gg2 <- 
  ggplot(data = step3_narm1) +
  stat_smooth(mapping = aes(x = pascore, y = falls), method = "loess") + 
  stat_smooth(mapping = aes(x = pascore, y = falls), method = "lm", color = "red") + 
  theme_minimal()

# loess BMI 
step3_narm1 = step3_narm1 %>% 
  arrange(desc(hwbmi)) %>% 
  map_df(rev)
gg3 <- 
  ggplot(data = step3_narm1) +
  stat_smooth(mapping = aes(x = hwbmi, y = falls), method = "loess") + 
  stat_smooth(mapping = aes(x = hwbmi, y = falls), method = "lm", color = "red") + 
  theme_minimal()

# loess grip strength 
step3_narm1 = step3_narm1 %>% 
  arrange(desc(gsgrpavg)) %>% 
  map_df(rev)
gg4 <- 
  ggplot(data = step3_narm1) +
  stat_smooth(mapping = aes(x = gsgrpavg, y = falls), method = "loess") + 
  stat_smooth(mapping = aes(x = gsgrpavg, y = falls), method = "lm", color = "red") + 
  theme_minimal()

# loess walk speed 
step3_narm1 = step3_narm1 %>% 
  arrange(desc(nfwlkspd)) %>% 
  map_df(rev)
gg5 <- 
  ggplot(data = step3_narm1) +
  stat_smooth(mapping = aes(x = nfwlkspd, y = falls), method = "loess") + 
  stat_smooth(mapping = aes(x = nfwlkspd, y = falls), method = "lm", color = "red") + 
  theme_minimal()

# loess corrected neck BMD 
step3_narm1 = step3_narm1 %>% 
  arrange(desc(b1fnd)) %>% 
  map_df(rev)
gg6 <- 
  ggplot(data = step3_narm1) +
  stat_smooth(mapping = aes(x = b1fnd, y = falls), method = "loess") + 
  stat_smooth(mapping = aes(x = b1fnd, y = falls), method = "lm", color = "red") + 
  theme_minimal()

# loess body fat mass 
step3_narm1 = step3_narm1 %>% 
  arrange(desc(b1tbfkg)) %>% 
  map_df(rev)
gg7 <- 
  ggplot(data = step3_narm1) +
  stat_smooth(mapping = aes(x = b1tbfkg, y = falls), method = "loess") + 
  stat_smooth(mapping = aes(x = b1tbfkg, y = falls), method = "lm", color = "red") + 
  theme_minimal()

# loess body lean mass 
step3_narm1 = step3_narm1 %>% 
  arrange(desc(b1tblkg)) %>% 
  map_df(rev)
gg8 <- 
  ggplot(data = step3_narm1) +
  stat_smooth(mapping = aes(x = b1tblkg, y = falls), method = "loess") + 
  stat_smooth(mapping = aes(x = b1tblkg, y = falls), method = "lm", color = "red") + 
  theme_minimal()

grid.arrange(gg1, 
             gg2,
             gg3,
             gg4,
             gg5,
             gg6,
             gg7,
             gg8)
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

Checking fractional polynomials

We’ll check the fractional polynomials for walk speed, corrected femoral neck bone mineral density, average left/right grip strength, BMI, and subject age.

step4_model$call
## glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
##     pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + 
##     b1tblkg, family = "binomial", data = step3_narm1)
fracpoly_age <- mfp(
  falls ~ fp(giage1, df = 4) +
    mhstrk + mhpark + mhcopd + mharth +
    pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd +
    b1tbfkg + b1tblkg, data = step3_narm1,
  family = binomial(),
  alpha = 0.1,
  verbose = T # use ident for age
)
## 
##  Variable    Deviance    Power(s)
## ------------------------------------------------
## Cycle 1
##  gsgrpavg             
##              4078.379     
##              4037.74     1
##                              
##                              
## 
##  mharth               
##              4058.791     
##              4037.74     1
##                              
##                              
## 
##  mhpark               
##              4055.444     
##              4037.74     1
##                              
##                              
## 
##  b1tbfkg              
##              4053.122     
##              4037.74     1
##                              
##                              
## 
##  mhcopd               
##              4051.007     
##              4037.74     1
##                              
##                              
## 
##  giage1               
##              4047.132     
##              4037.74     1
##              4036.424    3
##              4032.985    -2 2
## 
##  hwbmi                
##              4047.003     
##              4037.74     1
##                              
##                              
## 
##  nfwlkspd             
##              4046.925     
##              4037.74     1
##                              
##                              
## 
##  b1fnd                
##              4046.154     
##              4037.74     1
##                              
##                              
## 
##  qlh              
##              4045.323     
##              4037.74     1
##                              
##                              
## 
##  b1tblkg              
##              4042.026     
##              4037.74     1
##                              
##                              
## 
##  mhstrk               
##              4041.026     
##              4037.74     1
##                              
##                              
## 
##  pascore              
##              4041.012     
##              4037.74     1
##                              
##                              
## 
## 
## Tansformation
##          shift scale
## gsgrpavg     0     1
## mharth       0     1
## mhpark       0     1
## b1tbfkg      0     1
## mhcopd       0     1
## giage1       0   100
## hwbmi        0     1
## nfwlkspd     0     1
## b1fnd        0     1
## qlh          0     1
## b1tblkg      0     1
## mhstrk       0     1
## pascore      0     1
## 
## Fractional polynomials
##          df.initial select alpha df.final power1 power2
## gsgrpavg          1      1   0.1        1      1      .
## mharth            1      1   0.1        1      1      .
## mhpark            1      1   0.1        1      1      .
## b1tbfkg           1      1   0.1        1      1      .
## mhcopd            1      1   0.1        1      1      .
## giage1            4      1   0.1        1      1      .
## hwbmi             1      1   0.1        1      1      .
## nfwlkspd          1      1   0.1        1      1      .
## b1fnd             1      1   0.1        1      1      .
## qlh               1      1   0.1        1      1      .
## b1tblkg           1      1   0.1        1      1      .
## mhstrk            1      1   0.1        1      1      .
## pascore           1      1   0.1        1      1      .
## 
## 
## Transformations of covariates:
##                    formula
## giage1   I((giage1/100)^1)
## mhstrk              mhstrk
## mhpark              mhpark
## mhcopd              mhcopd
## mharth              mharth
## pascore            pascore
## qlh                    qlh
## hwbmi                hwbmi
## gsgrpavg          gsgrpavg
## nfwlkspd          nfwlkspd
## b1fnd                b1fnd
## b1tbfkg            b1tbfkg
## b1tblkg            b1tblkg
## 
## 
## Deviance table:
##           Resid. Dev
## Null model    4274.221
## Linear model  4037.74
## Final model   4037.74
# frac_PACEscore <- mfp(
#   falls ~ fp(pascore, df = 4) +
#     giage1 + mhstrk + mhpark + mhcopd + mharth +
#     qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd +
#     b1tbfkg + b1tblkg, data = step3_narm1,
#   family = binomial(),
#   alpha = 0.1,
#   verbose = T # use ident for PACE score
# )

frac_BMI <- mfp(
  falls ~ fp(hwbmi, df = 4) +
    giage1 + mhstrk + pascore + mhpark + mhcopd + mharth +
    qlh + gsgrpavg + nfwlkspd + b1fnd +
    b1tbfkg + b1tblkg, data = step3_narm1,
  family = binomial(),
  alpha = 0.1,
  verbose = T # use ident for hwbmi
)
## 
##  Variable    Deviance    Power(s)
## ------------------------------------------------
## Cycle 1
##  gsgrpavg             
##              4078.379     
##              4037.74     1
##                              
##                              
## 
##  mharth               
##              4058.791     
##              4037.74     1
##                              
##                              
## 
##  mhpark               
##              4055.444     
##              4037.74     1
##                              
##                              
## 
##  b1tbfkg              
##              4053.122     
##              4037.74     1
##                              
##                              
## 
##  mhcopd               
##              4051.007     
##              4037.74     1
##                              
##                              
## 
##  giage1               
##              4047.132     
##              4037.74     1
##                              
##                              
## 
##  hwbmi                
##              4047.003     
##              4037.74     1
##              4037.74     1
##              4036.969    -2 3
## 
##  nfwlkspd             
##              4046.925     
##              4037.74     1
##                              
##                              
## 
##  b1fnd                
##              4046.154     
##              4037.74     1
##                              
##                              
## 
##  qlh              
##              4045.323     
##              4037.74     1
##                              
##                              
## 
##  b1tblkg              
##              4042.026     
##              4037.74     1
##                              
##                              
## 
##  mhstrk               
##              4041.026     
##              4037.74     1
##                              
##                              
## 
##  pascore              
##              4041.012     
##              4037.74     1
##                              
##                              
## 
## 
## Tansformation
##          shift scale
## gsgrpavg     0     1
## mharth       0     1
## mhpark       0     1
## b1tbfkg      0     1
## mhcopd       0     1
## giage1       0     1
## hwbmi        0    10
## nfwlkspd     0     1
## b1fnd        0     1
## qlh          0     1
## b1tblkg      0     1
## mhstrk       0     1
## pascore      0     1
## 
## Fractional polynomials
##          df.initial select alpha df.final power1 power2
## gsgrpavg          1      1   0.1        1      1      .
## mharth            1      1   0.1        1      1      .
## mhpark            1      1   0.1        1      1      .
## b1tbfkg           1      1   0.1        1      1      .
## mhcopd            1      1   0.1        1      1      .
## giage1            1      1   0.1        1      1      .
## hwbmi             4      1   0.1        1      1      .
## nfwlkspd          1      1   0.1        1      1      .
## b1fnd             1      1   0.1        1      1      .
## qlh               1      1   0.1        1      1      .
## b1tblkg           1      1   0.1        1      1      .
## mhstrk            1      1   0.1        1      1      .
## pascore           1      1   0.1        1      1      .
## 
## 
## Transformations of covariates:
##                  formula
## hwbmi    I((hwbmi/10)^1)
## giage1            giage1
## mhstrk            mhstrk
## pascore          pascore
## mhpark            mhpark
## mhcopd            mhcopd
## mharth            mharth
## qlh                  qlh
## gsgrpavg        gsgrpavg
## nfwlkspd        nfwlkspd
## b1fnd              b1fnd
## b1tbfkg          b1tbfkg
## b1tblkg          b1tblkg
## 
## 
## Deviance table:
##           Resid. Dev
## Null model    4274.221
## Linear model  4037.74
## Final model   4037.74
frac_grip <- mfp(
  falls ~ fp(gsgrpavg, df = 4) +
    giage1 + mhstrk + pascore + mhpark + mhcopd + mharth +
    qlh + hwbmi + nfwlkspd + b1fnd +
    b1tbfkg + b1tblkg, data = step3_narm1,
  family = binomial(),
  alpha = 0.1,
  verbose = T # I((gsgrpavg/100)^-2)+I((gsgrpavg/100)^-2*log((gsgrpavg/100)))
)
## 
##  Variable    Deviance    Power(s)
## ------------------------------------------------
## Cycle 1
##  gsgrpavg             
##              4078.379     
##              4037.74     1
##              4037.156    0.5
##              4031.277    -2 -2
## 
##  mharth               
##              4052.325     
##              4031.277    1
##                              
##                              
## 
##  mhpark               
##              4048.67      
##              4031.277    1
##                              
##                              
## 
##  b1tbfkg              
##              4047.745     
##              4031.277    1
##                              
##                              
## 
##  mhcopd               
##              4044.216     
##              4031.277    1
##                              
##                              
## 
##  giage1               
##              4040.189     
##              4031.277    1
##                              
##                              
## 
##  hwbmi                
##              4041.056     
##              4031.277    1
##                              
##                              
## 
##  nfwlkspd             
##              4040.498     
##              4031.277    1
##                              
##                              
## 
##  b1fnd                
##              4039.415     
##              4031.277    1
##                              
##                              
## 
##  qlh              
##              4038.927     
##              4031.277    1
##                              
##                              
## 
##  b1tblkg              
##              4035.509     
##              4031.277    1
##                              
##                              
## 
##  mhstrk               
##              4034.481     
##              4031.277    1
##                              
##                              
## 
##  pascore              
##              4034.817     
##              4031.277    1
##                              
##                              
## 
## 
## Tansformation
##          shift scale
## gsgrpavg     0   100
## mharth       0     1
## mhpark       0     1
## b1tbfkg      0     1
## mhcopd       0     1
## giage1       0     1
## hwbmi        0     1
## nfwlkspd     0     1
## b1fnd        0     1
## qlh          0     1
## b1tblkg      0     1
## mhstrk       0     1
## pascore      0     1
## 
## Fractional polynomials
##          df.initial select alpha df.final power1 power2
## gsgrpavg          4      1   0.1        4     -2     -2
## mharth            1      1   0.1        1      1      .
## mhpark            1      1   0.1        1      1      .
## b1tbfkg           1      1   0.1        1      1      .
## mhcopd            1      1   0.1        1      1      .
## giage1            1      1   0.1        1      1      .
## hwbmi             1      1   0.1        1      1      .
## nfwlkspd          1      1   0.1        1      1      .
## b1fnd             1      1   0.1        1      1      .
## qlh               1      1   0.1        1      1      .
## b1tblkg           1      1   0.1        1      1      .
## mhstrk            1      1   0.1        1      1      .
## pascore           1      1   0.1        1      1      .
## 
## 
## Transformations of covariates:
##                                                                formula
## gsgrpavg I((gsgrpavg/100)^-2)+I((gsgrpavg/100)^-2*log((gsgrpavg/100)))
## giage1                                                          giage1
## mhstrk                                                          mhstrk
## pascore                                                        pascore
## mhpark                                                          mhpark
## mhcopd                                                          mhcopd
## mharth                                                          mharth
## qlh                                                                qlh
## hwbmi                                                            hwbmi
## nfwlkspd                                                      nfwlkspd
## b1fnd                                                            b1fnd
## b1tbfkg                                                        b1tbfkg
## b1tblkg                                                        b1tblkg
## 
## 
## Deviance table:
##           Resid. Dev
## Null model    4274.221
## Linear model  4037.74
## Final model   4031.277
frac_walkspeed <- mfp(
  falls ~ fp(nfwlkspd, df = 4) +
    giage1 + mhstrk + pascore + mhpark + mhcopd + mharth +
    qlh + hwbmi + gsgrpavg + b1fnd +
    b1tbfkg + b1tblkg, data = step3_narm1,
  family = binomial(),
  alpha = 0.1,
  verbose = T # inverse square for nfwlkspd
)
## 
##  Variable    Deviance    Power(s)
## ------------------------------------------------
## Cycle 1
##  gsgrpavg             
##              4078.379     
##              4037.74     1
##                              
##                              
## 
##  mharth               
##              4058.791     
##              4037.74     1
##                              
##                              
## 
##  mhpark               
##              4055.444     
##              4037.74     1
##                              
##                              
## 
##  b1tbfkg              
##              4053.122     
##              4037.74     1
##                              
##                              
## 
##  mhcopd               
##              4051.007     
##              4037.74     1
##                              
##                              
## 
##  giage1               
##              4047.132     
##              4037.74     1
##                              
##                              
## 
##  hwbmi                
##              4047.003     
##              4037.74     1
##                              
##                              
## 
##  nfwlkspd             
##              4046.925     
##              4037.74     1
##              4029.636    -2
##              4029.601    -2 -2
## 
##  b1fnd                
##              4038.14      
##              4029.636    1
##                              
##                              
## 
##  qlh              
##              4036.574     
##              4029.636    1
##                              
##                              
## 
##  b1tblkg              
##              4034.12      
##              4029.636    1
##                              
##                              
## 
##  mhstrk               
##              4033.206     
##              4029.636    1
##                              
##                              
## 
##  pascore              
##              4033.178     
##              4029.636    1
##                              
##                              
## 
## Cycle 2
##  gsgrpavg             
##              4070.855     
##              4029.636    1
##                              
##                              
## 
##  mharth               
##              4050.737     
##              4029.636    1
##                              
##                              
## 
##  mhpark               
##              4046.675     
##              4029.636    1
##                              
##                              
## 
##  b1tbfkg              
##              4046.669     
##              4029.636    1
##                              
##                              
## 
##  mhcopd               
##              4043.337     
##              4029.636    1
##                              
##                              
## 
##  giage1               
##              4039.871     
##              4029.636    1
##                              
##                              
## 
##  hwbmi                
##              4039.875     
##              4029.636    1
##                              
##                              
## 
## 
## Tansformation
##          shift scale
## gsgrpavg     0     1
## mharth       0     1
## mhpark       0     1
## b1tbfkg      0     1
## mhcopd       0     1
## giage1       0     1
## hwbmi        0     1
## nfwlkspd     0     1
## b1fnd        0     1
## qlh          0     1
## b1tblkg      0     1
## mhstrk       0     1
## pascore      0     1
## 
## Fractional polynomials
##          df.initial select alpha df.final power1 power2
## gsgrpavg          1      1   0.1        1      1      .
## mharth            1      1   0.1        1      1      .
## mhpark            1      1   0.1        1      1      .
## b1tbfkg           1      1   0.1        1      1      .
## mhcopd            1      1   0.1        1      1      .
## giage1            1      1   0.1        1      1      .
## hwbmi             1      1   0.1        1      1      .
## nfwlkspd          4      1   0.1        2     -2      .
## b1fnd             1      1   0.1        1      1      .
## qlh               1      1   0.1        1      1      .
## b1tblkg           1      1   0.1        1      1      .
## mhstrk            1      1   0.1        1      1      .
## pascore           1      1   0.1        1      1      .
## 
## 
## Transformations of covariates:
##                 formula
## nfwlkspd I(nfwlkspd^-2)
## giage1           giage1
## mhstrk           mhstrk
## pascore         pascore
## mhpark           mhpark
## mhcopd           mhcopd
## mharth           mharth
## qlh                 qlh
## hwbmi             hwbmi
## gsgrpavg       gsgrpavg
## b1fnd             b1fnd
## b1tbfkg         b1tbfkg
## b1tblkg         b1tblkg
## 
## 
## Deviance table:
##           Resid. Dev
## Null model    4274.221
## Linear model  4037.74
## Final model   4029.636
frac_neckBMD <- mfp(
  falls ~ fp(b1fnd, df = 4) +
    giage1 + mhstrk + pascore + mhpark + mhcopd + mharth +
    qlh + hwbmi + gsgrpavg + nfwlkspd +
    b1tbfkg + b1tblkg, data = step3_narm1,
  family = binomial(),
  alpha = 0.1,
  verbose = T # idnet for b1fnd
)
## 
##  Variable    Deviance    Power(s)
## ------------------------------------------------
## Cycle 1
##  gsgrpavg             
##              4078.379     
##              4037.74     1
##                              
##                              
## 
##  mharth               
##              4058.791     
##              4037.74     1
##                              
##                              
## 
##  mhpark               
##              4055.444     
##              4037.74     1
##                              
##                              
## 
##  b1tbfkg              
##              4053.122     
##              4037.74     1
##                              
##                              
## 
##  mhcopd               
##              4051.007     
##              4037.74     1
##                              
##                              
## 
##  giage1               
##              4047.132     
##              4037.74     1
##                              
##                              
## 
##  hwbmi                
##              4047.003     
##              4037.74     1
##                              
##                              
## 
##  nfwlkspd             
##              4046.925     
##              4037.74     1
##                              
##                              
## 
##  b1fnd                
##              4046.154     
##              4037.74     1
##              4035.913    -2
##              4035.85     -2 -2
## 
##  qlh              
##              4045.323     
##              4037.74     1
##                              
##                              
## 
##  b1tblkg              
##              4042.026     
##              4037.74     1
##                              
##                              
## 
##  mhstrk               
##              4041.026     
##              4037.74     1
##                              
##                              
## 
##  pascore              
##              4041.012     
##              4037.74     1
##                              
##                              
## 
## 
## Tansformation
##          shift scale
## gsgrpavg     0     1
## mharth       0     1
## mhpark       0     1
## b1tbfkg      0     1
## mhcopd       0     1
## giage1       0     1
## hwbmi        0     1
## nfwlkspd     0     1
## b1fnd        0     1
## qlh          0     1
## b1tblkg      0     1
## mhstrk       0     1
## pascore      0     1
## 
## Fractional polynomials
##          df.initial select alpha df.final power1 power2
## gsgrpavg          1      1   0.1        1      1      .
## mharth            1      1   0.1        1      1      .
## mhpark            1      1   0.1        1      1      .
## b1tbfkg           1      1   0.1        1      1      .
## mhcopd            1      1   0.1        1      1      .
## giage1            1      1   0.1        1      1      .
## hwbmi             1      1   0.1        1      1      .
## nfwlkspd          1      1   0.1        1      1      .
## b1fnd             4      1   0.1        1      1      .
## qlh               1      1   0.1        1      1      .
## b1tblkg           1      1   0.1        1      1      .
## mhstrk            1      1   0.1        1      1      .
## pascore           1      1   0.1        1      1      .
## 
## 
## Transformations of covariates:
##             formula
## b1fnd    I(b1fnd^1)
## giage1       giage1
## mhstrk       mhstrk
## pascore     pascore
## mhpark       mhpark
## mhcopd       mhcopd
## mharth       mharth
## qlh             qlh
## hwbmi         hwbmi
## gsgrpavg   gsgrpavg
## nfwlkspd   nfwlkspd
## b1tbfkg     b1tbfkg
## b1tblkg     b1tblkg
## 
## 
## Deviance table:
##           Resid. Dev
## Null model    4274.221
## Linear model  4037.74
## Final model   4037.74
frac_fatMass <- mfp(
  falls ~ fp(b1tbfkg, df = 4) +
    giage1 + mhstrk + pascore + mhpark + mhcopd + mharth +
    qlh + hwbmi + gsgrpavg + nfwlkspd +
    b1fnd + b1tblkg, data = step3_narm1,
  family = binomial(),
  alpha = 0.1,
  verbose = T # idnet for b1tbfkg
)
## 
##  Variable    Deviance    Power(s)
## ------------------------------------------------
## Cycle 1
##  gsgrpavg             
##              4078.379     
##              4037.74     1
##                              
##                              
## 
##  mharth               
##              4058.791     
##              4037.74     1
##                              
##                              
## 
##  mhpark               
##              4055.444     
##              4037.74     1
##                              
##                              
## 
##  b1tbfkg              
##              4053.122     
##              4037.74     1
##              4037.74     1
##              4037.07     2 3
## 
##  mhcopd               
##              4051.007     
##              4037.74     1
##                              
##                              
## 
##  giage1               
##              4047.132     
##              4037.74     1
##                              
##                              
## 
##  hwbmi                
##              4047.003     
##              4037.74     1
##                              
##                              
## 
##  nfwlkspd             
##              4046.925     
##              4037.74     1
##                              
##                              
## 
##  b1fnd                
##              4046.154     
##              4037.74     1
##                              
##                              
## 
##  qlh              
##              4045.323     
##              4037.74     1
##                              
##                              
## 
##  b1tblkg              
##              4042.026     
##              4037.74     1
##                              
##                              
## 
##  mhstrk               
##              4041.026     
##              4037.74     1
##                              
##                              
## 
##  pascore              
##              4041.012     
##              4037.74     1
##                              
##                              
## 
## 
## Tansformation
##          shift scale
## gsgrpavg     0     1
## mharth       0     1
## mhpark       0     1
## b1tbfkg      0    10
## mhcopd       0     1
## giage1       0     1
## hwbmi        0     1
## nfwlkspd     0     1
## b1fnd        0     1
## qlh          0     1
## b1tblkg      0     1
## mhstrk       0     1
## pascore      0     1
## 
## Fractional polynomials
##          df.initial select alpha df.final power1 power2
## gsgrpavg          1      1   0.1        1      1      .
## mharth            1      1   0.1        1      1      .
## mhpark            1      1   0.1        1      1      .
## b1tbfkg           4      1   0.1        1      1      .
## mhcopd            1      1   0.1        1      1      .
## giage1            1      1   0.1        1      1      .
## hwbmi             1      1   0.1        1      1      .
## nfwlkspd          1      1   0.1        1      1      .
## b1fnd             1      1   0.1        1      1      .
## qlh               1      1   0.1        1      1      .
## b1tblkg           1      1   0.1        1      1      .
## mhstrk            1      1   0.1        1      1      .
## pascore           1      1   0.1        1      1      .
## 
## 
## Transformations of covariates:
##                    formula
## b1tbfkg  I((b1tbfkg/10)^1)
## giage1              giage1
## mhstrk              mhstrk
## pascore            pascore
## mhpark              mhpark
## mhcopd              mhcopd
## mharth              mharth
## qlh                    qlh
## hwbmi                hwbmi
## gsgrpavg          gsgrpavg
## nfwlkspd          nfwlkspd
## b1fnd                b1fnd
## b1tblkg            b1tblkg
## 
## 
## Deviance table:
##           Resid. Dev
## Null model    4274.221
## Linear model  4037.74
## Final model   4037.74
frac_fatMass <- mfp(
  falls ~ fp(b1tblkg, df = 4) +
    giage1 + mhstrk + pascore + mhpark + mhcopd + mharth +
    qlh + hwbmi + gsgrpavg + nfwlkspd +
    b1fnd + b1tbfkg, data = step3_narm1,
  family = binomial(),
  alpha = 0.1,
  verbose = T # idnet for b1tblkg
)
## 
##  Variable    Deviance    Power(s)
## ------------------------------------------------
## Cycle 1
##  gsgrpavg             
##              4078.379     
##              4037.74     1
##                              
##                              
## 
##  mharth               
##              4058.791     
##              4037.74     1
##                              
##                              
## 
##  mhpark               
##              4055.444     
##              4037.74     1
##                              
##                              
## 
##  b1tbfkg              
##              4053.122     
##              4037.74     1
##                              
##                              
## 
##  mhcopd               
##              4051.007     
##              4037.74     1
##                              
##                              
## 
##  giage1               
##              4047.132     
##              4037.74     1
##                              
##                              
## 
##  hwbmi                
##              4047.003     
##              4037.74     1
##                              
##                              
## 
##  nfwlkspd             
##              4046.925     
##              4037.74     1
##                              
##                              
## 
##  b1fnd                
##              4046.154     
##              4037.74     1
##                              
##                              
## 
##  qlh              
##              4045.323     
##              4037.74     1
##                              
##                              
## 
##  b1tblkg              
##              4042.026     
##              4037.74     1
##              4036.991    -2
##              4036.727    3 3
## 
##  mhstrk               
##              4041.026     
##              4037.74     1
##                              
##                              
## 
##  pascore              
##              4041.012     
##              4037.74     1
##                              
##                              
## 
## 
## Tansformation
##          shift scale
## gsgrpavg     0     1
## mharth       0     1
## mhpark       0     1
## b1tbfkg      0     1
## mhcopd       0     1
## giage1       0     1
## hwbmi        0     1
## nfwlkspd     0     1
## b1fnd        0     1
## qlh          0     1
## b1tblkg      0   100
## mhstrk       0     1
## pascore      0     1
## 
## Fractional polynomials
##          df.initial select alpha df.final power1 power2
## gsgrpavg          1      1   0.1        1      1      .
## mharth            1      1   0.1        1      1      .
## mhpark            1      1   0.1        1      1      .
## b1tbfkg           1      1   0.1        1      1      .
## mhcopd            1      1   0.1        1      1      .
## giage1            1      1   0.1        1      1      .
## hwbmi             1      1   0.1        1      1      .
## nfwlkspd          1      1   0.1        1      1      .
## b1fnd             1      1   0.1        1      1      .
## qlh               1      1   0.1        1      1      .
## b1tblkg           4      1   0.1        1      1      .
## mhstrk            1      1   0.1        1      1      .
## pascore           1      1   0.1        1      1      .
## 
## 
## Transformations of covariates:
##                     formula
## b1tblkg  I((b1tblkg/100)^1)
## giage1               giage1
## mhstrk               mhstrk
## pascore             pascore
## mhpark               mhpark
## mhcopd               mhcopd
## mharth               mharth
## qlh                     qlh
## hwbmi                 hwbmi
## gsgrpavg           gsgrpavg
## nfwlkspd           nfwlkspd
## b1fnd                 b1fnd
## b1tbfkg             b1tbfkg
## 
## 
## Deviance table:
##           Resid. Dev
## Null model    4274.221
## Linear model  4037.74
## Final model   4037.74

Creating transformed variables

step4_narm <- step3_narm1 %>% 
  mutate(inv_sq_walk = nfwlkspd^-2) %>% 
  mutate(
    grip_trform = 
      (gsgrpavg/100)^-2 + ((gsgrpavg/100)^-2 * log(gsgrpavg/100)))

Visualizing transformed variables

plot(step4_narm$nfwlkspd, step4_narm$inv_sq_walk)

plot(step4_narm$nfwlkspd, (step4_narm$nfwlkspd+1)^-2)

plot(step4_narm$gsgrpavg,step4_narm$grip_trform)

From the above plots, we can formulate an offseted transformation of inverse squared walking speed, which decreases the disproportionate values given to small absolute values of walking speed:

step4_narm <- step4_narm %>% 
  mutate(offsetinv_sq_walk = (nfwlkspd+1)^-2)
step4_narm = step4_narm %>%
  arrange(desc(inv_sq_walk)) %>%
  map_df(rev)
gg9 <-
  ggplot(data = step4_narm) +
  stat_smooth(mapping = aes(x = inv_sq_walk, y = falls), method = "loess") +
  stat_smooth(mapping = aes(x = inv_sq_walk, y = falls), method = "lm", color = "red") +
  theme_minimal()


step4_narm = step4_narm %>%
  arrange(desc(grip_trform)) %>%
  map_df(rev)
gg10 <-
  ggplot(data = step4_narm) +
  stat_smooth(mapping = aes(x = grip_trform, y = falls), method = "loess") +
  stat_smooth(mapping = aes(x = grip_trform, y = falls), method = "lm", color = "red") +
  theme_minimal()

grid.arrange(gg9, gg5, gg10, gg4)
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

step4_model$call
## glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
##     pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd + b1tbfkg + 
##     b1tblkg, family = "binomial", data = step3_narm1)
summary(step4_model)$coef # without transformations
##                 Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept) -2.603372408 0.9563516254 -2.722192 6.485052e-03
## giage1       0.025140758 0.0081895108  3.069873 2.141498e-03
## mhstrk       0.301647522 0.1628266514  1.852568 6.394423e-02
## mhpark       1.618930925 0.3697138706  4.378875 1.192934e-05
## mhcopd       0.451069377 0.1204304576  3.745476 1.800521e-04
## mharth       0.384990347 0.0841367768  4.575768 4.744768e-06
## pascore      0.001143792 0.0006300872  1.815291 6.947915e-02
## qlh          0.322746343 0.1153725017  2.797429 5.151114e-03
## hwbmi       -0.073355050 0.0242189021 -3.028835 2.454990e-03
## gsgrpavg    -0.039507594 0.0062236393 -6.347989 2.181476e-10
## nfwlkspd    -0.636315533 0.2103364029 -3.025228 2.484459e-03
## b1fnd        0.984569397 0.3373304990  2.918709 3.514847e-03
## b1tbfkg      0.043484761 0.0110869386  3.922161 8.775821e-05
## b1tblkg      0.018296076 0.0088195514  2.074491 3.803378e-02
step5_model1 <- glm(
  formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth +
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd +
    b1tbfkg + b1tblkg, 
  family = "binomial",
  data = step4_narm)

step5_model2 <- glm(
  formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth +
    pascore + qlh + hwbmi + grip_trform + inv_sq_walk + b1fnd +
    b1tbfkg + b1tblkg, 
  family = "binomial",
  data = step4_narm)

step5_model3 <- glm(
  formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth +
    pascore + qlh + hwbmi + gsgrpavg + nfwlkspd + b1fnd +
    b1tbfkg + b1tblkg, 
  family = "binomial",
  data = step4_narm)

step5_model4 <- glm(
  formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth +
    pascore + qlh + hwbmi + gsgrpavg + offsetinv_sq_walk + b1fnd +
    b1tbfkg + b1tblkg, 
  family = "binomial",
  data = step4_narm)

summary(step5_model1)$coef
##                 Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept) -3.655476545 0.8622491548 -4.239467 2.240515e-05
## giage1       0.025910809 0.0080776930  3.207699 1.338014e-03
## mhstrk       0.314877369 0.1628795256  1.933192 5.321257e-02
## mhpark       1.608457365 0.3738094170  4.302881 1.685916e-05
## mhcopd       0.458829234 0.1204820610  3.808278 1.399377e-04
## mharth       0.385600207 0.0841637031  4.581550 4.615424e-06
## pascore      0.001190304 0.0006300573  1.889199 5.886516e-02
## qlh          0.310048930 0.1158407517  2.676510 7.439335e-03
## hwbmi       -0.077480518 0.0243399888 -3.183260 1.456267e-03
## gsgrpavg    -0.039688221 0.0062068220 -6.394290 1.612948e-10
## inv_sq_walk  0.329280347 0.0874846580  3.763864 1.673080e-04
## b1fnd        0.991873642 0.3380155264  2.934403 3.341904e-03
## b1tbfkg      0.045946167 0.0111285174  4.128687 3.648404e-05
## b1tblkg      0.018749315 0.0088357703  2.121979 3.383954e-02
summary(step5_model2)$coef
##                  Estimate   Std. Error    z value     Pr(>|z|)
## (Intercept) -5.4122348264 0.8216797017 -6.5867939 4.494255e-11
## giage1       0.0373242258 0.0078957211  4.7271459 2.276976e-06
## mhstrk       0.3385558123 0.1624902667  2.0835452 3.720156e-02
## mhpark       1.7018466026 0.3756791262  4.5300537 5.896871e-06
## mhcopd       0.4468274522 0.1197035101  3.7327849 1.893743e-04
## mharth       0.4312030948 0.0835952970  5.1582219 2.493060e-07
## pascore      0.0008894403 0.0006278547  1.4166340 1.565900e-01
## qlh          0.3209934172 0.1154609104  2.7801047 5.434138e-03
## hwbmi       -0.0541584439 0.0239776520 -2.2587051 2.390174e-02
## grip_trform -0.0059828929 0.0055649152 -1.0751095 2.823257e-01
## inv_sq_walk  0.3822300898 0.0901284942  4.2409461 2.225795e-05
## b1fnd        0.9509848192 0.3368614931  2.8230737 4.756564e-03
## b1tbfkg      0.0457854757 0.0111054038  4.1228105 3.742775e-05
## b1tblkg     -0.0028217370 0.0081707661 -0.3453455 7.298347e-01
summary(step5_model3)$coef
##                 Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept) -2.603372408 0.9563516254 -2.722192 6.485052e-03
## giage1       0.025140758 0.0081895108  3.069873 2.141498e-03
## mhstrk       0.301647522 0.1628266514  1.852568 6.394423e-02
## mhpark       1.618930925 0.3697138706  4.378875 1.192934e-05
## mhcopd       0.451069377 0.1204304576  3.745476 1.800521e-04
## mharth       0.384990347 0.0841367768  4.575768 4.744768e-06
## pascore      0.001143792 0.0006300872  1.815291 6.947915e-02
## qlh          0.322746343 0.1153725017  2.797429 5.151114e-03
## hwbmi       -0.073355050 0.0242189021 -3.028835 2.454990e-03
## gsgrpavg    -0.039507594 0.0062236393 -6.347989 2.181476e-10
## nfwlkspd    -0.636315533 0.2103364029 -3.025228 2.484459e-03
## b1fnd        0.984569397 0.3373304990  2.918709 3.514847e-03
## b1tbfkg      0.043484761 0.0110869386  3.922161 8.775821e-05
## b1tblkg      0.018296076 0.0088195514  2.074491 3.803378e-02
summary(step5_model4)$coef
##                       Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept)       -4.014656409 0.8599357322 -4.668554 3.033270e-06
## giage1             0.024233835 0.0081657444  2.967743 2.999946e-03
## mhstrk             0.299398859 0.1629957140  1.836851 6.623188e-02
## mhpark             1.604294299 0.3714403301  4.319117 1.566546e-05
## mhcopd             0.452475024 0.1205081163  3.754727 1.735308e-04
## mharth             0.382449378 0.0841963251  4.542352 5.562995e-06
## pascore            0.001197172 0.0006302949  1.899384 5.751394e-02
## qlh                0.305943309 0.1158710717  2.640377 8.281386e-03
## hwbmi             -0.076204826 0.0242717658 -3.139649 1.691503e-03
## gsgrpavg          -0.039050897 0.0062202641 -6.278013 3.429280e-10
## offsetinv_sq_walk  3.431448884 0.9186412938  3.735352 1.874526e-04
## b1fnd              0.988708073 0.3375463018  2.929104 3.399411e-03
## b1tbfkg            0.044360126 0.0110904169  3.999861 6.337970e-05
## b1tblkg            0.018584759 0.0088285333  2.105079 3.528444e-02
summary(step4_model)$deviance
## [1] 4037.74
summary(step5_model1)$deviance
## [1] 4029.636
summary(step5_model2)$deviance
## [1] 4069.507
summary(step5_model3)$deviance
## [1] 4037.74
summary(step5_model4)$deviance
## [1] 4033.275

Step 5 model

Note from the above output that the second model, including the inverse square walking speed parameter but not grip strength transform, has the least deviance, therefore that will be the model chosen for this step.

The transforms identified by the fractional polynomial track seem to produce less linear log odds with respect to the outcome. We’ll proceed with two preliminary final models: one containing the transforms of average grip strength and walk speed, and one containing the original encoding. Even though the inverse square of walk speed has a more statistically significant Wald statistic, we can see from the above plot that the linear fit for the log odds is estimated above one for a significant portion of the range. At any rate, there isn’t a significant change in deviance between the model generated in step 4 and the models containing the transformations.

step5_model <- step5_model1
summary(step5_model)$coef
##                 Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept) -3.655476545 0.8622491548 -4.239467 2.240515e-05
## giage1       0.025910809 0.0080776930  3.207699 1.338014e-03
## mhstrk       0.314877369 0.1628795256  1.933192 5.321257e-02
## mhpark       1.608457365 0.3738094170  4.302881 1.685916e-05
## mhcopd       0.458829234 0.1204820610  3.808278 1.399377e-04
## mharth       0.385600207 0.0841637031  4.581550 4.615424e-06
## pascore      0.001190304 0.0006300573  1.889199 5.886516e-02
## qlh          0.310048930 0.1158407517  2.676510 7.439335e-03
## hwbmi       -0.077480518 0.0243399888 -3.183260 1.456267e-03
## gsgrpavg    -0.039688221 0.0062068220 -6.394290 1.612948e-10
## inv_sq_walk  0.329280347 0.0874846580  3.763864 1.673080e-04
## b1fnd        0.991873642 0.3380155264  2.934403 3.341904e-03
## b1tbfkg      0.045946167 0.0111285174  4.128687 3.648404e-05
## b1tblkg      0.018749315 0.0088357703  2.121979 3.383954e-02

Step 6: Exploring Interactions

Creating the interaction models

Since there are thirteen variables in the model, there are ${13} $ \(= 78\) total interaction models to screen.

EMM1 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*mhstrk,    family = "binomial",   data = step4_narm)
EMM2 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*mhpark,    family = "binomial",   data = step4_narm)
EMM3 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*mhcopd,    family = "binomial",   data = step4_narm)
EMM4 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*mharth,    family = "binomial",   data = step4_narm)
EMM5 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*pascore,    family = "binomial",   data = step4_narm)
EMM6 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*qlh,    family = "binomial",   data = step4_narm)
EMM7 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*hwbmi,    family = "binomial",   data = step4_narm)
EMM8 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*gsgrpavg,    family = "binomial",   data = step4_narm)
EMM9 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*inv_sq_walk,    family = "binomial",   data = step4_narm)
EMM10 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*b1fnd,    family = "binomial",   data = step4_narm)
EMM11 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM12 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + giage1*b1tblkg,    family = "binomial",   data = step4_narm)
EMM13 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*mhpark,    family = "binomial",   data = step4_narm)
EMM14 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*mhcopd,    family = "binomial",   data = step4_narm)
EMM15 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*mharth,    family = "binomial",   data = step4_narm)
EMM16 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*pascore,    family = "binomial",   data = step4_narm)
EMM17 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*qlh,    family = "binomial",   data = step4_narm)
EMM18 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*hwbmi,    family = "binomial",   data = step4_narm)
EMM19 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*gsgrpavg,    family = "binomial",   data = step4_narm)
EMM20 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*inv_sq_walk,    family = "binomial",   data = step4_narm)
EMM21 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*b1fnd,    family = "binomial",   data = step4_narm)
EMM22 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM23 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhstrk*b1tblkg,    family = "binomial",   data = step4_narm)
EMM24 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhpark*mhcopd,    family = "binomial",   data = step4_narm)
EMM25 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhpark*mharth,    family = "binomial",   data = step4_narm)
EMM26 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhpark*pascore,    family = "binomial",   data = step4_narm)
EMM27 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhpark*qlh,    family = "binomial",   data = step4_narm)
EMM28 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhpark*hwbmi,    family = "binomial",   data = step4_narm)
EMM29 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhpark*gsgrpavg,    family = "binomial",   data = step4_narm)
EMM30 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhpark*inv_sq_walk,    family = "binomial",   data = step4_narm)
EMM31 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhpark*b1fnd,    family = "binomial",   data = step4_narm)
EMM32 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhpark*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM33 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhpark*b1tblkg,    family = "binomial",   data = step4_narm)
EMM34 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhcopd*mharth,    family = "binomial",   data = step4_narm)
EMM35 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhcopd*pascore,    family = "binomial",   data = step4_narm)
EMM36 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhcopd*qlh,    family = "binomial",   data = step4_narm)
EMM37 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhcopd*hwbmi,    family = "binomial",   data = step4_narm)
EMM38 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhcopd*gsgrpavg,    family = "binomial",   data = step4_narm)
EMM39 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhcopd*inv_sq_walk,    family = "binomial",   data = step4_narm)
EMM40 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhcopd*b1fnd,    family = "binomial",   data = step4_narm)
EMM41 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhcopd*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM42 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mhcopd*b1tblkg,    family = "binomial",   data = step4_narm)
EMM43 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mharth*pascore,    family = "binomial",   data = step4_narm)
EMM44 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mharth*qlh,    family = "binomial",   data = step4_narm)
EMM45 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mharth*hwbmi,    family = "binomial",   data = step4_narm)
EMM46 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mharth*gsgrpavg,    family = "binomial",   data = step4_narm)
EMM47 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mharth*inv_sq_walk,    family = "binomial",   data = step4_narm)
EMM48 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mharth*b1fnd,    family = "binomial",   data = step4_narm)
EMM49 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mharth*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM50 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + mharth*b1tblkg,    family = "binomial",   data = step4_narm)
EMM51 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + pascore*qlh,    family = "binomial",   data = step4_narm)
EMM52 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + pascore*hwbmi,    family = "binomial",   data = step4_narm)
EMM53 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + pascore*gsgrpavg,    family = "binomial",   data = step4_narm)
EMM54 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + pascore*inv_sq_walk,    family = "binomial",   data = step4_narm)
EMM55 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + pascore*b1fnd,    family = "binomial",   data = step4_narm)
EMM56 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + pascore*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM57 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + pascore*b1tblkg,    family = "binomial",   data = step4_narm)
EMM58 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + qlh*hwbmi,    family = "binomial",   data = step4_narm)
EMM59 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + qlh*gsgrpavg,    family = "binomial",   data = step4_narm)
EMM60 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + qlh*inv_sq_walk,    family = "binomial",   data = step4_narm)
EMM61 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + qlh*b1fnd,    family = "binomial",   data = step4_narm)
EMM62 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + qlh*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM63 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + qlh*b1tblkg,    family = "binomial",   data = step4_narm)
EMM64 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + hwbmi*gsgrpavg,    family = "binomial",   data = step4_narm)
EMM65 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + hwbmi*inv_sq_walk,    family = "binomial",   data = step4_narm)
EMM66 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + hwbmi*b1fnd,    family = "binomial",   data = step4_narm)
EMM67 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + hwbmi*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM68 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + hwbmi*b1tblkg,    family = "binomial",   data = step4_narm)
EMM69 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + gsgrpavg*inv_sq_walk,    family = "binomial",   data = step4_narm)
EMM70 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + gsgrpavg*b1fnd,    family = "binomial",   data = step4_narm)
EMM71 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + gsgrpavg*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM72 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + gsgrpavg*b1tblkg,    family = "binomial",   data = step4_narm)
EMM73 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + inv_sq_walk*b1fnd,    family = "binomial",   data = step4_narm)
EMM75 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + inv_sq_walk*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM76 <-
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth +
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd +
    b1tbfkg + b1tblkg + inv_sq_walk*b1tblkg,    family = "binomial",   data = step4_narm)
EMM77 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + b1fnd*b1tbfkg,    family = "binomial",   data = step4_narm)
EMM74 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + b1fnd*b1tblkg,    family = "binomial",   data = step4_narm)
EMM78 <- 
  glm(formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + 
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + 
    b1tbfkg + b1tblkg + b1tbfkg*b1tblkg,    family = "binomial",   data = step4_narm)

EMM_list <- list(
  EMM1,
  EMM2,
  EMM3,
  EMM4,
  EMM5,
  EMM6,
  EMM7,
  EMM8,
  EMM9,
  EMM10,
  EMM11,
  EMM12,
  EMM13,
  EMM14,
  EMM15,
  EMM16,
  EMM17,
  EMM18,
  EMM19,
  EMM20,
  EMM21,
  EMM22,
  EMM23,
  EMM24,
  EMM25,
  EMM26,
  EMM27,
  EMM28,
  EMM29,
  EMM30,
  EMM31,
  EMM32,
  EMM33,
  EMM34,
  EMM35,
  EMM36,
  EMM37,
  EMM38,
  EMM39,
  EMM40,
  EMM41,
  EMM42,
  EMM43,
  EMM44,
  EMM45,
  EMM46,
  EMM47,
  EMM48,
  EMM49,
  EMM50,
  EMM51,
  EMM52,
  EMM53,
  EMM54,
  EMM55,
  EMM56,
  EMM57,
  EMM58,
  EMM59,
  EMM60,
  EMM61,
  EMM62,
  EMM63,
  EMM64,
  EMM65,
  EMM66,
  EMM67,
  EMM68,
  EMM69,
  EMM70,
  EMM71,
  EMM72,
  EMM73,
  EMM74,
  EMM75,
  EMM76,
  EMM77,
  EMM78) # 400 lines of nonsense 

Finding statistically significant EMM

which(unlist(map(.x = EMM_list, 
    .f = function(x){
      summary(x)$coef[15,1] > 0.1
    })))
## [1] 17 20 24 25 30 39 48 61 73
v_pval <- unlist(map(.x = EMM_list, 
    .f = function(x){
      summary(x)$coef[15,1]
    }))

v_test <- unlist(map(.x = EMM_list, 
    .f = function(x){
      summary(x)$coef[15,1] > 0.1
    }))

v_terms <- unlist(map(.x = EMM_list,
    .f = function(x){
      rownames(summary(x)$coef)[15]
    }))

df_EMM <- data.frame(
  v_pval,
  v_test
)
rownames(df_EMM) <- v_terms
df_EMM
##                             v_pval v_test
## giage1:mhstrk         3.077075e-04  FALSE
## giage1:mhpark         4.441974e-02  FALSE
## giage1:mhcopd         2.647677e-02  FALSE
## giage1:mharth        -1.774848e-02  FALSE
## giage1:pascore       -1.288882e-05  FALSE
## giage1:qlh           -3.723410e-02  FALSE
## giage1:hwbmi         -5.796533e-04  FALSE
## giage1:gsgrpavg      -2.340870e-03  FALSE
## giage1:inv_sq_walk    1.627530e-02  FALSE
## giage1:b1fnd          7.805004e-04  FALSE
## giage1:b1tbfkg       -3.444632e-04  FALSE
## giage1:b1tblkg       -4.993912e-04  FALSE
## mhstrk:mhpark        -6.046888e-02  FALSE
## mhstrk:mhcopd        -3.843598e-03  FALSE
## mhstrk:mharth        -5.162830e-01  FALSE
## mhstrk:pascore       -3.291577e-03  FALSE
## mhstrk:qlh            5.342856e-01   TRUE
## mhstrk:hwbmi         -3.825790e-02  FALSE
## mhstrk:gsgrpavg      -1.093581e-02  FALSE
## mhstrk:inv_sq_walk    3.915672e-01   TRUE
## mhstrk:b1fnd         -7.033287e-01  FALSE
## mhstrk:b1tbfkg       -1.737786e-02  FALSE
## mhstrk:b1tblkg       -1.976545e-02  FALSE
## mhpark:mhcopd         1.241152e+00   TRUE
## mhpark:mharth         5.965834e-01   TRUE
## mhpark:pascore       -1.769477e-02  FALSE
## mhpark:qlh           -1.785505e-01  FALSE
## mhpark:hwbmi         -9.419156e-02  FALSE
## mhpark:gsgrpavg       2.687024e-02  FALSE
## mhpark:inv_sq_walk    1.525909e+00   TRUE
## mhpark:b1fnd         -2.273769e+00  FALSE
## mhpark:b1tbfkg        1.875983e-02  FALSE
## mhpark:b1tblkg       -4.476684e-02  FALSE
## mhcopd:mharth        -7.149877e-02  FALSE
## mhcopd:pascore       -2.224044e-03  FALSE
## mhcopd:qlh            3.154201e-02  FALSE
## mhcopd:hwbmi          1.698593e-02  FALSE
## mhcopd:gsgrpavg      -2.121380e-02  FALSE
## mhcopd:inv_sq_walk    2.974181e-01   TRUE
## mhcopd:b1fnd         -2.915117e-01  FALSE
## mhcopd:b1tbfkg        1.942758e-02  FALSE
## mhcopd:b1tblkg        2.267697e-03  FALSE
## mharth:pascore        1.293269e-03  FALSE
## mharth:qlh           -2.570944e-01  FALSE
## mharth:hwbmi          1.768440e-02  FALSE
## mharth:gsgrpavg      -1.008291e-02  FALSE
## mharth:inv_sq_walk   -3.465847e-02  FALSE
## mharth:b1fnd          1.081070e+00   TRUE
## mharth:b1tbfkg        7.353564e-03  FALSE
## mharth:b1tblkg       -5.877040e-03  FALSE
## pascore:qlh          -7.420557e-04  FALSE
## pascore:hwbmi         2.482677e-05  FALSE
## pascore:gsgrpavg      4.866156e-05  FALSE
## pascore:inv_sq_walk  -1.639146e-03  FALSE
## pascore:b1fnd        -5.645277e-03  FALSE
## pascore:b1tbfkg       3.406564e-05  FALSE
## pascore:b1tblkg       8.782536e-05  FALSE
## qlh:hwbmi             3.693628e-02  FALSE
## qlh:gsgrpavg          1.754117e-02  FALSE
## qlh:inv_sq_walk      -1.947243e-01  FALSE
## qlh:b1fnd             4.267914e-01   TRUE
## qlh:b1tbfkg           2.888314e-02  FALSE
## qlh:b1tblkg           9.227418e-03  FALSE
## hwbmi:gsgrpavg        6.357874e-04  FALSE
## hwbmi:inv_sq_walk    -2.731919e-03  FALSE
## hwbmi:b1fnd          -2.715472e-02  FALSE
## hwbmi:b1tbfkg         1.642859e-04  FALSE
## hwbmi:b1tblkg        -8.193786e-04  FALSE
## gsgrpavg:inv_sq_walk -1.808709e-02  FALSE
## gsgrpavg:b1fnd       -2.128056e-02  FALSE
## gsgrpavg:b1tbfkg      4.134400e-04  FALSE
## gsgrpavg:b1tblkg      1.077405e-03  FALSE
## inv_sq_walk:b1fnd     2.920345e-01   TRUE
## b1fnd:b1tblkg        -4.212955e-02  FALSE
## inv_sq_walk:b1tbfkg   2.012397e-03  FALSE
## inv_sq_walk:b1tblkg  -9.821502e-03  FALSE
## b1fnd:b1tbfkg         3.516991e-03  FALSE
## b1tbfkg:b1tblkg      -4.093231e-04  FALSE
length(rownames(df_EMM))
## [1] 78
length(unique(rownames(df_EMM)))
## [1] 78

Several models show statistically significant interaction: 20, 24,25,30,39,48,61,73.

The three statistically significant interaction terms: mhpark:mhcopd, mhpark:mharth, and mharth:b1fnd.

Examining number of subject in each interacting group

length(step4_narm[step4_narm$mhcopd == 1 & step4_narm$mhpark == 1,]$id)
## [1] 4
step4_narm[step4_narm$mhcopd == 1 & step4_narm$mhpark == 1,]
## # A tibble: 4 x 32
##   id     site  giage1 mhdiab mhstrk mhpark mhcopd mharth mhcancer pascore
##   <chr>  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>    <dbl>   <dbl>
## 1 BI0249 BI        71      0      0      1      1      1        0   108. 
## 2 SD8239 SD        72      0      0      1      1      1        0    69.6
## 3 PA3345 PA        76      0      0      1      1      0        1   105. 
## 4 SD8183 SD        76      0      0      1      1      1        1    82.8
## # ... with 22 more variables: qlhealth <dbl>, hwbmi <dbl>, gsgrpavg <dbl>,
## #   nfwlkspd <dbl>, b1fnd <dbl>, b1tbfkg <dbl>, b1tblkg <dbl>, mhfalln2 <dbl>,
## #   st_1 <dbl>, st_2 <dbl>, st_3 <dbl>, st_4 <dbl>, st_5 <dbl>, falls <dbl>,
## #   qlh2 <dbl>, qlh3 <dbl>, qlh4 <dbl>, qlh5 <dbl>, qlh <dbl>,
## #   inv_sq_walk <dbl>, grip_trform <dbl>, offsetinv_sq_walk <dbl>
length(step4_narm[step4_narm$mharth == 1 & step4_narm$mhpark == 1,]$id)
## [1] 14
step4_narm[step4_narm$mharth == 1 & step4_narm$mhpark == 1,]
## # A tibble: 14 x 32
##    id     site  giage1 mhdiab mhstrk mhpark mhcopd mharth mhcancer pascore
##    <chr>  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>    <dbl>   <dbl>
##  1 MN1504 MN        82      0      1      1      0      1        0    75.7
##  2 MN1592 MN        75      0      0      1      0      1        0   141. 
##  3 PI5075 PI        74      0      0      1      0      1        1   120. 
##  4 BI0249 BI        71      0      0      1      1      1        0   108. 
##  5 BI0179 BI        69      0      0      1      0      1        0   146  
##  6 MN2133 MN        75      0      0      1      0      1        0   186  
##  7 PO7306 PO        79      0      0      1      0      1        1    67.7
##  8 MN1795 MN        70      0      0      1      0      1        0   173. 
##  9 MN1637 MN        74      0      0      1      0      1        0   180. 
## 10 MN2239 MN        73      0      0      1      0      1        0    16.4
## 11 SD8239 SD        72      0      0      1      1      1        0    69.6
## 12 SD8183 SD        76      0      0      1      1      1        1    82.8
## 13 PI4987 PI        72      0      0      1      0      1        0   128. 
## 14 MN1911 MN        70      0      0      1      0      1        1   181. 
## # ... with 22 more variables: qlhealth <dbl>, hwbmi <dbl>, gsgrpavg <dbl>,
## #   nfwlkspd <dbl>, b1fnd <dbl>, b1tbfkg <dbl>, b1tblkg <dbl>, mhfalln2 <dbl>,
## #   st_1 <dbl>, st_2 <dbl>, st_3 <dbl>, st_4 <dbl>, st_5 <dbl>, falls <dbl>,
## #   qlh2 <dbl>, qlh3 <dbl>, qlh4 <dbl>, qlh5 <dbl>, qlh <dbl>,
## #   inv_sq_walk <dbl>, grip_trform <dbl>, offsetinv_sq_walk <dbl>

There are 4 and 14 subjects (in our reduced dataset) that have both Parkinson’s and a history of COPD and arthritis, respectively. Clearly, that is too few subjects for this interaction term to make sense. We can assess these interactions using the liklihood ratio test to compare them to the model produced by step 5:

EMM17

lrtest(EMM17,step5_model)
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg + mhstrk * qlh
## Model 2: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1  15 -2013.7                     
## 2  14 -2014.8 -1 2.3385     0.1262
summary(EMM17)$coefficients[15,4]
## [1] 0.1250471

This doesn’t meet our criterion, \(\alpha = 0.05\).

EMM20

lrtest(EMM20,step5_model)
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg + mhstrk * inv_sq_walk
## Model 2: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1  15 -2014.1                     
## 2  14 -2014.8 -1 1.5001     0.2207
summary(EMM20)$coefficients[15,4]
## [1] 0.2261003

This doesn’t meet our criterion, \(\alpha = 0.05\).

mhpark:mhcopd

lrtest(EMM24,step5_model)
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg + mhpark * mhcopd
## Model 2: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1  15 -2014.3                     
## 2  14 -2014.8 -1 1.0851     0.2976
summary(EMM24)$coefficients[15,4]
## [1] 0.3241368

This doesn’t meet our criterion, \(\alpha = 0.05\).

mhpark:mharth

lrtest(EMM25,step5_model)
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg + mhpark * mharth
## Model 2: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1  15 -2014.5                     
## 2  14 -2014.8 -1 0.6162     0.4324
summary(EMM25)$coefficients[15,4]
## [1] 0.4375495

This doesn’t meet our criterion, \(\alpha = 0.05\).

EMM30

lrtest(EMM30,step5_model)
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg + mhpark * inv_sq_walk
## Model 2: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1  15 -2013.7                     
## 2  14 -2014.8 -1 2.2973     0.1296
summary(EMM30)$coefficients[15,4]
## [1] 0.2117856

This doesn’t meet our criterion, \(\alpha = 0.05\).

EMM39

lrtest(EMM39,step5_model)
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg + mhcopd * inv_sq_walk
## Model 2: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1  15 -2014.1                     
## 2  14 -2014.8 -1 1.4193     0.2335
summary(EMM39)$coefficients[15,4]
## [1] 0.2378462

This doesn’t meet our criterion, \(\alpha = 0.05\).

mharth:b1fnd

lrtest(EMM48,step5_model)
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg + mharth * b1fnd
## Model 2: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1  15 -2013.4                       
## 2  14 -2014.8 -1 2.9054    0.08828 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(EMM48)$coefficients[15,4]
## [1] 0.08908384

This doesn’t meet our criterion, \(\alpha = 0.05\).

EMM61

lrtest(EMM61,step5_model)
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg + qlh * b1fnd
## Model 2: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1  15 -2014.7                     
## 2  14 -2014.8 -1 0.2877     0.5917
summary(EMM61)$coefficients[15,4]
## [1] 0.5914099

This doesn’t meet our criterion, \(\alpha = 0.05\).

EMM73

lrtest(EMM73,step5_model)
## Likelihood ratio test
## 
## Model 1: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg + inv_sq_walk * b1fnd
## Model 2: falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth + pascore + 
##     qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd + b1tbfkg + 
##     b1tblkg
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1  15 -2014.7                     
## 2  14 -2014.8 -1 0.2328     0.6294
summary(EMM73)$coefficients[15,4]
## [1] 0.6323184

This doesn’t meet our criterion, \(\alpha = 0.05\).

Preliminary final model

None of our EMM models passed the liklihood ratio test; therefore the preliminary final model is the one identified from step 5:

preliminary_final_model <- step5_model
as.data.frame(summary(preliminary_final_model)$coef)
##                 Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept) -3.655476545 0.8622491548 -4.239467 2.240515e-05
## giage1       0.025910809 0.0080776930  3.207699 1.338014e-03
## mhstrk       0.314877369 0.1628795256  1.933192 5.321257e-02
## mhpark       1.608457365 0.3738094170  4.302881 1.685916e-05
## mhcopd       0.458829234 0.1204820610  3.808278 1.399377e-04
## mharth       0.385600207 0.0841637031  4.581550 4.615424e-06
## pascore      0.001190304 0.0006300573  1.889199 5.886516e-02
## qlh          0.310048930 0.1158407517  2.676510 7.439335e-03
## hwbmi       -0.077480518 0.0243399888 -3.183260 1.456267e-03
## gsgrpavg    -0.039688221 0.0062068220 -6.394290 1.612948e-10
## inv_sq_walk  0.329280347 0.0874846580  3.763864 1.673080e-04
## b1fnd        0.991873642 0.3380155264  2.934403 3.341904e-03
## b1tbfkg      0.045946167 0.0111285174  4.128687 3.648404e-05
## b1tblkg      0.018749315 0.0088357703  2.121979 3.383954e-02
# pass Wald test for alpha = 0.05  
(summary(preliminary_final_model)$coef[,4]) < 0.05
## (Intercept)      giage1      mhstrk      mhpark      mhcopd      mharth 
##        TRUE        TRUE       FALSE        TRUE        TRUE        TRUE 
##     pascore         qlh       hwbmi    gsgrpavg inv_sq_walk       b1fnd 
##       FALSE        TRUE        TRUE        TRUE        TRUE        TRUE 
##     b1tbfkg     b1tblkg 
##        TRUE        TRUE
# pass Wald test for alpha = 0.10   
(summary(preliminary_final_model)$coef[,4]) < 0.1
## (Intercept)      giage1      mhstrk      mhpark      mhcopd      mharth 
##        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE 
##     pascore         qlh       hwbmi    gsgrpavg inv_sq_walk       b1fnd 
##        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE 
##     b1tbfkg     b1tblkg 
##        TRUE        TRUE

Checking the Fit of the Preliminary Final Model

The Hosmer-Lemeshow test

For the above model, there are several continuous variables; therefore \(J\approx n\), and therefor the assumptions of the Pearson and Deviance Residuals tests (i.e. that the test statistic follows a \(\chi^2_{J-(p+1)}\) distribution) do not hold. Therefore we will use the Hosmer-Lemeshow approach, as implemented in the ResourceSelection package:

n <- length(step4_narm$falls)
n1 <- sum(step4_narm$falls)
g <- max(10,min(n1/2, (n-n1)/2, 2+8^(n/1000)^2))
hoslem.test(step4_narm$falls, fitted(preliminary_final_model), g=g)
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  step4_narm$falls, fitted(preliminary_final_model)
## X-squared = 349.32, df = 375.5, p-value = 0.83

The moderate p-value reported above suggests that the model fits fairly well. Note that the g value was increased due to large sample size.

Assessing discriminative ability

To assess the discriminative ability of the preliminary final model we can compute the AUROC:

gof(x = preliminary_final_model, lpotROC = TRUE)$auc
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases

##          auc lower 95% CI upper 95% CI 
##     66.27690     64.11134     68.44246 
## attr(,"interpret")
## [1] "auc = 0.5       --> useless"   "0.7 < auc < 0.8 --> good"     
## [3] "0.8 < auc < 0.9 --> excellent"

As reported above, AUROC of the preliminary final model is 66.3% (64.1%, 68.4%). The reported valyes being less than 70% is somewhat concerning, but the model may be adequate for some purposes.

Logistic Regression diagnostics

R2

We can compute the pseudo-\(R^2\) statistic (the McFadden \(R^2\) statistic from the DescTools package):

PseudoR2(preliminary_final_model)
##   McFadden 
## 0.05722328

Considering that this \(R^2\) statistic is often quite small compared to those reported in linear regression, the value reported above does not necessarily indicate adequate fit. The model does not reach the threashold of 0.2, which has been described as a ‘moderately strong model’.

AIC & BIC

AIC(preliminary_final_model)
## [1] 4057.636
BIC(preliminary_final_model)
## [1] 4149.132

These values can be used to compare to another candidate model using the same data. If such a model existed, a difference of three or more in AIC would be meaningful.

Graphical Assessment

We can use the ‘plpt’ command from the LogisticDx package, including the change in chi-square residuals vs pi-hat, change in deviance vs pi-hat, change in beta vs pi-hat,change in chi-square residuals vs pi-hat:

plot(preliminary_final_model, devNew = FALSE)

This suggests one main result:

  • The consistent presence of four outliers and one rather severe outlier in several of the plots suggests that their removal may improve the model.

Numerical Assessment

The LogisticDx package provides the dx function for computing diagnostics of our logistic regression model. We will use this to look for influencial outliers; however our influencial outliers have a change in Pearson chi-sq closer to 15:

diag <- dx(preliminary_final_model)
outlier <- diag[which(diag$h>0.05)]
outlier
##    (Intercept) giage1 mhstrk mhpark mhcopd mharth   pascore qlh    hwbmi
## 1:           1     75      0      0      0      1  40.21429   1 27.84372
## 2:           1     81      0      0      0      1 127.78571   0 37.87878
## 3:           1     73      0      0      0      1 136.00000   0 31.07845
## 4:           1     77      0      0      0      1 104.57143   1 33.23531
##    gsgrpavg inv_sq_walk     b1fnd  b1tbfkg  b1tblkg y         P n      yhat
## 1:     36.5   16.301406 0.8374081 20.40021 50.52593 1 0.9709247 1 0.9709247
## 2:     25.0    7.830669 1.0266128 27.23861 63.06787 1 0.7473776 1 0.7473776
## 3:     33.0    9.828225 0.6522009 28.16036 57.16411 1 0.7882817 1 0.7882817
## 4:     31.0    8.781344 0.7813811 18.13544 64.52979 0 0.7434641 1 0.7434641
##            Pr         dr          h        sPr        sdr     dChisq       dDev
## 1:  0.1730492  0.2429255 0.05036835  0.1775792  0.2492847 0.03153436 0.06214284
## 2:  0.5813878  0.7631314 0.06965246  0.6027585  0.7911826 0.36331776 0.62596988
## 3:  0.5182490  0.6897822 0.10319766  0.5472554  0.7283894 0.29948852 0.53055113
## 4: -1.7023775 -1.6495372 0.08919896 -1.7837916 -1.7284243 3.18191259 2.98745050
##          dBhat
## 1: 0.001672579
## 2: 0.027200561
## 3: 0.034463016
## 4: 0.311619436

Four influencial outliers are identified. Consultation with a subject matter expert would allow us to make an informed decision on whether to include or exclude these subjects. However, we can note that these four outliers represent the extreme values of the inverse square of the walking speed. We could consider censoring PO7440, PI5413, BI0650, & PA3753; and repeat the analysis. We will first consider creating a model with the walking speed offset by 1, so as to avoid creating extremely large values of the inverse square of the walking speed for extremely small values of walking speed. ‘step5_model4’ is such a model, and we can compare it using the diagnostics above:

hoslem.test(step4_narm$falls, fitted(step5_model4), g=10)
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  step4_narm$falls, fitted(step5_model4)
## X-squared = 13.908, df = 8, p-value = 0.0842
PseudoR2(step5_model4)
##  McFadden 
## 0.0563718
AIC(step5_model4)
## [1] 4061.275
BIC(step5_model4)
## [1] 4152.771

Although the AIC & BIC of this model are slightly better, the Hosmer-Lemeshow and pesudo-\(R^2\) values are worse. We can proceed by completing the proposed censoring of observations above:

Creating a new model with censored outliers

Censoring the outliers:

step7_narm <- subset(step4_narm, id != 'PA3753')
step7_narm <- subset(step7_narm, id != 'PI5413')
step7_narm <- subset(step7_narm, id != 'PO7441')
step7_narm <- subset(step7_narm, id != 'PI5436')

Creating the censored model:

step7_model1 <- glm(
  formula = falls ~ giage1 + mhstrk + mhpark + mhcopd + mharth +
    pascore + qlh + hwbmi + gsgrpavg + inv_sq_walk + b1fnd +
    b1tbfkg + b1tblkg, 
  family = "binomial",
  data = step7_narm)

summary(step7_model1)$coef
##                 Estimate   Std. Error   z value     Pr(>|z|)
## (Intercept) -3.670468171 0.8630827296 -4.252742 2.111688e-05
## giage1       0.025699153 0.0081243470  3.163227 1.560307e-03
## mhstrk       0.311450873 0.1630566525  1.910078 5.612321e-02
## mhpark       1.600554604 0.3741526404  4.277812 1.887394e-05
## mhcopd       0.457077427 0.1205354113  3.792059 1.494032e-04
## mharth       0.384421000 0.0841813663  4.566581 4.957443e-06
## pascore      0.001193856 0.0006305183  1.893452 5.829782e-02
## qlh          0.315580723 0.1161938933  2.715984 6.607917e-03
## hwbmi       -0.076700285 0.0243744118 -3.146754 1.650936e-03
## gsgrpavg    -0.039625837 0.0062150523 -6.375785 1.820282e-10
## inv_sq_walk  0.346976688 0.1043442014  3.325309 8.832067e-04
## b1fnd        0.984188232 0.3382128419  2.909967 3.614668e-03
## b1tbfkg      0.045086242 0.0111614751  4.039452 5.357625e-05
## b1tblkg      0.019065542 0.0088366672  2.157549 3.096292e-02

Checking the fit

For the above model, there are several continuous variables; therefore \(J\approx n\), and therefor the assumptions of the Pearson and Deviance Residuals tests (i.e. that the test statistic follows a \(\chi^2_{J-(p+1)}\) distribution) do not hold. Therefore we will use the Hosmer-Lemeshow approach, as implemented in the ResourceSelection package:

n <- length(step7_narm$falls)
n1 <- sum(step7_narm$falls)
g <- max(10,min(n1/2, (n-n1)/2, 2+8^(n/1000)^2))
hoslem.test(step7_narm$falls, fitted(step7_model1), g=g)
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  step7_narm$falls, fitted(step7_model1)
## X-squared = 403.09, df = 374, p-value = 0.1442

The moderate p-value reported above is adequately large, but only slightly better than that of the preliminary final. model. To assess the discriminative ability of the preliminary final model we can compute the AUROC:

gof(x = step7_model1, lpotROC = TRUE, g=g+1)$auc
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases

##          auc lower 95% CI upper 95% CI 
##     66.15480     63.98589     68.32371 
## attr(,"interpret")
## [1] "auc = 0.5       --> useless"   "0.7 < auc < 0.8 --> good"     
## [3] "0.8 < auc < 0.9 --> excellent"

As reported above, AUROC of the preliminary final model is 66.3% (64.1%, 68.4%). The reported valyes being less than 70% is somewhat concerning, but the model may be adequate for some purposes.

We can compute the pseudo-\(R^2\) statistic (the McFadden \(R^2\) statistic from the DescTools package):

PseudoR2(step7_model1)
##   McFadden 
## 0.05552766

Considering that this \(R^2\) statistic is often quite small compared to those reported in linear regression, the value reported above does not necessarily indicate adequate fit. The model does not reach the threashold of 0.2, which has been described as a ‘moderately strong model’. It is slightly worse than the \(R^2\) for the previous model.

AIC & BIC

AIC(step7_model1)
## [1] 4053.755
BIC(step7_model1)
## [1] 4145.24

The AIC is smaller than the previously reported value, but the BIC is larger.

Graphical Assessment

We can use the ‘plot’ command from the LogisticDx package, including the change in chi-square residuals vs pi-hat, change in deviance vs pi-hat, change in beta vs pi-hat,change in chi-square residuals vs pi-hat:

plot(step7_model1, devNew = FALSE)

This suggests one main result:

  • Outliers still exist in the dataset, but the ones with the strongest influence on the model have been removed.

Numerical Assessment

The LogisticDx package provides the dx function for computing diagnostics of our logistic regression model. We will use this to look for influencial outliers; however our influencial outliers have a change in Pearson chi-sq closer to 15:

diag <- dx(step7_model1)
outlier <- diag[which(diag$h>0.05)]
outlier
## Empty data.table (0 rows and 26 cols): (Intercept),giage1,mhstrk,mhpark,mhcopd,mharth...

This shows that no new outliers have arisen after changes to the model after removing influencial outliers.

Final Model

The model with outliers removed does not perform substantially better than the previous model. Therefore we can use the preliminary final model as the final model:

final_model <- preliminary_final_model

Results

Betas for final model

tbl_regression(final_model, 
               label = list(giage1 ~ "Age"
#                            ,mhdiab ~ "Diabetes"
                            ,mhstrk ~ "Stroke"
                            ,mhpark ~ "Parkinsons"
                            ,mhcopd ~ "COPD"
                            ,mharth ~ "Arthritis or Gout"
#                            ,mhcancer ~ "Cancer"
                            ,pascore ~ "PASE Score"
                            ,qlh ~ "Subjective Health Rating"
                            ,hwbmi ~ "Body Mass Index"
                            ,b1tbfkg ~ "Total Body Fat"
                            ,b1tblkg ~ "Lean Body Mass"
                            ,gsgrpavg ~ "Average Grip Strength"
                            ,inv_sq_walk ~ "Inverse Square Walking Speed"
                            ,b1fnd ~ "Corrected Femoral Neck BMD"
                            ),
               exponentiate = FALSE)
Characteristic log(OR)1 95% CI1 p-value
Age 0.03 0.01, 0.04 0.001
Stroke 0.31 -0.01, 0.63 0.053
Parkinsons 1.6 0.87, 2.3 <0.001
COPD 0.46 0.22, 0.69 <0.001
Arthritis or Gout 0.39 0.22, 0.55 <0.001
PASE Score 0.00 0.00, 0.00 0.059
Subjective Health Rating 0.31 0.08, 0.53 0.007
Body Mass Index -0.08 -0.13, -0.03 0.001
Average Grip Strength -0.04 -0.05, -0.03 <0.001
Inverse Square Walking Speed 0.33 0.17, 0.51 <0.001
Corrected Femoral Neck BMD 1.0 0.33, 1.7 0.003
Total Body Fat 0.05 0.02, 0.07 <0.001
Lean Body Mass 0.02 0.00, 0.04 0.034

1 OR = Odds Ratio, CI = Confidence Interval

OR Calculate all of them

tbl_regression(final_model, 
               label = list(giage1 ~ "Age"
#                            ,mhdiab ~ "Diabetes"
                            ,mhstrk ~ "Stroke"
                            ,mhpark ~ "Parkinsons"
                            ,mhcopd ~ "COPD"
                            ,mharth ~ "Arthritis or Gout"
#                            ,mhcancer ~ "Cancer"
                            ,pascore ~ "PASE Score"
                            ,qlh ~ "Subjective Health Rating"
                            ,hwbmi ~ "Body Mass Index"
                            ,b1tbfkg ~ "Total Body Fat"
                            ,b1tblkg ~ "Lean Body Mass"
                            ,gsgrpavg ~ "Average Grip Strength"
                            ,inv_sq_walk ~ "Inverse Square Walking Speed"
                            ,b1fnd ~ "Corrected Femoral Neck BMD"
                            ),
               exponentiate = TRUE)
Characteristic OR1 95% CI1 p-value
Age 1.03 1.01, 1.04 0.001
Stroke 1.37 0.99, 1.87 0.053
Parkinsons 5.00 2.38, 10.4 <0.001
COPD 1.58 1.25, 2.00 <0.001
Arthritis or Gout 1.47 1.25, 1.73 <0.001
PASE Score 1.00 1.00, 1.00 0.059
Subjective Health Rating 1.36 1.08, 1.71 0.007
Body Mass Index 0.93 0.88, 0.97 0.001
Average Grip Strength 0.96 0.95, 0.97 <0.001
Inverse Square Walking Speed 1.39 1.18, 1.66 <0.001
Corrected Femoral Neck BMD 2.70 1.39, 5.22 0.003
Total Body Fat 1.05 1.02, 1.07 <0.001
Lean Body Mass 1.02 1.00, 1.04 0.034

1 OR = Odds Ratio, CI = Confidence Interval

Table 1 patient characteristics

MrOs_check <- MrOs %>%  # why did this break?
    dplyr::select(falls, giage1, mhstrk, mhpark, mhcopd, mharth, pascore, qlh, hwbmi, gsgrpavg, nfwlkspd, b1fnd, b1tbfkg, b1tblkg)
MrOs_check$censored <- complete.cases(MrOs_check)
MrOs_check$censored2 <- ifelse(MrOs_check$censored == FALSE, "Excluded", "Included")
#MrOs$censored <- ifelse(MrOs$mhfalln2 == NA,1,0)
#sum(MrOs$censored)
#tbl_summary(step7_narm)
MrOs_check %>%  # why did this break?
    dplyr::select(falls, giage1, mhstrk, mhpark, mhcopd, mharth, pascore, qlh, hwbmi, gsgrpavg, nfwlkspd, b1fnd, b1tbfkg, b1tblkg, censored2)%>% 
    tbl_summary(by = censored2, missing = "no",
    statistic = list(all_continuous() ~ "{mean}",
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    label = list(falls ~ "More than one fall"
                 ,giage1 ~ "Age"
                 ,mhstrk ~ "Stroke"
                 ,mhpark ~ "Parkinsons"
                 ,mhcopd ~ "COPD"
                 ,mharth ~ "Arthritis or Gout"
                 ,pascore ~ "PASE Score"
                 ,qlh ~ "Subjective Health Rating"
                 ,hwbmi ~ "Body Mass Index"
                 ,gsgrpavg ~ "Average Grip Strength"
                 ,nfwlkspd ~ "Walking Speed"
                 ,b1fnd ~ "Corrected Femoral Neck BMD"
                 ,b1tbfkg ~ "Total Body Fat"
                 ,b1tblkg ~ "Lean Body Mass"
    ),
    ) %>% add_p() %>% add_overall() %>%
  modify_caption("**Table 1. Patient Characteristics**") %>%
  bold_labels()
Table 1. Patient Characteristics
Characteristic Overall, N = 5,9941 Excluded, N = 9021 Included, N = 5,0921 p-value2
More than one fall 786 / 5,224 (15%) 31 / 132 (23%) 755 / 5,092 (15%) 0.006
Age 73.7 76.6 73.1 <0.001
Stroke 344 / 5,994 (5.7%) 88 / 902 (9.8%) 256 / 5,092 (5.0%) <0.001
Parkinsons 52 / 5,994 (0.9%) 20 / 902 (2.2%) 32 / 5,092 (0.6%) <0.001
COPD 640 / 5,994 (11%) 142 / 902 (16%) 498 / 5,092 (9.8%) <0.001
Arthritis or Gout 2,847 / 5,994 (47%) 494 / 902 (55%) 2,353 / 5,092 (46%) <0.001
PASE Score 146 126 150 <0.001
Subjective Health Rating 857 / 5,992 (14%) 245 / 900 (27%) 612 / 5,092 (12%) <0.001
Body Mass Index 27.4 27.1 27.4 <0.001
Average Grip Strength 39 34 39 <0.001
Walking Speed 1.20 1.09 1.22 <0.001
Corrected Femoral Neck BMD 0.78 0.76 0.79 <0.001
Total Body Fat 22 21 22 <0.001
Lean Body Mass 57 55 57 <0.001

1 n / N (%); Mean

2 Pearson's Chi-squared test; Wilcoxon rank sum test

Calculate p-values for differences between outcomes in each exposure group

References

  1. Hothorn T, Zeileis A, Farebrother (pan.f) RW, Cummins (pan.f) C, Millo G, Mitchell D. Lmtest: Testing Linear Regression Models.; 2020. Accessed April 30, 2021. https://CRAN.R-project.org/package=lmtest
  2. Hu B, Shao J, Palta M. PSEUDO-R2 IN LOGISTIC REGRESSION MODEL. :14.
  3. Signorell A. Tools for Descriptive Statistics [R Package DescTools Version 0.99.41]. Comprehensive R Archive Network (CRAN); 2021. Accessed June 6, 2021. https://CRAN.R-project.org/package=DescTools
  4. Sjoberg D. Gtsummary: Presentation-Ready Data Summary and Analytic Result Tables [R Package Gtsummary Version 1.4.1]. Comprehensive R Archive Network (CRAN); 2021. Accessed May 20, 2021. https://CRAN.R-project.org/package=gtsummary
  5. Solymos P, Keim J, Lele S. Resource Selection (Probability) Functions for Use-Availability Data [R Package ResourceSelection Version 0.3-5]. Comprehensive R Archive Network (CRAN); 2019. Accessed June 6, 2021. https://CRAN.R-project.org/package=ResourceSelection

  1. OHSU-PSU School of Public Health↩︎

  2. OHSU-PSU School of Public Health↩︎

  3. OHSU-PSU School of Public Health↩︎

  4. OHSU-PSU School of Public Health↩︎

  5. OHSU-PSU School of Public Health↩︎