Examining Missing Alcohol

Creating the missing alcohol variable

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

ta AL_cat AL_miss,m
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)





   Alcohol |
       Use |  Alcohol Use Missing
  Category | Not Missi    Missing |     Total
-----------+----------------------+----------
None-Light |    20,274          0 |    20,274 
  Moderate |     7,686          0 |     7,686 
     Heavy |     4,232          0 |     4,232 
         . |         0     37,998 |    37,998 
-----------+----------------------+----------
     Total |    32,192     37,998 |    70,190 

Al_miss vs MJ

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): ta MJ AL_miss,col percent
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       21,020
                                              Subpop. size      =  141,524,461
                                              Design df         =          109

----------------------------------------
Cannabis  |     Alcohol Use Missing     
Use       | Not Miss   Missing     Total
----------+-----------------------------
 Never Us |    40.52     33.02     40.03
 Past Use |    44.64     58.72     45.56
 1-10 tim |    8.411      3.37     8.082
 11-20 ti |    1.896     1.424     1.865
 21-30 ti |    4.536     3.463     4.466
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(4)         =  396.8634
    Design-based  F(3.83, 417.25) =   16.5128     P = 0.0000

Al_miss vs BP_cat

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): ta BP_cat AL_miss,col percent
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       21,020
                                              Subpop. size      =  141,524,461
                                              Design df         =          109

----------------------------------------
BP        |     Alcohol Use Missing     
Category  | Not Miss   Missing     Total
----------+-----------------------------
   Normal |    52.31     46.61     51.94
 Elevated |    16.25     16.52     16.27
  Stage 1 |    20.64     23.42     20.82
  Stage 2 |    10.79     13.45     10.97
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(3)         =   70.4881
    Design-based  F(2.98, 324.76) =    4.6614     P = 0.0034

Al_miss vs gndr

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): ta gndr AL_miss,col percent
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       21,020
                                              Subpop. size      =  141,524,461
                                              Design df         =          109

----------------------------------------
          |     Alcohol Use Missing     
   Gender | Not Miss   Missing     Total
----------+-----------------------------
   Female |    50.22     42.12     49.69
     Male |    49.78     57.88     50.31
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(1)         =  112.3311
    Design-based  F(1, 109)       =   26.1362     P = 0.0000

Al_miss vs race_eth

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): ta race_eth AL_miss,col percent
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       21,020
                                              Subpop. size      =  141,524,461
                                              Design df         =          109

----------------------------------------
Recoded   |
Race &    |     Alcohol Use Missing     
Ethnicity | Not Miss   Missing     Total
----------+-----------------------------
 White-NH |    64.39     69.23     64.71
 Black_NH |    11.81     8.897     11.62
   Mex Am |    9.783     10.86     9.854
      Oth |    14.01     11.01     13.82
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(3)         =   79.6670
    Design-based  F(2.75, 299.53) =    5.4876     P = 0.0016

Al_miss vs SMK_cat

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): ta SMK_cat AL_miss,col percent
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,139
Number of PSUs     =       214                Population size   =  306,306,164
                                              Subpop. no. obs   =       20,969
                                              Subpop. size      =  141,285,572
                                              Design df         =          109

----------------------------------------
Smoking   |     Alcohol Use Missing     
Category  | Not Miss   Missing     Total
----------+-----------------------------
    Never |    57.88     38.06     56.59
 Past Smo |    19.29     30.87     20.05
 Light, 1 |    12.98     12.49     12.95
 Moderate |    7.671     12.05     7.957
 Heavy, 2 |    2.173     6.532     2.458
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(4)         = 1018.5098
    Design-based  F(3.62, 394.89) =   52.2495     P = 0.0000

Al_miss vs EDUC_cat

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): ta EDUC_cat AL_miss,col percent
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,182
Number of PSUs     =       214                Population size   =  306,497,065
                                              Subpop. no. obs   =       21,012
                                              Subpop. size      =  141,476,473
                                              Design df         =          109

----------------------------------------
Recode    |
Education |     Alcohol Use Missing     
Level     | Not Miss   Missing     Total
----------+-----------------------------
 Less tha |    3.685     6.953     3.898
 Less tha |    9.775     16.78     10.23
 High Sch |    22.09     27.19     22.42
 Some Col |    33.35     29.32     33.09
  College |     31.1     19.75     30.36
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(4)         =  574.9431
    Design-based  F(3.65, 398.15) =   25.0275     P = 0.0000

Al_miss vs sddsrvyr

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): ta sddsrvyr AL_miss,col percent
svy, subpop(if include==1): ta sddsrvyr AL_miss,row percent
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       21,020
                                              Subpop. size      =  141,524,461
                                              Design df         =          109

----------------------------------------
Survey    |     Alcohol Use Missing     
Year      | Not Miss   Missing     Total
----------+-----------------------------
 2005-200 |    12.77     14.48     12.89
 2007-200 |    13.63     18.05     13.92
 2009-201 |    13.76     16.96     13.97
 2011-201 |     13.9     19.18     14.25
 2013-201 |    14.85     15.49      14.9
 2015-201 |    14.96     15.33     14.98
 2017-201 |    16.12     .5154      15.1
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(6)         =  875.7773
    Design-based  F(4.71, 513.03) =   18.9376     P = 0.0000

(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       21,020
                                              Subpop. size      =  141,524,461
                                              Design df         =          109

----------------------------------------
Survey    |     Alcohol Use Missing     
Year      | Not Miss   Missing     Total
----------+-----------------------------
 2005-200 |    92.67     7.325       100
 2007-200 |    91.54     8.456       100
 2009-201 |    92.08     7.917       100
 2011-201 |    91.22     8.777       100
 2013-201 |    93.22      6.78       100
 2015-201 |    93.33     6.672       100
 2017-201 |    99.78     .2226       100
          | 
    Total |    93.48     6.521       100
----------------------------------------
  Key:  row percentage

  Pearson:
    Uncorrected   chi2(6)         =  875.7773
    Design-based  F(4.71, 513.03) =   18.9376     P = 0.0000

Al_miss vs HEI2015

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): mean hei2015, over(AL_miss)
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running mean on estimation sample)

Survey: Mean estimation

Number of strata =     105           Number of obs   =       68,641
Number of PSUs   =     214           Population size =  297,338,173
                                     Subpop. no. obs =       19,471
                                     Subpop. size    =  132,317,580
                                     Design df       =          109

-------------------------------------------------------------------
                  |             Linearized
                  |       Mean   Std. Err.     [95% Conf. Interval]
------------------+------------------------------------------------
c.hei2015@AL_miss |
     Not Missing  |   52.30551   .2258467      51.85789    52.75313
         Missing  |   50.05454   .5488218      48.96679    51.14229
-------------------------------------------------------------------

Al_miss vs BMI

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): mean bmxbmi, over(AL_miss)
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running mean on estimation sample)

Survey: Mean estimation

Number of strata =     105          Number of obs   =       70,080
Number of PSUs   =     214          Population size =  305,959,182
                                    Subpop. no. obs =       20,910
                                    Subpop. size    =  140,938,589
                                    Design df       =          109

------------------------------------------------------------------
                 |             Linearized
                 |       Mean   Std. Err.     [95% Conf. Interval]
-----------------+------------------------------------------------
c.bmxbmi@AL_miss |
    Not Missing  |   28.88579   .1014983      28.68462    29.08696
        Missing  |   30.07532    .280443      29.51949    30.63115
------------------------------------------------------------------

Al_miss vs Age

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): mean ridageyr, over(AL_miss)
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running mean on estimation sample)

Survey: Mean estimation

Number of strata =     105            Number of obs   =       70,190
Number of PSUs   =     214            Population size =  306,545,053
                                      Subpop. no. obs =       21,020
                                      Subpop. size    =  141,524,461
                                      Design df       =          109

--------------------------------------------------------------------
                   |             Linearized
                   |       Mean   Std. Err.     [95% Conf. Interval]
-------------------+------------------------------------------------
c.ridageyr@AL_miss |
      Not Missing  |   39.21059   .1593037      38.89485    39.52632
          Missing  |   44.38201   .3748651      43.63904    45.12498
--------------------------------------------------------------------

Al_miss vs Income

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

svy, subpop(if include==1): mean indfmpir, over(AL_miss)
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




(running mean on estimation sample)

Survey: Mean estimation

Number of strata =     105            Number of obs   =       68,616
Number of PSUs   =     214            Population size =  298,070,653
                                      Subpop. no. obs =       19,446
                                      Subpop. size    =  133,050,061
                                      Design df       =          109

--------------------------------------------------------------------
                   |             Linearized
                   |       Mean   Std. Err.     [95% Conf. Interval]
-------------------+------------------------------------------------
c.indfmpir@AL_miss |
      Not Missing  |   3.063389   .0341046      2.995795    3.130983
          Missing  |   2.622841   .0726115      2.478927    2.766755
--------------------------------------------------------------------

Results

Those with missing alcohol use data are: * less light MJ use * slightly more stage 2 htn * Slightly more male * Very similar race/eth * Slightly more moderate-heavy smokers * Slightly less educated * Disproportionately in 2017-2018 cycle * slightly lower HEI2015 * Similar BMI * Similar Age * slightly lower income

Most importantly: Very similar results with or without those with missing alcohol data.

New key results

T1 alcohol use re-tabulated

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat .=3, gen(ALC_cat)

label define ALC_cat 0 "None-Light" 1 "Moderate" 2 "Heavy" 3 "Missing"
label values ALC_cat ALC_cat
label variable ALC_cat "Alcohol Use Category"

svy, subpop(if include==1): ta ALC_cat MJ_cat, col percent
ta ALC_cat MJ_cat if include==1,m
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(37998 differences between AL_cat and ALC_cat)




(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       21,020
                                              Subpop. size      =  141,524,461
                                              Design df         =          109

--------------------------------------------------
Alcohol   |
Use       |         Cannabis Use Category         
Category  | Never Us  Past Use   Current     Total
----------+---------------------------------------
 None-Lig |    64.94     40.41     28.29     48.48
 Moderate |    20.86      33.5     37.47     29.01
    Heavy |    8.823     17.69     30.51     15.99
  Missing |     5.38     8.405     3.735     6.521
          | 
    Total |      100       100       100       100
--------------------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(6)         = 6610.8899
    Design-based  F(5.31, 578.40) =  185.3983     P = 0.0000


   Alcohol |
       Use |      Cannabis Use Category
  Category | Never Use   Past Use  Current U |     Total
-----------+---------------------------------+----------
None-Light |     6,237      3,305        913 |    10,455 
  Moderate |     1,899      2,668      1,091 |     5,658 
     Heavy |       969      1,557        901 |     3,427 
   Missing |       598        774        108 |     1,480 
-----------+---------------------------------+----------
     Total |     9,703      8,304      3,013 |    21,020 

T3 Model 2 result

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

xi: svy,subpop(if include==1 & AL_miss==0): logit BP_abn i.MJ2 gndr ridageyr, or
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




i.MJ2             _IMJ2_0-2           (naturally coded; _IMJ2_0 omitted)
(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       19,540
                                              Subpop. size      =  132,295,780
                                              Design df         =          109
                                              F(   4,    106)   =       296.50
                                              Prob > F          =       0.0000

------------------------------------------------------------------------------
             |             Linearized
      BP_abn | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     _IMJ2_1 |   1.149771    .068934     2.33   0.022     1.020952    1.294845
     _IMJ2_2 |   1.142925   .1125914     1.36   0.178     .9402058    1.389352
        gndr |   2.466686   .0995826    22.36   0.000     2.277006    2.672166
    ridageyr |   1.048802   .0021503    23.24   0.000     1.044549    1.053072
       _cons |   .0870363   .0072116   -29.47   0.000     .0738552    .1025699
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

T4 Model 2 result

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat 0=0 1=0 2=0 3=0 4=0 .=1, gen(AL_miss)

label define AL_miss 0 "Not Missing" 1 "Missing"
label values AL_miss AL_miss
label variable AL_miss "Alcohol Use Missing"

xi: svy,subpop(if include==1 & AL_miss==0): mlogit BP_cat i.MJ2 gndr ridageyr, rrr
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(49916 differences between AL_cat and AL_miss)




i.MJ2             _IMJ2_0-2           (naturally coded; _IMJ2_0 omitted)
(running mlogit on estimation sample)

Survey: Multinomial logistic regression

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       19,540
                                              Subpop. size      =  132,295,780
                                              Design df         =          109
                                              F(  12,     98)   =       137.58
                                              Prob > F          =       0.0000

------------------------------------------------------------------------------
             |             Linearized
      BP_cat |        RRR   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Normal       |  (base outcome)
-------------+----------------------------------------------------------------
Elevated     |
     _IMJ2_1 |   1.161144   .0924113     1.88   0.063      .991703    1.359536
     _IMJ2_2 |   1.250721   .1614201     1.73   0.086     .9684329    1.615293
        gndr |   2.323464   .1169758    16.75   0.000     2.102814    2.567268
    ridageyr |   1.027802   .0029303     9.62   0.000      1.02201    1.033626
       _cons |    .069961   .0079452   -23.42   0.000     .0558603    .0876211
-------------+----------------------------------------------------------------
Stage_1_HTN  |
     _IMJ2_1 |   1.112923   .0990483     1.20   0.232     .9329513    1.327611
     _IMJ2_2 |   1.020423   .1230006     0.17   0.867      .803573    1.295791
        gndr |   2.450932   .1272142    17.27   0.000     2.211333    2.716492
    ridageyr |   1.049218   .0026578    18.97   0.000     1.043963    1.054499
       _cons |   .0375769   .0039426   -31.27   0.000     .0305217    .0462629
-------------+----------------------------------------------------------------
Stage_2_HTN  |
     _IMJ2_1 |   1.185814   .1255485     1.61   0.110     .9613548    1.462681
     _IMJ2_2 |   1.183829    .211061     0.95   0.346     .8314333    1.685585
        gndr |   2.787245   .1999721    14.29   0.000     2.417797    3.213147
    ridageyr |    1.08709   .0028555    31.79   0.000     1.081446    1.092765
       _cons |   .0037377   .0004722   -44.24   0.000     .0029098    .0048012
------------------------------------------------------------------------------
Note: _cons estimates baseline relative risk for each outcome.

Examining Excluded in Age 20-59

Age

Exclusions vs age criteria

Most observations (44k / 70k) were not in the correct age range.

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

ta AGE_exclude include,m
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>


   Meets Age |     Meets Overall
   Inclusion |  Inclusion Criteria
    Criteria |  Excluded   Included |     Total
-------------+----------------------+----------
   Age 20-59 |     5,249     21,020 |    26,269 
Age Excluded |    43,921          0 |    43,921 
-------------+----------------------+----------
       Total |    49,170     21,020 |    70,190 

Mean

mean ages are quite similar

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

svy, subpop(if AGE_exclude==0): mean ridageyr, over(include)
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(running mean on estimation sample)

Survey: Mean estimation

Number of strata =     105            Number of obs   =       70,190
Number of PSUs   =     214            Population size =  306,545,053
                                      Subpop. no. obs =       25,348
                                      Subpop. size    =  166,131,356
                                      Design df       =          109

--------------------------------------------------------------------
                   |             Linearized
                   |       Mean   Std. Err.     [95% Conf. Interval]
-------------------+------------------------------------------------
c.ridageyr@include |
         Excluded  |   38.94718   .2541685      38.44342    39.45093
         Included  |   39.54781   .1563763      39.23788    39.85774
--------------------------------------------------------------------

Gender

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

svy, subpop(if AGE_exclude==0): ta gndr include, col percent
ta gndr include if AGE_exclude==0,m
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       25,348
                                              Subpop. size      =  166,131,356
                                              Design df         =          109

----------------------------------------
          |   Meets Overall Inclusion   
          |           Criteria          
   Gender | Excluded  Included     Total
----------+-----------------------------
   Female |    56.95     49.69     50.77
     Male |    43.05     50.31     49.23
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(1)         =  186.7592
    Design-based  F(1, 109)       =   41.1675     P = 0.0000

           |     Meets Overall
           |  Inclusion Criteria
    Gender |  Excluded   Included |     Total
-----------+----------------------+----------
    Female |     2,986     10,679 |    13,665 
      Male |     2,263     10,341 |    12,604 
-----------+----------------------+----------
     Total |     5,249     21,020 |    26,269 

Race

Included participants were a bit whiter than excluded.

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

svy, subpop(if AGE_exclude==0): ta race_eth include, col percent
ta race_eth include if AGE_exclude==0,m
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       25,348
                                              Subpop. size      =  166,131,356
                                              Design df         =          109

----------------------------------------
Recoded   |   Meets Overall Inclusion   
Race &    |           Criteria          
Ethnicity | Excluded  Included     Total
----------+-----------------------------
 White-NH |    52.28     64.71     62.87
 Black_NH |    15.98     11.62     12.27
   Mex Am |       11     9.854     10.02
      Oth |    20.74     13.82     14.84
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(3)         =  652.4977
    Design-based  F(2.50, 272.00) =   38.9452     P = 0.0000


   Recoded |     Meets Overall
    Race & |  Inclusion Criteria
 Ethnicity |  Excluded   Included |     Total
-----------+----------------------+----------
  White-NH |     1,585      8,395 |     9,980 
  Black_NH |     1,242      4,494 |     5,736 
    Mex Am |       923      3,609 |     4,532 
       Oth |     1,499      4,522 |     6,021 
-----------+----------------------+----------
     Total |     5,249     21,020 |    26,269 

Education

Included were slightly more educated; note difference in less than 9th grade education.

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

svy, subpop(if AGE_exclude==0): ta EDUC_cat include, col percent
ta EDUC_cat include if AGE_exclude==0,m
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,169
Number of PSUs     =       214                Population size   =  306,448,136
                                              Subpop. no. obs   =       25,332
                                              Subpop. size      =  166,034,439
                                              Design df         =          109

----------------------------------------
Recode    |   Meets Overall Inclusion   
Education |           Criteria          
Level     | Excluded  Included     Total
----------+-----------------------------
 Less tha |    9.056     3.898     4.661
 Less tha |    12.51     10.23     10.57
 High Sch |    23.54     22.42     22.59
 Some Col |    28.75     33.09     32.45
  College |    26.14     30.36     29.73
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(4)         =  657.2417
    Design-based  F(3.40, 370.45) =   40.4108     P = 0.0000

                      |     Meets Overall
     Recode Education |  Inclusion Criteria
                Level |  Excluded   Included |     Total
----------------------+----------------------+----------
  Less than 9th Grade |       659      1,442 |     2,101 
Less than High School |       812      2,910 |     3,722 
      High School/GED |     1,180      4,738 |     5,918 
         Some College |     1,452      6,754 |     8,206 
     College Graduate |     1,133      5,168 |     6,301 
                    . |        13          8 |        21 
----------------------+----------------------+----------
                Total |     5,249     21,020 |    26,269 

Tobacco Use

Similar

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

svy, subpop(if AGE_exclude==0): ta SMK_cat include, col percent
ta SMK_cat include if AGE_exclude==0,m
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,115
Number of PSUs     =       214                Population size   =  306,228,409
                                              Subpop. no. obs   =       25,278
                                              Subpop. size      =  165,814,712
                                              Design df         =          109

----------------------------------------
          |   Meets Overall Inclusion   
Smoking   |           Criteria          
Category  | Excluded  Included     Total
----------+-----------------------------
    Never |    62.49     56.59     57.46
 Past Smo |    14.78     20.05     19.27
 Light, 1 |    13.98     12.95      13.1
 Moderate |    6.463     7.957     7.736
 Heavy, 2 |    2.285     2.458     2.432
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(4)         =  214.9924
    Design-based  F(3.85, 419.51) =   10.5232     P = 0.0000

                      |     Meets Overall
                      |  Inclusion Criteria
     Smoking Category |  Excluded   Included |     Total
----------------------+----------------------+----------
                Never |     3,313     12,159 |    15,472 
          Past Smoker |       713      3,665 |     4,378 
  Light, 1-10 cif/day |       791      3,121 |     3,912 
Moderate, 11-20 cig/d |       317      1,566 |     1,883 
   Heavy, 21+ cig/day |        91        458 |       549 
                    . |        24         51 |        75 
----------------------+----------------------+----------
                Total |     5,249     21,020 |    26,269 

BMI

Similar

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

svy, subpop(if AGE_exclude==0): mean bmxbmi, over(include)
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(running mean on estimation sample)

Survey: Mean estimation

Number of strata =     105          Number of obs   =       69,915
Number of PSUs   =     214          Population size =  304,934,507
                                    Subpop. no. obs =       25,073
                                    Subpop. size    =  164,520,810
                                    Design df       =          109

------------------------------------------------------------------
                 |             Linearized
                 |       Mean   Std. Err.     [95% Conf. Interval]
-----------------+------------------------------------------------
c.bmxbmi@include |
       Excluded  |   28.70402   .1640714      28.37884    29.02921
       Included  |   28.96312   .1011648      28.76261    29.16362
------------------------------------------------------------------

Income

Excluded had slightly lower income.

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

svy, subpop(if AGE_exclude==0): mean indfmpir, over(include)
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(running mean on estimation sample)

Survey: Mean estimation

Number of strata =     105            Number of obs   =       68,059
Number of PSUs   =     214            Population size =  295,209,716
                                      Subpop. no. obs =       23,217
                                      Subpop. size    =  154,796,019
                                      Design df       =          109

--------------------------------------------------------------------
                   |             Linearized
                   |       Mean   Std. Err.     [95% Conf. Interval]
-------------------+------------------------------------------------
c.indfmpir@include |
         Excluded  |   2.678598   .0526744      2.574199    2.782996
         Included  |    3.03458   .0350647      2.965083    3.104077
--------------------------------------------------------------------

Alcohol

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

recode AL_cat .=3, gen(ALC_cat)

label define ALC_cat 0 "None-Light" 1 "Moderate" 2 "Heavy" 3 "Missing"
label values ALC_cat ALC_cat
label variable ALC_cat "Alcohol Use Category"

svy, subpop(if AGE_exclude==0): ta AL_cat include, col percent
svy, subpop(if AGE_exclude==0): ta ALC_cat include, col percent
ta ALC_cat include if AGE_exclude==0,m
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(37998 differences between AL_cat and ALC_cat)




(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       65,011
Number of PSUs     =       214                Population size   =  281,890,666
                                              Subpop. no. obs   =       21,090
                                              Subpop. size      =  141,476,970
                                              Design df         =          109

----------------------------------------
Alcohol   |   Meets Overall Inclusion   
Use       |           Criteria          
Category  | Excluded  Included     Total
----------+-----------------------------
 None-Lig |    61.53     51.86     52.49
 Moderate |     23.4     31.04     30.54
    Heavy |    15.07      17.1     16.97
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(2)         =  155.1160
    Design-based  F(1.93, 210.58) =   19.5090     P = 0.0000

(running tabulate on estimation sample)

Number of strata   =       105                Number of obs     =       70,190
Number of PSUs     =       214                Population size   =  306,545,053
                                              Subpop. no. obs   =       25,348
                                              Subpop. size      =  166,131,356
                                              Design df         =          109

----------------------------------------
Alcohol   |   Meets Overall Inclusion   
Use       |           Criteria          
Category  | Excluded  Included     Total
----------+-----------------------------
 None-Lig |    22.96     48.48      44.7
 Moderate |    8.731     29.01     26.01
    Heavy |    5.623     15.99     14.45
  Missing |    62.69     6.521     14.84
          | 
    Total |      100       100       100
----------------------------------------
  Key:  column percentage

  Pearson:
    Uncorrected   chi2(3)         =  2.22e+04
    Design-based  F(2.21, 241.06) = 1195.6734     P = 0.0000


   Alcohol |     Meets Overall
       Use |  Inclusion Criteria
  Category |  Excluded   Included |     Total
-----------+----------------------+----------
None-Light |       980     10,455 |    11,435 
  Moderate |       345      5,658 |     6,003 
     Heavy |       225      3,427 |     3,652 
   Missing |     3,699      1,480 |     5,179 
-----------+----------------------+----------
     Total |     5,249     21,020 |    26,269 

Diet Quality

Similar

use "data\NHANES0518_new.dta", clear
svyset sdmvpsu [pw=wtmec12yr], strata(sdmvstra)

svy, subpop(if AGE_exclude==0): mean hei2015, over(include)
      pweight: wtmec12yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

(running mean on estimation sample)

Survey: Mean estimation

Number of strata =     105           Number of obs   =       67,220
Number of PSUs   =     214           Population size =  289,323,664
                                     Subpop. no. obs =       22,378
                                     Subpop. size    =  148,909,968
                                     Design df       =          109

-------------------------------------------------------------------
                  |             Linearized
                  |       Mean   Std. Err.     [95% Conf. Interval]
------------------+------------------------------------------------
c.hei2015@include |
        Excluded  |   52.44492   .3712613      51.70909    53.18074
        Included  |   52.15834   .2244701      51.71344    52.60323
-------------------------------------------------------------------