1: Age Quintiles

Create age quintiles (weighted, in entire subpop2 sample). Report the range and median (midpoint) of each quintile (also weighted).

use "NHANES0708_all.dta"

svyset sdmvpsu [pw=wtmec2yr], strata(sdmvstra)

xtile age_q5=ridageyr [aweight=wtmec2yr] if subpop2==1,nq(5)

bysort age_q5: tabstat ridageyr [aweight=wtmec2yr],stat(n min max p50)
      pweight: wtmec2yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>



-------------------------------------------------------------------------------
-> age_q5 = 1

    variable |         N       min       max       p50
-------------+----------------------------------------
    ridageyr |       342         4         6         5
------------------------------------------------------

-------------------------------------------------------------------------------
-> age_q5 = 2

    variable |         N       min       max       p50
-------------+----------------------------------------
    ridageyr |       337         7         9         8
------------------------------------------------------

-------------------------------------------------------------------------------
-> age_q5 = 3

    variable |         N       min       max       p50
-------------+----------------------------------------
    ridageyr |       213        10        11        11
------------------------------------------------------

-------------------------------------------------------------------------------
-> age_q5 = 4

    variable |         N       min       max       p50
-------------+----------------------------------------
    ridageyr |       237        12        14        13
------------------------------------------------------

-------------------------------------------------------------------------------
-> age_q5 = 5

    variable |         N       min       max       p50
-------------+----------------------------------------
    ridageyr |       192        15        17        16
------------------------------------------------------

-------------------------------------------------------------------------------
-> age_q5 = .

    variable |         N       min       max       p50
-------------+----------------------------------------
    ridageyr |      8441         0        80        39
------------------------------------------------------

2 Run the crude model with the categorical quintile variable.

use "NHANES0708_all.dta"

svyset sdmvpsu [pw=wtmec2yr], strata(sdmvstra)
xtile age_q5=ridageyr [aweight=wtmec2yr] if subpop2==1,nq(5)

xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata i.age_q5
      pweight: wtmec2yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>


i.age_q5          _Iage_q5_1-5        (naturally coded; _Iage_q5_1 omitted)
(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =        16                Number of obs     =       10,149
Number of PSUs     =        32                Population size   =  297,136,095
                                              Subpop. no. obs   =          439
                                              Subpop. size      = 8,131,107.05
                                              Design df         =           16
                                              F(   4,     13)   =         2.24
                                              Prob > F          =       0.1215

------------------------------------------------------------------------------
             |             Linearized
      wccata |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  _Iage_q5_2 |   .1842817   .3964377     0.46   0.648    -.6561288    1.024692
  _Iage_q5_3 |  -.1072657   .3823348    -0.28   0.783    -.9177793     .703248
  _Iage_q5_4 |   1.236589    .467235     2.65   0.018     .2460951    2.227083
  _Iage_q5_5 |   .2704047   .5807907     0.47   0.648    -.9608166    1.501626
       _cons |   -1.92872   .3308428    -5.83   0.000    -2.630076   -1.227365
------------------------------------------------------------------------------

3 Run the crude model with the linear age term.

use "NHANES0708_all.dta"

svyset sdmvpsu [pw=wtmec2yr], strata(sdmvstra)
xtile age_q5=ridageyr [aweight=wtmec2yr] if subpop2==1,nq(5)

xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata ridageyr
      pweight: wtmec2yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>


(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =        16                Number of obs     =       10,149
Number of PSUs     =        32                Population size   =  297,136,095
                                              Subpop. no. obs   =          439
                                              Subpop. size      = 8,131,107.05
                                              Design df         =           16
                                              F(   1,     16)   =         4.81
                                              Prob > F          =       0.0434

------------------------------------------------------------------------------
             |             Linearized
      wccata |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    ridageyr |   .0816693   .0372445     2.19   0.043     .0027145    .1606241
       _cons |  -2.323536   .3852478    -6.03   0.000    -3.140225   -1.506847
------------------------------------------------------------------------------

4 How closely does the linear relationship align with the quintiles?

5 Test higher order terms: quadratic and cubic models

Quadratic

This first creates a centered age variable, and creates higher-order terms based on that variable.

use "C:\Users\Matt\Documents\EPI536\Assignments\NHANES0708_all.dta"
svyset sdmvpsu [pw=wtmec2yr], strata(sdmvstra)

# Calculates weighted mean for creating the centered age variable
svy,subpop(subpop2): mean ridageyr
# Creates weighted age variable
gen agec=ridageyr-10.2

# Creates higher-order age variables
gen agec_sq=agec^2
gen agec_cb=agec^3

# Verifying new variables
su ridageyr agec agec_sq agec_cb if subpop2==1

# Run Quadratic
xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata agec agec_sq
      pweight: wtmec2yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

Unknown #command
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =      16      Number of obs   =       10,149
Number of PSUs   =      32      Population size =  297,136,095
                                Subpop. no. obs =        1,321
                                Subpop. size    = 23,276,001.8
                                Design df       =           16

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    ridageyr |   10.16534   .2237876      9.690935    10.63975
--------------------------------------------------------------

Unknown #command

Unknown #command


Unknown #command

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    ridageyr |      1,321    9.663134    3.846791          4         17
        agec |      1,321    -.536866    3.846791       -6.2        6.8
     agec_sq |      1,321    15.07482    14.03692        .04      46.24
     agec_cb |      1,321   -10.41995    118.9342   -238.328    314.432

Unknown #command
(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =        16                Number of obs     =       10,149
Number of PSUs     =        32                Population size   =  297,136,095
                                              Subpop. no. obs   =          439
                                              Subpop. size      = 8,131,107.05
                                              Design df         =           16
                                              F(   2,     15)   =         3.00
                                              Prob > F          =       0.0800

------------------------------------------------------------------------------
             |             Linearized
      wccata |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        agec |   .0877135   .0455836     1.92   0.072    -.0089194    .1843464
     agec_sq |  -.0195448   .0089637    -2.18   0.045    -.0385469   -.0005427
       _cons |  -1.210634   .1982293    -6.11   0.000    -1.630861   -.7904065
------------------------------------------------------------------------------

Cubic

This first creates a centered age variable, and creates higher-order terms based on that variable.

use "C:\Users\Matt\Documents\EPI536\Assignments\NHANES0708_all.dta"
svyset sdmvpsu [pw=wtmec2yr], strata(sdmvstra)

# Calculates weighted mean for creating the centered age variable
svy,subpop(subpop2): mean ridageyr
# Creates weighted age variable
gen agec=ridageyr-10.2

# Creates higher-order age variables
gen agec_sq=agec^2
gen agec_cb=agec^3

# Run Cubic
xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata agec agec_sq agec_cb
      pweight: wtmec2yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

Unknown #command
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =      16      Number of obs   =       10,149
Number of PSUs   =      32      Population size =  297,136,095
                                Subpop. no. obs =        1,321
                                Subpop. size    = 23,276,001.8
                                Design df       =           16

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    ridageyr |   10.16534   .2237876      9.690935    10.63975
--------------------------------------------------------------

Unknown #command

Unknown #command


Unknown #command
(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =        16                Number of obs     =       10,149
Number of PSUs     =        32                Population size   =  297,136,095
                                              Subpop. no. obs   =          439
                                              Subpop. size      = 8,131,107.05
                                              Design df         =           16
                                              F(   3,     14)   =         2.28
                                              Prob > F          =       0.1245

------------------------------------------------------------------------------
             |             Linearized
      wccata |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        agec |    .166679   .0862877     1.93   0.071    -.0162427    .3496008
     agec_sq |  -.0173134   .0077224    -2.24   0.039    -.0336841   -.0009427
     agec_cb |  -.0029188   .0030952    -0.94   0.360    -.0094803    .0036427
       _cons |  -1.244106   .1796734    -6.92   0.000    -1.624996   -.8632152
------------------------------------------------------------------------------

Plot:

use "C:\Users\Matt\Documents\EPI536\Assignments\NHANES0708_all.dta"
svyset sdmvpsu [pw=wtmec2yr], strata(sdmvstra)

# generate quintiles
xtile age_q5=ridageyr [aweight=wtmec2yr] if subpop2==1,nq(5)

# Calculates weighted mean for creating the centered age variable
svy,subpop(subpop2): mean ridageyr
# Creates weighted age variable
gen agec=ridageyr-10.2

# Creates higher-order age variables
gen agec_sq=agec^2
gen agec_cb=agec^3

# Run Cubic
xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata agec agec_sq agec_cb
# Run model with age quintiles
xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata i.age_q5
predict pred_wc2_q5,xb
predict pred_wc2_se_q5,stdp
gen pred_wc2_ub_q5=pred_wc2_q5 + 1.96*pred_wc2_se_q5
gen pred_wc2_lb_q5=pred_wc2_q5 - 1.96*pred_wc2_se_q5

*Run model with linear age
xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata ridageyr
predict pred_wc2_x,xb
*Run model with quadratic age
xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata agec agec_sq
predict pred_wc2_sq,xb

twoway scatter pred_wc2_q5 ridageyr if subpop2==1 & male==1 & foodinsec==0, mcolor(blue) ///
    || scatter pred_wc2_lb_q5 ridageyr if subpop2==1 & male==1 & foodinsec==0, msymbol(smx) mcolor(blue) ///
    || scatter pred_wc2_ub_q5 ridageyr if subpop2==1 & male==1 & foodinsec==0, msymbol(smx) mcolor(blue) ///
    || scatter pred_wc2_x ridageyr if subpop2==1 & male==1 & foodinsec==0 ///
    || scatter pred_wc2_sq ridageyr if subpop2==1 & male==1 & foodinsec==0

graph export age_plot.svg, replace
      pweight: wtmec2yr
          VCE: linearized
  Single unit: missing
     Strata 1: sdmvstra
         SU 1: sdmvpsu
        FPC 1: <zero>

Unknown #command

Unknown #command
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =      16      Number of obs   =       10,149
Number of PSUs   =      32      Population size =  297,136,095
                                Subpop. no. obs =        1,321
                                Subpop. size    = 23,276,001.8
                                Design df       =           16

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    ridageyr |   10.16534   .2237876      9.690935    10.63975
--------------------------------------------------------------

Unknown #command

Unknown #command


Unknown #command
(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =        16                Number of obs     =       10,149
Number of PSUs     =        32                Population size   =  297,136,095
                                              Subpop. no. obs   =          439
                                              Subpop. size      = 8,131,107.05
                                              Design df         =           16
                                              F(   3,     14)   =         2.28
                                              Prob > F          =       0.1245

------------------------------------------------------------------------------
             |             Linearized
      wccata |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        agec |    .166679   .0862877     1.93   0.071    -.0162427    .3496008
     agec_sq |  -.0173134   .0077224    -2.24   0.039    -.0336841   -.0009427
     agec_cb |  -.0029188   .0030952    -0.94   0.360    -.0094803    .0036427
       _cons |  -1.244106   .1796734    -6.92   0.000    -1.624996   -.8632152
------------------------------------------------------------------------------

Unknown #command
i.age_q5          _Iage_q5_1-5        (naturally coded; _Iage_q5_1 omitted)
(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =        16                Number of obs     =       10,149
Number of PSUs     =        32                Population size   =  297,136,095
                                              Subpop. no. obs   =          439
                                              Subpop. size      = 8,131,107.05
                                              Design df         =           16
                                              F(   4,     13)   =         2.24
                                              Prob > F          =       0.1215

------------------------------------------------------------------------------
             |             Linearized
      wccata |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  _Iage_q5_2 |   .1842817   .3964377     0.46   0.648    -.6561288    1.024692
  _Iage_q5_3 |  -.1072657   .3823348    -0.28   0.783    -.9177793     .703248
  _Iage_q5_4 |   1.236589    .467235     2.65   0.018     .2460951    2.227083
  _Iage_q5_5 |   .2704047   .5807907     0.47   0.648    -.9608166    1.501626
       _cons |   -1.92872   .3308428    -5.83   0.000    -2.630076   -1.227365
------------------------------------------------------------------------------

(8828 missing values generated)

(8828 missing values generated)

(8,828 missing values generated)

(8,828 missing values generated)

(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =        16                Number of obs     =       10,149
Number of PSUs     =        32                Population size   =  297,136,095
                                              Subpop. no. obs   =          439
                                              Subpop. size      = 8,131,107.05
                                              Design df         =           16
                                              F(   1,     16)   =         4.81
                                              Prob > F          =       0.0434

------------------------------------------------------------------------------
             |             Linearized
      wccata |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    ridageyr |   .0816693   .0372445     2.19   0.043     .0027145    .1606241
       _cons |  -2.323536   .3852478    -6.03   0.000    -3.140225   -1.506847
------------------------------------------------------------------------------


(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =        16                Number of obs     =       10,149
Number of PSUs     =        32                Population size   =  297,136,095
                                              Subpop. no. obs   =          439
                                              Subpop. size      = 8,131,107.05
                                              Design df         =           16
                                              F(   2,     15)   =         3.00
                                              Prob > F          =       0.0800

------------------------------------------------------------------------------
             |             Linearized
      wccata |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        agec |   .0877135   .0455836     1.92   0.072    -.0089194    .1843464
     agec_sq |  -.0195448   .0089637    -2.18   0.045    -.0385469   -.0005427
       _cons |  -1.210634   .1982293    -6.11   0.000    -1.630861   -.7904065
------------------------------------------------------------------------------



(file age_plot.svg written in SVG format)

Bonus

Lowess

use "C:\Users\Matt\Documents\EPI536\Assignments\NHANES0708_all.dta"

lowess wccata ridageyr if subpop2==1 & male==1 & foodinsec==0,logit

graph export lowess.svg, replace
(file lowess.svg written in SVG format)

nlcheck

use "NHANES0708_all.dta"

xi: logit wccata ridageyr if subpop2==1 & male==1 & foodinsec==0
nlcheck ridageyr,graph
graph export nlcheck.svg, replace
Iteration 0:   log likelihood = -202.29128  
Iteration 1:   log likelihood = -199.98977  
Iteration 2:   log likelihood = -199.97022  
Iteration 3:   log likelihood = -199.97022  

Logistic regression                             Number of obs     =        439
                                                LR chi2(1)        =       4.64
                                                Prob > chi2       =     0.0312
Log likelihood = -199.97022                     Pseudo R2         =     0.0115

------------------------------------------------------------------------------
      wccata |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    ridageyr |   .0709313   .0328997     2.16   0.031     .0064491    .1354135
       _cons |  -2.237634   .3474705    -6.44   0.000    -2.918663   -1.556604
------------------------------------------------------------------------------


Nonlinearity test:

           chi2(  9) =   11.24
         Prob > chi2 =    0.2597

(file nlcheck.svg written in SVG format)

Stata Code Accompanying the assignment

*Stata-generated plot

*Create quintile variable xtile prage_q5=dmdhrage [aweight=wtmec2yr] if subpop2==1,nq(5)

Create centered variable svy,subpop(subpop2): mean dmdhrage mean PR age: 39.0 gen pragec=dmdhrage-39 *Create higher order terms gen pragec_sq=pragec^2 gen pragec_cb=pragec^3

Run model with age quintiles xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata i.prage_q5 predict pred_wc2_q5,xb predict pred_wc2_se_q5,stdp gen pred_wc2_ub_q5=pred_wc2_q5 + 1.96pred_wc2_se_q5 gen pred_wc2_lb_q5=pred_wc2_q5 - 1.96*pred_wc2_se_q5

Run model with linear age xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata dmdhrage predict pred_wc2_x,xb Run model with quadratic age xi: svy,subpop(if subpop2==1 & male==1 & foodinsec==0): logit wccata pragec pragec_sq predict pred_wc2_sq,xb

twoway scatter pred_wc2_q5 dmdhrage if subpop2==1 & male==1 & foodinsec==0, mcolor(blue) /// || scatter pred_wc2_lb_q5 dmdhrage if subpop2==1 & male==1 & foodinsec==0, msymbol(smx) mcolor(blue) /// || scatter pred_wc2_ub_q5 dmdhrage if subpop2==1 & male==1 & foodinsec==0, msymbol(smx) mcolor(blue) /// || scatter pred_wc2_x dmdhrage if subpop2==1 & male==1 & foodinsec==0 /// || scatter pred_wc2_sq dmdhrage if subpop2==1 & male==1 & foodinsec==0