The basic unit-level model (Battese, Harter, and Fuller 1988; Rao and Molina 2015) is given by \[ y_j = \beta' x_j + v_{i[j]} + \epsilon_j\,,\\ v_i \stackrel{\mathrm{iid}}{\sim} {\cal N}(0, \sigma_v^2) \qquad \epsilon_j \stackrel{\mathrm{iid}}{\sim} {\cal N}(0, \sigma^2) \] where \(j\) runs from 1 to \(n\), the number of unit-level observations, \(\beta\) is a vector of regression coefficients for given covariates \(x_j\), and \(v_i\) are random area intercepts.
We use the api
dataset included in packages survey.
library(survey)
data(api)
apipop$cname <- as.factor(apipop$cname)
apisrs$cname <- factor(apisrs$cname, levels=levels(apipop$cname))
The apipop
data frame contains the complete population
whereas apisrs
is a simple random sample from it. The
variable cname
is the county name, and we will be
interested in estimation at the county level. Not all counties in the
population are sampled. In order to be able to make predictions for
out-of-sample areas we make sure that the levels of the sample’s
cname
variable match those of its population
counterpart.
The basic unit-level model with county random area effects is fit as follows
library(mcmcsae)
mod <- api00 ~
reg(~ ell + meals + stype + hsg + col.grad + grad.sch, name="beta") +
gen(factor = ~ iid(cname), name="v")
sampler <- create_sampler(mod, data=apisrs)
sim <- MCMCsim(sampler, store.all=TRUE, verbose=FALSE)
(summary(sim))
## llh_ :
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## llh_ -1090 4.49 -243 0.12 -1097 -1089 -1083 1394 1
##
## sigma_ :
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## sigma_ 56.2 3.11 18.1 0.067 51.4 56.1 61.4 2149 1
##
## beta :
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## (Intercept) 802.755 25.179 31.88 0.49801 761.691 802.847 843.8644 2556 0.999
## ell -1.960 0.306 -6.40 0.00584 -2.481 -1.962 -1.4580 2751 1.001
## meals -1.966 0.284 -6.92 0.00542 -2.435 -1.966 -1.5030 2748 1.000
## stypeH -106.948 13.367 -8.00 0.25196 -128.710 -106.845 -84.3888 2814 1.000
## stypeM -60.240 11.604 -5.19 0.21186 -79.362 -60.367 -40.9081 3000 1.000
## hsg -0.618 0.413 -1.50 0.00754 -1.292 -0.613 0.0543 3000 1.000
## col.grad 0.580 0.507 1.14 0.01005 -0.234 0.573 1.4234 2541 0.999
## grad.sch 2.121 0.480 4.42 0.00936 1.338 2.106 2.9163 2629 1.000
##
## v_sigma :
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## v_sigma 23.3 6.26 3.72 0.202 13.7 22.9 34.1 962 1
##
## v :
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## Alameda -25.745 15.2 -1.69449 0.327 -51.6 -25.3497 -1.11 2160 1.000
## Amador 0.162 24.2 0.00669 0.449 -39.8 0.0835 39.76 2915 1.000
## Butte -0.452 23.7 -0.01906 0.456 -39.1 -0.2165 38.43 2699 1.000
## Calaveras 8.082 22.0 0.36734 0.431 -27.0 7.4774 44.89 2607 1.001
## Colusa 0.521 24.4 0.02130 0.453 -40.4 0.8411 40.15 2916 1.000
## Contra Costa -10.131 15.2 -0.66481 0.278 -35.3 -9.5220 13.53 3000 1.000
## Del Norte 1.001 23.9 0.04186 0.437 -38.3 0.3058 41.06 3000 0.999
## El Dorado 0.449 24.5 0.01833 0.477 -38.7 0.7864 39.66 2633 1.000
## Fresno 0.152 15.5 0.00982 0.283 -25.9 0.2895 25.52 3000 1.000
## Glenn -0.614 23.8 -0.02579 0.434 -40.1 0.1035 37.19 3000 1.000
## ... 47 elements suppressed ...
We wish to estimate the area population means \[
\theta_i = \frac{1}{N_i}\sum_{j \in U_i} y_j\,,
\] where \(U_i\) is the set of
units in area \(i\) of size \(N_i\). The MCMC output in variable
sim
can be used to obtain draws from the posterior
distribution for \(\theta_i\). The
\(r\)th draw can be expressed as \[
\theta_{i;r} = \frac{1}{N_i} \left(n_i \bar{y}_i + \beta_r'(t_{x;i}
- n_i \bar{x}_i) + (N_i - n_i)v_{i;r} + \sum_{j \in U_i\setminus s_i}
\epsilon_{j;r} \right)\,,
\] where \(\bar{y}_i\) is the
sample mean of \(y\) in area \(i\) and \(t_{x;i}\) is a vector of population totals
for area \(i\).
N <- table(apipop$cname)
samplesums <- tapply(apisrs$api00, apisrs$cname, sum)
samplesums[is.na(samplesums)] <- 0 # substitute 0 for out-of-sample areas
m <- match(apisrs$cds, apipop$cds) # population units in the sample
res <- predict(sim, newdata=apipop, labels=names(N),
fun=function(x, p) (samplesums + tapply(x[-m], apipop$cname[-m], sum ))/N,
show.progress=FALSE)
(summ <- summary(res))
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## Alameda 678 14.45 46.9 0.310 654 679 702 2170 1.000
## Amador 738 31.03 23.8 0.573 689 738 790 2938 1.000
## Butte 679 26.40 25.7 0.503 635 679 723 2750 1.000
## Calaveras 744 27.34 27.2 0.499 700 743 789 3000 1.000
## Colusa 552 31.90 17.3 0.595 499 553 605 2870 1.000
## Contra Costa 727 14.52 50.0 0.268 703 727 750 2928 1.000
## Del Norte 680 31.90 21.3 0.582 629 679 732 3000 0.999
## El Dorado 754 26.92 28.0 0.509 710 754 798 2798 1.000
## Fresno 595 15.08 39.4 0.275 570 595 619 3000 1.000
## Glenn 624 31.41 19.9 0.573 573 625 674 3000 1.000
## Humboldt 701 27.54 25.5 0.504 656 702 748 2984 1.001
## Imperial 558 25.21 22.1 0.517 518 556 600 2375 0.999
## Inyo 707 32.91 21.5 0.601 653 707 760 3000 1.000
## Kern 596 14.99 39.8 0.310 571 596 620 2332 0.999
## Kings 594 22.90 25.9 0.500 555 595 630 2094 1.001
## Lake 663 25.54 26.0 0.477 621 662 706 2863 1.000
## Lassen 706 26.44 26.7 0.495 662 707 748 2854 1.001
## Los Angeles 636 8.18 77.7 0.149 622 636 649 2995 1.000
## Madera 615 20.25 30.4 0.370 583 615 650 3000 1.001
## Marin 817 20.41 40.0 0.373 783 817 850 3000 1.000
## Mariposa 724 35.74 20.2 0.674 665 724 783 2816 1.000
## Mendocino 663 27.04 24.5 0.494 618 662 707 3000 1.002
## Merced 579 23.76 24.4 0.434 541 578 619 3000 1.001
## Modoc 672 30.30 22.2 0.598 623 672 722 2570 0.999
## Mono 703 41.74 16.8 0.762 634 703 772 3000 0.999
## Monterey 629 17.96 35.0 0.332 598 629 658 2919 1.000
## Napa 702 21.75 32.3 0.398 667 702 739 2984 0.999
## Nevada 799 29.66 26.9 0.541 749 798 847 3000 1.000
## Orange 693 15.23 45.5 0.294 669 693 718 2686 1.000
## Placer 772 24.02 32.1 0.461 734 772 812 2714 0.999
## Plumas 690 32.28 21.4 0.622 637 690 744 2696 1.000
## Riverside 636 14.26 44.6 0.273 613 636 658 2724 1.000
## Sacramento 669 15.40 43.4 0.281 644 668 694 3000 1.000
## San Benito 709 30.33 23.4 0.567 659 709 759 2862 0.999
## San Bernardino 647 13.11 49.3 0.239 625 647 669 3000 1.000
## San Diego 705 15.09 46.7 0.323 680 705 730 2185 1.000
## San Francisco 638 19.89 32.1 0.363 605 638 671 3000 0.999
## San Joaquin 630 17.66 35.7 0.368 600 630 658 2298 1.000
## San Luis Obispo 753 23.66 31.8 0.432 715 753 792 3000 1.000
## San Mateo 735 21.62 34.0 0.408 700 735 770 2812 1.001
## Santa Barbara 679 19.25 35.3 0.351 647 679 710 3000 1.001
## Santa Clara 733 15.77 46.5 0.295 707 733 758 2865 1.000
## Santa Cruz 680 20.01 34.0 0.400 646 680 712 2497 1.001
## Shasta 696 22.55 30.9 0.434 660 696 734 2705 1.001
## Sierra 708 40.49 17.5 0.777 641 708 775 2716 1.001
## Siskiyou 697 25.23 27.6 0.488 656 697 739 2677 1.000
## Solano 712 20.11 35.4 0.370 678 712 743 2954 1.000
## Sonoma 725 22.51 32.2 0.433 688 725 761 2703 1.000
## Stanislaus 661 19.69 33.6 0.382 629 661 693 2656 1.001
## Sutter 652 25.26 25.8 0.473 610 652 692 2852 1.000
## Tehama 659 28.67 23.0 0.538 612 659 707 2845 1.000
## Trinity 650 38.84 16.7 0.709 586 651 713 3000 1.000
## Tulare 583 21.60 27.0 0.415 546 583 618 2712 1.001
## Tuolumne 720 29.90 24.1 0.546 671 720 769 3000 1.000
## Ventura 710 17.42 40.7 0.332 681 709 739 2758 1.000
## Yolo 687 23.99 28.6 0.454 647 687 726 2791 1.000
## Yuba 627 28.80 21.8 0.533 580 626 675 2925 1.000
theta <- c(tapply(apipop$api00, apipop$cname, mean)) # true population quantities
plot_coef(summ, list(est=theta), n.se=2, est.names=c("mcmcsae", "true"), maxrows=30)
A model with binomial likelihood can also be fit. We now model the
target variable sch.wide
, a binary variable indicating
whether a school-wide growth target has been met. We use the same mean
model structure as above for the linear model, but now using a logistic
link function, \[
y_j \stackrel{\mathrm{iid}}{\sim} {\cal Be}(p_j)\,,\\
\mathrm{logit}(p_j) = \beta' x_j + v_{i[j]}\,,\\
v_i \stackrel{\mathrm{iid}}{\sim} {\cal N}(0, \sigma_v^2)
\]
apisrs$target.met <- as.numeric(apisrs$sch.wide == "Yes")
sampler <- create_sampler(update(mod, target.met ~ .), family="binomial", data=apisrs)
sim <- MCMCsim(sampler, store.all=TRUE, verbose=FALSE)
summary(sim)
## llh_ :
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## llh_ -83.3 2.5 -33.3 0.0616 -87.7 -83.2 -79.6 1651 1
##
## beta :
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## (Intercept) 4.56516 1.4705 3.1044 0.053839 2.2341 4.50600 7.11086 746 1.002
## ell -0.03653 0.0148 -2.4738 0.000397 -0.0615 -0.03621 -0.01266 1383 1.000
## meals 0.00123 0.0139 0.0886 0.000360 -0.0217 0.00131 0.02443 1496 1.001
## stypeH -2.83251 0.6289 -4.5039 0.018450 -3.8958 -2.81534 -1.80446 1162 1.001
## stypeM -1.66460 0.5866 -2.8378 0.018269 -2.6488 -1.66899 -0.69783 1031 1.000
## hsg -0.02637 0.0222 -1.1892 0.000655 -0.0630 -0.02606 0.01058 1146 1.002
## col.grad -0.03600 0.0267 -1.3468 0.000852 -0.0783 -0.03624 0.00844 984 1.001
## grad.sch 0.01267 0.0327 0.3871 0.001279 -0.0397 0.01183 0.06877 655 0.999
##
## v_sigma :
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## v_sigma 0.372 0.267 1.39 0.0117 0.0335 0.325 0.869 525 1.01
##
## v :
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## Alameda -0.19342 0.429 -0.45131 0.01244 -1.041 -0.082722 0.335 1186 1.001
## Amador 0.01100 0.456 0.02412 0.00835 -0.704 0.002089 0.749 2986 1.000
## Butte 0.00190 0.451 0.00421 0.00824 -0.697 0.004309 0.692 2998 1.000
## Calaveras 0.04634 0.454 0.10213 0.00978 -0.613 0.004646 0.818 2154 1.001
## Colusa -0.00633 0.455 -0.01389 0.00832 -0.729 -0.004004 0.711 3000 0.999
## Contra Costa -0.00822 0.396 -0.02074 0.00875 -0.666 0.000858 0.618 2051 1.000
## Del Norte -0.00822 0.471 -0.01745 0.00883 -0.761 -0.002536 0.712 2843 1.000
## El Dorado -0.00807 0.454 -0.01777 0.00933 -0.750 0.000655 0.707 2366 1.000
## Fresno -0.04698 0.365 -0.12865 0.00715 -0.704 -0.015413 0.517 2608 1.000
## Glenn 0.01047 0.464 0.02254 0.00848 -0.687 0.002597 0.745 3000 1.000
## ... 47 elements suppressed ...
To predict the population fractions of schools that meet the growth target by county,
samplesums <- tapply(apisrs$target.met, apisrs$cname, sum)
samplesums[is.na(samplesums)] <- 0 # substitute 0 for out-of-sample areas
res <- predict(sim, newdata=apipop, labels=names(N),
fun=function(x, p) (samplesums + tapply(x[-m], apipop$cname[-m], sum ))/N,
show.progress=FALSE)
(summ <- summary(res))
## Mean SD t-value MCSE q0.05 q0.5 q0.95 n_eff R_hat
## Alameda 0.785 0.0601 13.05 0.001644 0.670 0.796 0.867 1338 1.002
## Amador 0.839 0.1185 7.08 0.002226 0.600 0.900 1.000 2834 1.000
## Butte 0.836 0.0744 11.23 0.001447 0.708 0.833 0.938 2645 1.000
## Calaveras 0.872 0.0960 9.08 0.001921 0.700 0.900 1.000 2499 1.000
## Colusa 0.650 0.1605 4.05 0.003092 0.333 0.667 0.889 2693 0.999
## Contra Costa 0.816 0.0572 14.27 0.001306 0.715 0.821 0.899 1918 1.001
## Del Norte 0.880 0.1090 8.08 0.002101 0.750 0.875 1.000 2693 0.999
## El Dorado 0.849 0.0752 11.29 0.001521 0.725 0.850 0.950 2444 1.000
## Fresno 0.782 0.0583 13.40 0.001205 0.683 0.785 0.866 2344 0.999
## Glenn 0.794 0.1398 5.68 0.002684 0.556 0.778 1.000 2713 0.999
## Humboldt 0.853 0.0734 11.62 0.001436 0.725 0.850 0.950 2609 1.000
## Imperial 0.698 0.1047 6.67 0.002090 0.525 0.700 0.850 2510 1.000
## Inyo 0.784 0.1504 5.21 0.002810 0.571 0.857 1.000 2867 1.000
## Kern 0.805 0.0516 15.59 0.001181 0.711 0.811 0.878 1911 1.000
## Kings 0.851 0.0798 10.66 0.001649 0.720 0.840 0.960 2345 1.000
## Lake 0.826 0.0948 8.71 0.001822 0.682 0.818 0.955 2706 1.001
## Lassen 0.837 0.1070 7.82 0.002028 0.636 0.818 1.000 2785 0.999
## Los Angeles 0.799 0.0427 18.73 0.001128 0.731 0.799 0.872 1431 1.003
## Madera 0.817 0.0738 11.07 0.001497 0.677 0.806 0.935 2430 1.001
## Marin 0.839 0.0675 12.43 0.001566 0.720 0.840 0.940 1856 1.000
## Mariposa 0.862 0.1454 5.93 0.002731 0.600 0.800 1.000 2834 1.000
## Mendocino 0.816 0.0905 9.01 0.001723 0.640 0.840 0.960 2760 1.000
## Merced 0.790 0.0807 9.80 0.001614 0.651 0.794 0.905 2500 1.000
## Modoc 0.835 0.1509 5.53 0.002993 0.600 0.800 1.000 2544 1.001
## Mono 0.760 0.2446 3.11 0.004505 0.333 0.667 1.000 2948 1.002
## Monterey 0.802 0.0685 11.71 0.001537 0.687 0.807 0.904 1988 1.000
## Napa 0.848 0.0808 10.50 0.001664 0.704 0.852 0.963 2357 1.001
## Nevada 0.870 0.0933 9.32 0.001769 0.714 0.857 1.000 2782 1.001
## Orange 0.774 0.0603 12.84 0.001327 0.672 0.778 0.871 2066 1.000
## Placer 0.815 0.0743 10.97 0.001705 0.682 0.818 0.924 1900 1.003
## Plumas 0.743 0.1524 4.87 0.002987 0.444 0.778 1.000 2602 1.000
## Riverside 0.825 0.0529 15.60 0.001238 0.729 0.831 0.895 1824 0.999
## Sacramento 0.847 0.0471 18.00 0.001229 0.767 0.847 0.924 1466 1.003
## San Benito 0.824 0.1232 6.69 0.002517 0.636 0.818 1.000 2396 1.000
## San Bernardino 0.861 0.0420 20.49 0.001134 0.790 0.862 0.928 1374 1.000
## San Diego 0.849 0.0420 20.22 0.000935 0.778 0.852 0.913 2016 0.999
## San Francisco 0.760 0.0756 10.06 0.001505 0.630 0.760 0.870 2521 1.000
## San Joaquin 0.829 0.0575 14.42 0.001274 0.726 0.839 0.911 2034 1.000
## San Luis Obispo 0.857 0.0681 12.57 0.001400 0.750 0.875 0.950 2368 1.001
## San Mateo 0.800 0.0732 10.92 0.001707 0.671 0.811 0.895 1840 1.003
## Santa Barbara 0.833 0.0640 13.01 0.001366 0.728 0.840 0.926 2196 1.001
## Santa Clara 0.784 0.0687 11.41 0.001917 0.649 0.796 0.875 1283 1.002
## Santa Cruz 0.786 0.0745 10.56 0.001571 0.653 0.796 0.898 2245 1.002
## Shasta 0.817 0.0764 10.69 0.001533 0.675 0.825 0.925 2486 0.999
## Sierra 0.730 0.2254 3.24 0.004407 0.333 0.667 1.000 2615 1.000
## Siskiyou 0.843 0.0992 8.51 0.002000 0.667 0.867 1.000 2458 1.001
## Solano 0.866 0.0621 13.94 0.001465 0.750 0.867 0.950 1799 1.001
## Sonoma 0.845 0.0630 13.41 0.001332 0.741 0.852 0.935 2235 1.002
## Stanislaus 0.812 0.0714 11.38 0.001809 0.692 0.824 0.901 1557 1.000
## Sutter 0.834 0.0907 9.20 0.001703 0.650 0.850 0.950 2835 1.000
## Tehama 0.841 0.0991 8.49 0.001876 0.647 0.824 1.000 2794 1.000
## Trinity 0.756 0.1999 3.78 0.003735 0.500 0.750 1.000 2864 1.000
## Tulare 0.831 0.0631 13.17 0.001325 0.718 0.836 0.918 2270 0.999
## Tuolumne 0.882 0.0920 9.59 0.001827 0.750 0.917 1.000 2535 1.000
## Ventura 0.837 0.0509 16.46 0.001085 0.752 0.839 0.913 2197 1.000
## Yolo 0.849 0.0737 11.52 0.001444 0.714 0.857 0.943 2608 1.002
## Yuba 0.798 0.0996 8.01 0.001933 0.632 0.789 0.947 2652 0.999