
Bootstrap replicate weights for sequential Poisson sampling
Source:R/sps_repweights.R
sps_repweights.Rd
Produce bootstrap replicate weights that are appropriate for Poisson sampling, and therefore approximately correct for sequential Poisson sampling.
Arguments
- w
A numeric vector of design (inverse probability) weights for a (sequential) Poisson sample.
- replicates
A positive integer that gives the number of bootstrap replicates (1,000 by default). Non-integers are truncated towards 0.
- tau
A number greater than or equal to 1 that gives the rescale factor for the bootstrap weights. Setting to 1 does not rescale the weights. This can also be a function that takes a vector of bootstrap adjustments and returns a number larger than 1. The default automatically picks the smallest feasible rescale factor (up to a small tolerance).
- dist
A function that produces random deviates with mean 0 and standard deviation 1, such as
rnorm()
. The default uses the pseudo-population method from section 4.1 of Beaumont and Patak (2012); see details.- tol
A non-negative number, strictly less than 1, that gives the tolerance for determining the minimum feasible value of
tau
.
Value
sps_repweights()
returns a matrix of bootstrap replicate weights
with replicates
columns (one for each replicate) and length(w)
rows
(one for each unit in the sample), with the value of tau
as an attribute.
min_tau()
returns a function that takes a vector of bootstrap adjustments
and returns the smallest value for \(\tau\) such that the rescaled
adjustments are greater than or equal to tol
.
Details
Replicate weights are constructed using the generalized bootstrap method by Beaumont and Patak (2012). Their method takes a vector of design weights \(w\), finds a vector of adjustments \(a\) for each bootstrap replicate, and calculates the replicate weights as \(a w\).
There are two ways to calculate the adjustments \(a\). The default
pseudo-population method randomly rounds \(w\) for each replicate to
produce a collection of integer weights \(w'\) that are used to generate a
random vector \(b\) from the binomial distribution. The vector of
adjustments is then \(a = 1 + b - w' / w\). Specifying a
deviates-generating function for dist
uses this function to produce a
random vector \(d\) that is then used to make an adjustment \(a = 1 + d
\sqrt{1 - 1 / w}\).
The adjustments can be rescaled by a value \(\tau \geq 1\) to prevent negative replicate weights. With this rescaling, the adjustment becomes \((a + \tau - 1) / \tau\). If \(\tau > 1\) then the resulting bootstrap variance estimator should be multiplied by \(\tau^2\).
Note
As an alternative to the bootstrap, Ohlsson (1998, equations 2.13) proposes an analytic estimator for the variance of the total \(\hat Y = \sum wy\) (for the take-some units) under sequential Poisson sampling: $$V(\hat Y) = \frac{n}{n - 1} \sum \left(1 - \frac{1}{w}\right) \left(wy - \frac{\hat Y}{n}\right)^2.$$ See Rosén (1997, equation 3.11) for a more general version of this estimator that can be applied to other order sampling schemes. Replacing the left-most correction by \(n / (m - 1)\), where \(m\) is the number of units in the sample, gives a similar estimator for the total under ordinary Poisson sampling, \(\hat Y = n / m \sum wy\).
References
Beaumont, J.-F. and Patak, Z. (2012). On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling. International Statistical Review, 80(1): 127-148.
Ohlsson, E. (1998). Sequential Poisson Sampling. Journal of Official Statistics, 14(2): 149-162.
Rosén, B. (1997). On sampling with probability proportional to size. Journal of Statistical Planning and Inference, 62(2): 159-191.
See also
sps()
for drawing a sequential Poisson sample.
bootstrapFP()
(with method = "wGeneralised"
) in the bootstrapFP
package for calculating the variance of Horvitz-Thompson estimators using
the generalized bootstrap and make_gen_boot_factors()
in the svrep
package.
Examples
# Make a population with units of different size
x <- c(1:10, 100)
# Draw a sequential Poisson sample
(samp <- sps(x, 5))
#> [1] 4 6 8 9 11
# Make some bootstrap replicates
dist <- list(
pseudo_population = NULL,
standard_normal = rnorm,
exponential = \(x) rexp(x) - 1,
uniform = \(x) runif(x, -sqrt(3), sqrt(3))
)
lapply(dist, sps_repweights, w = weights(samp), replicates = 5, tau = 2)
#> $pseudo_population
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 4.8750000 5.375000 1.937500 5.375000 3.656250
#> [2,] 2.4375000 1.291667 1.291667 3.583333 1.291667
#> [3,] 2.0781250 1.578125 2.437500 1.578125 2.078125
#> [4,] 0.5277778 1.291667 1.291667 1.791667 1.027778
#> [5,] 1.0000000 1.000000 1.000000 1.000000 1.000000
#> attr(,"tau")
#> [1] 2
#>
#> $standard_normal
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1.308878 4.195764 6.222635 5.04479605 4.051434
#> [2,] 2.536105 2.813617 3.408599 0.04442534 3.206192
#> [3,] 2.461942 1.657655 2.134877 1.63222601 2.301553
#> [4,] 1.634049 1.605084 1.777511 1.72258501 1.510670
#> [5,] 1.000000 1.000000 1.000000 1.00000000 1.000000
#> attr(,"tau")
#> [1] 2
#>
#> $exponential
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 2.313069 6.386048 2.740352 5.424610 2.707660
#> [2,] 1.491443 1.577468 2.009637 1.699790 2.856261
#> [3,] 1.310694 1.633221 1.455132 1.276392 2.967271
#> [4,] 1.088787 1.170839 1.690410 1.219775 1.183783
#> [5,] 1.000000 1.000000 1.000000 1.000000 1.000000
#> attr(,"tau")
#> [1] 2
#>
#> $uniform
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 2.970496 4.307107 4.721201 4.575343 1.1823646
#> [2,] 3.359473 1.908055 2.419320 2.701965 2.5187027
#> [3,] 1.752788 2.523467 1.216850 1.519328 1.1756872
#> [4,] 2.273226 1.804588 1.541610 2.242415 0.9458692
#> [5,] 1.000000 1.000000 1.000000 1.000000 1.0000000
#> attr(,"tau")
#> [1] 2
#>