Bootstrap replicate weights for sequential Poisson sampling
Source:R/sps_repweights.R
sps_repweights.Rd
Produce bootstrap replicate weights that are appropriate for Poisson sampling, and therefore approximately correct for sequential Poisson sampling.
Arguments
- w
A numeric vector of design (inverse probability) weights for a (sequential) Poisson sample.
- replicates
A positive integer that gives the number of bootstrap replicates (1,000 by default). Non-integers are truncated towards 0.
- tau
A number greater than or equal to 1 that gives the rescale factor for the bootstrap weights. Setting to 1 (the default) does not rescale the weights.
- dist
A function that produces random deviates with mean 0 and standard deviation 1, such as
rnorm()
. The default uses the pseudo-population method from section 4.1 of Beaumont and Patak (2012); see details.
Value
A matrix of bootstrap replicate weights with replicates
columns (one for
each replicate) and length(w)
rows (one for each unit in the sample), with
the value of tau
as an attribute.
Details
Replicate weights are constructed using the generalized bootstrap method by Beaumont and Patak (2012). Their method takes a vector of design weights \(w\), finds a vector of adjustments \(a\) for each bootstrap replicate, and calculates the replicate weights as \(a w\).
There are two ways to calculate the adjustments \(a\). The default
pseudo-population method randomly rounds \(w\) for each replicate to
produce a collection of integer weights \(w'\) that are used to generate a
random vector \(b\) from the binomial distribution. The vector of
adjustments is then \(a = 1 + b - w' / w\). Specifying a
deviates-generating function for dist
uses this function to produce a
random vector \(d\) that is then used to make an adjustment \(a = 1 + d
\sqrt{1 - 1 / w}\).
The adjustments can be rescaled by a value \(\tau \geq 1\) to prevent negative replicate weights. With this rescaling, the adjustment becomes \((a + \tau - 1) / \tau\). If \(\tau > 1\) then the resulting bootstrap variance estimator should be multiplied by \(\tau^2\).
Note
As an alternative to the bootstrap, Ohlsson (1998, equations 2.13) proposes an analytic estimator for the variance of the total \(\hat Y = \sum wy\) (for the take-some units) under sequential Poisson sampling: $$V(\hat Y) = \frac{n}{n - 1} \sum \left(1 - \frac{1}{w}\right) \left(wy - \frac{\hat Y}{n}\right)^2.$$ See Rosén (1997, equation 3.11) for a more general version of this estimator that can be applied to other order sampling schemes. Replacing the left-most correction by \(n / (m - 1)\), where \(m\) is the number of units in the sample, gives a similar estimator for the total under ordinary Poisson sampling, \(\hat Y = n / m \sum wy\).
References
Beaumont, J.-F. and Patak, Z. (2012). On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling. International Statistical Review, 80(1): 127-148.
Ohlsson, E. (1998). Sequential Poisson Sampling. Journal of Official Statistics, 14(2): 149-162.
Rosén, B. (1997). On sampling with probability proportional to size. Journal of Statistical Planning and Inference, 62(2): 159-191.
See also
sps()
for drawing a sequential Poisson sample.
bootstrapFP()
(with method = "wGeneralised"
) in the bootstrapFP
package for calculating the variance of Horvitz-Thompson estimators using
the generalized bootstrap.
Examples
# Make a population with units of different size
x <- c(1:10, 100)
# Draw a sequential Poisson sample
(samp <- sps(x, 5))
#> [1] 4 6 8 9 11
# Make some bootstrap replicates
dist <- list(
pseudo_population = NULL,
standard_normal = rnorm,
exponential = \(x) rexp(x) - 1,
uniform = \(x) runif(x, -sqrt(3), sqrt(3))
)
lapply(dist, sps_repweights, w = weights(samp), replicates = 5, tau = 2)
#> $pseudo_population
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 4.8750000 5.375000 1.937500 5.375000 3.656250
#> [2,] 2.4375000 1.291667 1.291667 3.583333 1.291667
#> [3,] 2.0781250 1.578125 2.437500 1.578125 2.078125
#> [4,] 0.5277778 1.291667 1.291667 1.791667 1.027778
#> [5,] 1.0000000 1.000000 1.000000 1.000000 1.000000
#> attr(,"tau")
#> [1] 2
#>
#> $standard_normal
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1.308878 4.195764 6.222635 5.04479605 4.051434
#> [2,] 2.536105 2.813617 3.408599 0.04442534 3.206192
#> [3,] 2.461942 1.657655 2.134877 1.63222601 2.301553
#> [4,] 1.634049 1.605084 1.777511 1.72258501 1.510670
#> [5,] 1.000000 1.000000 1.000000 1.00000000 1.000000
#> attr(,"tau")
#> [1] 2
#>
#> $exponential
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 2.313069 6.386048 2.740352 5.424610 2.707660
#> [2,] 1.491443 1.577468 2.009637 1.699790 2.856261
#> [3,] 1.310694 1.633221 1.455132 1.276392 2.967271
#> [4,] 1.088787 1.170839 1.690410 1.219775 1.183783
#> [5,] 1.000000 1.000000 1.000000 1.000000 1.000000
#> attr(,"tau")
#> [1] 2
#>
#> $uniform
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 2.970496 4.307107 4.721201 4.575343 1.1823646
#> [2,] 3.359473 1.908055 2.419320 2.701965 2.5187027
#> [3,] 1.752788 2.523467 1.216850 1.519328 1.1756872
#> [4,] 2.273226 1.804588 1.541610 2.242415 0.9458692
#> [5,] 1.000000 1.000000 1.000000 1.000000 1.0000000
#> attr(,"tau")
#> [1] 2
#>