Bootstrap replicate weights for sequential Poisson sampling

Produce bootstrap replicate weights that are appropriate for Poisson sampling, and therefore approximately correct for sequential Poisson sampling.

Usage

sps_repweights(w, replicates = 1000L, tau = min_tau(1e-04), dist = NULL)

min_tau(tol)

Arguments

w: A numeric vector of design (inverse probability) weights for a (sequential) Poisson sample.
replicates: A positive integer that gives the number of bootstrap replicates (1,000 by default). Non-integers are truncated towards 0.
tau: A number greater than or equal to 1 that gives the rescale factor for the bootstrap weights. Setting to 1 does not rescale the weights. This can also be a function that takes a vector of bootstrap adjustments and returns a number larger than 1. The default automatically picks the smallest feasible rescale factor (up to a small tolerance).
dist: A function that produces random deviates with mean 0 and standard deviation 1, such as rnorm(). The default uses the pseudo-population method from section 4.1 of Beaumont and Patak (2012); see details.
tol: A non-negative number, strictly less than 1, that gives the tolerance for determining the minimum feasible value of tau.

Value

sps_repweights() returns a matrix of bootstrap replicate weights with replicates columns (one for each replicate) and length(w) rows (one for each unit in the sample), with the value of tau as an attribute.

min_tau() returns a function that takes a vector of bootstrap adjustments and returns the smallest value for $\tau$ such that the rescaled adjustments are greater than or equal to tol.

Details

Replicate weights are constructed using the generalized bootstrap method by Beaumont and Patak (2012). Their method takes a vector of design weights $w$, finds a vector of adjustments $a$ for each bootstrap replicate, and calculates the replicate weights as $a w$.

There are two ways to calculate the adjustments $a$. The default pseudo-population method randomly rounds $w$ for each replicate to produce a collection of integer weights $w'$ that are used to generate a random vector $b$ from the binomial distribution. The vector of adjustments is then $a = 1 + b - w' / w$. Specifying a deviates-generating function for dist uses this function to produce a random vector $d$ that is then used to make an adjustment $a = 1 + d \sqrt{1 - 1 / w}$.

The adjustments can be rescaled by a value $\tau \geq 1$ to prevent negative replicate weights. With this rescaling, the adjustment becomes $(a + \tau - 1) / \tau$. If $\tau > 1$ then the resulting bootstrap variance estimator should be multiplied by $\tau^2$.

Note

As an alternative to the bootstrap, Ohlsson (1998, equations 2.13) proposes an analytic estimator for the variance of the total $\hat Y = \sum wy$ (for the take-some units) under sequential Poisson sampling: $$V(\hat Y) = \frac{n}{n - 1} \sum \left(1 - \frac{1}{w}\right) \left(wy - \frac{\hat Y}{n}\right)^2.$$ See Rosén (1997, equation 3.11) for a more general version of this estimator that can be applied to other order sampling schemes. Replacing the left-most correction by $n / (m - 1)$, where $m$ is the number of units in the sample, gives a similar estimator for the total under ordinary Poisson sampling, $\hat Y = n / m \sum wy$.

References

Beaumont, J.-F. and Patak, Z. (2012). On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling. International Statistical Review, 80(1): 127-148.

Ohlsson, E. (1998). Sequential Poisson Sampling. Journal of Official Statistics, 14(2): 149-162.

Rosén, B. (1997). On sampling with probability proportional to size. Journal of Statistical Planning and Inference, 62(2): 159-191.

Examples

# Make a population with units of different size
x <- c(1:10, 100)

# Draw a sequential Poisson sample
(samp <- sps(x, 5))
#> [1]  5  6  9 10 11

# Make some bootstrap replicates
dist <- list(
  pseudo_population = NULL,
  standard_normal = rnorm,
  exponential = \(x) rexp(x) - 1,
  uniform = \(x) runif(x, -sqrt(3), sqrt(3))
)

lapply(dist, sps_repweights, w = weights(samp), replicates = 5, tau = 2)
#> $pseudo_population
#>          [,1]     [,2]     [,3]      [,4]     [,5]
#> [1,] 1.250000 2.625000 1.250000 3.1250000 2.625000
#> [2,] 2.437500 2.437500 3.583333 1.2916667 1.291667
#> [3,] 1.027778 1.291667 1.791667 0.5277778 1.027778
#> [4,] 0.875000 1.562500 0.875000 1.5625000 1.062500
#> [5,] 1.000000 1.000000 1.000000 1.0000000 1.000000
#> attr(,"tau")
#> [1] 2
#> 
#> $standard_normal
#>          [,1]     [,2]       [,3]     [,4]     [,5]
#> [1,] 3.324661 4.860756 3.96811302 3.215279 4.584936
#> [2,] 2.813617 3.408599 0.04442534 3.206192 1.986830
#> [3,] 1.478419 1.863969 1.45787460 1.998627 1.952668
#> [4,] 1.436819 1.574704 1.53078169 1.361320 1.847787
#> [5,] 1.000000 1.000000 1.00000000 1.000000 1.000000
#> attr(,"tau")
#> [1] 2
#> 
#> $exponential
#>          [,1]     [,2]     [,3]     [,4]     [,5]
#> [1,] 2.849095 1.839346 2.390392 1.995315 3.469898
#> [2,] 1.450560 2.159272 1.883601 1.606920 4.224309
#> [3,] 1.774187 1.170839 1.690410 1.219775 1.183783
#> [4,] 1.068304 1.080475 1.551248 1.378740 1.086677
#> [5,] 1.000000 1.000000 1.000000 1.000000 1.000000
#> attr(,"tau")
#> [1] 2
#> 
#> $uniform
#>           [,1]     [,2]     [,3]     [,4]      [,5]
#> [1,] 4.1115291 2.260868 2.912767 3.273160 3.0394871
#> [2,] 2.3443560 3.537325 1.514754 1.982972 1.4510350
#> [3,] 2.2732264 1.804588 1.541610 2.242415 0.9458692
#> [4,] 0.7742958 1.580406 1.272099 1.124584 1.9199455
#> [5,] 1.0000000 1.000000 1.000000 1.000000 1.0000000
#> attr(,"tau")
#> [1] 2
#>