Calculate a weighted generalized mean.
Usage
generalized_mean(r)
arithmetic_mean(x, w = NULL, na.rm = FALSE)
geometric_mean(x, w = NULL, na.rm = FALSE)
harmonic_mean(x, w = NULL, na.rm = FALSE)
Arguments
- r
A finite number giving the order of the generalized mean.
- x
A strictly positive numeric vector.
- w
A strictly positive numeric vector of weights, the same length as
x
. The default is to equally weight each element ofx
.- na.rm
Should missing values in
x
andw
be removed? By default missing values inx
orw
return a missing value.
Value
generalized_mean()
returns a function:
function(x, w = NULL, na.rm = FALSE){...}
This computes the generalized mean of order r
of x
with
weights w
.
arithmetic_mean()
, geometric_mean()
, and
harmonic_mean()
each return a numeric value for the generalized means
of order 1, 0, and -1.
Details
The function generalized_mean()
returns a function to compute the
generalized mean of x
with weights w
and exponent r
(i.e., \(\prod_{i = 1}^{n} x_{i}^{w_{i}}\) when \(r = 0\) and
\(\left(\sum_{i = 1}^{n} w_{i} x_{i}^{r}\right)^{1 / r}\)
otherwise). This is also called the power mean, Hölder mean, or \(l_p\)
mean. See Bullen (2003, p. 175) for a definition, or
https://en.wikipedia.org/wiki/Generalized_mean. The generalized mean
is the solution to the optimal prediction problem: choose \(m\) to
minimize \(\sum_{i = 1}^{n} w_{i} \left[\log(x_{i}) - \log(m)
\right]^2\) when \(r = 0\), \(\sum_{i =
1}^{n} w_{i} \left[x_{i}^r - m^r \right]^2\) otherwise.
The functions arithmetic_mean()
, geometric_mean()
, and
harmonic_mean()
compute the arithmetic, geometric, and harmonic (or
subcontrary) means, also known as the Pythagorean means. These are the most
useful means for making price indexes, and correspond to setting
r = 1
, r = 0
, and r = -1
in generalized_mean()
.
Both x
and w
should be strictly positive (and finite),
especially for the purpose of making a price index. This is not enforced,
but the results may not make sense if the generalized mean is not defined.
There are two exceptions to this.
The convention in Hardy et al. (1952, p. 13) is used in cases where
x
has zeros: the generalized mean is 0 wheneverw
is strictly positive andr
< 0. (The analogous convention holds whenever at least one element ofx
isInf
: the generalized mean isInf
wheneverw
is strictly positive andr
> 0.)Some authors let
w
be non-negative and sum to 1 (e.g., Sydsaeter et al., 2005, p. 47). Ifw
has zeros, then the corresponding element ofx
has no impact on the mean wheneverx
is strictly positive. Unlikeweighted.mean()
, however, zeros inw
are not strong zeros, so infinite values inx
will propagate even if the corresponding elements ofw
are zero.
The weights are scaled to sum to 1 to satisfy the definition of a
generalized mean. There are certain price indexes where the weights should
not be scaled (e.g., the Vartia-I index); use sum()
for
these cases.
The underlying calculation returned by generalized_mean()
is mostly
identical to weighted.mean()
, with one
important exception: missing values in the weights are not treated
differently than missing values in x
. Setting na.rm = TRUE
drops missing values in both x
and w
, not just x
. This
ensures that certain useful identities are satisfied with missing values in
x
. In most cases arithmetic_mean()
is a drop-in replacement
for weighted.mean()
.
Note
generalized_mean()
can be defined on the extended real line, so
that r = -Inf / Inf
returns min()
/max()
, to agree with the
definition in, e.g., Bullen (2003). This is not implemented, and r
must be finite.
There are a number of existing functions for calculating unweighted
geometric and harmonic means, namely the geometric.mean()
and
harmonic.mean()
functions in the psych package, the
geomean()
function in the FSA package, the GMean()
and
HMean()
functions in the DescTools package, and the
geoMean()
function in the EnvStats package. Similarly, the
ci_generalized_mean()
function in the Compind package
calculates an unweighted generalized mean.
References
Bullen, P. S. (2003). Handbook of Means and Their Inequalities. Springer Science+Business Media.
Fisher, I. (1922). The Making of Index Numbers. Houghton Mifflin Company.
Hardy, G., Littlewood, J. E., and Polya, G. (1952). Inequalities (2nd edition). Cambridge University Press.
IMF, ILO, Eurostat, UNECE, OECD, and World Bank. (2020). Consumer Price Index Manual: Concepts and Methods. International Monetary Fund.
Lord, N. (2002). Does Smaller Spread Always Mean Larger Product? The Mathematical Gazette, 86(506): 273-274.
Sydsaeter, K., Strom, A., and Berck, P. (2005). Economists' Mathematical Manual (4th edition). Springer.
See also
transmute_weights()
transforms the weights to turn a generalized
mean of order \(r\) into a generalized mean of order \(s\).
factor_weights()
calculates the weights to factor a mean of
products into a product of means.
price_indexes and quantity_index()
for simple
wrappers that use generalized_mean()
to calculate common indexes.
back_period()
/base_period()
for a simple utility
function to turn prices in a table into price relatives.
Other means:
extended_mean()
,
lehmer_mean()
,
nested_mean()
Examples
x <- 1:3
w <- c(0.25, 0.25, 0.5)
#---- Common generalized means ----
# Arithmetic mean
arithmetic_mean(x, w) # same as weighted.mean(x, w)
#> [1] 2.25
# Geometric mean
geometric_mean(x, w) # same as prod(x^w)
#> [1] 2.059767
# Harmonic mean
harmonic_mean(x, w) # same as 1 / weighted.mean(1 / x, w)
#> [1] 1.846154
# Quadratic mean / root mean square
generalized_mean(2)(x, w)
#> [1] 2.397916
# Cubic mean
# Notice that this is larger than the other means so far because
# the generalized mean is increasing in r
generalized_mean(3)(x, w)
#> [1] 2.506649
#---- Comparing the Pythagorean means ----
# The dispersion between the arithmetic, geometric, and harmonic
# mean usually increases as the variance of 'x' increases
x <- c(1, 3, 5)
y <- c(2, 3, 4)
var(x) > var(y)
#> [1] TRUE
arithmetic_mean(x) - geometric_mean(x)
#> [1] 0.5337879
arithmetic_mean(y) - geometric_mean(y)
#> [1] 0.1155009
geometric_mean(x) - harmonic_mean(x)
#> [1] 0.5096903
geometric_mean(y) - harmonic_mean(y)
#> [1] 0.1152684
# But the dispersion between these means is only bounded by the
# variance (Bullen, 2003, p. 156)
arithmetic_mean(x) - geometric_mean(x) >= 2 / 3 * var(x) / (2 * max(x))
#> [1] TRUE
arithmetic_mean(x) - geometric_mean(x) <= 2 / 3 * var(x) / (2 * min(x))
#> [1] TRUE
# Example by Lord (2002) where the dispersion decreases as the variance
# increases, counter to the claims by Fisher (1922, p. 108) and the
# CPI manual (par. 1.14)
x <- (5 + c(sqrt(5), -sqrt(5), -3)) / 4
y <- (16 + c(7 * sqrt(2), -7 * sqrt(2), 0)) / 16
var(x) > var(y)
#> [1] TRUE
arithmetic_mean(x) - geometric_mean(x)
#> [1] 0.145012
arithmetic_mean(y) - geometric_mean(y)
#> [1] 0.1485894
geometric_mean(x) - harmonic_mean(x)
#> [1] 0.104988
geometric_mean(y) - harmonic_mean(y)
#> [1] 0.1439479
# The "bias" in the arithmetic and harmonic indexes is also smaller in
# this case, counter to the claim by Fisher (1922, p. 108)
arithmetic_mean(x) * arithmetic_mean(1 / x) - 1
#> [1] 0.3333333
arithmetic_mean(y) * arithmetic_mean(1 / y) - 1
#> [1] 0.4135021
harmonic_mean(x) * harmonic_mean(1 / x) - 1
#> [1] -0.25
harmonic_mean(y) * harmonic_mean(1 / y) - 1
#> [1] -0.2925373
#---- Missing values ----
w[2] <- NA
arithmetic_mean(x, w, na.rm = TRUE) # drop the second observation
#> [1] 0.936339
weighted.mean(x, w, na.rm = TRUE) # still returns NA
#> [1] NA