Published in J. Environ. Qual. 33:882-890 (2004).
© ASA, CSSA, SSSA
677 S. Segoe Rd., Madison, WI 53711 USA
TECHNICAL REPORTS
Heavy Metals in the Environment
Uncertainty and Sensitivity Analysis of Spatial Predictions of Heavy Metals in Wheat
D. J. Brus*,a and
M. J. W. Jansenb
a Alterra, Green World Research, P.O. Box 47, 6700 Wageningen, the Netherlands
b Biometris, Bornsesteeg 47, 6708 PD Wageningen, the Netherlands
* Corresponding author (dick.brus{at}wur.nl).
Received for publication March 25, 2003.
 |
ABSTRACT
|
|---|
Heavy metals seriously threaten the health of human beings when they enter the food chain. Therefore, policymakers require precise predictions of heavy metal concentrations in agricultural crops. In this paper we quantify the uncertainty of regression predictions of Cd and Pb in wheat (Triticum aestivum L.) and the contributions to the uncertainties in these predictions associated with inputs to the regression model. For each node of the 500- x 500-m grid covering the arable soils in the Netherlands, a latin hypercube sample size of 1000 is constructed from the uncertainty distributions of the explanatory variables (pH, soil organic matter [SOM], and heavy metal concentration in soil), the regression coefficients, and the random term of the regression model. This sample is used as input for the regression model to obtain 1000 values from the uncertainty distributions of the log(Cd) and log(Pb) concentration in wheat. There were no nodes where the recent EU quality standards for Cd and Pb (0.2 mg kg1 fresh wt.) in wheat were almost certain to be exceeded. For most nodes with clay soils, the quality standard for Cd in wheat almost certainly will not be exceeded; for Pb this is much less certain. The uncertainty in the Cd concentration in soil contributes most to the uncertainty in the predicted Cd concentrations in wheat (36% on the average), followed by the random term of the regression model (23%). For Pb the contribution of the random term is by far the largest (52%).
Abbreviations: SOM, soil organic matter
 |
INTRODUCTION
|
|---|
WORLDWIDE, HEAVY METALS such as Cd, Pb, Cu, and Zn are dispersed into the rural environment through atmospheric deposition and the use of agrochemicals and manure. In some locations, the heavy metal concentrations have been raised to levels that may threaten the health of man and animals feeding on the crops. For the Netherlands, Wiersma et al. (1986) reported high concentrations of Cd and Pb in cereals, sometimes exceeding the proposed maximum allowable concentrations for human consumption by the Netherlands at that time.
In a previous study regression models were derived for the log concentration of Cd in grass, maize, wheat, and sugar beet (Brus et al., 2002b). Explanatory variables (regressors) in these models were the log concentration of Cd in the soil and basic soil properties (pH, SOM content, and clay content) that govern the transfer of the heavy metal in soil to the crop. These regression models were used inversely to derive Cd concentration thresholds in soil from food quality standards for these crops. These critical thresholds were compared with the estimated Cd concentrations in soil, and for many arable soils non-negligible probabilities (e.g., >0.05) that the Cd concentration in soil exceeded the critical threshold concentration in soil were found when wheat was taken as a reference crop. For Pb similar results have been found (Brus et al., 2002a). This raised questions as to what Cd and Pb concentrations should be expected in wheat, the certainty of the predicted concentrations in wheat, and how the uncertainty can be reduced.
The study of the contributions of the uncertainties in the model inputs to the uncertainty in the model output is called sensitivity analysis (e.g., Saltelli et al., 2000). Sensitivity analysis gives insight into the weakest link in the chain, and the results may help to efficiently allocate resources for future research. At the highest level, we may distinguish between uncertainty in the regressors and uncertainty in the model. The uncertainty in the regressors is due to measurement error and interpolation error. In spatial studies, the interpolation error will be important, and the uncertainty due to this error can be reduced by measuring the regressors in areas where there are no such measurements. We separated the regressors into two groups: (i) basic soil properties (pH, SOM, and clay content) and (ii) heavy metal concentration in the soil. The reason for this distinction is that measurements of pH, SOM, and clay content are part of a standard package of soil analysis for agricultural purposes, whereas measurements of soil heavy metal concentrations are not.
The contribution of model uncertainty to the prediction uncertainty can be separated into the uncertainty in the regression coefficients and the variance of the random term. The uncertainty in the regression coefficients can be decreased by collecting more regression data, but the variance of the random term will hardly be affected by this. To reduce the variance of the random term we must look for another type of model, possibly with extra regressors.
The present study has the following objectives:
- Predict the Cd and Pb concentration in wheat grown on the present arable soils in the Netherlands (approximately 45000 ha), quantify the uncertainties in these predictions, and calculate the probability of exceeding the food quality standard for Cd and Pb in wheat.
- Quantify the contributions to the uncertainties in these predictions associated with inputs to the regression model, namely the values for the regressors, the regression coefficients, and the random term of the regression model.
- Account for errors in the measurements of the response variable (Cd and Pb concentrations in wheat) to estimate the variance of the random term.
 |
MATERIALS AND METHODS
|
|---|
Regression Models for Cadmium and Lead in Wheat
The regression model for the Cd and Pb concentration in wheat is based on the Freundlich equation:
 | [1] |
where Qplant and Qsoil are the heavy metal concentrations (mg kg1 dry wt.) in the crop and the soil, respectively, A is the Freundlich constant, and B is a nonlinearity coefficient. We assumed that log(A) is a linear function of pH, log(SOM), and log(clay) (Brus et al., 2002b):
 | [2] |
where pH is the soil pH (pHKCl), SOM is the soil organic matter content (%), and clay is the soil clay content (%). The logarithms in this expression are base 10.
Combining Eq. [1] and [2] and accounting for the effects of error in the measurement of the response variable Qplant and possible inadequacy of the model leads to the following model for the regression data:
 | [3] |
where Mplant denotes a measurement of the Cd or Pb concentration in wheat,
denotes a random term due to errors in these measurements, and
denotes a random term due to model inadequacy (i.e., variation in measurements of the response variable not explained by the model, nor by errors in measurements of the response variable). Model inadequacy becomes apparent when the residual variance is greater than that which can be explained by measurement errors. Such inadequacy is caused by the absence in the model of factors that influence Qplant and by imperfections in the modeling of the effect that the factors present.
Model inadequacy and measurement error enter separately into Eq. [3] because it appears that measurement errors are non-negligible in the analysis of the regression data, but play no role in the prediction of Qplant given the soil properties and the regression coefficients. We do not want to predict the outcome of a measurement of the Cd and Pb concentration in wheat, but the true concentration (i.e., the average of an infinite number of unbiased measurements). Thus, the prediction model for log(Qplant), at some new location with given predictions of pH, SOM, clay, and Qsoil, is given by:
 | [3a] |
The variance of the model inadequacy term
is assumed to have a constant value
2. The variance of measurement error
is assumed to depend on the level of Qplant:
 | [4] |
The appendix discusses the estimation of
and ß from repeated measurements of Cd and Pb concentrations in two control samples with known concentrations. We are aware that the empirical basis for the estimation of
and ß is small, but more data were not available. The resulting estimates (0.0004885 for
and 0.1272 for ß) were used in the regression analysis according to Eq. [3]. The regression was performed iteratively, with changing weights, since the variance of
depends on the level of Qplant. For details see the appendix. The regressors that were included in the model were selected beforehand by Ordinary-Least-Squares fitting (no weighting) of all possible models with log(Qsoil) as a forced regressor. The best model was selected on the basis of Mallows Cp, which appeared to have the largest value for Radj2. This was done with the convenient procedure RSEARCH of GenStat (GenStat Committee, 2000). Two data sets were used for fitting the models, a database consisting of 84 points evenly spread over the arable soils in the Netherlands (Wiersma et al., 1986), and a database of samples collected in the recent floodplains of the Meuse and tributaries (18 measurements) (Projectgroep zware metalen in oevergronden van Maas en zijrivieren, 1987).
In Fig. 1 and 2
the fitted concentrations in wheat have been plotted against the measured concentrations. Table 1 shows the estimated regression coefficients and their standard errors, the percentage of variance accounted for (adjusted R2), and the residual variance due to model inadequacy (
2) of the selected models. Note that the rather low values for adjusted R2 are somewhat misleading due to the measurement errors. Tables 2 and 3 show the correlation matrices of the regression coefficients. The regression results have not been validated by an independent set of data because we do not have such a data set. Moreover, if we would have had an independent set of data representative for the area for which we want predictions, then we would have preferred to use it for fitting the model. This implies that we assumed that the two data sets together are "representative" for the study area for which we need predictions of Cd and Pb in wheat.
Monte Carlo Simulation of Cadmium and Lead in Wheat
To simulate values for Cd and Pb in wheat, we used the uncertainty distributions of pH, SOM content, clay content, and Cd and Pb in soil as estimated by Brus et al. (2002a)(2002b) at the nodes of a 500- x 500-m grid. In this study we used only the uncertainty distributions at the nodes with arable land according to LGN3, the land use database of the Netherlands. In total there were 18609 nodes. The uncertainty distributions were estimated by simple indicator kriging with local prior means (Goovaerts, 1997). For a detailed description of the implementation of this method, refer to Brus et al. (2002b). The Cd and Pb concentrations in the soil are related to the SOM and clay contents in the soil and, therefore, the uncertainty distributions of these concentrations in soil were estimated conditional on the SOM and clay content. Conditional distributions were estimated for four non-overlapping categories defined in terms of their SOM and clay content.
The Cd and Pb concentrations in soil were simulated in two steps. First, SOM and clay values were drawn independently from the uncertainty distributions of SOM and clay. Then for each pair the SOMclay category and its associated conditional distributions of Cd and Pb were determined, and one value was drawn.
Finally, we assumed Gaussian uncertainty distributions for the regression coefficients A0 and A2 and for the random term
. For the coefficient B we assumed a gamma distribution so that all simulated values for B are positive. For A1 we also simulated values from a gamma distribution. The simulated values were multiplied by 1, so that the simulated values for A1 were all negative. The estimated variances of the regression coefficients and the random term are presented in Table 1.
We constructed a sample of 1000 draws from the uncertainty distributions of the quantities on the right-hand side of Eq. [3], namely pH, SOM, clay, Qsoil, A0... A3, B, and
, excluding the measurement error term
, which plays no role in prediction. Instead of an ordinary random sample from the uncertainty distributions of these quantities we constructed a latin hypercube sample (LHS) of size 1000 of the type that closely reproduces the rank-correlations between these quantities (Tables 2 and 3). This way of sampling is often used in uncertainty and sensitivity analysis to obtain efficient estimates of uncertainties and sensitivities (van Meirvenne and Goovaerts, 2001; Viscarra Rossel et al., 2001). In a LHS of size N, each sampled variable assumes precisely one random value in each of the intervals with cumulative probability between (i 1)/N and i/N for i = 1...N. The rank-correlations of the sample are forced as closely as possible to the theoretical rank-correlations of the uncertainty distribution of the quantities in the right-hand side of Eq. [3] (McKay et al., 1979; Iman and Conover, 1982; Saltelli et al., 2000). For the resulting sample the 1000 corresponding log(Qplant) concentrations were calculated with Eq. [3a].
Uncertainty and Sensitivity Analysis
The uncertainty in log(Qplant) has been characterized by the variance of its distribution, which is estimated by the variance of the 1000 simulated values. Moreover, we estimated the sensitivities of log(Qplant) to four groups of inputs, namely (i) the regression coefficients; (ii) the random term
; (iii) the basic soil properties pH, SOM, and clay; and (iv) the heavy metal concentration in the soil. These four groups of inputs are mutually independent, except for dependence between the heavy metal concentration, Qsoil, and the basic soil properties, which is due to the fact that the distribution of Qsoil depends on the values of clay and SOM. We expressed the sensitivities to these four groups of inputs by means of the percentage of variance of log(Qplant) accounted for by these inputs. This measure of sensitivity is also known under several other names: first-order sensitivity index, top marginal variance, and correlation ratio (e.g., Saltelli et al., 2000). The sensitivities were approximated by means of spline regression of the sampled values of log(Qplant) on the sampled values of the inputs: that is to say, log(Qplant) was approximated by a sum of splines of the individual inputs. We used cubic smoothing splines with two effective degrees of freedom (Hastie and Tibshirani, 1990). The quality of this approximation was gauged by adjusted R2, that is, the percentage of variance accounted for by the full spline model, which appeared to be gratifyingly close to 100% in most of our analyses (see next section). The difference between adjusted R2 and 100% is due to interactions that are not taken into account by this type of regression model, and to a lesser degree by the fact that splines with merely two degrees of freedom may have too little flexibility to fit the model response. The sensitivities with respect to a particular group were approximated by the percentage of variance accounted for when only the variables of that group entered the regression. This procedure is a refinement of the more common regression-based sensitivity analysis, where merely linear regression is applied and one estimates sensitivities of individual inputs rather than of grouped inputs (e.g., Saltelli et al., 2000). The sensitivity analysis and the construction of the sample were performed with USAGE 2.0, a collection of GenStat procedures for uncertainty and sensitivity analysis (Goedhart and Thissen, 2002; GenStat Committee, 2000).
 |
RESULTS AND DISCUSSION
|
|---|
Predicted Cadmium and Lead in Wheat
Figures 3A, 3B, and 3C
show the spatial pattern of the 5th percentile, 50th percentile (median), and 95th percentile, respectively, of the 1000 simulated Cd concentrations in wheat per node. Figures 4A, 4B, and 4C show the same for the Pb concentrations in wheat. The 50th percentile can be taken as the predicted concentration. The 95th percentile is a very pessimistic (conservative) prediction, and it is very likely that the true concentration is lower. In areas where the 95th percentile is below the food quality standard we are reasonably sure that the food quality standard in wheat is not exceeded. In contrast, the 5th percentile can be seen as a very optimistic prediction, and if the 5th percentile is above the food quality standard, we are reasonably sure that the food quality standard will be exceeded. The food quality standard for Cd and Pb in wheat is 0.2 mg kg1 fresh weight (Commission of the European Communities, 2001). We transformed these quality standards to dry weight by dividing them by 0.85, resulting in 0.235 mg kg1.
There are no nodes where the 5th percentile of the simulated Cd concentrations exceeds the quality standard, so there are no nodes where the quality standard of Cd in wheat was almost certain to be exceeded. However, for 72% of the nodes the 95th percentile of the Cd concentration is below the quality standard. These nodes are depicted in Fig. 3D. For these nodes we are reasonably sure that the food quality standard for Cd will not be exceeded. These nodes are concentrated on the marine and fluviatile clay soils.
Similar to Cd, there are no nodes where the 5th percentile of the simulated Pb concentration exceeds the quality standard for Pb in wheat, so for no nodes this quality standard was almost certainly exceeded (Fig. 4A). Contrary to Cd, there are also no nodes where the 95th percentile is below the quality standard. This implies that on the basis of the present data for Pb no definite conclusions can be drawn. For all nodes we are neither certain that the quality standard for Pb in wheat is not exceeded, nor certain that this quality standard is exceeded.
Figure 5
shows the cumulative frequency distributions over the 18609 nodes of the estimated probabilities of excess Cd or Pb in wheat. These probabilities were estimated by the number of simulated values larger than the quality standard divided by the total number of simulated values per node (1000). For Cd the curve is much steeper near the origin than for Pb (i.e., for Cd there are many more nodes with small probabilities of excess heavy metal in wheat). It follows from the previous paragraph that 72% of the nodes have a probability of <5% of excess Cd, and 0% of the nodes have a probability of <5% of excess Pb in wheat. Both for Cd and Pb, there are no nodes with probabilities larger than 95% that the quality standard is exceeded.

View larger version (20K):
[in this window]
[in a new window]
|
Fig. 5. Cumulative frequency distribution (over nodes) of probabilities that Cd and Pb in wheat exceed their quality standards.
|
|
Uncertainty and Sensitivities
Table 4 summarizes the uncertainties of the predicted Cd and Pb concentrations in wheat, expressed as the standard error and the coefficient of variation of the simulated concentrations per node. The standard deviation and coefficient of variation differ between the nodes, and the table shows the minimum, mean, and maximum of the standard deviations and coefficients of variation. The numbers in the table show that we are less certain about the Pb concentrations in wheat than about the Cd concentrations, also when related to the mean levels.
View this table:
[in this window]
[in a new window]
|
Table 4. Minimum, mean, and maximum of standard deviation and coefficient of variation of Cd and Pb predictions in wheat.
|
|
Table 5 shows the results of the sensitivity analysis. Again, the contributions of the separate input groups differ between the nodes. The minimum, mean, and maximum of the contributions are given. The validity of our regression-based sensitivity analysis appeared to be reasonable, as shown by the good quality of the fit of the full spline model (mean adjusted R2 = 89% for Cd and 93% for Pb). We may conclude that the interaction effect is small, and that splines with two degrees of freedom are sufficiently flexible for the current sensitivity analysis.
View this table:
[in this window]
[in a new window]
|
Table 5. Minimum, mean, and maximum of percentage of variance accounted for by the spline models for log(Cd) and log(Pb) predictions in wheat.
|
|
For Cd the uncertainty in the Cd concentration in the soil contributes most to the uncertainty in the predicted Cd concentration in wheat: the contribution varies between the nodes from 6 to 77%, with a mean contribution of 36%. The contributions of the random term of the regression model (mean = 23%) and of the uncertainty in the basic soil properties (mean = 20%) are more or less equal. The contribution of the uncertainty in the regression coefficients is relatively small (mean = 7%).
As opposed to Cd, the contribution of the random term of the regression model for Pb is by far the largest (mean = 52%). Similar to Cd, the contribution of the uncertainty in the regression coefficients is relatively small (mean = 7%). The contributions of the uncertainty in the Pb concentration in soil (mean = 19%) and the uncertainty in the basic soil properties (mean = 14%) are rather close.
These results make clear that for Cd the largest gain can be achieved by reducing the uncertainty in spatial interpolations of Cd concentrations in soil, for instance by collecting additional data in areas where there are no such measurements. The second best thing we can do is to improve the model by reducing its residual variance, for instance by including extra regressors or fitting different model types. In contrast, for Pb the order of the contributions of the model residual and the heavy metal concentration in soil is reversed. Consequently, improving the model is by far the best we can do to reduce the uncertainty in the Pb concentration in wheat, followed by reducing the uncertainty in the Pb concentration in soil.
Both for Cd and Pb the contribution of the uncertainty in the regression coefficients is small, and, therefore, if we stick to the present models, then not much gain can be expected from collecting more data to fit the models. On the other hand, it appeared that for approximately one-third of the pixels the regression model must be extrapolated to obtain predictions everywhere. For these pixels we have extra model uncertainty not accounted for by the uncertainty in the regression coefficients. Therefore, collecting additional data in the periphery of the property space spanned by the regressors can be worth consideration to eliminate extrapolation uncertainty.
Discrepancies in Support
An important aspect of studies like this is the support of the samples, that is, the geometry (size and shape) of the volumes of soil sampled. With respect to the sample data we must distinguish between the data used for fitting the regression model (hereafter referred to as the calibration data), and the data used as input for the regression model when predicting the Cd and Pb concentrations in wheat. The support of the 84 calibration data in the national data set (Wiersma et al., 1986) are squares of 400 m2. From each square one composite soil sample of 40 cores is taken to a depth of 20 cm and one composite sample of wheat consisting of 400 ears. The support of the 18 calibration data in the Meuse data set are whole fields. We decided to use both data sets for calibration, despite the difference in support.
In an ideal situation the support of the input data of the regression model equals the support of the calibration data. In this study, for the basic soil properties (pH, SOM, and clay), the measurements have point support. Samples were taken from the whole A soil horizon, so the sampling depth is variable (i.e, 540 cm). These measurements are interpolated by point-kriging to the nodes of the 500- x 500-m grid, leaving the support unchanged (i.e., the interpolated values have the same support as the measurements). Actually, we do not have a single interpolated value per node, but a complete probability distribution from which many values are drawn, reflecting our uncertainty about the true value at a given node. From the above it follows that these drawn values are possible values at point support and not possible mean values of blocks of 400 m2. These distributions therefore may reflect slightly too much uncertainty about the true means of these soil properties for blocks of 400 m2. However, we think the effect on the results is negligible. For Cd and Pb in soil the support of the measurements varies from 20 x 20 m (approximately 1000 out of 3571 measurements), to whole farms (122 measurements) (see Brus et al., 2002b). The vast majority of the samples has a support smaller than 1 ha. The sampling depth varies from 5 cm (grassland) to 20 cm (arable land). The measurements are interpolated by point-kriging, leading to a support of the interpolated values equal to that of the measurements (i.e., <1 ha). From the above we conclude that there are discrepancies in the support of the calibration and input data. In studies using existing data this is more the rule than the exception. In general, when the calibration support is larger than the measurement support, this difference can be removed by block-kriging. However, in our situation the problem is complicated due to the nonconstant calibration and measurement support, discrepancies in sampling depth, and the need for complete probability distributions. Therefore, we refrained from tuning the support of the calibration and the input data. Consequently, the uncertainties and sensitivities presented above must be interpreted as rough estimates of the uncertainties and sensitivities when predicting the Cd and Pb concentrations in wheat at the 20- x 20-m support.
Another problem arises when the geometry of the areas, for which we want predictions of the mean Cd and Pb concentrations in wheat, is not equal to the calibration support. Bierkens et al. (2000) refer to the former as the policy scale (where scale is used as a synonym for support), the support needed for making decisions. A rational policy support is the whole field. For a linear model the mean wheat concentration at a policy support that is larger than the calibration support can be predicted simply by predicting the means of the regressors at the policy support by block-kriging, and then inserting these means in the linear regression model. An interesting problem is how to perform an uncertainty and sensitivity analysis at this policy support. It is beyond the scope of this paper to deal with this problem in detail. An important point is that for such an analysis we need the variogram of the regression residuals at the calibration support. This is because the residuals at different sites of calibration scale within a field may cancel out each other, and as a result the variance of the field mean of the residuals is smaller than the variance of the residuals at individual sites. The reduction of the variance depends on the size (and geometry) of the field and on the variogram. The variance reduction is the largest for pure nugget variograms, and will be small for variograms showing strong spatial correlation.
 |
CONCLUSIONS
|
|---|
For most nodes with arable soils on clay the food quality standard for Cd in wheat almost certainly will not be exceeded. For the arable soils on sand we are less certain. On the other hand, there are no nodes where the food quality standards for Cd in wheat are almost certainly exceeded. No definite conclusions for Pb can be drawn on the basis of the present data. For all arable soils we are neither certain that the quality standard for Pb in wheat is not exceeded, nor certain that this quality standard is exceeded. So, especially for Pb, we must reduce the uncertainty in the predicted heavy metal concentrations in wheat to extend the area where we can draw conclusions on the compliance of food quality standards for heavy metal concentrations in wheat.
The strongest reduction in uncertainty for Cd can be achieved by reducing the uncertainty in the spatial predictions of the Cd concentration in soil, for instance by collecting additional soil samples and analyzing Cd concentrations in areas where such measurements are sparse. If our goal is to delineate safe areas, then these additional soil samples should be selected at sites with relatively low predicted Cd concentrations in wheat. On the other hand, to delineate areas suspected of having elevated metal concentrations in wheat, sites with large predicted concentrations should be selected. The contribution of the random error term of the model for Pb is by far the largest. Therefore, it is most important to search for other models for Pb that can explain the spatial variation of Pb in wheat much better. An important question in this context is what the contribution is of atmospheric deposition on the surface of ears of wheat in the accumulation of Pb. If this is considerable, then not much progress can be made via modeling the uptake of Pb in soil.
For many arable fields in the Netherlands the soil has been tested for pH, SOM, and clay. These results have not been used in this study. By using these site-specific measurements of pH, SOM, and clay, the precision of Cd and Pb predictions in wheat can be improved considerably (mean uncertainty contribution = 20 and 14%, respectively), so that for more fields conclusions can be drawn on the compliance of food quality standards for Cd and Pb in wheat.
 |
APPENDIX
|
|---|
Variance of Measurement Error
Repeated measurements of the Cd and Pb concentrations in two external control samples (with known concentrations) were used to model the variance due to measurement errors. The data consisted of the known concentrations (Qplant) the number of replications of the measurement (n), and the observed variance (
2) of the measurements (i.e., the variance of the n replicates).
MetalQplant n
2 Cd 0.03238 0.00061
Cd 0.2033 0.0074
Pb12.437 18
Pb10.436 13
The variance of measurements of Cd and Pb concentrations was modeled as:
 | [A1] |
where Qplant is the known concentration in the sample, with positive
and ß. The term
represents errors that do not depend on the level, whereas the term ßQ2plant represents errors proportional to the level, such that for high levels of Qplant the coefficient of variation is constant. A similar model is used in ISO 5725-2 (International Organization for Standardization, 1994). The coefficients
and ß, which were assumed to be the same for both metal concentrations, were estimated by weighted regression of the observed variances, with weight according to the assumption that individual measurements are independently normally distributed. Thus, up to a proportionality constant, the observed variance between measurements,
2, is distributed as
2n1. The observed variance has expectation
2, and variance 2
4/(n 1). Accordingly, the parameters
and ß were fitted using Eq. [A1] with (n 1)/(2
4) as the weight. This led to the estimates 0.0004885 for
and 0.1272 for ß.
Iteratively Reweighted Regression of Observations of Crop Heavy Metal Concentration on Basic Soil Properties
The model of the variance of measurements (Eq. [A1]) is for untransformed Cd and Pb measurements, whereas the regression model (Eq. [3]) is for log-transformed Cd and Pb measurements. The variance of these log-transformed measurements is approximated by first- order Taylor expansion of a function of a variable x:
 | [A2] |
 | [A3] |
where µ is the expectation of x. Furthermore, note that:
 | [A4] |
The first derivative of ln x = 1/x, so from this it follows that:
 | [A5] |
Inserting
+ ß(Qplant)2 for var(x) (Eq. [A1]) and Qplant for µ gives the following approximation for the measurement errors on the log scale:
 | [A6] |
Thus, in Eq. [3] the variance of
is a function of log(Qplant), whereas the variance of
is assumed to be an unknown constant
2. The true values for log(Qplant) are unknown. We can only use the fitted values as predictions of log(Qplant) to obtain estimates of the variance of
. Therefore, the regression was done iteratively with weight 1/[g(fi) +
2], where fi denotes the fitted value of the i-th measurement yi of log(Qplant). To obtain g(fi), we first calculated the variance with Eq. [A1] with Qplant equal to 10fitted, and then approximated the variance of
with [A6]. The variance
2 was tuned until the function h(
2), defined by:
 | [A7] |
satisfies h(
2) =
, with
the residual degrees of freedom. Since h(
2) decreases with increasing
2, there exists a unique solution if the sum takes a value greater than
for
2 = 0; the latter condition was satisfied in our analyses of Cd and Pb concentrations in plants. The solution of h(
2) =
was found with the well-known root-finding method of Newton, using the derivative of h(
2) with respect to
2:
 | [A8] |
 |
REFERENCES
|
|---|
- Bierkens, M.F.P., P.A. Finke, and P. de Willigen. 2000. Upscaling and downscaling methods for environmental research. Kluwer Academic Publ., Dordrecht, the Netherlands.
- Brus, D.J., J.J. de Gruijter, D.J.J. Walvoort, F. de Vries, J.J.B. Bronswijk, P.F.A.M. Römkens, and W. de Vries. 2002a. Landelijke kaarten van de kans op overschrijding van kritieke zwaremetaalgehaltes in de bodem van Nederland. Alterra Rep. 124. Alterra, Wageningen, the Netherlands.
- Brus, D.J., J.J. de Gruijter, D.J.J. Walvoort, F. de Vries, J.J.B. Bronswijk, P.F.A.M. Römkens, and W. de Vries. 2002b. Mapping the probability of exceeding critical thresholds for cadmium concentrations in soils in the Netherlands. J. Environ. Qual. 31:18751884.[Abstract/Free Full Text]
- Commission of the European Communities. 2001. Commission Regulation (EC) no. 466/2001 of 8 March 2001 setting maximum levels for certain contaminants in foodstuffs. Off. J. Eur. Communities Legis. 77:113.
- GenStat Committee. 2000. The guide to GenStat. VSN Int., Oxford.
- Goedhart, P.W., and J.T.N.M. Thissen (ed.) 2002. Biometris GenStat procedure library manual. 6th ed. [Online]. Available at http://www.biometris.nl/Software/GENSTAT/Biometris_Genstat_Procedures.htm (verified 18 Dec. 2003). Biometris, Wageningen, the Netherlands.
- Goovaerts, P. 1997. Geostatistics for natural resources evaluation. Oxford Univ. Press, New York.
- Hastie, T.J., and R.J. Tibshirani. 1990. Generalized additive models. Chapman & Hall, London.
- Iman, R.L., and W.J. Conover. 1982. A distribution-free approach to introducing rank correlation among input variables. Commun. Stat. B Simulation Computation 11:311344.
- International Organization for Standardization. 1994. ISO 5725-2: Accuracy (trueness and precision) of measurement methods and results. Part 2: Basic method for the determination of repeatability and reproducibility of a standard measurement method. ISO, Geneva.
- McKay, M.D., R.J. Beckman, and W.J. Conover. 1979. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21:239245.
- Projectgroep zware metalen in oevergronden van Maas en zijrivieren. 1987. Zware metalen in oevergronden en daarop verbouwde gewassen in het stroomgebied van Maas, Geul en Roer in de provincie Limburg. Instituut voor Bodemvruchtbaarheid. Rapport 1 t/m 3. Projectgroep zware metalen in oevergronden van Maas en zijrivieren, Haren, the Netherlands.
- Saltelli, A., K. Chan, and E.M. Scott (ed.) 2000. Sensitivity analysis. John Wiley & Sons, Chichester, UK.
- Van Meirvenne, M., and P. Goovaerts. 2001. Evaluating the probability of exceeding a site-specific soil cadmium contamination threshold. Geoderma 102:75100.
- Viscarra Rossel, R.A., P. Goovaerts, and A.B. McBratney. 2001. Assessment of the production and economic risks of site-specific liming using geostatistical uncertainty modelling. Environmetrics 12:699711.
- Wiersma, D., B.J. Van Goor, and N.G. Van der Veen. 1986. Cadmium, lead, mercury, and arsenic concentrations in crops and corresponding soils in the Netherlands. J. Agric. Food Chem. 34:10671074.
Related articles in JEQ:
- This Issue in Journal of Environmental Quality
JEQ 2004 33: 799-804.
[Full Text]