Journal of Environmental Quality 31:1576-1588 (2002)
© 2002 American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America
TECHNICAL REPORTS
Heavy Metals in the Environment
Kriging Method Evaluation for Assessing the Spatial Distribution of Urban Soil Lead Contamination
Julie A. Cattle*,
Alex. B. McBratney and
Budiman Minasny
Department of Agricultural Chemistry and Soil Science, The Univ. of Sydney, NSW 2006 Australia
* Corresponding author (cattlej{at}epa.nsw.gov.au)
Received for publication July 25, 2001.
 |
ABSTRACT
|
|---|
Describing contaminant spatial distribution is an integral component of risk assessment. Application of geostatistical techniques for this purpose has been demonstrated previously. These techniques may provide both an estimate of the concentration at a given unsampled location, as well as the probability that the concentration at that location will exceed a critical threshold concentration. This research is a comparative study between multiple indicator kriging and kriging with the cumulative distribution function of order statistics, with both local and global variograms. The aim was to determine which of the four methods is best able to delineate between "contaminated" and "clean" soil. The four methods were validated with a subset of data values that were not used in the prediction. Method performance was assessed by calculating the root mean square error (RMSE), analysis of variance, the proportion of sites misclassified by each method as either "clean" when they were actually "contaminated" or vice versa, and the expected loss for each misclassification type. The data used for the comparison were 807 topsoil Pb concentrations from the inner-Sydney suburbs of Glebe and Camperdown, Australia. While there was very little difference between the four methods, multiple indicator kriging was found to produce the most accurate predictions for delineating "clean" from "contaminated" soil.
Abbreviations: CCDF, conditional cumulative distribution function CDF, cumulative distribution function EIL, environmental investigation limit RMSE, root mean square error
 |
INTRODUCTION
|
|---|
INCREASED COMMUNITY AWARENESS of environmental issues and stringent environmental legislation has made it necessary to identify and assess contaminated sites. Evaluating the environmental impact of a contaminant such as Pb must start with a determination of its spatial distribution. This is especially important in an urban area considering the complex, heterogeneous nature of soil. The use of geostatistical techniques to describe the spatial distribution of heavy metal contaminants has been demonstrated previously (Atteia et al., 1994; von Steiger et al., 1996; Bierkens, 1997; Goovaerts et al., 1997; Barabás et al., 2001; Carlon et al., 2001; Van Meirvenne and Goovaerts, 2001). These techniques provide a means to estimate either the value of a soil attribute at locations between samples, or the probability that the attribute value will exceed a given threshold at a particular location. Such information is essential for mapping potential risks to the environment or human health.
A survey of total Pb, Zn, Cu, and Cd concentrations in topsoil sampled from 227 sites in the inner-Sydney suburb of Glebe was conducted in 1993 (Markus, 1993). Total Pb concentrations in 50% of the samples exceeded the environmental investigation limit (EIL) of 300 mg kg-1, which is used in Australia to identify Pb-contaminated sites requiring further investigation (Australian and New Zealand Environment and Conservation Council and National Health and Medical Research Council, 1992). A similar survey was conducted in 1994 in the adjoining suburb of Camperdown with topsoil sampled from 221 locations (Davis, 1995). Lead concentrations reported in this study also exceeded regulatory guidelines. A geostatistical analysis of each individual data set was conducted to spatially describe metal concentrations. While providing useful information about the contamination pattern, the techniques used may also have overestimated the area classified as contaminated. Further analysis on the combined dataset was deemed necessary to determine whether an improved prediction could be obtained with other kriging techniques. The aim of this work is to determine which of the techniques investigated is able to produce the best prediction of exceeding a contamination threshold and therefore the best map to delineate contaminated from noncontaminated soil.
 |
DATA AND ANALYSIS
|
|---|
Sampling
Soil was sampled from the inner Sydney suburb of Glebe with a stratified random sampling design to select site locations within an area covering 2.3 km2. The area was divided into 100- by 100-m cells with one site in each cell selected at random. There were a total of 227 sites, eight of which were not sampled because of an absence of soil. These have been recorded as "no data" points. To determine the extent of spatial variation, two samples 1 m apart were taken at each site and analyzed separately. A total of 438 topsoil samples (surface 10 cm) were collected for analysis. Full details of the Glebe survey may be found in Markus and McBratney (1996).
A greater sampling density was employed in the survey of Camperdown. The 1-km2 survey area was divided into 400 cells of 50 by 50 m. Soil was sampled according to a stratified random sampling design with one sample collected from each cell. Spatial variation was also considered in this survey by collecting two samples 1 m apart, although this was done at only 95 sites. There were 24 sites not sampled, either because access to the site was denied or no soil was present, resulting in a total of 376 sites and 471 individual soil samples. Lead analysis was not conducted on all samples and so data is only available for 221 different sites amounting to 369 Pb concentrations, of which 95 were duplicate Pb analyses on the same soil sample and 53 of the 221 sites had two samples collected 1 m apart. Other than the difference in sampling design, all other aspects of the sampling protocol were identical to those described for the Glebe survey.
The survey areas intersected along the southern boundary of the Glebe survey with an overlap of 100 m. The original Camperdown area extends from an easting of 331104 mE to 333104 mE and a northing of 6248390 mN to 6248890 mN in the map grid of Australia coordinate system (MGA '94). The corresponding longitudes and latitudes are 151°10'24.7'' E and 33°53'28.8'' S to 151°11'42.9'' E and 33°53'13.8'' S. The Glebe survey was delineated by eastings of 331704 mE to 333104 mE and northings of 6248790 mN to 6250590 mN. The corresponding longitudes and latitudes for this area are 151°10'48.3'' E and 33°53'16.2'' S to 151°11'44'' E and 33°52'18.6'' S. Data from each survey were combined to produce a data set of 807 total Pb concentrations from 430 urban locations in Glebe and Camperdown. A map of the combined survey area with the sampling grid and sites is shown in Fig. 1
. A pair of overlapping crosses represent sites where two samples were collected 1 m apart for the Glebe survey and single crosses represent the Camperdown sites.
Total Soil Lead Concentrations
Soil samples were air-dried and ground with an agate mortar and pestle to pass through a 2-mm stainless steel sieve. For the Glebe survey, samples were digested in a heating block with aqua regia. Samples from the Camperdown survey were digested with microwave radiation (Milestone [Leutkirch, Germany] MLS-1200 Mega digestion system). The microwave procedure had previously been compared with the block digestion method and the results were found to be very similar (Davis, 1995). Lead analyses were conducted with flame atomic absorption spectrophotometry (FAAS) on a Varian (Melbourne, Australia) SpectrAA-20 with background correction.
Descriptive statistics of both individual survey data and the combined data set are presented in Table 1. The distribution of Pb concentrations in the combined data set is skewed (Fig. 2) and so the data were log10transformed producing a symmetric curve, shown in Fig. 2. The data span a wide range of Pb concentrations and almost half are larger than the Australian EIL of 300 mg kg-1. Extremely large Pb concentrations were found at some sites, with the maximum value approximately equivalent to 2% Pb. Many of the large Pb concentrations were located near major roads, although outlier concentrations are more likely to be derived from point sources of contamination.

View larger version (16K):
[in this window]
[in a new window]
|
Fig. 2. Frequency distributions of raw and log-transformed Pb concentrations in Glebe and Camperdown topsoil.
|
|
Geostatistical Data Analysis
Several kinds of kriging were conducted to spatially describe the distribution of Pb concentrations in Glebe and Camperdown topsoil. Goovaerts (1997) states that rather than deriving an optimal estimate and associated error variance for a particular attribute, a more informative way to assess any risk is to model the uncertainty about the unknown. This may be done by determining the conditional probability distribution of an attribute at a particular location, with, for example, indicator-based algorithms. This approach is valuable for estimating the probability of wrongly classifying a location as safe, when in fact it is "contaminated" (false negative), or alternatively unnecessarily declaring a "clean" location to be "contaminated" (false positive).
Multiple indicator kriging (Journel, 1983) and kriging with the cumulative distribution function (CDF) of order statistics (Juang et al., 1998) were compared for their ability to predict the probability that Pb concentrations in urban topsoil are greater than a threshold concentration. The data used in indicator kriging have a value of either 1 or 0 depending on whether the attribute is greater or less than one or more cutoffs, but it does not show the magnitude of deviation from each cutoff. Kriging with the CDF of order statistics includes information about the relative magnitude of each element of the random sample and its deviation from the cutoff (Juang et al., 1998). It has been employed recently to delineate areas of soil contaminated with Zn within a rice paddy field in Taiwan (Juang et al., 1998). The threshold Pb concentration of particular interest in this study is 300 mg kg-1, which is the EIL for Pb in Australian soil (Australian and New Zealand Environment and Conservation Council and National Health and Medical Research Council, 1992). Both methods were conducted with global and local variograms and the algorithms for each type of kriging are described below. All computations were performed with an in-house computer program developed for this purpose. This program is written in Fortran and runs under a Windows interface. It is unpublished and was developed by Alex McBratney and Budiman Minasny in the Department of Agricultural Chemistry and Soil Science at the University of Sydney.
Multiple Indicator Kriging
The n data points are sorted in order of increasing magnitude and discretized into 100 equally probable classes by cutoff values zk, where k = 1,...,K and where K is the number of cutoffs. For every cutoff the data are then transformed into values of 1 and 0 according to whether they are greater or less than the cutoff concentration. The indicator for data z(xi), where i = 1,...,n at cutoff value zk, is defined as:
 | [1] |
Global Variogram
The semi-variogram measures the average dissimilarity between data points separated by a vector h (Goovaerts, 1997). The global indicator semi-variogram is computed as shown in Eq. [2], with all the measured data points (n). The frequency with which two z values separated by h are on opposite sides of the cutoff value zk is given by the value 2
I(h;zk) (Goovaerts, 1997). For each cutoff value of zk, the semi-variogram of the indicator values is calculated and the exponential model (shown in Eq. [3]) is fitted with weighted nonlinear least squares to estimate the variogram parameters. These parameters may vary significantly between different cutoffs and so a scatterplot of each variogram parameter (sill, nugget, and range) versus the cutoff was drawn and a generalized cross-validation spline (Silverman, 1985) was fitted to smooth extreme values:
 | [2] |
where N(h) is the number of pairs of data points separated by vector h; and
I(h;zk) is the semi-variance for lag h:
 | [3] |
where C0 = nugget, C0 + C1 = sill, and r = range.
This results in a different variogram for each of the cutoff values. Predictions were then made at the nodes of a 10-m square grid. The probability estimate at each unknown location is found by ordinary indicator kriging with a neighborhood m of 100 data points and a search radius of 1500 m, and is written as:
 | [4] |
where
i is the weight assigned to the indicator at xi and the solution of the system of m + 1 linear equations under the condition that the weights sum to 1 (Eq. [6]). The term µ is the Lagrange multiplier:
 | [5] |
 | [6] |
Therefore, the probability of exceeding each cutoff is obtained at every estimated site. A monotonic decreasing regression (Dole, 1999) was fitted to the estimates to obtain the conditional cumulative distribution function (CCDF):
 | [7] |
where ßj
0 and 1
j
K - 1 and where
in which
is the CCDF at each location (x) and ß are the parameters estimated with nonnegative least squares.
Local Variogram
The local semi-variogram is calculated at each unknown location with a neighborhood of 200 data points and the exponential model is fitted to estimate the variogram parameters. As for the global variogram, the computation is repeated for each cutoff zk, generating 100 variograms for each unknown location. For every cutoff, at each location, a different variogram is used to krige the indicator data. Kriging is conducted with a neighborhood of only 100 data points to reduce the computation time and the CCDF was obtained as described in the global example. For kriging with global variograms it takes an average of 3.7 s to interpolate 100 cutoffs at each location. For kriging with local variograms it takes an average of 8.5 s to calculate the variogram, fit the model, and perform interpolation at each location. Hence, it takes approximately 2.3 times longer to krige with local variograms. Calculations were done with a PC with a Pentium III 600 MHz processor and 390 Mbytes of RAM.
Kriging with the Cumulative Distribution Function of Order Statistics
The order statistics of a random sample are the individual elements of that sample arranged in order of increasing magnitude. They may be denoted by z(1)
z(2)
z(3)
...
z(r)
...
z(n) where z(r) is the rth order statistic (Juang et al., 1998). For this study, they are the set of Pb concentrations arranged from the smallest concentration z(1) to the largest concentration z(n). The first stage in CDF kriging is the same as indicator kriging. The data are ordered by increasing magnitude, discretized into 100 classes by cutoff values zk, and then transformed into values of 0 or 1 with the indicator function given in Eq. [1]. For the CDF technique it is necessary to then calculate the stationary mean of all indicator transformed data (Eq. [8]) and this is used to further transform the data into the CDF of order statistics:
 | [8] |
If at least r of the n data points are greater than zk, then the rth order statistic is greater than zk. The probability of this occurring is given by the CDF of order statistics:
 | [9] |
where F(zk;x) is the CDF of the rth order statistic at the location x and (n) is the set of order statistics of the n data points.
The global and local semi-variograms are calculated and kriging performed in the same way as described for the indicator transformed data, except the indicator function I(x;zk) is replaced by the CDF of order statistics F(zk;x) in Eq. [2] and [4]. The semi-variance of the CDF of order statistics
F replaces
I in Eq. [5].
Method Validation with Prediction and Test Sets
To determine which of the kriging methods used here is best able to predict the probability that the Pb concentration at a given site is greater or less than an environmental threshold, the data were randomly grouped into eight test and prediction sets for validation. This enables comparison of the predictions with actual data from the same locations that have been excluded from the test kriging. The data set of 807 Pb concentrations was randomized, whereupon division was made every 100 points, so that on each occasion 12.5% of the whole data set was omitted to form a test set. Kriging was performed with a prediction set of 707 data points, to predict onto a test grid of the locations of the remaining 100 points, with both local and global, CDF and multiple indicator kriging. The predictions generated by each method were compared with the actual data at those sites to ascertain which of the four methods gave the best estimate of Pb concentrations, and the corresponding probability of being greater than a particular threshold in urban topsoil. This allows a true comparison of the performance of each different geostatistical method used in this study.
Test and prediction sets were selected at random, so the statistical properties of each test set should not be appreciably different from the whole data set. To confirm this and ensure that the test sets are actually representative of the entire data set, their descriptive statistics were determined and are presented in Table 2. Upon comparison with the entire data set (Table 1) it may be concluded that the test sets are representative. The median values compare well as do the majority of the mean values. Mean values of Test Sets 3 and 5 are notably larger although these test sets contain large outlier Pb concentrations. All other differences between the summary statistics of the test sets and the entire data set result from the distribution of the outlier concentrations between the test sets.
Analysis of Validation Data
Lead Concentration Estimates
Data from each of the eight test sets were pooled and analyzed together. Multiple indicator kriging and kriging with the CDF of order statistics predict at each unknown location the conditional probability (CCDF value) for every cutoff value. The predicted Pb concentration is computed as the mean of the CCDF and the probability of occurrence is generated for each location. The predicted mean Pb concentration for each location was used to calculate the root mean square error (RMSE), defined in Eq. [10]. This measures the prediction accuracy of each of the four methods. An analysis of variance was conducted with the 32 mean RMSEs (eight from each test set for each of the four methods) to investigate whether there were any differences between the eight test sets and also between the four methods:
 | [10] |
where SEmethod = (actual Pb concentration - predicted meanmethod)2.
Lead Probability Estimates (Site Misclassification)
Using a spatial model to predict contaminant concentrations at unsampled locations enables informed and reliable decision making in contaminated site assessment. There is, however, a degree of uncertainty in the kriged estimates. This may lead to false positive errors resulting in unnecessary expenditure on site remediation or more problematic false negative errors leading to a potential decline in human health. Either scenario may also have legal ramifications. It is therefore desirable to determine the uncertainty associated with predictions. The CCDF is a model of the local uncertainty in the kriged estimates. In addition to estimates of the Pb concentration at unknown sites, the probability of exceeding a given threshold concentration is also obtained from the CCDF. The difficulty lies in assigning a probability threshold to indicate whether or not a site is "contaminated." For example, if the probability of exceeding the 300 mg kg-1 critical threshold is 0.85, then it is reasonable to assume that the site is contaminated. If the probability of exceeding 300 mg kg-1 is only 0.4, then there is far more doubt associated with classifying the site as contaminated. Making a decision on the basis of this information is very difficult. So at what probability should a location be considered "contaminated"? The answer is completely subjective unless some optimality criterion is applied to the data. To determine the optimal probability threshold for declaring a location "contaminated" or "clean," the percentage of test locations that were wrongly classified as clean or contaminated with respect to a 300 mg kg-1 threshold was computed for the Glebe and Camperdown test sets. This process was conducted for each method individually, for a range of probability thresholds progressing from 0 to 1 in increments of 0.05, with the aim of selecting the probability corresponding to the smallest percentage of misclassification as the probability threshold indicative of contamination. At this probability threshold the chance of an incorrect prediction is minimized. This probability value may then be used for making the most accurate Pb contamination map in the Glebe and Camperdown area. Knowledge of the likelihood of misclassification is important from a decision-making perspective. In the context of this study it is also a gauge of method performance for local and global, CDF and multiple indicator kriging.
No distinction is made between the two kinds of error (i.e., false positive or false negative), only that an error exists. Sites misclassified by multiple indicator kriging may differ from those misclassified by CDF kriging and incorrect predictions may occur more frequently for large Pb concentrations than small concentrations. These issues were investigated to identify any patterns in site misclassification.
Lead Probability Estimates (Expected Loss)
Misclassification of land as "contaminated" or "clean" has significant implications, as poor predictions are likely to incur a loss. The loss may be a decline in human health resulting from a false negative error. Alternatively, the loss may be the additional cost of unnecessary remediation when the error is false positive. The relative impact of each kind of loss is obviously not the same. Loss causing adverse effects to human health is far greater than the purely financial cost of remediating "clean" land. If misclassification is negligible then the loss is also likely to be small. To determine the potential impact of site misclassification, the expected loss was computed for the Glebe data. The computational method is based on that described by Goovaerts (1997). In this scenario the importance is placed on the cost associated with a particular type of incorrect prediction rather than just the occurrence of either type of misclassification.
The loss associated with declaring a location "clean," L1, is shown in Eq. [11]. Conversely, the loss resulting from classification of soil as "contaminated," L2, is given in Eq. [12]. The parameter
1 represents the cost to human health in the event of a false negative error, and
2 is the cost of unnecessary remediation following a false positive error. The expected loss was calculated for
1 values of 2.5 and 10. These are arbitrary values and were chosen to illustrate the change this parameter may effect on the final expected loss. The term
2 was a constant assigned the value of 1, although this is also arbitrary. The decision of a value to assign to either
1 or
2 is subjective and beyond the scope of this research. A value of 2.5 for
1 was suggested by Goovaerts (1997) and 10 was selected here as a comparison. The term L1 accounts for the magnitude of the misclassification, that is, as soil becomes more "contaminated" the larger L1 becomes because the potential for adverse effects on human health depends on the Pb concentration in soil. The units of
1 are therefore money/(mg kg-1) and the units of loss are money. The relative size of the error is not considered in L2, because the decision to remediate is not dependent on the final Pb concentration, but simply on whether the critical threshold is exceeded. In this case the units of
2 are money only. This assumes that the remediation method is simply soil removal, hence the magnitude of error is not important. There are few options for remediation of soil contaminated with heavy metals, but in the event of using a remediation technology that is dependent on the concentration of metal, the loss described by L3 may be important (Eq. [13]). The term L3 is a modified version of L2, accounting for the magnitude of error in the same fashion as for L1. The expected loss was calculated on the E-type estimates (i.e., mean of the CCDF) of test set data for each of the four methods:
 | [11] |
 | [12] |
 | [13] |
The criteria for classifying soil as "contaminated" or "clean" may differ according to individual situations. The probability threshold determined earlier is the optimal criterion for classifying soil as "clean" or "contaminated" to minimize misclassification. This does not distinguish between the type of error. It may be more important to minimize the likelihood of a particular type of misclassification. Expected loss was calculated for probability thresholds progressing from 0 to 1 in increments of 0.01 (i.e., in the same manner as misclassification). The probability threshold corresponding to the minimum value of loss may assist in producing a map of Pb contamination that accounts for the specific type of misclassification error. The terms L1 and L2 and the parameters
1 and
2 have already been defined, although here they have been expressed in terms of probability (Eq. [14] and [15]). The term L1 was computed for
1 values of 2.5 and 10 and
2 was again kept constant at 1. Each type of loss is first calculated individually and then summed to give the mean total expected loss for each probability threshold, as shown in Eq. [16]:
 | [14] |
 | [15] |
 | [16] |
 |
RESULTS AND DISCUSSION
|
|---|
Lead Concentration Estimates
The RMSE for each method is provided in Table 3. These values are all quite large compared with the distribution of measured Pb concentrations. The RMSE is of similar magnitude to the upper decile of actual data, indicating poor accuracy of predictions. To determine what effect the heavily contaminated sites had on the RMSE, the locations with the 12 largest RMSEs were excluded from the calculation. These locations also happened to have the 12 of the 15 largest Pb concentrations. Table 3 illustrates the dramatic effect on the RMSE of removing the 12 outlier concentrations. While the RMSE was reduced to almost one-third of its original value, it is still large with respect to the distribution of measured Pb concentrations (greater than 75% of the data). These values, however, are still useful for comparative purposes. Such a large decrease in the RMSE with omission of sites containing large Pb concentrations suggests that the methods have limited predictive ability for heavily contaminated sites.
Two general observations can be made about these data. The first being that local variograms performed better than their global counterparts, and the second that multiple indicator kriging was more accurate than CDF kriging (i.e., local multiple indicator kriging had the smallest RMSE and therefore the most accurate predictions). Although this method yielded the smallest RMSE it should be noted that the difference between the four methods is small.
The analysis of variance found no significant difference between global and local CDF and multiple indicator kriging; however, the difference between the test sets was found to be significant at the 0.05 level. These results concur with the findings reported for the RMSE determination, that the methods are very similar to each other. There is such little difference between multiple indicator kriging and CDF kriging that both the RMSEs and the analysis of variance indicate either method performs better with a local variogram than with a global variogram.
Lead Probability Estimates
Site Misclassification
Figure 3
is a plot of the percentage of locations misclassified versus the probability threshold. The percentage of misclassification is large for probability thresholds close to 0 or 1 and decreases at intermediate probability thresholds. If the probability threshold is small (e.g., 0.1), then a location having a probability of 0.2 of exceeding 300 mg kg-1 would be considered "contaminated." A probability of 0.2 is hardly cause for concern and such a site would probably be "clean." Nevertheless, an incorrect classification has resulted. The opposite problem occurs when threshold probabilities take on large values. Goovaerts (1997) defines a marginal probability of contamination as the proportion of actual data that exceed the 300 mg kg-1 critical threshold. He states that the probability threshold at which misclassification is minimal is usually close to this marginal probability of contamination (Goovaerts, 1997). The marginal probability of contamination for the Glebe and Camperdown data set was 0.41. The probability value at which misclassification is minimal is either 0.45 (local CDF and multiple indicator kriging) or 0.5 (global CDF and multiple indicator kriging), both values close to 0.41.

View larger version (23K):
[in this window]
[in a new window]
|
Fig. 3. The percentage of sites misclassified as either "clean" or "contaminated" by each of the kriging methods.
|
|
Figure 3 further supports the suggestion that local multiple indicator kriging is the most accurate method of those investigated in this study. When all methods are compared, this method exhibits the smallest misclassification error over virtually the entire range of probability thresholds. These results differ from the previous finding that either method with a local variogram performs better than the global methods. Global multiple indicator kriging clearly demonstrated less misclassification than local CDF kriging. Multiple indicator kriging misclassified fewer sites than CDF kriging. Misclassification was minimized at a probability threshold of 0.45 for local multiple indicator kriging (14.75%) and 0.5 for global multiple indicator kriging (15.25%).
While all four methods demonstrate almost identical misclassification, the locations being misclassified are not identified. Incorrect predictions made by multiple indicator kriging and CDF kriging occur almost entirely at the same locations, only approximately 2.5% were found to be at different sites. This suggests that the methods are unable to predict with accuracy at sites with similar characteristics. Those sites with Pb concentrations greater than 300 mg kg-1 were incorrectly predicted in 9% of cases, whereas samples with Pb concentrations smaller than the EIL were misclassified only 3.5% of the time. It is reasonable to assume that the extra sites with large Pb concentrations being misclassified are probably outlier concentrations. This theory was discounted, however, upon further investigation. Of sites with Pb concentrations larger than 1000 mg kg-1, only 1% were incorrectly predicted. These results confirm the similarity between multiple indicator and CDF kriging and illustrate the absence of any obvious pattern among locations subject to misclassification errors.
Expected Loss
Discussion of expected loss will focus on the results for local multiple indicator kriging as all methods were very similar. Maps of L1, L2, L3, and the total loss obtained by summing L1 and L3 are illustrated in Fig. 4 to 7
. All maps displayed here (except Fig. 7) were produced with the loss calculated for local multiple indicator kriged mean estimates on the entire data set (i.e., at 31990 locations) with an
1 value of 2.5 and an
2 value of 1. Figure 7 illustrates the difference in magnitude of total loss for an
1 value of 10. Results of expected loss calculated with an
1 value of 10 were much larger than for an
1 of 2.5; however, patterns in all maps were identical. As expected, L1 (Fig. 4) is larger in known "contaminated" areas because the loss associated with misclassifying a heavily contaminated site is by definition greater than the loss incurred by misclassifying a site only slightly larger than the critical threshold. Obviously, there is no L1type loss in areas having small Pb concentrations. Figure 5 shows L2 to exhibit opposite behavior to L1, that is, L2 is larger in areas known to be "clean" and smaller in "contaminated" areas. The difference between L2 and L3 (Fig. 5 and 6) illustrates the effect of accounting for the magnitude of error. The loss associated with L2 is not as large because of the absence of a Pb concentration term, although both L2 and L3 exhibit similar trends. The term L3 contains more information than L2 and as such is a better measure of the loss incurred when a false positive error is made. Some regions may have Pb concentrations fractionally within the regulatory guideline. The relative cost of cleaning these regions is smaller than regions well within the guideline concentration. A map of the total expected loss across the survey area is shown in Fig. 7 (L1 + L3). The map of L1 + L2 has not been included because the same trends were observed as for the L1 + L3 loss. Similarly, the map of total loss (L1 + L3) calculated with an
1 value of 2.5 has been omitted, as it was found to be identical to that of L1 shown in Fig. 4. The total loss is greatest in high valued regions of the survey area and is concentrated in the central and northern parts of the area. Metal contaminants typically have highly skewed distributions and the very large concentrations give weight to L1. The term L3 is sensitive to the magnitude of error but the error will never be as large as for L1 because the maximum concentration difference can only be 300 mg kg-1.

View larger version (25K):
[in this window]
[in a new window]
|
Fig. 5. Map showing the loss incurred in the event of an incorrect classification of "contaminated" when soil is actually "clean" (L2; 2 = 1); the magnitude of error is not considered.
|
|
Figure 8
illustrates the change in L1 and L2 with increasing probability thresholds. Note that L2 does not change with
1. The loss is obviously much larger if
1 is equal to 10. By definition both losses, L1 and L2, cannot occur at the same time; the data in Fig. 8 are mean values for each probability threshold.

View larger version (27K):
[in this window]
[in a new window]
|
Fig. 8. Plot of probability threshold versus the loss incurred as a result of underestimating the Pb concentration (L1) and overestimating the Pb concentration (L2) for each of the four methods.
|
|
Figure 9
presents the total expected loss plotted against the probability threshold for
1 values of 2.5 and 10. The total expected loss is at a minimum at probability thresholds of 0.65 and 0.8 for
1 values of 2.5 and 10, respectively. The term L1 is minimized at a probability threshold of 0.95 for
1 values of 2.5 and 10. The term L2 is at a minimum at a probability threshold of 0. In other words, if virtually all sites are classified as "contaminated" (p = 0.95), then the likelihood of incurring a loss from a misclassification of "clean" is negligible, and vice versa. It is evident from Fig. 8 and 9 that virtually no difference in expected loss occurs as a function of the particular kriging method. The value of L1 will be greater in regions of the survey area with large Pb concentrations because the prediction at an unknown location will have a higher probability of exceeding 300 mg kg-1. Similarly, the value of L2 will be greater in regions where the Pb concentrations are small, as the probability that an unknown location will exceed 300 mg kg-1 is probably small.

View larger version (26K):
[in this window]
[in a new window]
|
Fig. 9. Plot of probability threshold versus the total expected loss incurred if soil is misclassified as either "clean" or "contaminated."
|
|
There is not necessarily a best estimate of the Pb concentration at a particular location for all situations. For example, it may not be necessary to remediate soil surrounding industrial sites to concentrations regarded as acceptable in soil of residential areas. Different situations require different optimality criteria. As a result of the analyses of misclassification and expected loss, several criteria have been derived for delineating "contaminated" from "clean" soil, namely, the probability thresholds at which minimal misclassification (p = 0.45) and minimal expected loss (p = 0.65 or 0.8 depending on the value of
1) occur. These criteria have been applied to the entire Glebe and Camperdown data set to produce maps delineating "clean" soil from "contaminated" soil. Figures 10 and 11
were produced with local multiple indicator kriged data at 31 990 locations as described previously. The dark shaded regions in Fig. 10 are those areas declared "contaminated" if the criteria for minimizing site misclassification are used (i.e., p = 0.45). If the probability of a location exceeding 300 mg kg-1 is greater than 0.45, then the location is considered to be "contaminated," otherwise it is "clean." If the criteria for minimizing expected loss is applied to the same data the resulting map is identical to Fig. 10 for a probability threshold of 0.65 (
1 = 2.5). The probability threshold at which expected loss is minimized for
1 = 10 is 0.85. Figure 11 shows the change this effects on the locations declared to be "clean" or "contaminated." A much smaller region is classified as "contaminated" because if the probability threshold is high then most sites will be considered "clean." Expected loss is minimized because by classifying locations that may only just exceed 300 mg kg-1 as "clean," the loss associated with remediation is avoided. A larger number of locations are likely to be misclassified in this scenario although the resulting loss will be smaller.

View larger version (26K):
[in this window]
[in a new window]
|
Fig. 10. Delineation of "clean" and "contaminated" soil with criteria for minimal misclassification; p = 0.45.
|
|

View larger version (25K):
[in this window]
[in a new window]
|
Fig. 11. Delineation of "clean" and "contaminated" soil with criteria for minimal expected loss; p = 0.85.
|
|
 |
DISCUSSION
|
|---|
The indicator kriging employed in the Glebe study differs from previous applications of this technique in that the number of cutoff values is far greater here and variograms have been computed locally. The choice of 100 cutoff values was made on the basis that this would produce a better CCDF. Order relation problems are dealt with by using the monotonic regression. Hendricks Franssen et al. (1997) conducted indicator kriging of Cu, Pb, and Zn concentrations from an urban area in the Netherlands also with only one cutoff value. In a comprehensive analysis of Cd contamination in the Swiss Jura, Goovaerts (1997) recommended using between 5 and 15 cutoffs. This is both to avoid intensive computation and to reduce the possibility of order relation problems (Goovaerts, 1997). Goovaerts et al. (1997) demonstrate how this indicator approach may be applied in determining the probability of contamination by Cd, Cu, and Pb in the Swiss Jura. Mohammadi (1997) conducted multiple indicator kriging on Cd data from Belgium with 12 cutoffs, computing a global variogram for each cutoff. Smith et al. (1993) also used indicator kriging with one cutoff value, although they simultaneously considered three variables.
In contrast to the results of Juang et al. (1998), kriging with the CDF of order statistics was not found to have greater accuracy than multiple indicator kriging. This was evidenced by the RMSE for each method and the greater proportion of sites misclassified by CDF kriging than by indicator kriging. At best, kriging with the CDF of order statistics produced estimates of similar or inferior accuracy to those of multiple indicator kriging. There are several important distinctions between the techniques used in this study and those of Juang et al. (1998). As previously described, the indicator kriging conducted in the Glebe study estimated the CCDF for each site at 100 cutoff values, hence the term multiple indicator kriging. The Taiwan study used only one cutoff value, the regulatory guideline for Zn. In addition, they used only global variograms while here both local and global variograms were computed. Other differences were the search neighborhood used for kriging and the size of the data set. Both global and local CDF and multiple indicator kriging of the Glebe data set were conducted with the nearest 100 data points to the unknown location. Juang et al. (1998) used a maximum of only eight data points for this purpose. The size of each of the survey areas and number of samples within them are also substantially different. The Taiwan study was conducted with only 81 observations spread over an area of 2000 ha. Samples were approximately 500 m apart. Sampling was far more intense in the Glebe survey and the kriging was conducted with almost 10 times the number of observations.
Most applications of kriging to heavy metal soil surveys have used ordinary kriging with global variograms to estimate the concentration of various heavy metals in soil (e.g., Atteia et al., 1994; Tao, 1995a,b). Von Steiger et al. (1996) used disjunctive kriging to determine the probability that the metal concentration will exceed an environmental threshold concentration and Bierkens (1997) used stratified residual kriging to obtain Pb maps in the city of Utrecht, the Netherlands. The use of indicator kriging is perhaps more useful as it offers the desirable criteria of estimating the local uncertainty through estimating the CCDF at each site. In addition to finding the optimal concentration estimate at a particular location given some optimality criterion, the CCDF provides additional information in the form of the probability of exceeding a critical threshold value at each location. Such information is particularly valuable in contaminated site assessment, where the decision to remediate may depend on the likelihood of exceeding regulatory guidelines.
 |
CONCLUSIONS
|
|---|
- Lead concentrations in Glebe and Camperdown topsoil are large with 41% of observations exceeding the environmental investigation limit of 300 mg kg-1. A number of sites contain Pb concentrations up to 66 times greater than this regulatory guideline.
- Kriging with the CDF of order statistics and multiple indicator kriging with local and global variograms were compared for their ability to describe the spatial distribution of Pb in urban topsoil. Method validation identified multiple indicator kriging conducted with local variograms as producing the most accurate predictions for delineating "clean" soil from "contaminated." There was very little difference between each of the methods.
 |
ACKNOWLEDGMENTS
|
|---|
This project has been assisted by the New South Wales Government through its Environmental Trust. We would like to thank Ms. Jean Davis for carrying out the survey of Camperdown heavy metal concentrations.
 |
REFERENCES
|
|---|
- Atteia, O., J.-P. Dubois, and R. Webster. 1994. Geostatistical analysis of soil contamination in the Swiss Jura. Environ. Pollut. 86:315327.[Medline]
- Australian and New Zealand Environment and Conservation Council and National Health and Medical Research Council. 1992. Australian and New Zealand guidelines for the assessment and management of contaminated sites. Australian and New Zealand Environment and Conservation Council and National Health and Medical Research Council, Canberra, Australia.
- Barabás, N., P. Goovaerts, and P. Adriaens. 2001. Geostatistical assessment and validation of uncertainty for three-dimensional dioxin data from sediments in an estuarine river. Environ. Sci. Technol. 35:32943301.[Medline]
- Bierkens, M.F.P. 1997. Using stratification and residual kriging to map soil pollution in urban areas. p. 9961007. In E.Y. Baafi and N.A. Schofield (ed.) Geostatistics Wollongong '96, Vol. 2. Kluwer Academic Publ., Dordrecht, the Netherlands.
- Carlon, C., A. Critto, A. Marcomini, and P. Nathanail. 2001. Risk based characterisation of contaminated industrial site using multivariate and geostatistical tools. Environ. Pollut. 111:417427.[Medline]
- Davis, J.M. 1995. Aspects of heavy metal concentrations in Camperdown topsoil. B.Sc. Honours thesis. Dep. of Agric. Chem. and Soil Sci., Univ. of Sydney, Australia.
- Dole, D. 1999. Scatterplot smoothing subject to monotonicity and convexity. Report. Dep. of Agric. and Resour. Econ., Univ. of Western Australia, Nedlands, Australia.
- Goovaerts, P. 1997. Geostatistics for natural resources evaluation. Oxford Univ. Press, New York.
- Goovaerts, P., R. Webster, and J.-P. Dubois. 1997. Assessing the risk of soil contamination in the Swiss Jura using indicator geostatistics. Environ. Ecol. Stat. 4:3148.
- Hendricks Franssen, H.J.W.M., A.C. van Eijnsbergen, and A. Stein. 1997. Use of spatial prediction techniques and fuzzy classification for mapping soil pollutants. Geoderma 77:243262.
- Journel, A.G. 1983. Nonparametric estimation of spatial distributions. Math. Geol. 15:445468.
- Juang, K.-W., D.-Y. Lee, and C.K. Hsiao. 1998. Kriging with cumulative distribution function of order statistics for delineation of heavy-metal contaminated soils. Soil Sci. 163:797804.
- Markus, J.A. 1993. A survey of heavy metals in the topsoil of Glebe. B.Sc. Honours thesis. Dep. of Agric. Chem. and Soil Sci., Univ. of Sydney, Australia.
- Markus, J.A., and A.B. McBratney. 1996. An urban soil study: Heavy metals in Glebe, Australia. Aust. J. Soil Res. 34:453456.
- Mohammadi, J. 1997. Geostatistical mapping of environmental soil hazards. Ph.D. diss. University of Ghent, Belgium.
- Silverman, B.W. 1985. Some aspects of the spline smoothing approach to non-parametric regression curve fitting (with Discussion). J. R. Stat. Soc. B47:152.
- Smith, J.L., J.J. Halvorson, and R.I. Papendick. 1993. Using multiple-variable indicator kriging for evaluating soil quality. Soil Sci. Soc. Am. J. 57:743749.[Abstract/Free Full Text]
- Tao, S. 1995a. Spatial structures of copper, lead and mercury contents in surface soil in the Shenzhen area. Water Air Soil Pollut. 82:583591.
- Tao, S. 1995b. Kriging and mapping of copper, lead and mercury contents in surface soil in Shenzhen area. Water Air Soil Pollut. 83:161172.
- Van Meirvenne, M., and P. Goovaerts. 2001. Evaluating the probability of exceeding a site-specific soil cadmium contamination threshold. Geoderma 102:75100.
- Von Steiger, B., R. Webster, R. Schulin, and R. Lehmann. 1996. Mapping heavy metals in polluted soil by disjunctive kriging. Environ. Pollut. 94:205215.[Medline]