Journal of Environmental Quality 31:1538-1549 (2002)
© 2002 American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America
TECHNICAL REPORTS
Ground Water Quality
Application of Classification-Tree Methods to Identify Nitrate Sources in Ground Water
Timothy B. Spruill*,a,
William J. Showersb and
Stephen S. Howea
a United States Geological Survey, 3916 Sunset Ridge Rd., Raleigh, NC 27607
b Dep. of Marine Earth and Atmospheric Sciences, North Carolina State University, Raleigh, NC 27695-8208
* Corresponding author (tspruill{at}usgs.gov)
Received for publication August 17, 2001.
 |
ABSTRACT
|
|---|
A study was conducted to determine if nitrate sources in ground water (fertilizer on crops, fertilizer on golf courses, irrigation spray from hog (Sus scrofa) wastes, and leachate from poultry litter and septic systems) could be classified with 80% or greater success. Two statistical classification-tree models were devised from 48 water samples containing nitrate from five source categories. Model 1 was constructed by evaluating 32 variables and selecting four primary predictor variables (
15N, nitrate to ammonia ratio, sodium to potassium ratio, and zinc) to identify nitrate sources. A
15N value of nitrate plus potassium >18.2 indicated animal sources; a value <18.2 indicated inorganic or soil organic N. A nitrate to ammonia ratio >575 indicated inorganic fertilizer on agricultural crops; a ratio <575 indicated nitrate from golf courses. A sodium to potassium ratio >3.2 indicated septic-system wastes; a ratio <3.2 indicated spray or poultry wastes. A value for zinc >2.8 indicated spray wastes from hog lagoons; a value <2.8 indicated poultry wastes. Model 2 was devised by using all variables except
15N. This model also included four variables (sodium plus potassium, nitrate to ammonia ratio, calcium to magnesium ratio, and sodium to potassium ratio) to distinguish categories. Both models were able to distinguish all five source categories with better than 80% overall success and with 71 to 100% success in individual categories using the learning samples. Seventeen water samples that were not used in model development were tested using Model 2 for three categories, and all were correctly classified. Classification-tree models show great potential in identifying sources of contamination and variables important in the source-identification process.
Abbreviations: CART, classification and regression tree USGS, United States Geological Survey
 |
INTRODUCTION
|
|---|
NITRATE IN GROUND water has been known to be a potential human health problem for more than 50 yr, since Comly (1945) reported that concentrations of nitrate in drinking water could cause methemoglobinemia in infants. A nitrate drinking water standard of 45 mg/L for nitrate (10 mg/L of nitrate, as nitrogen) for United States public water supplies was established in 1962 (United States Department of Health, Education, and Welfare, 1962). This standard has remained in force since 1962 and is the current maximum contaminant level (MCL) for public drinking water supplies (USEPA, 2001).
Some areas of the United States are more likely than others to have high nitrate concentrations in ground water. Susceptibility to nitrate contamination typically is highest in areas with sandy soils (Nolan et al., 1997). Within the AlbemarlePamlico Drainage Basin of North Carolina and Virginia, the highest nitrate concentrations occurred in areas having sandy soils with relatively low organic carbon content (Spruill et al., 1997; Spruill et al., 1998). Such areas primarily are located in the inner Coastal Plain where dissolved carbon concentrations are less than 3 mg/L. Nitrate concentrations exceeded the 10 mg/L maximum contaminant level in about 5% of the ground water samples from these areas.
To control nitrate contamination in ground water, the nitrate sources must be identified before appropriate and effective management actions can be taken. Ground water can have many nitrate sources, both natural and anthropogenic (Madison and Brunett, 1985; Hallberg and Keeney, 1993; Spalding and Exner, 1993). Rain, forests, grasslands, agricultural lands, organic wastes (e.g., farm manures, sewage sludges, food-processing wastes, and crop residues), row crops, vegetable crops, and livestock production are all potential nitrate sources in ground water.
Nitrogen sources have increased over the last several decades (Smil, 1997; Vitousek et al., 1997). Nationally, nitrogen applications to agricultural lands have increased 20-fold over the last 50 yr, and the most dramatic increases have occurred over the last 30 yr (Puckett et al., 1999). On an annual basis, fertilizer is the largest input of nitrogen to most agricultural systems (Hallberg and Keeney, 1993). In North Carolina, confined feeding operations, particularly with respect to hog production, have increased from 2.2 million hogs in 1990 to more than 10 million hogs in 1999, primarily in the Coastal Plain, making North Carolina the second largest producer of hogs in the United States (Mallin, 2000). In addition, human populations have increased as much as 40% since 1990 in some counties included in this study (United States Census Bureau, 2001). Because of increased nitrogen sources, the many potential regional or local nitrate sources to ground water, and increasing numbers of people in close proximity to these sources, identifying the predominant nitrate sources in ground water may not be easy. Reliable methods are needed that can be used by natural resources scientists and managers to identify sources of nitrate-contaminated ground water.
 |
PREVIOUS STUDIES
|
|---|
Several studies have been conducted over the last 30 yr to identify nitrate sources in ground water (Kreitler, 1975; Kreitler and Jones, 1975; Gormly and Spalding, 1979; Fogg et al., 1998) and surface water (Showers et al., 1990). Gormly and Spalding (1979) used isotopes of nitrogen and found that the primary nitrate sources in ground water in Nebraska and corresponding
15N range of values were +5 to +9
(per mil) for soil nitrogen, -2 to +7
for commercial fertilizer, and +10 to +23
for livestock. Komor and Anderson (1993) used
15N to distinguish nitrate sources in ground water beneath five land-use settings in Minnesota and found that water from wells in livestock feedlots had an average
15N concentration of 21.3
; in cultivated irrigated fields, 7.4
; in residential areas with septic systems, 6
; in nonirrigated cropland, 3.4
; and in natural undeveloped areas, 3.1
. Several isotope chemists reported that
15N concentrations of 10
or greater (Kreitler, 1975; Gormly and Spalding, 1979; Aravena et al., 1993; Fogg et al., 1998; Kendall and McDonnell, 1998) indicate that nitrogen from animals is present. In general,
15N has been demonstrated to be an effective discriminator between plant or commercial fertilizerderived nitrate and animal-derived nitrate, but divisions between multiple animal sources and humans are less well defined (Fogg et al., 1998; Kendall and McDonnell, 1998). However, Fogg et al. (1998) indicated that separations between septic and dairy or feedlot sources were possible and, based on their data, septic wastes had a
15N signature range from 7.3 to 10.3
, whereas the
15N signature range of the animal sites was from 10 to 14
.
Thus, although
15N of nitrate can be used to distinguish between animal and organic N or inorganic fertilizer-derived nitrate, it has not been successfully used alone to distinguish between subcategories of animal-derived nitrate in ground water. Even coupling
15N with other isotopes, such as
18O, has not been particularly successful for determining differences between animal sources. Nitrate
15N data in combination with other water quality variables, such as ions or ionic ratios, however, may be effective in distinguishing animal sources. For example, halogen ratios have been used to identify specific oil-field brines or salt contamination of freshwater aquifers (Whittemore and Pollock, 1979) or to discriminate among precipitation, natural ground water, domestic wastes, and saltwater contamination from evaporites (Davis et al., 1998). By including more variables in the source-identification process, the probability should be greater for successful discrimination among animal sources. Karr et al. (2001) recently coupled the information from both major ion and stable isotope chemistry of ground and surface water to identify sources of nitrate contamination.
 |
MULTIVARIATE STATISTICAL METHODS
|
|---|
Multivariate techniques, both computational and graphical, have been applied to determine the natural phenomena that control ground water quality. Waters associated with specific sources, such as aquifers or petroleum reservoirs, often can be distinguished by using trilinear and pattern diagrams, such as those devised by Piper (1944) and Stiff (1951). Hem (1985) presents several examples of the use of Piper diagrams for distinguishing water composition derived from specific aquifers. These techniques work, in general, because the specific minerals used for source identification either are dissolved by water moving through the rock matrix that composes the natural reservoir or contain connate waters that provide a unique signature of the source. However, for the same reason that makes these diagrams (which use only seven or eight ions) effective at discerning ions derived from a few natural sources, discerning anthropogenic sources with such a limited number of ions becomes considerably more difficult, because of the similarity of concentrations of the same few ions produced by many different natural and anthropogenic sources. The use of more sophisticated multivariate techniques, which can incorporate information from many more chemical ions, chemical isotopes, and associated properties to detect unique combinations of variables that identify each source, becomes imperative.
Multivariate statistical methods, capable of distinguishing complex relations among many variables, can be useful for source-identification problems. Alley (1993) presented an excellent overview of multivariate statistical techniques that have been applied to examine phenomena associated with water quality and to understand behavior and spatial patterns of water quality constituents. These techniques include cluster analysis, principal components analysis (PCA), and factor analysis. Steinhorst and Williams (1985) applied multivariate analysis, including analysis of variance, canonical analysis, and discriminant analysis to segregate ground water sources and to differentiate water quality associated with particular aquifers in basalt flows and interbeds in south-central Washington. Multivariate procedures, however, have not been used extensively to determine contamination sources from human activities.
A primary assumption behind this study is that the variability in one or more chemical constituents caused by anthropogenic sources is greater than that caused by other possible natural sources, such as minerals in rocks and soils of the region; therefore, certain constituents can be related to waste-specific sources. The waste-specific sources that often contribute to nitrate contamination are septic-system wastes; fertilizers applied to lawns, row crops, and golf courses; hog wastes leaking from lagoons or sprayed on crops as fertilizer; and chicken wastes applied to crops as fertilizer (Madison and Brunett, 1985; Hallberg and Keeney, 1993).
When the objective of an analysis is to determine into which predefined category a particular observation belongs, discriminant analysis (Davis, 1985) and classification or regression trees (Wilkinson, 2000) are appropriate techniques. Discriminant analysis is a multivariate technique, related to multiple regression, whereby linear equations are found that best discriminate the observations into two or more groups (Wilkinson, 2000). Although either discriminant analysis or classification-tree models are appropriate for the problem of classifying observations into predefined groups, classification-tree techniques have several advantages over discriminant analysis. The primary advantage of classification trees is that they are graphical and the output is more easily interpreted than strictly numerical methods, such as discriminant analysis (Breiman et al., 1984; StatSoft, 2001). As an example, classification-tree model output is hierarchical (StatSoft, 2001) and produces a visual representation of a dichotomous key, familiar to biologists, that visually and sequentially guides the user through a series of simple ifthen statements from the beginning of the tree through a series of subgroups to the final group classification. Other advantages of classification trees over discriminant analysis procedures are that they are nonparametric (Breiman et al., 1984) and can incorporate categorical data, thus making classification-tree methods more versatile with respect to variables that can be included in model development.
After reviewing statistical procedures in available software, classification trees were selected as a versatile tool that can be applied and understood effectively by those who may not have extensive statistical training. Even though many statisticians are not familiar with classification-tree techniques (Wilkinson, 2000), tree models and their development began in the 1960s in the field of social sciences and have, for about the last 20 yr, been extensively used in medicine, marketing, and information management. Regression-tree models (similar to classification-tree models) have only recently been applied to water quality problems. Qian and Anderson (1999) used regression trees to identify factors that affect pesticide concentrations in the Willamette River basin in Oregon. Robertson et al. (2001) used regression trees to identify important environmental variables that affect nutrient concentrations in watersheds in the upper Midwest.
The purpose of this study was to apply tree-based classification methods to (i) determine which water quality variables, both with and without
15N, could be used to identify the source of nitrate contamination with 80% or better success using selected chemical characteristics of the water sample from five known source categories, and (ii) determine if the chemical characteristics of water samples collected from wells in the North Carolina Coastal Plain and contaminated with nitrate can be used to identify the nitrate source. Ultimately, the intent of this study is to develop and demonstrate the potential of a simple predictive classification procedure that could be used and further developed by environmental scientists and regulators to identify principal nitrate sources present in ground water in a specific geographic area and perhaps apply these procedures to similar environmental problems. Throughout the remainder of this paper, the
15N of nitrate will simply be referred to as
15N.
 |
METHODS
|
|---|
Five common nitrate sources were selected for the analysishog wastes sprayed on cultivated fields (Spray), poultry wastes applied as litter (Poultry), septic-system wastes (Septic), inorganic fertilizer applied on golf courses (Golf), and inorganic fertilizer applied on row crops (Crop). Permission was obtained to sample ground water from 4 to 15 locations per category in the Coastal Plain of North Carolina (Fig. 1)
. Ground water samples were collected directly beneath each source area or, in the case of septic wastes, in the septic field or beneath fields sprayed with septic wastes. Forty-eight ground water samples from 48 wells were included for development of the model.
Wells included in the study were screened to intercept at least the upper 1.5 m of the saturated zone near the water table and were intended to intercept recent (<2 yr old) vertical recharge. The water table of the shallow aquifer usually is located within 3 m of the land surface in the North Carolina Coastal Plain and depth to water ranges between 1 and 3 m below land surface. United States Geological Survey (USGS) wells in the study area intercepted the upper 0.3 to 0.6 m of the saturated zone. In general, areas having sandy soils were selected for sampling to maximize the probability of contamination from nitrate and to ensure that adequate oxygen to maintain nitrate was present. Although only water samples having NO3N concentrations greater than 3 mg/L were to be collected (concentration was estimated by using test strips for nitrate), a few samples received from the lab had lower concentrations. Four samples had concentrations too low (<0.5 mg/L) to analyze
15N and were not used. Twenty-six wells were installed and/or used by the North Carolina Department of Environment and Natural Resources (NCDENR) as monitoring wells for a study of pesticides and nitrate in North Carolina ground water (Wade et al., 1997), onsite waste disposal, or other studies. Wells installed by the NCDENR typically were constructed of polyvinylchloride (PVC) with 1.5- to 3-m screens located in the saturated zone of the aquifer beneath the contaminant sources. The USGS installed temporary wells using a minipiezometer assembly (Winter et al., 1988) at 16 of the sites. The minipiezometer was hammered to the desired depth, the 2.5-cm screen extended, and the water sample collected through polytetrafluoroethylene (PTFE) or nylon tubing using a peristaltic pump. North Carolina State University installed six shallow PVC wells that were used in this study.
Each water sample was analyzed for 32 water quality variables that were included in model development (Table 1). Selected water quality data collected from the 48 wells are presented in Table 2. Water samples from 17 additional wells, most with 0.5- to 1.5-m screens, were used to test the resulting models and were collected as part of other USGS and NCDENRNorth Carolina Department of Agriculture (NCDA) studies conducted in the study area (Table 3). All water samples collected between August 1996 and February 2000 were filtered through a 0.45-µm capsule filter by using either a peristaltic or submersible pump fitted with either PTFE or nylon tubing. The USGS National Water-Quality Laboratory in Denver, Colorado analyzed major inorganic ions and nutrient species according to methods in Fishman (1993). Either the Stable Isotope Laboratory at North Carolina State University or the USGS Stable Isotope Research Laboratory in Menlo Park, California analyzed samples for
15N of nitrate. Determinations of
15N were done according to methods presented in Chang et al. (1999) and Silva et al. (2000). Either the USGS National Water-Quality Laboratory or the NCDENR Division of Water Quality Laboratory analyzed the additional 17 well-water samples that were collected as part of the USGS AlbemarlePamlico Water-Quality Assessment (NAWQA) Program (Spruill et al., 1998) or for the North Carolina Interagency Pesticide Study (Wade et al., 1997).
Two classification-tree models were devised by using the classification and regression tree (CART) procedure (Breiman et al., 1984) on the original 48-sample data set. Model 1 included nitrate
15N because it is known to be highly valuable in discriminating animal and fertilizer nitrate. However,
15N may not be available because of its cost or because it is not a standard analyte in most ground water monitoring networks. Therefore, all variables, except
15N, were used in devising Model 2.
The basic idea behind classification-tree models is to create a hierarchical tree of key variables and values based on a sample of objects of known classes (termed the learning sample); the resulting tree is then used to predict classes from another independently obtained sample having the same variables but unknown classes (termed the test sample). Classification-tree procedures employed by many statistical programs begin by separating the initial group composed of all observations (termed the root node, which is also a parent node or split node) into two homogeneous groups (termed child nodes) (Fig. 2)
. The program does this by examining all possible variables and then selecting the best variable (termed the split variable) to split the group into two homogeneous groups (nodes that have the fewest misclassifications or lowest "impurity" and greatest reduction in error from the previous node). The two resulting child groups are now the new parent nodes. The program again splits each of the two new parent nodes into two more child nodes each. This process continues until all of the objects or observations are classified. The groups formed at the end of the tree, which cannot be split any more, form the terminal nodes of the tree (Fig. 2).

View larger version (19K):
[in this window]
[in a new window]
|
Fig. 2. Diagram of hypothetical classification tree showing node types, split variables, and associated split values.
|
|
A variety of tree models including THAID (Morgan and Messenger, 1973), CART (Breiman et al., 1984), FACT (Loh and Vanichsetakul, 1988), and QUEST (Loh and Shih, 1997) are available through several statistical software programs and different tree models may generate different trees according to the classification algorithms employed by the particular model (StatSoft, 2001). Specific splitting algorithms for many of these programs are discussed in Loh and Shih (1997). The CART procedure (Breiman et al., 1984) and a variation, RPART (Therneau and Atkinson, 1997), both used in this analysis, evaluate all variables to determine which variable can make the best split (i.e., the variable that splits the parent group into the two purest child groups) using the GINI index of impurity (i/t) (Breiman et al., 1984). The GINI index is a measure of the total error (also known as deviance, Di, for classification trees), in any node and is computed by:
where j is the number of classes in any node t and p is the proportion of the class at the node (Loh and Shih, 1997). Thus, if the first, or root node, contains four classes in equal proportion, then the GINI index is 1 - [(1/4)2 + (1/4)2 + (1/4)2 + (1/4)2] or 1 - 1/4 or 0.75. A node with only one class (all observations are perfectly classified) would have a GINI impurity index value of 1 - (1)2 or 0. The error after the split is the sum of the error of the two resulting child nodes, where Di (child) = Di (left child) + Di (right child). The variable selected would be the one that most reduces the error between the parent and the sum of the error of the two new child nodes:
A succinct description of the GINI index is presented in StatSoft (2001) and Qian and Anderson (1999). It should be noted that the models developed in this paper are not necessarily unique, and it is possible that the model algorithm could select more than one competing variable or split value, particularly with small sample sizes. However, both CART (Breiman et al., 1984) and RPART (Therneau and Atkinson, 1997) were used in the model development process and resulted in very similar models.
An important consideration in devising tree models pertains to the construction of the "right-sized" classification tree (StatSoft, 2001). In essence, how large should the tree be to give the needed predictive accuracy without creating too complex a tree? For example, it may be possible to construct (or "grow") a tree that perfectly classifies all objects, but the resulting tree could be very long and complex, possibly ending with each observation in its own terminal node. A tree that is too short (having too few split nodes) will often have a higher predictive error (or cost) than a more complex tree with more splits and nodes. The issue of when to stop building the tree is a major topic in the classification-tree literature, and good discussions of the principal methods available (including test sample cross-validation, V-fold cross-validation, and global cross-validation) are presented in Breiman et al. (1984) and Statsoft (2001). However, because the intent of this study was largely exploratory in nature and the sample size of 48 observations with five separate groups was very small, a rigorous development of a final fully cross-validated tree model was not the focus of this paper.
In addition to the standard analysis of tree models, the classification success of the terminal nodes of both models (evaluated simply as the percentage of correct classifications of each group) was used to estimate the predictive classification potential of each model, similar to classification matrices produced by discriminant analysis procedures in several commercially available statistical programs. The 48 water analyses shown in Table 2 compose the learning sample by which both classification-tree models were constructed. These are the original observations (i.e., water samples with variables selected by the program for construction of Model 1 or Model 2) that form the basis for each model. If the performance were good (80% classification success or better on the learning sample was considered to be acceptable), there would be a basis for adopting the model for practical use or further development to test the model's predictive power and reliability.
Testing on an independent sample and comparing classification success for each category between the learning sample and test sample can be used to demonstrate the practical predictive performance of the model (model validation). However, Model 1 could not be validated by testing with an independent sample, because the primary split variable selected by Model 1 included
15N, which was not available for analyses of water samples from most wells where the nitrate source was known. All variables identified by Model 2 were available, and the predictive success of Model 2 was validated by using water analyses from an independently obtained test sample of 17 wells not used for Model 2 construction (Table 3) to evaluate model validity. A KruskalWallis test (Conover, 1980) was used when evaluating differences between distributions of model variables among the five sources.
 |
RESULTS AND DISCUSSION
|
|---|
A classification-tree model (Model 1, Fig. 3)
was devised by using all 32 variables (including
15N). The classification tree consists of four splits and five terminal nodes. Only 46 of the original 48 samples were used because of missing zinc data for two of the water samples. The most important variables in this classification tree were potassium plus
15N of nitrate (KNO315), nitrate to ammonia ratio (NO3NH4), sodium to potassium ratio (NAK), and zinc (ZN). The resulting classification matrix for evaluating Model 1 performance on the learning sample is shown in Table 4. Source classification of contamination by inorganic fertilizer in both the Crop and Golf categories resulted in 100% correct placement. The Septic category nitrate sources were classified with 75% success. Water samples from the Poultry category were placed with 71% success. Overall correct classification performance of Model 1 was approximately 88% for all five categories. Because all observations with
15N of nitrate were used to develop the model, no independently collected observations (water samples) were available to test model performance.

View larger version (32K):
[in this window]
[in a new window]
|
Fig. 3. Classification tree for Model 1 using the predictor variables potassium plus 15N of nitrate (KNO315), nitrate to ammonia ratio (NO3NH4), sodium to potassium ratio (NAK), and dissolved zinc (ZN), in micrograms per liter.
|
|
Model 2 was formulated without
15N data. All 48 samples were used in model development. The model that resulted included the sum of sodium plus potassium (NAKSUM), nitrate to ammonia ratio, calcium to magnesium ratio (CMR), and sodium to potassium ratio (Fig. 4)
. Classification success ranged from 100% for ground water from beneath fertilized golf courses to 71% for water collected from beneath fields fertilized with poultry litter (Table 5a). Overall classification success for the model on the learning sample was about 85%, similar to Model 1. Seventeen samples collected from other areas in the Coastal Plain for three of the five categories were used for validating Model 2. Classification success for Crop, Spray, and Septic categories was 100% (Table 5b).

View larger version (33K):
[in this window]
[in a new window]
|
Fig. 4. Classification tree for Model 2 using the predictor variables sum of sodium plus potassium (NAKSUM), nitrate to ammonia ratio (NO3NH4), calcium to magnesium ratio (CMR), and sodium to potassium ratio (NAK).
|
|
Application of classification-trees to ground water quality data from eastern North Carolina appears to be very useful in identifying nitrate sources. Model 1 identified four important variables in discriminating between the five groupspotassium plus
15N of nitrate, nitrate to ammonia ratio, sodium to potassium ratio, and zinc. Consistent with previous work, much of it summarized in Kendall and McDonnell (1998),
15N of nitrate is very useful in distinguishing animal sources of N from the other two major environmental sources of N, soil organic N, and fertilizer N. For discussion purposes, another model (not shown) was constructed by using only
15N, with a resulting model-derived split value (SV) of about 8.5
and correctly classified most soil organic and/or inorganic fertilizer sources and animal-based N sources. Based on the learning sample, the model using
15N alone was able to correctly classify 17 of 18 fertilizer- or organic Nderived nitrate samples and 29 of 30 animal-source samples. The addition of potassium, in milligrams per liter, to the
15N per mil concentrations, however, better separated (i.e., caused less overlap of the distributions) the animal from the inorganic- and/or plant-derived nitrate nitrogen than
15N alone, as shown in Fig. 5
, and was selected by CART for this data set as the best first split. The primary improvement appears to result from the improved ability to separate poultry from the inorganic- and/or soil organic Nderived nitrogen sources and the Golf category from the animal-derived N sources.
In Model 1, the best discriminator of Golf from Crop samples for the model run shown was the nitrate to ammonia ratio (split value = 575). In general, the Golf water samples had much lower nitrate nitrogen concentrations (median = 2.9 mg/L) than the Crop samples (median = 14.5 mg/L). However, some model runs used nitrate concentrations (model not shown) or other nitrate-related ratios (nitrate to potassium ratio for Model 2) to separate these two groups. The sample size for the Golf category (N = 4), however, was so small that it might not be possible to distinguish Crop from Golf categories, unless ground water nitrate concentrations are lower at golf courses compared with those at cultivated fields. Thus, although ground water beneath golf courses appears to have lower nitrate concentrations compared with ground water beneath row crops, many more randomly selected water samples stratified by source would need to be collected to reach such a conclusion.
The best discriminator of septic waste from other animal-derived N sources was the sodium to potassium ratio. Based on information shown in Fig. 6
, sodium concentrations in ground water contaminated by septic wastes were higher than those in ground water contaminated by other animal-derived wastes, and the sodium to potassium ratios of septic wastes were significantly higher (median of approximately 14, p < 0.05) than other categories investigated (median of all categories < 3). Wilhelm et al. (1994) used sodium concentrations to identify septic-system contamination at a site in Canada. The concentrations were approximately 10 times the background sodium concentration of the ground water (Wilhelm et al., 1994) and the ratio of sodium to potassium in these septic wastes was about 8. Data from Zublena et al. (1993b) indicate that the sodium to potassium ratios for swine lagoon wastes and stockpiled broiler or layer litter (Zublena et al., 1993a) and common fertilizers (Zublena et al., 1991) are all less than 0.5, much lower than the sodium to potassium ratio (approximately 7.5 to 8) indicated by data from Wilhelm et al. (1994) for septic wastes. The sodium to potassium ratio data shown for septic wastes in the North Carolina Coastal Plain in Fig. 6 had a median of about 14 with 75% of the samples exceeding 8, which is comparable with the ratio shown in Wilhelm et al. (1994). The data from our study suggest that sodium relative to potassium is much higher in septic wastes compared with either of the other animal-derived wastes and may be due to the preponderance of sodium in the typical human diet and the use of salt in water softeners in rural areas. In any case, the sodium to potassium ratio appears to be a good identifier of septic-system wastes within the study area.

View larger version (14K):
[in this window]
[in a new window]
|
Fig. 6. Distributions of (A) NA (sodium, in milligrams per liter) and (B) NAK (sodium to potassium ratio, unitless) in five source categories showing increase of separation between septic and the other two animal source categories when NAK is used.
|
|
After segregating the septic from the poultry and hog-spray wastes (sodium to potassium ratio <3.2, Fig. 3), zinc was useful for further separating the hog and poultry wastes. From the model, a zinc value greater than 2.2 µg per liter (µg/L) indicated hog wastes, whereas values less than 2.2 µg/L indicated poultry wastes. Zinc is added to hog feed as a growth enhancer (National Research Council, 1998) and may be the reason for the higher concentrations observed in ground water samples collected beneath crops fertilized with hog spray.
From the performance data shown in Table 4 for the learning sample, Model 1 appears to be an excellent discriminator of nitrate from inorganic fertilizer on crops, golf courses, and sprayed hog wastes (100, 100, and 92% respectively). Model 1 did not do as well in discriminating between poultry and septic sources, as indicated by the lower classification-success rates (71 and 75% respectively, Table 4). As has been shown by previous researchers, this may be because the
15N values of the septic sources have been shown to have a wide range (7.3 to 10
) that grades into values in both the Crop and Golf categories (Fig. 5), making discrimination difficult. The overlap was not improved by adding potassium (Fig. 5), where the lower tail of the Septic distribution overlaps with the Crop and Golf categories.
Thus, although
15N by itself is not particularly successful in separating specific animal sources (Kendall and McDonnell, 1998) and shows no difference between animal categories in the area studied in the Coastal Plain of North Carolina (Fig. 5), using it in combination with other isotopes (such as
18O, as suggested in Kendall and McDonnell, 1998) or ions, as demonstrated by results shown in this paper, can potentially segregate by animal-source category. An advantage of using major ions, as opposed to various isotopes, is related to the generally lower cost of the analysis for major ions. Although major ions alone can be used effectively in eastern North Carolina and probably most areas where the specific conductance of the shallow ground water is 350 µS/cm or less, specific models probably will need to be devised for areas where specific conductance is typically greater than this. Such areas include coastal areas and parts of the western and midwestern United States where evaporite deposits or saltwater intrusion occurs. In these areas,
15N is probably the best indicator of nitrate sources. In such areas, further separation of nitrate sources by using major ions may be difficult or require trace elements or other isotopes.
Nevertheless, in North Carolina and perhaps other areas of the East Coast where shallow ground water has relatively low dissolved solids, major ions can be used effectively to identify sources, as indicated by results shown for Model 2 (Fig. 4, Table 5a). In this model, sodium plus potassium, in mg/L, was found to be an excellent indicator of inorganic and/or soil organic N and animal-derived nitrate sources, with only one crop fertilizerderived water sample misclassified as septic-derived N and one septic-derived sample classified as nitrate from an inorganic fertilizer source (Table 5a). The overall classification success rate for Model 2 on the learning sample was 85%. The primary distinguishing characteristic of water samples from golf courses was the low nitrate concentration, although statistical limitations of its use for this purpose have been mentioned already. The nitrate to ammonia ratio was used by Model 2 (as in Model 1) to best distinguish the two categories, although the split value (454) was lower in this model. The calcium to magnesium ratio (split value = 2.9) was best used to distinguish poultry from hog spray, and sodium to potassium ratio was best used to distinguish septic from hog spray. The performance of the calcium to magnesium ratio in identifying poultry sources was identical to the performance of zinc in Model 1 (71% success, Table 4). Calcium and magnesium may be easily leached in the North Carolina Coastal Plain, where the cation exchange capacity (CEC) is typically low (<2 cmolc/kg). The mobility of cations may be greatly enhanced in much of the Coastal Plain, which may allow for their use in source identification in this and other areas having low CEC.
Although additional samples would be desirable in formulating a more precise model, both Model 1 and Model 2 appear to be effective in identifying nitrate from specific waste sources, at least for inorganic fertilizer-derived nitrate (Crop, Golf) and animal-derived nitrate (Spray and Septic) categories. Model 2 was tested using 17 water samples that were not used in model formulation, yielding a 100% classification success rate for the three categories (Crop, Septic, and Spray) for which data were available. The reliability of the model is further substantiated in that one well (GR-851995; Table 3) in the test data set sampled in 1995 was identified as an inorganic fertilizer source and in 1999 was identified as a hog-waste spray source (GR-851999; Table 3). Hog spray was indeed used after 1995 for fertilizing crops grown in this field and the model correctly identified nitrate sources for each time period. The water sample from L2 in 1995 (L21995; Table 3) indicated inorganic fertilizer and/or soil organic nitrogen as a source and again in 2000 (L22000; Table 3). This area is not affected by spray and is upgradient from fields that received spray. In addition, two drainage ditches (MS4D1 and MS4D2; Table 3) drain fields fertilized with inorganic fertilizer and hog spray, respectively, and were identified correctly by the model.
A significant finding of this study was that, with the exception of nitrate, no anion was identified as an important classification variable. These results suggest that although anions generally are more mobile in water, they do not differ significantly in concentration among source categories in shallow ground water of the North Carolina Coastal Plain. Even nitrate was found to be important only in distinguishing the fertilizer from crop and golf courses; of the four golf course samples used, all had lower nitrate, which may or may not be generally representative of golf courses. No significant differences were found among categories for sulfate (p > 0.05), and chloride in the Septic category was significantly higher (p < 0.05) than the Crop, Golf, and Poultry categories, but not the Spray category (p > 0.10), which explains why sodium was selected by the model.
 |
CONCLUSIONS
|
|---|
There are many possible applications of the classification-tree models presented in this paper. Some of these applications include determining nitrate sources in wells that appear unusual (i.e., determining the source of high nitrate concentrations in the vicinity of other wells that have much lower concentrations); determining the principal source of high nitrate where multiple sources may be contributing (septic tank vs. nearby chicken or crop-farming operations); and evaluating effectiveness of management actions (i.e., eliminating a source of contamination, such as a leaking sewer or spray application).
The classification-tree models developed in this study demonstrate that they are useful in identifying variables that are important in the source-identification process and that
15N, dissolved calcium, magnesium, sodium, potassium, nitrate, ammonia, and zinc are potentially useful in identifying dominant nitrate sources in ground water in sandy recharge areas of the Coastal Plain. Anions in general were not identified in the modeling process as important in discriminating nitrate sources in the study area, although further work and larger sample sizes will be needed to verify this. Specifically, although the classification-tree models may be applied as presented here, they are not unique or the only models possible, and additional ground water samples collected throughout the North Carolina Coastal Plain will be needed to better identify particular nitrate sources and improve the models, particularly for septic and poultry sources. Although this process may lead to more complicated tree models, it could also result in more precise classifications.
Although the simple models presented in this paper may be suitable for shallow aquifers in the North Carolina Coastal Plain and much of the middle Atlantic Coastal Plain, specific applications that may include other sources or contaminants (i.e., gas stations, landfills, etc.) in other areas would require the gathering of data from additional ground water sites with samples to be collected from known sources, such as was done in this study. Classification-tree models are widely available in many statistical computer packages, are relatively easily implemented and interpreted, and appear to classify sources at a level of reliability that can be practically useful.
The nitrate-source identification techniques used here appear to be generally useful in the Coastal Plain of North Carolina and possibly other areas having shallow ground water and low specific conductance, although further research is necessary to address questions about resulting mixtures, influence of oxidationreduction conditions in the aquifer, degradation or sorption of particular chemical indicators along flow paths, and interference with high background concentrations of ions that are used as indicators. As has been noted already,
15N appears to be a reliable indicator under conditions where other chemical indicators would not be as effective. Thus, inclusion of
15N in analyses is almost always advantageous for identification of sources and in establishing model plausibility. Data presented in this paper also demonstrate that routine inclusion of major ions as part of water quality studies that are not specifically directed at understanding the geochemistry can yield information that is highly useful, if not necessary, for meaningful data interpretation.
 |
ACKNOWLEDGMENTS
|
|---|
This project is a cooperative effort between the United States Geological Survey (USGS), the North Carolina Department of Environment and Natural Resources (NCDENR), and the United States Environmental Protection Agency (USEPA). Thanks to the many landowners, farmers, golf course managers, and others in eastern North Carolina who allowed access to their property. Special thanks to the USGS National Water-Quality Assessment Program; Song Qian, The Cadmus Group, Durham, NC; Diana Rashash, North Carolina State University Cooperative Extension, Onslow County; Wendell Gilliam, North Carolina State University; and Ray Milosh, Carl Bailey, Elizabeth Morey, Ted Mew, and Paul Dahlen, NCDENR. Finally, thanks to all of the reviewers of this paper who made many helpful comments and suggestions.
 |
REFERENCES
|
|---|
- Alley, W.M. 1993. Regional ground-water quality. Van Nostrand Rheinhold, New York.
- Aravena, R., M.L. Evans, and J.A. Cherry. 1993. Stable isotopes of oxygen and nitrogen in source identification of nitrate from septic systems. Ground Water 31:180186.
- Breiman, L.J., J.H. Friedman, R.A. Olshen, and C.J. Stone. 1984. Classification and regression trees. Chapman and Hall/CRC, New York.
- Chang, C.C., J. Langstron, M. Riggs, M.H. Campbell, S.R. Silva, and C. Kendall. 1999. A method for nitrate collection for
15N and
18O analysis from waters with low nitrate concentrations. Can. J. Fish. Aquat. Sci. 56:18561864.
- Comly, H.H. 1945. Cyanosis in infants caused by nitrates in well water. JAMA 129:112116.[ISI]
- Conover, W.J. 1980. Practical nonparametric statistics. John Wiley & Sons, New York.
- Davis, J.C. 1985. Statistics and data analysis in geology. John Wiley & Sons, New York.
- Davis, S.N., D.O. Whittemore, and J. Fabryka-Martin. 1998. Uses of chloride/bromide ratios in studies of potable water. Ground Water 36:338350.
- Fishman, M.J. (ed.) 1993. Methods of analysis by the U.S. Geological Survey National Water Quality Laboratorydetermination of inorganic and organic constituents in water and fluvial sediments. Open-File Rep. 93-125. United States Geol. Survey, Reston, VA.
- Fogg, G.E., D.E. Rolston, D.L. Decker, D.T. Louie, and M.E. Grismer. 1998. Spatial variation in nitrogen isotope values beneath nitrate contamination sources. Ground Water 36:418426.
- Gormly, J.R., and R.F. Spalding. 1979. Sources and concentrations of nitrate-nitrogen in ground water of the Central Platte Region, Nebraska. Ground Water 17:291301.
- Hallberg, G.R., and D.R. Keeney. 1993. Nitrate. p. 297322. In W.M. Alley (ed.) Regional ground-water quality. Van Nostrand Rheinhold, New York.
- Hem, J.A. 1985. Study and interpretation of natural water. Water-Supply Paper 2254. United States Geol. Survey, Reston, VA.
- Karr, J.D., W.J. Showers, J.W. Gilliam, and A.S. Andres. 2001. Tracing nitrate transport and environmental impact from intensive swine farming using delta nitrogen-15. J. Environ. Qual. 30:11631175.[Abstract/Free Full Text]
- Kendall, C.A., and J.J. McDonnell. 1998. Isotope tracers in catchment hydrology. Elsevier, Amsterdam.
- Komor, S.C., and H.W. Anderson, Jr. 1993. Nitrogen isotopes as indicators of nitrate sources in Minnesota sand-plain aquifers. Ground Water 31:260270.
- Kreitler, C.W. 1975. Determining the source of nitrate in ground water by nitrogen isotope studies. Rep. of Investigations 83. Bureau of Econ. Geol., Univ. of Texas, Austin.
- Kreitler, C.W., and D.C. Jones. 1975. Natural soil nitrate: The cause of nitrate contamination of ground water in Runnels County, Texas. Ground Water 13:5361.
- Loh, W.-Y., and Y.-S. Shih. 1997. Split selection methods for classification trees. Stat. Sinica 7:825840.
- Loh, W.-Y., and N. Vanichsetakul. 1988. Tree-structured classification via generalized discriminant analysis (with discussion). J. Am. Stat. Assoc. 83:715728.[ISI]
- Madison, R.J., and J.O. Brunett. 1985. Overview of the occurrence of nitrate in ground water of the United States. In National Water Summary 1984Hydrologic events, selected water-quality trends, and ground-water resources. Water-Supply Paper 2275. United States Geol. Survey, Reston, VA.
- Mallin, M.A. 2000. Impacts of animal production on rivers and estuaries. Am. Sci. 88(1):2637.
- Morgan, J.N., and R.C. Messenger. 1973. A sequential analysis program for the analysis of nominal scale dependent variables. Survey Res. Center, Inst. for Social Res., Univ. of Michigan, Ann Arbor.
- National Research Council. 1998. Nutrient requirements of swine. Natl. Academy Press, Washington, DC.
- Nolan, B.T., B.C. Ruddy, K.J. Hitt, and D.R. Helsel. 1997. Risk of nitrate in groundwaters of the United StatesA national perspective. Environ. Sci. Technol. 31:22292236.
- Qian, S.S., and C.W. Anderson. 1999. Exploring factors controlling the variability of pesticide concentrations in the Willamette River Basin using tree-based models. Environ. Sci. Technol. 33:33323340.
- Piper, A.M. 1944. A graphic procedure in the geochemical interpretation of water-analyses. Am. Geophys. Union Trans. 25:914923.
- Puckett, L.J., T.K. Cowdery, D.L. Lorenz, and J.D. Stoner. 1999. Estimation of nitrate contamination of an agro-ecosystem outwash aquifer using a nitrogen mass-balance approach. J. Environ. Qual. 28:20152025.[Abstract/Free Full Text]
- Robertson, D.M., D.A. Saad, and A.M. Wieben. 2001. An alternative regionalization scheme for defining nutrient criteria for rivers and streams. Water-Resour. Inventory Rep. 01-4073. United States Geol. Survey, Reston, VA.
- Showers, W.J., D.M. Eisenstein, H. Paerl, and J. Rudek. 1990. Stable isotope tracers of nitrogen sources to the Neuse River, North Carolina. Rep. 253. Water Resour. Res. Inst. of the Univ. of North Carolina, Chapel Hill.
- Silva, S.R., C. Kendal, D.H. Wilkinson, C.C. Chang, and R.J. Avanzino. 2000. A new method for collection of nitrate from fresh water and analysis for its nitrogen and oxygen isotopic ratios. J. Hydrol. (Amsterdam) 28:2236.
- Smil, V. 1997. Global population and the nitrogen cycle. Sci. Am. 277:7681.
- Spalding, R.F., and M.E. Exner. 1993. Occurrence of nitrate in groundwatera review. J. Environ. Qual. 22:392402.[Abstract/Free Full Text]
- Spruill, T.B., J.L. Eimers, and A.E. Morey. 1997. Nitrate-nitrogen concentrations in shallow ground water of the Coastal Plain of the AlbemarlePamlico Drainage Study Unit, North Carolina and Virginia. Factsheet 241-96. United States Geol. Survey, Reston, VA.
- Spruill, T.B., D.A. Harned, P.A. Ruhl, J.L. Eimers, D.R. Galeone, G. McMahon, K.E. Smith, and M.D. Woodside. 1998. Water quality in the AlbemarlePamlico Drainage Basin, North Carolina and Virginia, 199295. Circ. 1157. United States Geol. Survey, Reston, VA.
- StatSoft. 2001. Electronic statistics textbook. Available online at http://statsoftinc.com/textbook/stathome.html (verified 16 May 2001). StatSoft, Tulsa, OK.
- Steinhorst, R.K., and R.E. Williams. 1985. Discrimination of groundwater sources using cluster analysis, MANOVA, canonical analysis and discriminant analysis. Water Resour. Res. 21:11491156.
- Stiff, H.A., Jr. 1951. The interpretation of chemical water analysis by means of patterns. J. Petrol. Technol. 3:1516.
- Therneau, T.M., and E.J. Atkinson. 1997. An introduction to recursive partitioning using the RPART routines. Technical Report. Mayo Foundation, Rochester, MN.
- United States Census Bureau. 2001. County population estimates for July 1, 1999, and population change from April 1, 1990 to July 1, 1999. Available online at http://www.census.gov/population/estimates/county/co-99-2/99C2_37.txt (verified 16 May 2002). United States Census Bureau, Washington, DC.
- United States Department of Health, Education, and Welfare. 1962. Drinking water standards. Revised. Public Health Serv. Publ. 956. United States Department of Health, Education, and Welfare, Washington, DC.
- USEPA. 2001. National primary drinking water standards. EPA 816-F-01-007. USEPA, Washington, DC.
- Vitousek, P.M., J.D. Aber, R.W. Howarth, G.E. Likens, P.A. Matson, D.W. Schindler, W.H. Schlesinger, and D.G. Tilman. 1997. Human alteration of the global nitrogen cycle: Sources and consequences. Ecol. Applic. 7:737750.
- Wade, H., C. Bailey, J. Padmore, K. Rudo, B. Williams, and A. York. 1997. The interagency pesticide study of the impact of pesticide use on ground water in North Carolina. North Carolina Dep. of Agric., Raleigh.
- Whittemore, D.O., and L.M. Pollock. 1979. Determination of salinity sources in water resources of Kansas by minor alkali and halide chemistry. Kansas Water Resour. Res. Inst., Lawrence.
- Wilhelm, S.R., S.L. Schiff, and W.D. Robertson. 1994. Chemical fate and transport in a domestic septic system: Unsaturated and saturated zone geochemistry. Environ. Toxicol. Chem. 13:193203.
- Wilkinson, L. 2000. Classification and regression trees in SYSTAT10. Vol. I. SPSS, Chicago.
- Winter, T.C., J.W. LaBaugh, and D.O. Rosenberry. 1988. The design use of a hydraulic potenitameter for direct measurement of differences in hydraulic head between ground water and surface water. Limnol. Oceanogr. 33:12091214.
- Zublena, J.P., J.V. Baird, and J.P. Lilly. 1991. Soilfacts. Nutrient content of fertilizer and organic materials. Publ. AG-439-18. North Carolina Coop. Ext. Serv., Raleigh.
- Zublena, J.P., J.C. Barker, and T.A. Carter. 1993a. Soilfacts. Poultry manure as a fertilizer source. Publ. AG-439-5. North Carolina Coop. Ext. Serv., Raleigh.
- Zublena, J.P., J.C. Barker, J.W. Parker, and C.M. Stanislaw. 1993b. Soilfacts. Swine manure as a fertilizer source. Publ. AG-439-4. North Carolina Coop. Ext. Serv., Raleigh.
This article has been cited by other articles:

|
 |

|
 |
 
A. Bedard-Haughn, K. W. Tate, and C. van Kessel
Using Nitrogen-15 to Quantify Vegetative Buffer Effectiveness for Sequestering Nitrogen in Runoff
J. Environ. Qual.,
November 1, 2004;
33(6):
2252 - 2262.
[Abstract]
[Full Text]
[PDF]
|
 |
|