This data set contains daily, quarterly, and annual measures of 8 selected analytes (arsenic, nitrate, uranium, radium, TCE, PCE, atrazine and DEHP) in public water supplies. Data are derived from California Office of Drinking Water (ODW) Water Quality Monitoring Database (WQMD, also know as Water Quality Inventory or WQI) and Permits, Inspection, Compliance, Monitoring, and Enforcement (PICME) database. Daily measures include analytical results and detection limits for each sample, sampling station, and Community Water System (CWS). Annual summary data are comprised of average and maximum concentrations for each analyte and CWS. Quarterly summary data are comprised of average concentrations for each analyte and CWS. Quarterly and annual summary data also include frequency of sampling, the number of sampling stations, the number of non-detects, and the date of the last sample.
This data set contributes to the Environmental Public Health Tracking Network. The EPHT cooperative agreement states that all grantees must track and make available core environmental health tracking measures on the State and National EPHT Network, including data/information on key water contaminants, as defined through the Content workgroup process. The Content Workgroup Water Team identified initial contaminants of concern for the national EPHT program, identified nationally consistent data sources, and developed nationally consistent indicators and measures. This data set can be used to roughly estimate the annual and quarterly levels of the 8 selected analytes for an entire CWS. Using the Public Water System ID Number, it can be joined to the inventory dataset.
For detailed data quality information (Process Descriptions, Logical Consistency Report, and Completeness Report), please visit http://www.ehib.org/page.jsp?page_key=150
Summary measures of water contaminants provided in this dataset and that are attributed to an entire water system are not necessarily indicative of the quality of water served to the customer population of that water system. Water quality can change after sampling and there can exist great spatial heterogeneities within a single water system's distribution area. To complicate matters, individual- and population-level behaviors with respect to drinking water consumption can influence how much contact there is with drinking water contaminants. Therefore these data should not be used in linking to health data without great care and attention to potential confounders.
850 Marina Bay Parkway, Bldg P, 3rd Floor
The EPHTN Drinking Water Content Workgroup concluded that analytical results should be gathered from active, community water systems at USEPA-defined compliance sampling points from years 1999 to the most current year of available data. WQMD and PICME are coded logically to successfully ascertain active CWS, however, it is unclear whether the sampling points and subsequent analytical results meet the USEPA definition for "compliance" or whether the databases are structured to successfully perform this function with consistent results for every water system captured in the database. The compliance sampling point for the 8 tracked analytes is at the entry point to the distribution system. Since there is no WQMD/PICME coding scheme for determining systematically which sample points are entry points in PICME, the following logic was used: all sample sites, locally or from exporting systems one level up, having analyte-specific sampling results during the period 1999-[most currently available year] at stations coded as Active Treated (AT), Active Untreated (AU, implying no further treatment), Inactive Treated (IT), Inactive Untreated (IU, implying no further treatment), Purchased Treated (PT), Purchased Untreated (PU), Combination Treated (CT), and Combination Untreated (CU) were considered compliance sample points and included in the sample results table. Active Raw (AR, implying downstream treatment but not necessarily for the 8 analytes included in this dataset), Inactive Raw (IR, implying downstream treatment but not necessarily for the 8 analytes included in this dataset), Purchased Raw (PR), and Combination Raw (CR) sample sites were included if the downstream after treatment sample site witnessed no analyte-specific sampling during the reporting period. Using sample sites before treatment in this manner was recommended by ODW staff as it was highly unlikely -- due to no downstream sampling -- that the downstream treatment site was also treating for analyte in question. Using inactive sampling sites in this manner was recommended by ODW staff, since it is highly likely that samples were taken only when the associated source was actively providing flow to consumers. For non-detects, 1/2 the detection limit was used for mean measures, except in cases when the maximum system-wide detected finding during the reporting period was less than half the detection limit. In these cases, 1/2 the maximum system-wide detection during the reporting period was used.
This dataset is limited by legal monitoring practices and scheduling as defined by the California Water Code and the US Safe Drinking Water Act. The most notable aspect of incompleteness relative to environmental health tracking is that any summary measure of water contamination that is attributed to an entire water system is not necessarily indicative of the quality of water served to the customer population of that water system. Water quality can change after sampling and there is often great heterogeneity within a single water system's distribution area. Sampling results related to Disinfection By-Products within distribution systems according to the Surface Water Treatment Rule and as a required indicator in this dataset are missing, because regional districts and labs for the most part are not currently reporting distribution system sampling information to PICME and WQMD. In previous drinking water submissions, DBP data was included for those systems where data was available, because only statewide indicators were provided. DBP data are not included in this submission, because of increased spatial and temporal resolution in the measures. As decided by the Content Workgroup Drinking Water Team, maximum summary measures are only calculated on an annual basis; Maximums are not calculated on a quarterly basis. Moreover, to capture seasonal variations that have been documented in nitrate and atrazine occurrence, the Drinking Water Team agreed to calculate quarterly means only for nitrate and atrazine.
CEHTP received comma-separated (CSV) PICME database from ODW (dated 10/18/2011) and imported SOUPATH (N=17567), SOURCE (N=47196 minus 10 non-CWS sources with referential integrity errors), and SYSNUM1 (N=14241) tables into SQL Server 2005.
CEHTP received 4 WQMD tables (all dated 2/28/2012) CHEMICAL (N=1592333), CHEMARCH (N=7429400), CHEMXARC (N=8642976), CHEMHIST (N=6962426) in dbf format from ODW. Imported all 4 tables into SQL Server, appended into single table named EPHTN_FINDINGS (N=24624308, after deleting 2827 records that had no store_num entered), normalized on unique samples, established unique and search-optimized indexes, and named new samples table EPHTN_SAMPLES (N=1601704). A unique sample was based on sampling station, sample date/time, analyte, and lab.
Developed SQL stored procedure for generically extracting sampling results for arsenic (store_num='01002'), atrazine (store_num='39033'), DEHP (store_num='39100'), radium (store_num='11503'), PCE (store_num='34475'), and TCE (store_num='39180'). Logic holds that CHEMICAL.X_MOD<>F (no 'F'alse positive samples), CHEMICAL.X_MOD<>I (no 'I'nvalid samples), CHEMICAL.X_MOD<>Q (No 'Q'uestionable samples), CHEMICAL.STORE_NUM=????? (STORET number for analyte) and PICME_SOURCE.ENTITY_INFO in (AR,AT,AU,IR,IT,IU,CR,CT,CU,PR,PT,PU) (include active, inactive, combination, and purchased sources before treatment, after treatment, and untreated) according to logic rules described above. For nitrate and uranium, the procedure used analyte-specific logic to extract, convert, and prioritize 6 possible analytes into a single sample-specific result for nitrate and uranium. The 6 possible analytes were: 00618--Nitrate Nitrogen (ug/L), 71850--Nitrate ion (mg/L), A-029--Nitrate+Nitrite Nitrogen (ug/L), 28012--Uranium mass (ug/L), 28011--Uranium activity (pCi/L), and 01501--Gross Alpha activity (pCi/L). For unique nitrate/uranium samples across these 6 analytes having 2 or 3 analytical results (in nitrate or uranium), detections were prioritized over non-detects. If 2 or 3 analytical results were still available per sample (in nitrate or uranium), then the following sequential priority was used for selecting nitrate: 1. 00618, 2. 71850, 3. A-029; and uranium: 1. 28012, 2. 28011, 3. 01501. The following conversion factors were used for converting the database units to the units expected in measure, nitrate as nitrogen (mg/L): 1. /1000, 2. /4.43, 3. [A-029]/1000; uranium mass (ug/L): 1. /1, 2. /0.67, 3. /0.67. This procedure also included a methodology for iteratively removing analyte-specific outliers. For each analyte, the top 1% of detected concentrations were sorted and compared to the corresponding PWS-specific mean, frequency of detections, and standard deviation. By visually inspecting, high concentrations with low frequencies of detections and low SDs were further investigated for violations on EPA's SDWIS site.
Developed SQL statement for de-duplicating (ie. averaging) analyte-specific results taken on the same day (different time) and/or by different labs.
Developed SQL statement for estimating non-detects with no detection limit provided by borrowing detection limits from other samples during the same year. For each sample lacking a detection limit, detection limits were substitued using the following priority/methodology: 1. median detection limit from non-detect samples analyzed by same lab in the same year, 2. median detection limit from non-detect samples analyzed for same PWS in the same year, 3. median detection limit from non-detect samples in the same year
Developed SQL statement to compute analyte-specific means and maximums by CWS and year.
Developed SQL statement to compute analyte-specific means by quarter for atrazine and nitrate.
This data set contains daily, annual, and quarterly measures of concentrations for 8 selected analytes by Community Water Systems (CWS)
Data dictionary is available from the National Tracking Program at http://www.cdc.gov/nceh/tracking.
850 Marina Bay Parkway, Bldg P, 3rd Floor
850 Marina Bay Parkway, Bldg P, 3rd Floor