California Drinking Water Quality Sampling Results Dataset, 1999-2012

Metadata:


Identification_Information:
Citation:
Citation_Information:
Originator: CEHTP Science Team
Publication_Date: 20130331
Title:
California Drinking Water Quality Sampling Results Dataset, 1999-2012
Online_Linkage: http://cehtp.org/data/water/ncdms_1999_2012.zip
Description:
Abstract:
This data set contains daily, quarterly, and annual measures of 8 selected analytes (arsenic, nitrate, uranium, radium, TCE, PCE, atrazine and DEHP) in public water supplies. Data are derived from California Office of Drinking Water (ODW) Water Quality Monitoring Database (WQMD, also know as Water Quality Inventory or WQI) and Permits, Inspection, Compliance, Monitoring, and Enforcement (PICME) database. Daily measures include analytical results and detection limits for each sample, sampling station, and Community Water System (CWS).  Annual summary data are comprised of average and maximum concentrations for each analyte and CWS.  Quarterly summary data are comprised of average concentrations for each analyte and CWS.  Quarterly and annual summary data also include frequency of sampling, the number of sampling stations, the number of non-detects, and the date of the last sample.
Purpose:
This data set contributes to the Environmental Public Health Tracking Network. The EPHT cooperative agreement states that all grantees must track and make available core environmental health tracking measures on the State and National EPHT Network, including data/information on key water contaminants, as defined through the Content workgroup process. The Content Workgroup Water Team identified initial contaminants of concern for the national EPHT program, identified nationally consistent data sources, and developed nationally consistent indicators and measures. This data set can be used to roughly estimate the annual and quarterly levels of the 8 selected analytes for an entire CWS. Using the Public Water System ID Number and Year Associated To fields, it can be joined to the Inventory dataset.
Supplemental_Information:

Time_Period_of_Content:
Time_Period_Information:
Range_of_Dates/Times:
Beginning_Date: 19990101
Beginning_Time:
Ending_Date: 20121231
Ending_Time:
Currentness_Reference:
Publication Date
Status:
Progress: Complete
Maintenance_and_Update_Frequency: Once per year
Spatial_Domain:
Bounding_Coordinates:
West_Bounding_Coordinate: -124.409721
East_Bounding_Coordinate: -114.131208
North_Bounding_Coordinate: 42.009521999999997
South_Bounding_Coordinate: 32.53416
Keywords:
Theme:
Theme_Keyword_Thesaurus: NONE
Theme_Keyword: Arsenic; 1005; Nitrate; 1040; Total trihalomethanes; 2950; Haloacetic acids (five); 2456; Atrazine; 2050; Di(2-ethylhexyl) phthalate; 2039; Radium; 4010; Tetrachloroethylene; 2987; Trichloroethylene; 2984; Uranium; 4006; Disinfection by products; DBP; Environmental hazard; Environment; Water quality; Public water system; PWS; Community water systems; CWS; ground water; State drinking water data set; National drinking water data set; Safe drinking water act; SDWA; safe drinking water information system; SDWIS;MCL; MCL violations; Maximum Contaminant level
Place:
Place_Keyword_Thesaurus:
Place_Keyword: California,CA,06
Access_Constraints: none
Use_Constraints:
Summary measures of water contaminants provided in this dataset and that are attributed to an entire water system are not necessarily indicative of the quality of water served to the customer population of that water system. Water quality can change after sampling and there can exist great spatial heterogeneities within a single water system's distribution area. To complicate matters, individual- and population-level behaviors with respect to drinking water consumption can influence how much contact there is with drinking water contaminants. Therefore these data should not be used in linking to health data without great care and attention to potential confounders.
Point_of_Contact:
Contact_Information:
Contact_Person_Primary:
Contact_Person: CEHTP Science Team
Contact_Organization: CA Department of Public Health, CA Environmental Health Tracking Program
Contact_Position:
Contact_Address:
Address_Type: Mailing
Address:
850 Marina Bay Parkway, Bldg P, 3rd Floor
City: Richmond
State_or_Province: CA
Postal_Code: 94804
Country: United States Of America
Contact_Voice_Telephone: 5106203620
Contact_TDD/TTY_Telephone:
Contact_Facsimile_Telephone: 5106203720
Contact_Electronic_Mail_Address: data@cehtp.org
Hours_of_Service:
Contact Instructions:

Security_Information:
Security_Classification_System: none
Security_Classification: Unclassified
Security_Handling_Description: none
Native_Data_Set_Environment:
Relational Database Management System: SQL Server 2008
Back to Top
Data_Quality_Information:
Logical_Consistency_Report:
The EPHTN Drinking Water Content Workgroup concluded that analytical results should be gathered from active, community water systems at USEPA-defined compliance sampling points from years 1999 to the most current year of available data. WQMD and PICME are coded logically to successfully ascertain active CWS, however, it is unclear whether the sampling points and subsequent analytical results meet the USEPA definition for "compliance" or whether the databases are structured to successfully perform this function with consistent results for every water system captured in the database. The compliance sampling point for the 8 tracked analytes is at the entry point to the distribution system. Since there is no WQMD/PICME coding scheme for determining systematically which sample points are entry points in PICME, the following logic was used: all sample sites, locally or from exporting systems one level up, having analyte-specific sampling results during the period 1999-[most currently available year] at stations coded as Active Treated (AT), Active Untreated (AU, implying no further treatment), Inactive Treated (IT), Inactive Untreated (IU, implying no further treatment), Purchased Treated (PT), Purchased Untreated (PU), Combination Treated (CT), and Combination Untreated (CU) were considered compliance sample points and included in the sample results table. Active Raw (AR, implying downstream treatment but not necessarily for the 8 analytes included in this dataset), Inactive Raw (IR, implying downstream treatment but not necessarily for the 8 analytes included in this dataset), Purchased Raw (PR), and Combination Raw (CR) sample sites were included if the downstream after treatment sample site witnessed no analyte-specific sampling during the reporting period. Using sample sites before treatment in this manner was recommended by ODW staff as it was highly unlikely -- due to no downstream sampling -- that the downstream treatment site was also treating for analyte in question. Using inactive sampling sites in this manner was recommended by ODW staff, since it is highly likely that samples were taken only when the associated source was actively providing flow to consumers. 

For non-detects, 1/2 the detection limit was used for mean measures, except in cases when the maximum system-wide detected finding during the reporting period was less than half the detection limit. In these cases, 1/2 the maximum system-wide detection during the reporting period was used.
For this year's data compilation (2013), we used the SDWIS Inventory Facilities dataset deactivation date element to incorporate sampling results from sampling stations that are currently inactive but were previously active during the reporting period.  If sampling results from a currently inactive sample station occurred before the deactivation date, then we utilized the sampling results..

Completeness_Report:
This dataset is limited by legal monitoring practices and scheduling as defined by the California Water Code and the US Safe Drinking Water Act. The most notable aspect of incompleteness relative to environmental health tracking is that any summary measure of water contamination that is attributed to an entire water system is not necessarily indicative of the quality of water served to the customer population of that water system. Water quality can change after sampling and there is often great heterogeneity within a single water system's distribution area. Sampling results related to Disinfection By-Products within distribution systems according to the Surface Water Treatment Rule and as a required indicator in this dataset are missing, because regional districts and labs for the most part are not currently reporting distribution system sampling information to PICME and WQMD. In previous drinking water submissions, DBP data was included for those systems where data was available, because only statewide indicators were provided. DBP data are not included in this submission, because of increased spatial and temporal resolution in the measures.

As decided by the Content Workgroup Drinking Water Team, maximum summary measures are only calculated on an annual basis; Maximums are not calculated on a quarterly basis.  Moreover, to capture seasonal variations that have been documented in nitrate and atrazine occurrence, the Drinking Water Team agreed to calculate quarterly means only for nitrate and atrazine.

Lineage:
Process_Step:
Process_Description:
CEHTP received comma-separated (CSV) PICME database from ODW (dated 9/2/2012) and imported SOUPATH (N=17565), SOURCE (N=50560, minus 8 non-CWS sources with referential integrity errors --> N=50582), and SYSNUM1 (N=14398) tables into SQL Server 2008.
Process_Date: 20130315
Process_Step:
Process_Description:
CEHTP received SDWIS XML Inventory dataset from ODW (dated 2/22/2013) and imported Facilities (N=49485) table into SQL Server 2008.
Process_Date: 20130315
Process_Step:
Process_Description:
CEHTP downloaded 4 WQMD tables (all dated 3/15/2013) CHEMICAL (N=3309604), CHEMARCH (N=7435614), CHEMXARC (N=8637112), CHEMHIST (N=6970088) in dbf format from ODW website at http://www.cdph.ca.gov/certlic/drinkingwater/Pages/EDTlibrary.aspx. Imported all 4 tables into SQL Server 2008, appended into single table named EPHTN_FINDINGS (N=26352418-127 with null store_num=26352291), normalized on unique samples, established unique and search-optimized indexes, and named new samples table EPHTN_SAMPLES (N=1748969). A unique sample was based on sampling station, sample date/time, analyte, and lab.
Process_Date: 20130315
Process_Step:
Process_Description:
Developed SQL stored procedure for generically extracting sampling results for arsenic (store_num='01002'), atrazine  (store_num='39033'), DEHP  (store_num='39100'), radium  (store_num='11503'), PCE  (store_num='34475'), and TCE  (store_num='39180').  Logic holds that CHEMICAL.X_MOD<>F (no 'F'alse positive samples), CHEMICAL.X_MOD<>I (no 'I'nvalid samples), CHEMICAL.X_MOD<>Q (No 'Q'uestionable samples), CHEMICAL.STORE_NUM=????? (STORET number for analyte) and PICME_SOURCE.ENTITY_INFO in (AR,AT,AU,IR,IT,IU,CR,CT,CU,PR,PT,PU) (include active, inactive, combination, and purchased sources before treatment, after treatment, and untreated) according to logic rules described above.

For nitrate and uranium, the procedure used analyte-specific logic to extract, convert, and prioritize 6 possible analytes into a single sample-specific result for nitrate and uranium.  The 6 possible analytes were: 00618--Nitrate Nitrogen (ug/L), 71850--Nitrate ion (mg/L), A-029--Nitrate+Nitrite Nitrogen (ug/L), 28012--Uranium mass (ug/L), 28011--Uranium activity (pCi/L), and 01501--Gross Alpha activity (pCi/L). For unique nitrate/uranium samples across these 6 analytes having 2 or 3 analytical results (in nitrate or uranium), detections were prioritized over non-detects.  If 2 or 3 analytical results were still available per sample (in nitrate or uranium), then the following sequential priority was used for selecting nitrate: 1. 71850, 2. 00618, 3. A-029; and uranium: 1. 28012, 2. 28011, 3. 01501. The following conversion factors were used for converting the database units to the units expected in measure, nitrate as nitrogen (mg/L): 1. [71850]/4.43, 2. [00618]/1000, 3. [A-029]/1000; uranium mass (ug/L): 1. [28012]/1, 2. [28011]/0.67, 3. [01501]/0.67.

This procedure also included a methodology for iteratively removing analyte-specific outliers. For each analyte, the top 1% of detected concentrations were sorted and compared to the corresponding PWS-specific mean, frequency of detections, and standard deviation.  By visually inspecting, high concentrations with low frequencies of detections and low SDs were further investigated for violations in PICME and EPA's SDWIS site.
Process_Date: 20130315
Process_Step:
Process_Description:
Developed SQL statement for de-duplicating (ie. averaging) analyte-specific results taken on the same day (different time) and/or by different labs.
Process_Date: 20130315
Process_Step:
Process_Description:
Developed SQL statement for estimating non-detects with no detection limit provided by borrowing detection limits from other samples during the same year.  For each sample lacking a detection limit, detection limits were substitued using the following priority/methodology: 1. median detection limit from non-detect samples analyzed by same lab in the same year, 2. median detection limit from non-detect samples analyzed for same PWS in the same year, 3. median detection limit from non-detect samples in the same year
Process_Date: 20130315
Process_Step:
Process_Description:
Developed SQL statement to compute analyte-specific means and maximums by CWS and year.
Process_Date: 20130315
Process_Step:
Process_Description:
Developed SQL statement to compute analyte-specific means by quarter for atrazine and nitrate.
Process_Date: 20130315
Back to Top
Entity_and_Attribute_Information:
Overview_Description:
Entity_and_Attribute_Overview:
This data set contains daily, annual, and quarterly measures of concentrations for 8 selected analytes by Community Water Systems (CWS)
Entity_and_Attribute_Detail_Citation:
Data dictionary is available from the National Tracking Program at http://www.cdc.gov/nceh/tracking.
Back to Top
Distribution_Information:
Distributor:
Contact_Information:
Contact_Person_Primary:
Contact_Person: CEHTP Science Team
Contact_Organization: CA Department of Public Health, CA Environmental Health Tracking Program
Contact_Position:
Contact_Address:
Address_Type: Mailing
Address:
850 Marina Bay Parkway, Bldg P, 3rd Floor
City: Richmond
State_or_Province: CA
Postal_Code: 94804
Country: United States Of America
Contact_Voice_Telephone: 5106203620
Contact_TDD/TTY_Telephone:
Contact_Facsimile_Telephone: 5106203720
Contact_Electronic_Mail_Address: data@cehtp.org
Hours_of_Service:
Contact Instructions:

Resource_Description:
Distribution_Liability:

Custom_Order_Process:

Back to Top
Metadata_Reference_Information:
Metadata_Date: 20120315
Metadata_Contact:
Contact_Information:
Contact_Person_Primary:
Contact_Person: CEHTP Science Team
Contact_Organization: CA Department of Public Health, CA Environmental Health Tracking Program
Contact_Position:
Contact_Address:
Address_Type: Mailing
Address:
850 Marina Bay Parkway, Bldg P, 3rd Floor
City: Richmond
State_or_Province: CA
Postal_Code: 94804
Country: United States Of America
Contact_Voice_Telephone: 5106203620
Contact_TDD/TTY_Telephone:
Contact_Facsimile_Telephone: 5106203720
Contact_Electronic_Mail_Address: data@cehtp.org
Hours_of_Service:
Contact Instructions:

Metadata_Standard_Name: EPHTN Tracking Network Profile Version 1.2
Metadata_Access_Constraints: none
Metadata_Use_Constraints:
none
Back to Top