California Community Water Systems Inventory Dataset, 1999-2012

Metadata:


Identification_Information:
Citation:
Citation_Information:
Originator: CEHTP Science Team
Publication_Date: 20130331
Title:
California Community Water Systems Inventory Dataset, 1999-2012
Online_Linkage: http://cehtp.org/data/water/ncdms_1999_2012.zip
Description:
Abstract:
This data set contains information about all Community Water Systems in California. Data are derived from California Office of Drinking Water (ODW) Water Quality Monitoring Database (WQMD; also know as Water Quality Inventory or WQI), Safe Drinking Water System (SDWIS), and Permits, Inspection, Compliance, Monitoring, and Enforcement (PICME) database. The data set contains one record for each year and Community Water System (CWS) that was active for all or part of the reporting period. It includes additional detail about how many retail connections by each CWS, how many people are served by each CWS, and the approximate location of each CWS.
Purpose:
This data set contributes to the Environmental Public Health Tracking Network. The EPHT cooperative agreement states that all grantees must track and make available core environmental health tracking measures on the State and National EPHT Network, including data/information on key drinking water contaminants for regulated public water supplies, as defined through the Content workgroup process. The Content Workgroup Water Team identified contaminants of concern for the national EPHT program, identified nationally consistent data sources, and developed nationally consistent indicators and measures. This data set can be used to enumerate all Community Water Systems in California. Using the Public Water System ID Number, it can be joined with the sampling results dataset.
Time_Period_of_Content:
Time_Period_Information:
Range_of_Dates/Times:
Beginning_Date: 19990101
Beginning_Time:
Ending_Date: 20121231
Ending_Time:
Currentness_Reference:
Publication Date
Status:
Progress: Complete
Maintenance_and_Update_Frequency: Once per year
Spatial_Domain:
Bounding_Coordinates:
West_Bounding_Coordinate: -124.409721
East_Bounding_Coordinate: -114.131208
North_Bounding_Coordinate: 42.009521999999997
South_Bounding_Coordinate: 32.53416
Keywords:
Theme:
Theme_Keyword_Thesaurus: NONE
Theme_Keyword: hazard
Place:
Place_Keyword_Thesaurus:
Place_Keyword: California,CA,06
Access_Constraints: none
Use_Constraints:
The geographic locations provided in this dataset were not verified for accuracy by State Primacy Agency officers or water system staff. The locations are provided and intended for diagrammatic and visualization purposes, such that the approximate location of the CWS service area can be described on a small scale map. These locations are not designed for large scale analyses and should not be used in linking to health data.
Point_of_Contact:
Contact_Information:
Contact_Person_Primary:
Contact_Person: CEHTP Science Team
Contact_Organization: CA Department of Public Health, CA Environmental Health Tracking Program
Contact_Position:
Contact_Address:
Address_Type: Mailing
Address:
850 Marina Bay Parkway, Bldg P, 3rd Floor
City: Richmond
State_or_Province: CA
Postal_Code: 94804
Country: United States Of America
Contact_Voice_Telephone: 5106203620
Contact_TDD/TTY_Telephone:
Contact_Facsimile_Telephone: 5106203720
Contact_Electronic_Mail_Address: data@cehtp.org
Hours_of_Service:
Contact Instructions:

Security_Information:
Security_Classification_System: none
Security_Classification: Unclassified
Security_Handling_Description: none
Native_Data_Set_Environment:
Relational Database Management System: SQL Server 2008 Filename: NCDM_inventory.xml
Back to Top
Data_Quality_Information:
Logical_Consistency_Report:
The EPHTN Drinking Water Content Workgroup concluded that an inventory of active, community water systems years 1999 to most current must be compiled.  WQMD, SDWIS, and PICME are coded logically to reasonably ascertain active CWS for the reporting period, however, a small minority of active CWS only export drinking water to other CWS with a non-zero retail population. For this reason, the legal definition (i.e. serving 15 retail connections or 25 people, year-round) of a CWS was also used in populating the CWS inventory. 3363 total CWS were enumerated by this logic for the reporting period.

The SDWIS inventory deactivation date of each system that is currently inactive allows for the inclusion of water systems that were formerly active during earlier years of the reporting period. In previous years' inventory datasets, only the inventory of the most recent year was extracted. Listed below are the annual frequencies of active systems included in this dataset.

Year -- Num Systems
1999 -- 3,059
2000 -- 3,080
2001 -- 3,071
2002 -- 3,063
2003 -- 3,055
2004 -- 3,044
2005 -- 3,039
2006 -- 3,035
2007 -- 3,017
2008 -- 3,030
2009 -- 3,036
2010 -- 3,016
2011 -- 3,001
2012 -- 2,987

CWS-specific population-served estimates are often inaccurate, because individual water systems are given 3 options for performing this calculation:  1. using the US Census and/or CA Department of Finance, 2, multiplying the number of service connections by 3.3, and/or 3. estimating the population served for single connections that provide service to multiple dwelling establishments, such as mobile home parks, apartments, prisons, and other institutional facilities with permanent residents.  As of September 2, 2012, according to PICME and reported through this dataset, the total 2012 active CWS population served was 41,012,626, whereas the US Census reports the CA population in 2012 as 38,041,430. Moreover, estimates of public drinking water use by USGS and from the 1990 Census found that only 85-90% of the CA population is served by CWS. Statewide, this means that the population served estimate as reported by this dataset may be 20-25% higher than the true value.

Latitude and longitude coordinates to describe a representative location for CWS were found using 5 methods in the following priority: 1. centroid of service area polygon (LocationDerivationCode=SA), 2. centroid of facility coordinate locations expected to be near the service area (LocationDerivationCode=MFL), 3. centroid of principal city served coordinates and geocoded system headquarters address point (LocationDerivationCode=0), 4. principal city served coordinates (LocationDerivationCode=PCS), 5.  geocoded system headquarters address point (LocationDerivationCode=GSH).  The logic used for subsetting sampling station locations that are expected to be near retail populations was as follows: any active or combination groundwater sampling location (raw, treated, or untreated) was considered to be near the retail population served. Typically, groundwater systems lie in or very near the consuming population. For active, mixed and surface water sampling locations, only treatment plant sampling stations and distribution system sampling locations were used.  Annual frequencies for distribution of how locations were found are listed below:

1999 -- SA=1,312; MFL=1,556; O=130; PCS=48; GSH=8; -999=5
2000 -- SA=1,319; MFL=1,565; O=136; PCS=48; GSH=8; -999=4
2001 -- SA=1,329; MFL=1,554; O=123; PCS=51; GSH=10; -999=4
2002 -- SA=1,348; MFL=1,544; O=116; PCS=47; GSH=5; -999=3
2003 -- SA=1,350; MFL=1,530; O=116; PCS=50; GSH=6; -999=3
2004 -- SA=1,357; MFL=1,510; O=117; PCS=50; GSH=7; -999=3
2005 -- SA=1,366; MFL=1,493; O=119; PCS=51; GSH=7; -999=3
2006 -- SA=1,371; MFL=1,482; O=119; PCS=53; GSH=7; -999=3
2007 -- SA=1,377; MFL=1,468; O=116; PCS=46; GSH=7; -999=3
2008 -- SA=1,388; MFL=1,458; O=125; PCS=48; GSH=6; -999=5
2009 -- SA=1,398; MFL=1,445; O=134; PCS=49; GSH=6; -999=4
2010 -- SA=1,398; MFL=1,430; O=132; PCS=46; GSH=6; -999=4
2011 -- SA=1,394; MFL=1,416; O=134; PCS=47; GSH=6; -999=4
2012 -- SA=1,387; MFL=1,409; O=134; PCS=47; GSH=6; -999=4
Completeness_Report:
By definition, this dataset does not include unregulated drinking water providers or systems not defined as Community Water Systems according the Federal Safe Drinking Water Act, namely private drinking water wells or very small systems ("State Small") or mutuals in which there are less than 15 retail connections and less than 25 year-round residents.

Latitude and longitude values could not be found for 7 CWS over the entire reporting period; This CWS was coded as missing -999.
Lineage:
Process_Step:
Process_Description:
CEHTP received comma-separated (CSV) PICME database from ODW (dated 9/2/2012) and imported SYSNUM1 (N=14398) table into SQL Server 2008. Principal county served was inferred from the first 2 digits of the PWSID.
Process_Date: 20130315
Process_Step:
Process_Description:
CEHTP received SDWIS XML Inventory dataset from ODW (dated 2/22/2013) and imported WaterSystems (N=11846) table into SQL Server 2008. The PICME SYSNUM1.UPDT (deactivation date) was updated to reflect the deactivation dates specified in the SDWIS WaterSystems table.
Process_Date: 20130315
Process_Step:
Process_Description:
CEHTP downloaded 4 WQMD tables (all dated 3/15/2013) CHEMICAL (N=3309604), CHEMARCH (N=7435614), CHEMXARC (N=8637112), CHEMHIST (N=6970088) in dbf format from ODW website at http://www.cdph.ca.gov/certlic/drinkingwater/Pages/EDTlibrary.aspx. Imported all 4 tables into SQL Server 2008, appended into single table named EPHTN_FINDINGS (N=26352418-127 with null store_num=26352291), normalized on unique samples, established unique and search-optimized indexes, and named new samples table EPHTN_SAMPLES (N=1748969). A unique sample was based on sampling station, sample date/time, analyte, and lab.  For each unique water system, determined earliest sample date, and added this as column MIN_SAMPLE_DT to PICME SYSNUM1 table.
Process_Date: 20130315
Process_Step:
Process_Description:
Estimated earliest water system activity in PICME SYSNUM1 by taking earliest date of following 4 date fields:  SYSNUM1.REVISEDATE, SYSNUM1.INVEN_DATE, SYSNUM1.SYS_UPDATE, SYSNUM1.MIN_SAMPLE_DT
Process_Date: 20130315
Process_Step:
Process_Description:
Developed SQL statement for creating year-specific CWS Inventory table.  Primary logic holds PICME_SYSNUM1.PWS_CLASS='C' and (Year>=Year(PICME_SYSNUM1.MIN_DT AND ((PICME_SYSNUM1.UPDT IS NULL OR YEAR(PICME_SYSNUM1.UPDT) >= Year)) and ((PICME_SYSNUM1.CONNECTIONS>=15 and PICME_SYSNUM1.POPULATION>0) or PICME_SYSNUM1.POPULATION>=25) to select only Community Water Systems that are Active in a reporting year and with either 15+ connections or 25+ population served, respectively.  The logic for determining optional Primary Source Code departs from EPA Surface Water Treatment Rule logic; If all the active sources contained in a water system are all of one source type (ie. GW, SWP, SW, GWP, GU, or GUP), then the system was attributed with that source type, otherwise it is attributed as missing. 3,044 system inventory records returned by this logic.
Process_Date: 20130315
Process_Step:
Process_Description:
Downloaded Geographic Names Information System (GNIS) feature types and codes from geonames.usgs.gov and extracted feature ID based on joining the the county FIPS code and the city location name in PICME_SYNUM1 field. Merged the feature ID field into the inventory dataset. A code of -999 was reported for CWS that could not be attributed by this method (N=240).
Process_Date: 20130315
Process_Step:
Process_Description:
Exported current database state of PWS service areas (N=2,063 PWS) from CEHTP Drinking Water Systems Geographic Reporting Tool (http://ehib.org/water). Extracted centroid of CWS (N=1,404) and merged the corresponding latitude/longitude value into the inventory dataset using a LocationDerivationCode of 'SA'.
Process_Date: 20130315
Process_Step:
Process_Description:
For CWS not having service area centroids (N=1959), developed SQL statement for subsetting by PWSID individual sampling stations near service area using PICME_SOURCE table. Primary logic is as follows: union of following 2 where statements: 1. WATER_TYPE=G and ENTITY_INFO in (AR,AT,AU,IR,IT,IU,CR,CT,CU) and 2. WATER_TYPE in (S,M) and ENTITY_INFO in (AT,IT,CT,DR,DT). 1,648 CWS were found with latitude longitude coordinates through this method. To ensure security, the reported centroids were rounded to the nearest hundredth (ie. accurate to ~1km) of a decimal degree.
Process_Date: 20130315
Process_Step:
Process_Description:
For the remaining CWS lacking latitude/longitude values (N=311), the water system headquarters address was geocoded using the CEHTP Centralized Geocoding Service.  These coordinates were averaged with corresponding CWS that had a principal city location (N=212). For the remaining CWS (N=87) that didn't have both a principal city and geocoded system headquarters available, the principal city alone was used (N=80), then lacking that, the geocoded system headquarters was used alone (N=12).
Process_Date: 20130315
Back to Top
Entity_and_Attribute_Information:
Overview_Description:
Entity_and_Attribute_Overview:
The data set contains fields describing Community Water Supplies (CWS) such as location of service area, number of connections, and approximate people served.
Entity_and_Attribute_Detail_Citation:
Data dictionary is available from the National Tracking Program at http://www.cdc.gov/nceh/tracking.
Back to Top
Distribution_Information:
Distributor:
Contact_Information:
Contact_Person_Primary:
Contact_Person: CEHTP Science Team
Contact_Organization: CA Department of Public Health, CA Environmental Health Tracking Program
Contact_Position:
Contact_Address:
Address_Type: Mailing
Address:
850 Marina Bay Parkway, Bldg P, 3rd Floor
City: Richmond
State_or_Province: CA
Postal_Code: 94804
Country: United States Of America
Contact_Voice_Telephone: 5106203620
Contact_TDD/TTY_Telephone:
Contact_Facsimile_Telephone: 5106203720
Contact_Electronic_Mail_Address: data@cehtp.org
Hours_of_Service:
Contact Instructions:

Resource_Description:
Distribution_Liability:

Custom_Order_Process:

Back to Top
Metadata_Reference_Information:
Metadata_Date: 20130315
Metadata_Contact:
Contact_Information:
Contact_Person_Primary:
Contact_Person: CEHTP Science Team
Contact_Organization: CA Department of Public Health, CA Environmental Health Tracking Program
Contact_Position:
Contact_Address:
Address_Type: Mailing
Address:
850 Marina Bay Parkway, Bldg P, 3rd Floor
City: Richmond
State_or_Province: CA
Postal_Code: 94804
Country: United States Of America
Contact_Voice_Telephone: 5106203620
Contact_TDD/TTY_Telephone:
Contact_Facsimile_Telephone: 5106203720
Contact_Electronic_Mail_Address: data@cehtp.org
Hours_of_Service:
Contact Instructions:

Metadata_Standard_Name: EPHTN Tracking Network Profile Version 1.2
Metadata_Access_Constraints: none
Metadata_Use_Constraints:
none
Back to Top