Gera Environmental Pressure Score — Methodology
A transparent, reproducible formula. Every value in this dataset traces directly to the real EPA ECHO Exporter bulk CSV (last-modified 2026-06-13), downloadable without a key from echo.epa.gov.
The formula
density_raw = log₁₀(active_facilities + 1)
violation_rate = recent_violators / active_facilities
snc_rate = serious_violators / active_facilities
pressure_score = 40·norm(density) + 40·norm(violation_rate) + 20·norm(snc_rate)
GEPS = 100 − pressure_score
where norm(x) = (x − min_x) / (max_x − min_x) across 3,177 included counties
Facility density and violation rate each carry 40% weight as the two primary compliance signals. Significant Non-Compliance share carries 20% weight because SNC designations vary by programme type. Higher GEPS = lower environmental pressure = better compliance context.
Normalisation ranges (2026-06-20)
| Metric | Weight | Min (lowest pressure) | Max (highest pressure) |
|---|---|---|---|
| Facility density (log₁₀ scale) | 40% | 1.041 (log₁₀ of 10) | 5.090 (log₁₀ of 123,062) |
| Recent violation rate | 40% | 0.000 (0%) | 0.765 (76.5%) |
| SNC rate | 20% | 0.000 (0%) | 0.320 (32.0%) |
Reproduce it yourself — step by step
- 1
Download the EPA ECHO Exporter bulk ZIP (no API key or registration required)
Fetch the file from https://echo.epa.gov/files/echodownloads/echo_exporter.zip. This is a single compressed CSV (ECHO_EXPORTER.csv) containing all EPA-regulated facilities across Clean Air Act, Clean Water Act (NPDES), RCRA hazardous waste, and Safe Drinking Water Act programmes. The file is ~445 MB compressed and ~2.1 GB uncompressed. It is a U.S. federal government work in the public domain — no key, registration, or fee required. Last-modified: Sat, 13 Jun 2026.
https://echo.epa.gov/files/echodownloads/echo_exporter.zip - 2
Stream-decompress and extract only the needed columns
Do NOT load the full 2.1 GB CSV into memory. Use a streaming deflate decompressor (e.g. Python zlib.decompressobj(wbits=-15)) and extract these six columns per row: FAC_STATE (col 4), FAC_COUNTY (col 6), FAC_FIPS_CODE (col 7), FAC_ACTIVE_FLAG (col 30), FAC_QTRS_WITH_NC (col 43), FAC_PROGRAMS_WITH_SNC (col 44). Column indices are 0-based; verify against the header row before processing.
- 3
Aggregate to county level using 5-digit FIPS code
For each facility row, extract the 5-digit county FIPS code (FAC_FIPS_CODE). Per county accumulate: (a) total_facilities = row count; (b) active_facilities = count where FAC_ACTIVE_FLAG = 'Y'; (c) recent_violators = count where FAC_ACTIVE_FLAG = 'Y' AND FAC_QTRS_WITH_NC > 0 (any non-compliance in the last 12 quarters); (d) serious_violators = count where FAC_ACTIVE_FLAG = 'Y' AND FAC_PROGRAMS_WITH_SNC > 0 (at least one programme in Significant Non-Compliance). Rows with missing or invalid FIPS codes are skipped.
- 4
Exclude counties below the quality threshold
Counties with fewer than 10 active regulated facilities are excluded entirely — they have insufficient regulated-facility density for a statistically reliable violation rate or SNC rate. These are rendered as "insufficient data" in the interface. Never impute or estimate. This threshold produces 3,177 valid county rows.
- 5
Compute three raw metrics per county
For each included county: (1) density_raw = log10(active_facilities + 1) — the log scale compresses the extreme right tail (Los Angeles County has 123,062 active facilities vs. rural counties with 10–20), making the metric comparable across county sizes without external land-area data; (2) violation_rate = recent_violators / active_facilities (range 0–1); (3) snc_rate = serious_violators / active_facilities (range 0–1).
- 6
Min-max normalise each metric across all included counties
For each metric m, compute: norm_m = (value_m − min_m) / (max_m − min_m). This rescales each metric to [0, 1] where 1 = highest pressure / worst value. Normalisation ranges from the 2026-06-20 EPA snapshot (3,177 counties) are shown in the table below.
- 7
Compute the weighted pressure score and GEPS
pressure_score = 40 × norm(density_raw) + 40 × norm(violation_rate) + 20 × norm(snc_rate). GEPS = 100 − pressure_score. Facility density (40%) and violation rate (40%) share equal weight as the two primary signals of environmental compliance load. SNC share (20%) carries lower weight because SNC designations are programme-specific and not all programmes apply equally to all facility types. The result is in [0, 100]; higher = lower pressure = better. Round to one decimal place.
- 8
Assign national ranks
Sort all 3,177 counties by GEPS descending. Rank 1 = lowest environmental pressure; rank 3,177 = highest pressure. Ties broken alphabetically by state then county name.
Validation examples (2026-06-20)
| County | State | Active fac. | Viol. rate | SNC rate | GEPS |
|---|---|---|---|---|---|
| ARROYO | PR | 10 | 0.0% | 0.0% | 100.0 / 100 |
| POQUOSON CITY | VA | 14 | 0.0% | 0.0% | 98.7 / 100 |
| BORDEN COUNTY | TX | 16 | 0.0% | 0.0% | 98.1 / 100 |
| MINGO | WV | 265 | 59.6% | 30.2% | 36.3 / 100 |
| OROCOVIS | PR | 25 | 76.0% | 32.0% | 36.6 / 100 |
| MCDOWELL | WV | 298 | 58.7% | 23.2% | 40.6 / 100 |
Data source and licence
All underlying data is published by the U.S. Environmental Protection Agency (EPA) through the ECHO (Enforcement and Compliance History Online) Exporter (last-modified 2026-06-13). This is a U.S. federal government work in the public domain under 17 U.S.C. § 105. No API key, registration, or fee is required to download it. Gera does not claim any copyright over the derived index values; the formula and data are published here so any reader can reproduce them.
The GEPS is a regulatory compliance context indicator, not a direct pollutant-concentration or health-outcome measure. For ambient air quality concentrations, see the Gera US Air Health Index (GUSAHI), which uses EPA AQS actual PM2.5 and ozone data.
Frequently asked questions
- Why use log10 for facility density instead of raw facility count?
- The raw active-facility count varies from 10 (our minimum threshold) to 123,062 (Los Angeles County). Without transformation, the density metric would be dominated entirely by the largest counties and the min-max normalisation would collapse most counties into a very narrow band near 0. The log10 transformation compresses this extreme right tail while preserving the ordering — a county with 10 facilities scores ~1.0, one with 100 scores ~2.0, one with 1,000 scores ~3.0, one with 123,000 scores ~5.1. The ratio between the largest and smallest becomes 5.1:1 instead of 12,000:1, making the density component comparable across county sizes without needing external land-area data.
- Why not use land area for facility density?
- Land area data would require joining a separate Census TIGER file, adding a second data source, a second licence, and a second join operation. The log-scaled facility count already captures the key signal (relative regulatory load) and is entirely self-contained within the EPA ECHO Exporter. Gera discloses this choice explicitly in the methodology — the GEPS is a regulatory compliance pressure index, not a population-exposure measure (which is captured separately in the Gera US Air Health Index using EPA AQS data).
- What does FAC_QTRS_WITH_NC measure?
- FAC_QTRS_WITH_NC is the number of quarters (out of the last 12 quarters = 3 years) in which the facility had documented non-compliance across any of its regulated EPA programmes. A value of 0 means no non-compliance in the past 3 years. A value of 12 means non-compliance in every quarter. The GEPS counts a facility as a "recent violator" if FAC_QTRS_WITH_NC > 0 — i.e., any non-compliance in the 3-year window. This is a conservative (inclusive) definition.
- How often is the GEPS updated?
- EPA ECHO publishes updated ECHO Exporters on a rolling basis (approximately monthly). Gera re-computes the GEPS each time a new ECHO Exporter is available. The current version is based on the ECHO Exporter last-modified 2026-06-13, downloaded by Gera on 2026-06-20.
- Can I reproduce the GEPS myself?
- Yes. Download the EPA ECHO Exporter ZIP from echo.epa.gov/files/echodownloads/ (no key required). Stream-decompress ECHO_EXPORTER.csv. Extract FAC_STATE, FAC_COUNTY, FAC_FIPS_CODE, FAC_ACTIVE_FLAG, FAC_QTRS_WITH_NC, FAC_PROGRAMS_WITH_SNC. Aggregate to county (5-digit FIPS). Exclude counties with < 10 active facilities. Compute density_raw = log10(active + 1), violation_rate = recent_viol / active, snc_rate = serious / active. Min-max normalise. Apply weights 40/40/20. Subtract from 100. The formula is deterministic — you will arrive at the same GEPS values from the same EPA snapshot.
← Browse all 3,177 US county environmental pressure scores
Contains public sector information published by U.S. Environmental Protection Agency (EPA) and licensed under the U.S. Public Domain (federal government work, 17 U.S.C. § 105). Source: EPA ECHO Exporter — Full Facility Register with Compliance Data (2026-06-13) (2026-06-20, published 2026-06-13 (EPA ECHO Exporter last-modified date)).
Informational/educational only — not a substitute for professional medical advice; a clinician interprets results.