Regional Water Use Efficiency in California Vineyards

Author: William Mullins ・ Date: December 5, 2025

GitHub
Back to Blog
Water ResourcesAgricultureStatistical Modeling
December 5, 2025R · GLM · Environmental Analysis

Introduction

The California Water Challenge

Does hydrologic region affect vineyard water use in California, after controlling for evapotranspiration, precipitation, and irrigated crop area?

California's agriculture is the state's dominant water consumer, accounting for approximately 80% of its developed water supply. The efficiency of this water use is paramount for economic resilience and environmental stewardship, particularly as climate change exacerbates drought frequency and intensifies competition for limited resources.

Vineyards represent a significant and geographically diverse component of the state's agricultural portfolio, spanning disparate climate zones from the cool coastal valleys to the hot interior regions. While climatic factors are known drivers of water demand, regional differences inherent to California's 10 major hydrologic regions include additional variables such as such as water management practices, regulatory structures, soil characteristics, and crop varieties—may exert an independent and measurable influence on observed water use efficiency.

The aim of this analysis is to quantify the magnitude and direction of the independent effect of hydrologic region on vineyard water use efficiency across California.

California Hydrologic Regions

The state is divided into 10 hydrologic regions, each with distinct climate characteristics that influence agricultural water demand.

California Hydrologic Regions map

About the Data

Data was extracted from the California Department of Water Resources Statewide Agricultural Water Use Data (2016-2020) Excel Application Tool. This dataset provides annual estimates for water use variables across 20 crop types and multiple geographic scales.

Key variables used in this analysis:

  • Regional Applied Water Volume (AW): Total irrigation water applied to vineyards in acre-feet
  • Regional Irrigated Crop Area (ICA): Total vineyard acreage receiving irrigation
  • Regional Crop Evapotranspiration ($ET_c$): Total water lost to atmosphere from soil and plants (acre-feet)—represents crop water demand
  • Regional Effective Precipitation ($E_p$): Rainfall that effectively contributes to crop water needs (acre-feet)
  • Hydrologic Region (HR): California's 10 major hydrologic regions, defined by watershed boundaries and climate characteristics

Data availability: California DWR Water Use Data.

Directed Acyclic Graph (DAG)

The following DAG illustrates the conceptual model of how the variables relate:

  1. Hydrologic Region (HR) → Climate Driven Variables: HR defines the local climate conditions. Different regions have different temperatures, humidity, and weather patterns. This drives both $ET_c$ (water lost) and $E_p$ (water gained naturally).
  2. Climate Driven Variables → Water Use: After accounting for natural water supply ($E_p$), the deficit between demand ($ET_c$) and supply must be met through irrigation.
  3. Vineyard Size (ICA) → Total Volume: Larger vineyards require proportionally more total water (a scaling relationship).

After controlling for climate and size, regional coefficients isolate potential management efficiency differences. The reference region (Central Coast) represents the efficiency baseline.

Directed Acyclic Graph (DAG)

Import Packages

The analysis requires the following packages:

Data Processing Pipeline

The data is stored in an Excel application, which means that data for each of the variables is stored on its own individual sheets. The following code is used to read in data from each of the sheets and convert it into a standard data frame.

Data Preparation

Now that the data has been read in, we can select data collected from vineyards which applied water to their crops.

Data Exploration

Regional Water Use Summary

Hydrologic RegionnMean AWMedian AWSDSE
South Lahontan23128.7839.43295.2861.57
South Coast461,287.72296.492,120.16312.60
Sacramento River2063,631.83134.918,249.17574.75
North Coast1215,936.57443.8510,137.34921.58
Colorado River256,748.80386.1513,446.132,689.23
Central Coast1639,917.941,476.9522,978.531,799.82
San Francisco Bay8010,135.80138.5525,803.392,884.91
San Joaquin River18217,764.873,262.9941,878.193,104.22
Tulare Lake16638,072.245,663.5158,246.364,520.79

Applied Water (AW) summary statistics by Hydrologic Region

Key observations:

  • Large variation between regions: Mean water use ranges from 129 acre-feet (South Lahontan) to 38,072 acre-feet (Tulare Lake)—a nearly 300-fold difference
  • High within-region variability: Standard deviations exceed means in all regions, indicating right-skewed distributions
  • Median < Mean everywhere: Consistent with right skew caused by a few very large vineyard operations

Distribution of Water Use

This log-scale boxplot reveals that while most regions have modest median use (100-1,000 acre-feet), there are many high water volume outliers in regions like Central Coast, San Joaquin River, and Tulare Lake.

Box Plot of Applied Water Volume by Hydrologic Region

Statistical Model

Model Specification

Why Gamma?

The Gamma distribution is appropriate here because:

  1. Positive support: Water use cannot be zero or negative in irrigated areas
  2. Right-skewed: The distribution has a long right tail (high outliers)
  3. Mean-variance relationship: Variance increases with the mean (heteroscedasticity)

Model Statistical Notation

$$Y_i \sim \text{Gamma}(\text{shape} = k, \text{scale} = \theta_i)$$

Where:

  • $Y_i$ = Regional Applied Water Volume for observation $i$
  • $k = 1/\phi$ is the shape parameter (constant across observations)
  • $\phi$ is the dispersion parameter
  • $\theta_i$ is the scale parameter (varies by observation)

Gamma Regression Equation:

$$\log(\mu) = \beta_0 + \beta_1 ET_c + \beta_2 E_p + \beta_3 \log(ICA) + \beta_4 HR_4 + \beta_5 HR_5 + \cdots + \beta_k HR_k$$

Coefficient Interpretation:

  • $\beta_1, \beta_2$: Multiplicative change in water use per unit increase in $ET_c$ or $E_p$
  • $\beta_3$: Change in water use per % change in vineyard area
  • $\beta_4 - \beta_{11}$: Regional multiplier relative to Central Coast (reference region)

Model Fitting

ParameterEstimateStd. Errort valuePr(>|t|)
(Intercept)7.473e-011.753e-0242.624< 2e-16 ***
Regional_ETc_Vol9.758e-073.035e-073.2150.00135 **
Regional_Ep_Vol-7.294e-062.672e-06-2.7300.00645 **
log(Regional_ICA)1.010e+002.213e-03456.617< 2e-16 ***
HRColorado River6.531e-013.313e-0219.712< 2e-16 ***
HRNorth Coast-1.257e-021.857e-02-0.6770.49850
HRSacramento River2.045e-011.645e-0212.432< 2e-16 ***
HRSan Francisco Bay5.207e-022.117e-022.4600.01406 *
HRSan Joaquin River2.495e-011.660e-0215.029< 2e-16 ***
HRSouth Coast3.332e-012.572e-0212.956< 2e-16 ***
HRSouth Lahontan5.557e-013.475e-0215.993< 2e-16 ***
HRTulare Lake5.399e-011.755e-0230.769< 2e-16 ***

Gamma GLM Coefficient Estimates (Dispersion parameter: 0.0234)

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model Validation Through Simulation

To ensure the model is correctly specified and is able to reliably estimate parameters, simulations are used. This process involves:

  1. Using coefficients from the model as the "true" values
  2. Generating 500 synthetic datasets using those parameters
  3. Refitting the model for each dataset
  4. Checking if that model recovers the true parameters

Simulation Procedure

Simulation Results

ParameterTrue ValueRecoveredDifference% Error
(Intercept)7.473e-017.427e-014.60e-030.62
Regional_ETc_Vol9.758e-078.633e-071.12e-0711.52
Regional_Ep_Vol-7.294e-06-8.424e-061.13e-0615.49
log(Regional_ICA)1.010e+001.010e+003.76e-040.04
HRColorado River6.531e-016.622e-01-9.14e-031.40
HRNorth Coast-1.257e-02-5.219e-03-7.35e-0358.49
HRSacramento River2.045e-011.988e-015.74e-032.80
HRSan Francisco Bay5.207e-023.966e-021.24e-0223.84
HRSan Joaquin River2.495e-012.610e-01-1.15e-024.62
HRSouth Coast3.332e-013.452e-01-1.20e-023.59
HRSouth Lahontan5.557e-015.573e-01-1.54e-030.28
HRTulare Lake5.399e-015.557e-01-1.58e-022.93

Parameter Recovery Summary (Mean absolute % error: 10.47, Max: 58.49)

Interpretation: For all of the parameters the model was able to recover the true value of the mean within its 95% confidence interval. This suggests that the model is successfully able to recover the parameter values.

Statistical Inference

Hypothesis Testing

Overall Test: Do Regions Differ?

Null Hypothesis ($H_0$):

$$\beta_4 = \beta_5 = \cdots = \beta_{11} = 0$$

All HR coefficients are zero, meaning there are NO differences in water use between hydrologic regions after controlling for climate factors and vineyard size.

Alternative Hypothesis ($H_1$):

$$\text{At least one } \beta_i \neq 0$$

At least one region differs in water use, indicating that regional factors beyond climate (e.g., efficiency, technology, regulations) affect water use.

Likelihood Ratio Test

The $H_0$ is tested by comparing the full gamma model (with HR) to a reduced gamma model (without HR):

ModelResid. DfResid. DevDfDeviancePr(>Chi)
Null (without HR)100863.545---
Full (with HR)100023.979839.566< 2.2e-16 ***

Analysis of Deviance Table (Likelihood Ratio Test)

Results:

  • Test Statistic: $\chi^2$ = 39.57 (df = 8)
  • P-value: < 2.2e-16
  • Conclusion: We reject the null hypothesis. There is evidence that the hydrologic regions affect the water applied by vineyards even after controlling for climate demand, precipitation, and vineyard size.

Individual Region Comparisons

Percent Change Calculation

$$\text{Percent Change} = (\exp(\beta_j) - 1) \times 100\%$$

Visualizing Regional Coefficients

Box Plot of Applied Water Volume by Hydrologic Region

Of the nine regions, only one, the North Coast, did not differ a statistically significant amount from the coefficient of the Central Coast.

Model Fit and Diagnostics

Predicted vs Observed Values

Model Validation Metric

$$R^2 = \text{cor}\left(y_i, \hat{y}_i\right)^2$$
ActualvsPredicted

The model explains 98.6% of variance in water use, indicating excellent fit.

Discussion

Key Findings

This analysis provides evidence that hydrologic regions in California differ substantially in vineyard water use efficiency even after accounting for climate-driven water demand, effective precipitation, and vineyard size.

Main Results

  1. Regional effects are statistically significant and large
    • LRT test: $\chi^2$ = 39.6, p < 0.001
    • Coefficients ranged from -1% to +92% percent different from reference level
  2. Three regions show substantially higher coefficients
    • Colorado River: 92% higher expected water use
    • Tulare Lake: 72% higher expected water use
    • South Lahontan: 74% higher expected water use
  3. Model validation confirms reliability
    • Simulation showed parameter recovery
    • High $R^2$ (0.986) — the majority of variation can be explained by the model

Future Directions

Further investigation should be done into finding what is causing the regional variation in water use between the regions. The results of this investigation imply that additional variables outside of water need are affecting some of these regions. Investigations should look into whether this is caused by management decisions or by additional environmental factors not considered in this study.