Statistical analyses
The first analysis was to calculate average difference between summer temperatures experienced by ethnoracial group and the county average. To calculate mean difference per ethnoracial group, we used fixed effects regressions. To account for spatial variation in temperature and ethnoracial composition, we used three strategies. First, regressions were stratified by census regions, which closely align with NOAA climate regions. Second, we included fixed effects for county. Finally, we used Conley variance–covariance calculations to construct standard errors to account for spatial autocorrelation in the data, calculated based on population-weighted centroids of tracts. We also included a fixed effect for the year, and we weighted our regression models by the total population of the tract. These statistical procedures were conducted using the fixest package in R (30). Often regression models with categorical variables like ethnoracial groups use traditional dummy coding with one referent group, typically white people. This is problematic because it makes the referent result invisible and makes one group’s experience the standard, norm, or aspirational depending on context (31). To avoid this, we implemented weighted effect coding, which functionally weights each category to represent deviation from the sample mean, in this case the county-averaged temperature (32).
Our second analysis was to associate our measure of residential segregation with local air temperatures. Associating tract-level temperature with the segregation index required a flexible regression framework to accommodate nonlinearities, so we used generalized additive models with smoothing splines. The segregation measure was modeled with a natural cubic spline with three knots. We included fixed effects for county and year and accounted for spatiotemporal dependence by modeling a tensor product smooth of the geocoordinates of population-weighted centroids by year. These regressions were implemented with themgcv package in R, specifically using the bam function for computational efficiency in large datasets (33).