f. Validation and bias correction of the datasets
The ERA5-driven WRF output data in hourly sequence from 2011 to 2020
(ERA5-driven) and CMIP6-driven WRF output data in hourly sequence in the
SSP2-4.5 scenario (CMIP6-driven) from the same period were used to
correct the systematic biases of the 2 m temperature. The bias
correction was based on SSP2-4.5 because it was assumed to be the
closest pathway to reality for the 2010s.
Reanalysis data can reflect past weather and climate information and
fill in the gaps associated with observation records and satellite data.
Because they are spatially complete and consistent in time with
observation records, reanalysis data can be effectively used for bias
correction. The ERA5-driven and CMIP6-driven datasets corresponded to
the same WRF configuration and physics parameterization for consistency.
The correction value and corrected dataset were determined using the
following two equations, respectively. Fig. S2 (b) shows the spatial map
of the mean correction value. CMIP6 tend to overpredict and underpredict
the 2 m temperature in inland and coastal areas, respectively, as
depicted in Fig S2 (b).
\begin{equation}
\begin{matrix}\mathrm{correction\ value=ERA5\ driven\ dataset-CMIP6\ driven\ dataset}\\
\end{matrix}\nonumber \\
\end{equation}\begin{equation}
\begin{matrix}\mathrm{corrected\ CMIP6\ driven\ dataset=CMIP6\ driven\ dataset+correction\ value}\\
\end{matrix}\nonumber \\
\end{equation}The performance of the bias correction was evaluated using 2 m
temperature observation data collected in the PRD between 2016 and 2020.
This time frame was chosen as it provided a higher quality temperature
dataset due to more complete data from observation stations compared to
other periods. All three datasets were organized in hourly frequencies
to achieve the best bias correction and evaluation performance. We
selected 118 observation stations among which the percentage of missing
values was less than 10%. See Fig. S3 in Supporting Information for the
spatial distribution of these 118 observation stations.
The following statistical metrics were calculated to evaluate the bias
correction performance: mean bias (MB), mean absolute error (MAE), root
mean square error (RMSE), and index of agreement (IOA), which reflect
the bias and absolute error between the corrected CMIP6 dataset and
observations, standard deviation of the residuals, and accuracy of the
corrected CMIP6 dataset, respectively. The metrics were calculated using
the following equations.
\begin{equation}
\begin{matrix}\mathrm{MAE=}\frac{\mathrm{1}}{\mathrm{n}}\sum_{\mathrm{1}}^{\mathrm{n}}\left(\left|\mathrm{C}_{\mathrm{m}}\mathrm{-}\mathrm{C}_{\mathrm{o}}\right|\right)\\
\end{matrix}\nonumber \\
\end{equation}\begin{equation}
\begin{matrix}\mathrm{MB=}\frac{\mathrm{1}}{\mathrm{n}}\sum_{\mathrm{1}}^{\mathrm{n}}\left(\mathrm{C}_{\mathrm{m}}\mathrm{-}\mathrm{C}_{\mathrm{o}}\right)\\
\end{matrix}\nonumber \\
\end{equation}\begin{equation}
\begin{matrix}\mathrm{RMSE=}\sqrt{\frac{\mathrm{1}}{\mathrm{n}}\sum_{\mathrm{1}}^{\mathrm{n}}\left(\mathrm{C}_{\mathrm{m}}\mathrm{-}\mathrm{C}_{\mathrm{o}}\right)^{\mathrm{2}}}\\
\end{matrix}\nonumber \\
\end{equation}\begin{equation}
\begin{matrix}\mathrm{IOA=1-}\frac{\sum_{\mathrm{1}}^{\mathrm{n}}\left(\mathrm{C}_{\mathrm{m}}\mathrm{-}\mathrm{C}_{\mathrm{o}}\right)^{\mathrm{2}}}{\sum_{\mathrm{1}}^{\mathrm{n}}{\mathrm{(}{\left|\mathrm{C}_{\mathrm{m}}\mathrm{-}\overset{\overline{}}{\mathrm{C}_{\mathrm{o}}}\right|\mathrm{+}\left|\mathrm{C}_{\mathrm{o}}\mathrm{-}\overset{\overline{}}{\mathrm{C}_{\mathrm{o}}}\right|\mathrm{)}}^{\mathrm{2}}}}\\
\end{matrix}\nonumber \\
\end{equation}where Cm is the simulation value in the grid
closest to the station, Co is the observation
station value, and n is the number of observation stations,\(\overset{\overline{}}{\mathrm{C}_{\mathrm{o}}}\) is the mean value of
the temperature at observation stations.
Validation result indicates that the ERA5-driven simulation tended to
overestimate the 2 m temperature by 0.6 °C compared with the
observations. The CMIP6-driven dataset overestimated the corresponding
values more than the ERA5-driven dataset. Compared with ERA5, the
CMIP6-driven dataset exhibited a larger RMSE and a smaller IOA,
indicating inferior performance (see Table. S2 in Supporting
Information). However, the bias was effectively reduced after
correction, with the MB and RMSE decreasing and the IOA increasing. To
precisely compare the simulations and observations, the difference
associated with each observation station in the two datasets was
evaluated. The CMIP6-driven simulations overpredicted the 2 m
temperature at most observation stations, and this bias was effectively
reduced after bias correction. The more accurate results obtained after
bias correction provided a more solid basis for further analysis (see
Fig. S5–S11 in Supporting Information).
Since the validation period is overlapped with the correction period, we
also used the correction value from 2011 to 2015 to correct the original
CMIP6-driven dataset from 2016 to 2020 and compared the performance of
this five-year correction value with that of the ten-year correction
value. The results (Fig. S2–S11) indicate that the difference between
the two sets of correction values is marginal, with slightly better
performance observed for the ten-year correction value. Therefore, to
achieve better performance for current and future projections, the
following results used the ten-year correction value obtained from the
2011–2020 period.