4.3  Selection of Inversion Parameters to Prevent Overfitting/Underfitting of Data

Kamini Singha; Timothy C. Johnson; Frederick D. Day-Lewis; Lee D. Slater

4.3 Selection of Inversion Parameters to Prevent Overfitting/Underfitting of Data

Data should generally be weighted based on their measurement (reciprocal or stacking) uncertainty (Sections 3.2 and 3.3) to prevent overfitting or underfitting of the data. In other words, the inversions should generally match the data within the uncertainties (sometimes called errors) quantified by reciprocal or stacking errors. However, there are often systematic data or model sources of error that are not manifested in reciprocal or stacking errors. For example, systematic sources of data error can arise from electrode position errors or temporal variation in subsurface electrical conductivity over the course of the survey. Systematic model errors include coarse-grid error (i.e., the inability of the model to simulate the effects of sub-grid heterogeneity), or violation of the 2-D heterogeneity assumption in 2-D inversions as noted above. We usually cannot quantify these errors and therefore cannot account for them via data weighting; their existence can produce inversion artifacts if they are not accounted for. The sources and magnitudes of these errors are often unknown and therefore require a subjective estimation of how to approach error weighting by the user.

One common approach to estimating data error is given by Equation 11.

[latex]\displaystyle s_{i}=a_{i}\left ( \mathrm{abs}(R_{i}) \right )+b[/latex]

(11)

where:

s_i	=	standard deviation of resistance measurement i
a_i	=	unitless scaling factor giving the error in terms of the magnitude of the measured resistance R_i
b	=	small resistance indicating the precision of the voltmeter

As written, Equation 11 allows for a different value a_i for each resistance measurement i, where the a values initially could be based on the reciprocal or stacking error expressed as a decimal fraction. These values can be increased until the inversion converges to a reasonable conductivity structure, at which point additional sources of error would presumably be accounted for. This approach assumes that standard or reciprocal errors are proportional to total error. Alternatively, a single a is often used to establish representative weights for all resistance measurements in cases where individual error weights are too low. In this case, it is common to first calculate average reciprocal errors for a range of bins that divide up the reciprocals in terms of increasing resistance (Figure 11). The reciprocal error for any resistance value is then determined from the equation of the line. One way to test whether reciprocal and stacking errors adequately represent the true data error is to first invert the data under the hypothesis that reciprocal/stacking error adequately quantifies all sources of error. If the inversion (1) does not converge (i.e., cannot fit the data) in a reasonable number of iterations, (2) results in an overly heterogeneous electrical conductivity structure, or (3) produces unrealistic electrical conductivity values, then it is likely that the reciprocal and stacking errors do not adequately capture the true error. In this case, the user must increase the error to produce smoother or more uniform tomograms. Use of the correct errors will result in convergence within a reasonable number of iterations, typically 3 to 10, although this will depend strongly on the inversion algorithm.

Graph showing an example reciprocal error plot

Figure 11 – Example reciprocal error plot for an electrical resistivity dataset. Blue crosses are individual reciprocal errors whereas orange dots are average values for bins defined in terms of increasing resistance. The linear fit of an estimated reciprocal error (s_i) as a function of resistance R_i is based on the binned values. Plot created with ResIPy (Blanchy et al., 2020).

There are several approaches commonly used to set the relative weighting between the model roughness and data misfit in the inversion (Equation 9a and b), i.e., the value of ε. This issue is not trivial, as the tradeoff between terms of the objective-function controls the variability of estimated electrical conductivity and, ultimately, whether the information in the data is optimally utilized. If too much weight is ascribed to the model roughness term, underfitting occurs—thus the inversion does not capitalize on all the information provided by the data, resulting in an overly smooth tomogram. On the other hand, if too much weight is given to the data misfit term, overfitting results—the data are fit so well that the inversion reproduces noise, resulting in an overly complex tomogram with spurious structure and possibly unrealistic electrical conductivity values. While these models are mathematically viable and may even fit data better than other models, they are geologically unrealistic and this is where the art of inversion and prior knowledge of the system are important.

The tradeoff parameter, ε, is analogous to a contrast knob, which if set incorrectly results in an image that is washed out at one extreme or noisy at the other. The simplest approach to identify ε is subjective selection by the user, such that the resulting tomogram is qualitatively consistent with existing knowledge of the range of subsurface electrical conductivity and geologic structure. More objective approaches include an Occam’s inversion (Constable et al., 1987), the L-curve (Hansen and O’Leary, 1993), and generalized cross-validation (GCV) (e.g., Haber and Oldenburg, 2000; Farquharson and Oldenburg, 2004). In general, the three techniques produce similar results for most datasets. These three approaches are:

In Occam’s inversion, ε is determined as part of the optimization to achieve the smoothest model that matches the data to the desired misfit.
In the L-curve approach, the inversion is repeated for a number of values of ε, and a plot of model complexity versus data misfit (i.e., the second term of Equation 9a and b versus the first term of Equation 9a and b) is constructed. The optimal ε is taken as the value at the elbow of the resulting L-shaped curve.
In GCV, the tradeoff parameter is identified based on a procedure analogous to inverting the dataset repeatedly, leaving one measurement out at a time, and finding the value for ε that minimizes the average prediction error for the data eliminated.

An inversion converges when the data misfit is reduced to some set value. Criteria for computing the data misfit commonly are based on the absolute weighted error (AWE), sometimes called the normalized error or the percent error (PE), which for measurement i are given respectively as Equations 12a and 12b.

[latex]\displaystyle \textit{AWE}_{i}=\frac{\left ( d_{obs,i}-d_{sim,i} \right )}{\sqrt{C_{d,i}}}[/latex]

(12a)

[latex]\displaystyle \textit{PE}_{i}=\frac{100\textrm{%}\times \left ( d_{obs,i}-d_{sim,i} \right )}{d_{obs,i}}[/latex]

(12b)

where:

d_obs,i and d_sim,i	=	observed and simulated values of datum i
C_d,i	=	variance of datum i as determined by the data noise estimate
AWE and PE	=	used to compute the normalized chi-squared (χ²) value and the root-mean-squared error (RMS) value defined by Equation 13a and 13b respectively.

[latex]\displaystyle \chi ^{2}=\frac{1}{N}\sum_{i=1}^{N}\mathit{AWE}{_{i}}^{2}[/latex]

(13a)

[latex]\displaystyle RMS=\sqrt{\frac{1}{N}\sum_{i=1}^{N}\mathit{PE}{_{i}}^{2}}[/latex]

(13b)

		where:
		N = number of data

The normalized χ² value is a linear scaling of the first term of the objective function (Equation 9a and b). It is a useful measure of data misfit because it gives a direct indication of what the inversion is trying to minimize (in addition to the regularization term) and includes the covariance of the data (the error weights) directly. When the data misfit in the numerator of the AWE is consistent with the data error estimate in the denominator of the AWE, then χ² = 1. Assuming our data are appropriately weighted, then χ² = 1 is the target value we are aiming for at convergence. That is, we would ideally fit our observed data with our simulated data in a manner consistent with the uncertainty in the measurements. The RMS (root-mean-squared error) value (as defined in Equation 13b) is equivalent to the standard deviation of the PE distribution, and therefore provides an intuitive measure of the total data misfit in terms of percent error, with no covariance term. Also, in contrast to the χ² value, the RMS value is independent of data weighting. Consequently, it is possible to have a χ² close to 1 but a very large RMS error if the covariances are large.

Some software packages assume the inversion has converged when the data misfits are within the limits specified by the data error, as outlined above. Other packages support Occam’s inversion or use of the L-curve approach. Yet other packages leave it to the user to decide when the inversion has converged, placing the burden of balancing the tradeoff between the model and data misfit on the scientist’s subjective judgment. In any case, all selections of inversion parameters should be recorded and reported. Some examples of under- and overfitting are provided in the case studies in Section 5.

License