Here is an overview of the way I look at the statistical analysis problem.
Material in slides from the presentations "Introduction to Statistical Methods for Understanding Prediction Uncertainty in Simulation Models" and "Sensitivity Analysis When Model Outputs Are Functions" is drawn from the papers in the section, below, on methodology.
Input values are a source of uncertainty for model predictions. When input uncertainty is characterized by a probability distribution, prediction uncertainty is characterized by the induced prediction distribution. Comparison of a model predictor based on a subset of model inputs to the full model predictor leads to a natural decomposition of the prediction variance and the correlation ratio as a measure of importance. Because the variance decomposition does not depend on assumptions about the form of the relation between inputs and output, the analysis can be called nonparametric. Variance components can be estimated through designed computer experiments.
When outputs of computational models are time series or functions of other continuous variables like distance, angle, etc., it can be that primary interest is in the general pattern or structure of the curve. In these cases, model sensitivity and uncertainty analysis focuses on the effect of model input choices and uncertainties in the overall shapes of such curves. We explore methods for characterizing a set of functions generated by a series of model runs for the purpose of exploring relationships between these functions and the model inputs.
Technical research report: "Evaluating Prediction Uncertainty" by M.D. McKay in (1995) Technical Report NUREG/CR-6311, US Nuclear Regulatory Commission and Los Alamos National Laboratory.
The probability distribution of a model prediction is presented as a proper basis for evaluating the uncertainty in a model prediction that arises from uncertainty in input values. Determination of important model inputs and subsets of inputs is made through comparison of the prediction distribution with conditional prediction probability distributions. Replicated Latin hypercube sampling and variance ratios are used in estimation of the distributions and in construction of importance indicators. The assumption of a linear relation between model output and inputs is not necessary for the indicators to be effective. A sequential methodology which includes an independent validation step is applied in two analysis applications to select subsets of input variables which are the dominant causes of uncertainty in the model predictions. Comparison with results from methods which assume linearity shows how those methods may fail. Finally, suggestions for treating structural uncertainty for submodels are presented.
This paper examines feasibility and value of using nonparametric variance-based methods to supplement parametric regression methods for uncertainty analysis of computer models. It shows from theoretical considerations how usual linear regression methods are a particular case within the general framework of variance-based methods. Examples of strengths and weaknesses of the methods are demonstrated analytically and numerically in an example. The paper shows that relaxation of linearity assumptions in nonparametric variance-based methods comes at the cost of additional computer runs.
Sample sizes affect identification of important inputs for computer models. For illustrative purposes, a partial differential equations model with 84 input variables is used to investigate the behavior of R2 as an importance indicator for various sample sizes and designs.
Latin hypercube sampling (LHS): "A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code" by M.D. McKay, W.J. Conover and R.J. Beckman in (1979) Technometrics, 21, 239-245 and again in (2000) Technometrics, 42, 55-61.
Two types of sampling plans are examined as alternatives to simple random sampling in Monte Carlo studies. These plans are shown to be improvements over simple random sampling with respect to variance for a class of estimators which includes the sample mean and the empirical distribution function.
Effects of assumed distributions: "Monte Carlo Estimation Under Different Distributions Using the Same Sample" by R.J. Beckman and M.D. McKay in (1987) Technometrics, 29, 153-160.
Two methods are presented for reducing the computer time necessary to investigate changes to the probability distribution of random inputs to large simulation computer codes. The first method produces unbiased estimators of functions of the output variable under the new distribution of inputs. The second method generates a subset of the original outputs that has a distribution corresponding to the new distribution of inputs. Efficiencies of the two methods are examined.
The importance of individual inputs of a computer model is sometimes assessed using indices that reflect the amount of output variation that can be attributed to random variation in each input. We review two such indices, and consider input sampling plans that support estimation of one of them, the Variance of Conditional Expectation or VCE (McKay 1995). Sampling plans suggested by Sobol’, Saltelli, and McKay, are examined and compared to a new sampling plan based on Balanced Incomplete Block Designs. The new design offers better sampling efficiency for the VCE than those of Sobol’ and Saltelli, and supports unbiased estimation of the index associated with each input.
We consider a class of input sampling plans called permuted column sampling plans, that are popular in sensitivity analysis of computer models. Permuted column plans, including replicated Latin hypercube sampling (McKay 1995), support estimation of first-order sensitivity coefficients (e.g., Saltelli et al. 2000), but these estimates are biased when the usual practice of random column permutation is used to construct the sampling arrays. Deterministic column permutations may be used to eliminate this estimation bias. We prove that any permuted column sampling plan that eliminates estimation bias, using the smallest possible number of runs in each array, and containing the largest possible number of arrays, can be characterized by an orthogonal array of strength 2. Approximate standard errors of the first-order sensitivity indices are derived for this sampling plan. Two examples are given demonstrating the sampling plan, behavior of the estimates and standard errors, and comparative results based on other approaches.
A unified framework for quantifying calibrated-prediction uncertainty: "Combining Experimental Data and Computer Simulations, with an Application to Flyer Plate Experiments" by B. Williams, D. Higdon, J. Gattiker, L. Moore, M. McKay and S. Keller-McNulty in (2006) Bayesian Analysis, 1, 765-792
A flyer plate experiment involves forcing a plane shock wave through stationary test samples of material and measuring the free surface velocity of the target as a function of time. These experiments are conducted to learn about the behavior of materials subjected to high strain rate environments. Computer simulations of flyer plate experiments are conducted with (two-dimensional) hydrodynamics codes, which incorporate physical models that contain parameters having uncertain values. The objectives of the analyses presented in this paper are to assess the sensitivity of the predicted free surface velocity to variations in the uncertain inputs, to constrain the values of these inputs to be consistent with experiment, and to predict free surface velocity based on the constrained inputs. The Bayesian approach taken combines detailed physics simulations with experimental data for the desired statistical inference. The approach allows for:
The resulting analysis accomplishes the objectives within a unified framework.
Please send comments, suggestions and questions about this website to Michael D. McKay.
Last modified 12 December 2010