Melding statistics with engineering?

Dan Lakeland writes:

I recently enrolled as a PhD student in a civil engineering program. My interest could be described as the application of data and risk analysis to engineering modelling, design methods, and decision making.

The field is pretty ripe, and infrastructure risk analysis is a common topic these days, but the simulations and statistical approaches taken so far have been a bit unsatisfactory. For example people studying the impact of bridge failures during earthquakes on the local economy might assume a constant cost per person-hour of delay throughout the rebuild period, or people might build statistical models of probability of building collapse, but I would call them pretty much prior distributions, not really based on much data, or based on a finite element computer model of the physics of a single model building.

I think the application of data to engineering is bizarrely a rather new field. Or at least in a renaissance. Back in the 50s or earlier they used to do lots of tests, and generate graphical nomographs of the results (Like the Moody chart for fluid flow friction factors), but these days the emphasis is on detailed finite element analyses, which tell you a exactly how some model will perform, but doesn’t deal at all with the difference between your model assumptions and reality.

I’m attaching an article that I’m reading for an earthquake soil mechanics class, which shows pretty much the state of the art of applications of (bayesian) statistics to engineering. A CPT test is a test where they push a cone on the end of a long rod into the ground and measure the pressure being applied to the cone as a function of depth. another paper I’ve read uses artificial neural networks to predict the shear capacity of reinforced concrete beams. Engineers typically don’t like ANN type approaches because they’re data oriented and don’t have explanatory power in terms of physics. On the other hand, the ANN model, because it’s based on data, is a much better fit to real performance than the existing physics based models.

I wonder if you might comment in your blog on melding statistics with engineering. especially how we can use data together with deterministic models, and build better engineering decision rules, both for everyday engineering, as well as for dealing with social investment decisions such as building code requirements for extreme events like earthquakes, hurricanes, and soforth.

What decision theory books or articles do you know of that might be useful and relevant to this field?

My reply

I’ve long thought of statistics as a branch of engineering rather than a science. To me, statistics is all about building tools to solve problems. On the other hand, departments of Operations Research and Industrial Engineering tend to focus on probability theory rather than applied statistics, so I think we need our own departments.

Getting to your specific question: yes, I know what you’re talking about. Back in high school and college I spent a few summers working in a lab programming finite element methods. Ultimately this was all statistical, but I didn’t see that at the time. I imagine there’s been a huge amount of work in this area in the past 25 years, with iterative methods for refining grid boxes and so forth. It would be a fun area to work in. But I suspect it would be an effort to translate it into statistical language.

It seems to me that engineers and physicists work very hard at solving particular problems, which are often big and difficult. Statisticians develop general tools for easy problems (e.g., logistic regression), which is a different sort of challenge. I think there’s great potential for putting these perspectives together but I’m not quite clear where to start. I’ve seen some articles in statistics journals addressing your concerns but I haven’t been so impressed by what I’ve seen there. Probably a better strategy is to start with the engineering literature and add uncertainty to that.

8 thoughts on “Melding statistics with engineering?

  1. The biggest difference I see between engineering and statistics is that engineering always assumes a model, and usually a known one at at that. "Model-free engineering" is almost an oxymoron. No RKHS stuff here!

    The second difference is that the mathematical models engineers build (in general) assume known parameters. If you have a solution to a problem in, e.g., IE/OR, the solution assumes known parameters, almost always.

    The desire for the closed form solution is strong in IE/OR. Computational statistics just doesn't fit. Parameters are real things, representing real characteristics of real systems. They are to be estimated, then plugged in as if known. Loss functions? If you don't know the parameter values, go ask the statisticians to find them out for you!

    It's a mindset thing, at the root of it.

  2. I know little to nothing about the reality of structural engineering, but James Beck at Caltech does a pretty good job in the field I think, at least as far as stochastic modeling goes (full disclosure, I took such a course from him this spring), developing methods for 'bayesian civil engineering', or whatever you want to call it. Studying dynamics (including the use of FE methods) under prior distributions on system parameters, and then ramping up Metropolis-Hastings/Gibbs (tempered) algorithms to study posterior distributions of desirable metrics. Googling him brings up his publications. It's certainly an interesting field…!

  3. There is a whole field of engineering called Inverse Problems that is concerned with estimating parameters given data and a model. I am most familiar with inverse heat transfer problems, where for example you are given a temperature history and have to determine thermal conductivity or some other parameter. Bayesian techniques are sometimes used.

    One of the pioneers of this field is James Beck (not the one mentioned above) now a professor emeritus at Michigan State.

  4. Engineers have a different approach to handling uncertainty. Often they don't need estimates so much as upper bounds, and so they're often very conservative. Some load is known from experience to be between 2 and 4 pounds, so let's call it 10 to be safe. It's a commendable approach, when you can get away with it, and often you can.

  5. Dan,

    You describe two approaches to these sorts of problems: use a complex deterministic model, or use some highly nonparametric regression to data. The former inadequately treats uncertainty and model error, the latter is susceptible to overfitting and can be opaque.

    One compromise approach comes out of the Gaussian process statistical emulation community. There, they base prediction on deterministic models whose input parameters are calibrated to data. They use a Gaussian process to interpolate the model output, assuming you're working with a small ensemble of runs from an expensive simulator. But in addition, they use a second "discrepancy" Gaussian process to try to estimate the form of the model's structural errors at its putative "best input" settings.

    This approach isn't purely deterministic and it's not purely statistical; it allows the deterministic model to inform prediction, but it treats the structural errors such as model biases as stochastic unknowns to be estimated from model misfit.

    The classic reference on this approach is Kennedy an d O'Hagan. (Be sure to read the <a>supplementary material for details. There is a more introductory piece just on the interpolation part here.) Some engineering examples may be found in David Higdon's work, e.g. here. See Bayarri et al. and Bastos and O'Hagan on validation of complex models. A paper which focuses just on estimating model discrepancy is Goldstein, House, and Rougier, in this case for climate models.

    (Much work in the emulation community has been in climate science, which shares in common with some engineering problems the existence of complex finite difference simulations which have significant structural errors, a limited ability to conduct controlled experimental tests which necessitates the use of such models, and a need to apply the output in decision-relevant ways.)

  6. Having a bachelor's in chemical engineering and currently in a PhD program in statistics, I was also interested in how to meld statistics and engineering. I work with engineers who are interested in inverse problems. Typically they run a black box (think nonlinear least squares) and get estimates for their parameters, but they never seem to take the next step and ask what the uncertainty is in those estimates. So I would agree with Andrew that a reasonable approach might be to delve into the engineering literature and add an understanding of uncertainty.

  7. The approach I have taken when I want to use Bayesian statistics on an engineering problem (which is not that may times since I'm just a Chem E undergrad) is to find a deterministic model, assume gaussian error and do inference on that. The other approach I have seen is, like Jarad said, nonlinear least squares (which corresponds to assuming gaussian error and finding the maximum likelihood). This is popular because it's super easy to do in Excel (and all engineers use Excel).

    Does anyone know of any Bayesian statistics books that adopt an engineering perspective? My search turned up zero.

  8. jsalvati,

    I don't know of any engineering oriented Bayesian texts either. I think the closest you will come are the ones oriented toward physical scientists, such as Sivia or d'Agostini.

Comments are closed.