By empowering engineers to reproduce detailed natural processes, computer simulation is transforming the design, analysis, and manufacture of industrial engineering practices. Despite its significant success, one persistent question bothers both analysts and decision-makers:

*how good are these simulations exactly?*

**Uncertainty quantification**, which stands at the confluence of probability, statistics, computational mathematics, and disciplinary sciences, provides a promising framework to answer that question and has gathered tremendous momentum in recent years. In this article, we will discuss the following aspects of uncertainty quantification:

- the motivation: where are the uncertainties coming from and why they are important?
- the solution: V&V and IPAC…

For complex systems such as airplanes, railways, power plants, maintenance is a big issue as it ensures the systems’ reliability and safety during their life cycles.

By tapping the power of advanced sensor capability, IoT technology, and data analytics algorithms, maintenance in the era of Industry 4.0 has experienced a rapid shift from “reactive” to “proactive”: instead of performing maintenance only when failure has already occurred, the state-of-the-art strategy is to actively anticipate system degradation and schedule maintenance “just-in-time.” This new type of maintenance is known as **predictive maintenance **(PdM).

In practice, PdM is typically achieved by first using sensors…

**Gaussian Process** (GP) is a powerful supervised machine learning method that is largely used in regression settings. This method is desirable in practice since:

- it performs quite well in small data regime;
- it is highly interpretable;
- it automatically estimates the prediction uncertainty.

This last point is what sets GP apart from many other machine learning techniques: for a GP model, its prediction *f*(*x*) at a location *x* is not a deterministic value, but rather a random variable following a normal distribution, i.e., *f*(*x*) ~ *N*(*μ*(*x*), *σ*²(*x*)). …

Making reliable model-based predictions is not always an easy task.

In generic form, we have a model *f*(.) with some model parameter ** θ**. This model should simulate some real-life processes. Then, given input

In regression analysis, labeling the training samples is usually time-consuming and takes a large portion of the computational budget.

In our engineering team, we face this kind of challenge all the time.

Our task is to design aero-engine components. We constantly need to train regression models to predict product performance, given the design parameters. For us, labeling a training sample requires conducting high-fidelity physics simulations, which could easily take up to days, even weeks, to run on a cluster.

Obviously, if the model needs many samples to reach satisfactory accuracy, the resulting computational burden would be a nightmare.

Luckily, we…

Animations are great data visualization tools to convey complicated insights engagingly. In this article, we will walk through the steps of creating an animation with `Matplotlib`

and `Celluloid`

.

This is what we will make: it simulates various projectile motion trajectories and updates the associated histogram of the projectile shooting range.

This tutorial goes as follows. After introducing the animation packages and the employed data, we dive straight into creating the animation. We go from simple to complex: to start, we animate a single trajectory, followed by animating multiple trajectories. …

For data scientists, effectively communicating the uncertainty of our analysis to stakeholders is crucial for reliable decision-making.

I’ve been working on uncertainty quantification analysis for some time. From numerous presentations I made for various audiences, I learned that despite the usual uncertainty visualization techniques such as box plots, violin plots, confidence bands, etc., are compact and precise in displaying uncertainty, they may only resonate among trained statisticians.

For broader audiences, including stakeholders, domain experts, etc., …

Want to deliver faster optimization for time-consuming objective functions? Consider surrogate optimization.

In engineering, a product optimization analysis involves finding the design parameter combination such that the *objective function*, e.g.*, *product performance, manufacturing costs, etc., is maximized/minimized globally across the design parameter space.

The optimization routine requires many iterations to locate the global optimum. Within each iteration, high-fidelity, time-consuming computer simulations are usually conducted to evaluate the current design's objective function. Consequently, the whole optimization process could easily take up to days or even weeks if the objective function is complex.

In our engineering team, we have implemented an optimization…

In this post, we will review some of the most useful probability concepts applied to **multiple random variables**. In particular, I will show you how those concepts are logically connected.

**An Intuition for each concept is built first before discussing the math.** That’s why I’ve created many illustrations to facilitate visually explaining abstract concepts. I’ve also included links for related topics this post didn’t cover. **A linkable table of content **is provided below, in case you want to jump into specific topics directly. I hope those efforts could offer you a better reading experience.

If you want to refresh your…

In this post, we will review some of the most useful **probability concepts** and important tools for **exploratory data analysis**. The table of content is given in the mind map above. Throughout the post, I have created many illustrations to explain abstract concepts visually. Meanwhile, I’ve used a dataset to show how the probability concepts are applied in practice. I hope those efforts could offer you a better reading experience.

This post is all about understanding the underlying probability theories. …