Can we trust the data-driven VFMs?

A data-driven VFM presents an estimate of the flow rate, but how certain is really the model about this value? In a study performed on data-driven VFM modeling, we spent some time digging into calibration plots to assess the quality of the models.

This is the sixth article in a series about data-driven VFMs. As discussed in a previous post, calibration plots may be used to analyze probabilistic models. In particular, estimated accuracies may be visually compared to actual accuracies for a given set of predictions.

Recall that we were conducting a study on data-driven VFMs. We had built four separate rate models for each of the 60 wells included, estimating how much every well is producing, one model using maximum a posteriori estimation (MAP) and three using variational inference (VI). One advantage of using variational inference is the ability to quantify the prediction uncertainty. This post is a deep-dive into the VI models, which differ only in how noise is treated: fixed homoscedastic noise, learned homoscedastic noise, and learned heteroscedastic noise.

Why does the prediction uncertainty matter? If we want to make a decision based on some insight or knowledge, whether it is based on sensor data or predictive models, it is useful to have some idea of the degree to which the insight may be trusted.

Say you are going for a holiday at some place where you know the weather forecast to be highly uncertain. Even if the forecast indicates that the weather will be sunny, you may be advised to have a back-up plan in case the forecast turns out to be wrong. On the other hand, if you know that the forecast is usually very accurate, you may be inclined to fully trust it. Your decision as to whether to pack for unexpected rainy days depends on the accuracy you expect of the forecast.

Here is another use-case example. We have a field where one of the wells has a high risk of gas hydrates forming when it produces at a too low gas rate. A gas hydrate is like an ice-shaped structure, forming under certain pressure and temperature conditions, if water and gas are present. The hydrates may build up on the inside of the pipeline, leading to a blockage of the pipe. Removal of such plugs can be difficult, as depressurizing may happen to only one of the sides. Say you are worried that hydrates may form if you are producing at a rate less than 1 million Sm3/d. If your model estimates that you are above this level, but there is a possibility of 20% that you might be below your threshold, it is helpful to know how certain your model is about these 20%. Is the model underestimating, overestimating or quite accurate in its estimates of the model uncertainty? If the model is always overestimating the uncertainty, the likelihood of being below the threshold is lower. However, if it is the opposite way and the model is underestimating, then there is a greater risk than 20% of the flow rate being below the threshold. Being aware of the uncertainty of your model, and how confident you can rely on the estimates, enables you to take more accurate actions regarding flow assurance in your production system.

Due to the non-stationary behaviour of the reservoir, data-driven well flow rate models may potentially have large prediction errors, especially for future data. It is therefore desirable that the model can assess its performance. We find it informative to generate calibration plots for each of the three noise models that are included in this study, to broaden our understanding and confidence in the uncertainty estimates.

How a model is calibrated, may vary across wells. To visualize the variance in model calibration, we have illustrated the (point-wise) 25th and 75th percentiles of the calibration curves obtained across wells. Figure 1 holds calibration plots for all the 3 models, where the first row represents the model using the fixed homoscedastic noise, the second row visualizes the learned homoscedastic noise, and the last row is the model with learned heteroscedastic noise.

From Figure 1 we can see that on historical data, the models trained on test separator measurements seem to be best calibrated in terms of uncertainty. The models trained on MPFM measurements overestimate the uncertainty in their predictions. On future data, the results are reversed. The models trained on MPFM measurements are better calibrated and the models trained on test separator measurements all underestimate the prediction uncertainty. In other words, these last models are too cocky about their own performance.

Another finding that we observed by investigating the plots in Figure 1 is that learning the noise model seems to improve the calibration. The plots give us some confidence in the uncertainty estimates for the learned noise models, especially for the historical cases. The calibration curves for models trained on MPFM data generally lie above the curves for models trained on test separator data, both for historical and future predictions. This means that the models trained on MPFM measurements are less confident in their predictions, even though they are trained on more data, and have a lower prediction error, as seen in a previous post. We suspected that models trained on MPFM data would reflect the increased uncertainty present in these measurements, but this is difficult to observe from the results. It is worth noting that the MPFM models are tested on MPFM data, so any systematic errors present in the MPFM measurements themselves will not be detected.

Neither the homoscedastic nor the heteroscedastic noise models can capture complex noise profiles that depend on flow conditions. As most MPFM are specialized to accurately measure flow rates for certain compositions and flow regimes, this is a potential drawback of the models.

The BNN approach is promising due to its ability to provide uncertainty estimates. However, it is challenging to obtain well-calibrated models due to the difficulty of setting meaningful priors on neural network weights, and the fact that priors play a significant role in small data regimes. As a result, the uncertainty estimates provided by the BNNs should be used with caution.

Finally, don’t become too disappointed by the results from this article if you are dreaming of exploring data-driven VFMs for your wells. As seen through the posts in this article series, modeling well flow rates real time with sufficiently low errors is a challenging task, but we have some suggestions for boosting the performance of the models! Could a hybrid approach combining machine learning techniques, mechanistic models and some human insight improve the models? How about increasing the data volume by modeling the dynamic behaviour of a well learning from other wells using a multi-task learning model? Stay tuned for coming updates and some very promising results we have on data-driven VFM.


  • T. A. AL-Qutami, R. Ibrahim, I. Ismail, Virtual multiphase flow metering using diverse neural network ensemble and adaptive simulated annealing, Expert Systems With Applications 93 (2018) 72–85. doi:10.1016/j.eswa.2017.10.014.
  • Grimstad, B., Hotvedt, M., Sandnes, A.T., Kolbjørnsen, O., Imsland, L.S., 2021. Bayesian neural networks for virtual flow metering: An empirical study. arXiv:2102.01391

Read our previous articles on data-driven VFMs here.