On the challenges of modeling data-driven virtual flow meters

A cornerstone of ML applications is available data. Yet, in industrial applications the data may fail to suitably represent the underlying process behavior. We discuss some data challenges faced by data-driven VFMs.

Numerous companies are hunting for data-driven solutions that will revolutionize their industry. In the petroleum industry, two decades of steady improvement in instrumentation of petroleum assets and committed investments in digitalization projects to improve data collection systems have set the stage for machine learning. One of the hottest candidate applications is data-driven virtual flow meters (VFMs), where machine learning promises to reduce maintenance costs without expensing on accuracy.

This is the second article in a series about data-driven VFMs. Read the introductory post here.

We have several years of experience working with real production data from petroleum assets and in modeling data-driven VFMs. Solution Seeker is the first company that offers the market a commercial data-driven VFM. Our goal is less than 5% error, with 90% less effort. In our R&D work towards this goal, we have had to contend with a lot of data trouble. We have identified four prevalent challenges that degrade the performance of data-driven VFMs:

1. Low data volume

2. Low data variety

3. Poor data quality

4. Non-stationary process

In the following, we will take a deep-dive into these four challenges. Hopefully, this will shed some more light on why we don’t see more data-driven VFMs in operation - yet.

As many researchers working in the domain of AI and ML know, data-driven solutions, especially those based on high capacity models like neural networks, are data-hungry (Mishra and Datta-Gupta, 2018). They require a substantial amount of data volume. However, some petroleum assets do not have continuous sensor measurements of the multiphase flow rate, an important measurement for development of VFMs. Instead, new flow rate measurements are obtained during well-testing, at most 1-2 times per month (Monteiro et.al. 2020). These assets must establish years of production before a sufficient volume of data is acquired. This is unfortunate as VFMs are most useful for assets where the data volume is sparse, yet, development of data-driven VFMs requires a sufficient data volume before they are accurate enough for utilization.

Even for the assets that do have an appropriate data volume, the data variety is often inadequate. It is well known that many data-driven solutions extrapolate poorly, resulting in a struggle to make meaningful and acceptable predictions in previously unseen operating conditions. For a petroleum asset, data variety is largely decided by the operational practices of the operator. Operators are often concerned with maintaining stable production rates, and may not be aware that perturbing the system is beneficial to model learning. Take Figure 1 as an example. Here, the choke openings that are seen up to now are visualized for three of the wells Solution Seeker is working with. The X-axis shows the choke opening, from fully closed (0%) to fully open (100%). Notice, Well Z has data samples in almost all of the operating region, whereas Wells X and Y are lacking data above 40% choke opening. For a data-driven VFM that learns its behaviour only from patterns in data, how can we expect it to make good predictions in the unseen operating domain above 40% opening?

Improved well testing procedures can partially address the two first challenges. By performing well tests more frequently and prioritizing wells with uncertain predictions, the data volume can be increased for wells where it counts the most. The way testing is performed can also affect data variety. For example, ensuring sufficient test time and performing multi-rate well tests can increase the amount of information gathered from well testing. Solution Seeker offers applications for scheduling and optimizing well tests to improve data volume and variety.

The third challenge is related to the measurements that are already available. Sadly, it is not uncommon that these are noisy and biased, and may even fall out for long periods of time. Figure 2 illustrates the latter issue and shows the huge number of missing values from some of the data streams we work with at Solution Seeker. On the Y-axis we have 100 data streams, and the black markings indicates the samples where the measurement value is missing. Luckily, there are ways to handle poor quality data to a certain extent, such as preprocessing and data reconciliation. Here at Solution Seeker, we have a patented and proprietary data squashing algorithm (Grimstad et.al. 2016) that automatically handles some of these issues as data comes in in real-time.

Now, imagine that you have a dataset that does have an appropriate data volume and variety, and that you have done everything in your power to reduce poor quality data. You will then get a slap in the face because there is no way of escaping challenge four. Unfortunately, the available production data is generated by a non-stationary process due to the reservoir emptying. The process conditions may be stationary in short periods of time, however, in time with drainage of the reservoir, the characteristics of the process will slowly change. In practice, this means that we will always be in a situation where the models will have to extrapolate to previously unseen process conditions. This is illustrated in Figure 3. Shown is the choke opening (%) and pressure in the wellhead (bar). Pressure in the wellhead decreases while the choke opening is increasing, pushing the process into previously unseen process conditions. Therefore, in time, VFMs will have to be recalibrated. How often will naturally depend on the process.

In a recent article (Grimstad et.al. 2021) by Solution Seeker’s research lab, we developed data-driven VFMs for 60 petroleum wells across 5 assets. We did this with the traditional or “vanilla” approach. The results were diverse. For some of the well models we achieved an excellent performance with an average test error as low as 4%, whereas for others, the error was well above 20%. What this underlines is that robustness is a key issue, and cherry-picking and showcasing results from the best developed models might be misleading.

We believe that the vanilla approach to data-driven VFMs is not robust enough to be rolled-out and scaled in operations. Our findings motivate approaches that utilize all available knowledge to a greater extent. For instance, probabilistic modeling, gray-box modeling and learning across wells. We believe that this is a necessity to provide high accuracy, easily maintained virtual flow meters. Keep an eye out for the remaining articles in this series, where we will deep-dive into these results and discuss the above-mentioned alternative approaches to VFM modeling.


  • B. Grimstad, V. Gunnerud, A. Sandnes, S. Shamlou, I. S. Skrondal, V. Uglane, S. Ursin-Holm, B. Foss, A Simple Data-Driven Approach to Production Estimation and Optimization, in: SPE Intelligent Energy International Conference and Exhibition, Society of Petroleum Engineers, 2016.
  • Grimstad, B., Hotvedt, M., Sandnes, A.T., Kolbjørnsen, O., Imsland, L.S., 2021. Bayesian neural networks for virtual flow metering: An empirical study. arXiv:2102.01391
  • S. Mishra, A. Datta-Gupta, Applied Statistical Modeling and Data Analytics - A Practical Guide for the Petroleum Geosciences, Elsevier, 2018.
  • D. D. Monteiro, M. M. Duque, G. S. Chaves, V. M. F. Filho, J. S. Baioco, Using data analytics to quantify the impact of production test uncertainty on oil flow rate forecast, IFP Energies Nouvelles 75 (2020).