Data-driven at core, now sprinkled with physics

Neural networks, multi-task learning, generative models, 7 million (!) data points. Safe to say, we're fully committed to the data-driven philosophy. But we're also maturing how physics can fit into this. Why ignore hundreds of years of wisdom?

Author: Solution Seeker
Publish date: Mar 12, 2025 · 4 min read

News Modeling Hybrid AI Upskilling Simulation

In all seriousness though, our excellent research team has been dedicated in their efforts to develop an in-house physics-based simulator, called ManyWells. The fact that we can now artificially generate an abundance of physics-compliant wells and scenarios signifies a great leap for us. It opens a lot of doors, granting us the opportunity to apply additional synthetic data on top of the true production data our clients have given us access to.

«But we are not here to gatekeep! True to our strong commitment towards research and innovation, we're sharing our simulator and synthetic data with the community.»

The physics of a well

A well's typical behaviour has been studied by physicists for a while. In addition to established first principles equations, several empirical ones govern our expectations as to how a well should behave. The well's properties are obviously important, such as depth, fluid densities, etc. However, this doesn't mean that a well is going to do exactly what these equations predict. There are many simplifying and idealistic assumptions in even the most complex of equation sets and simulators. The true behaviour of a well is extremely difficult to capture, often due to the multiphase flow (oil, gas and water) that is streaming through the pipes.

Yet, with time, the governing equations have grown to include a great deal of scenarios, for example different flow regimes. A simulator is perhaps never going to capture all physical phenomena perfectly, but in very many cases it's simulation results are better proxies than blind assumptions, especially when the historical production data isn't representative for the current phenomena.

But we are not here to gatekeep! True to our strong commitment towards research and innovation, we're sharing our simulator and synthetic data with the community. With the exceptions of the 3W dataset by Petrobras and the MRST reservoir simulator by Sintef, there are almost no public resources in this space. We're proud that we can help change that!

ManyWells' repositories

GitHub repo: For the simulator code
Link to GitHub repo
HuggingFace repo: For the simulated datasets
Link to HuggingFace repo

Now, if we take a sneak peak through those famous open doors, there are quite a few ways ManyWells can guide us in the right direction. The most intended ones are listed below.

Hybrid machine learning

Hybrid AI has been the talk of the town for a while, and it simply states that you're combining physics and a data-driven methodology to compensate for each other's weaknesses and build on each other's strengths. It's no secret that the main pillar of machine learning is that what you've seen (or trained on) before should be similar to what you experience now. In general, a well system lacks this property due to the non-stationarity of the underlying reservoir. As the reservoir changes unpredictably, the well changes accordingly. Machine learning models often have the inherent flaw that they exhibit a significantly decreased predictive performance when applied on data that doesn't resemble the past. Previously, we've explained how multi-task learning can drastically improve the predictive performance for virtual flow metering. Though it doesn't have to stop there, does it?

For example, we can now generate synthetic data and practice a hybrid training routine where our data-driven virtual flow meter (VFM) trains on both simulated and true production data. Our VFM could then see far more scenarios than before, ensuring a more physics-compliant and robust behaviour.

Model evaluation & algorithmic quality assurance

When developing an algorithm or model, especially in the oil & gas industry, you're not always blessed with a whole lot of representative data to actually test it on. This means that the evaluation phase becomes a tedious and long-lasting trial-and-error period without structure and clear results, often with the client using their gut feeling as the target. Although their gut feeling can be quite correct in many cases, nobody is immune to biases and one can hardly say that this qualifies for objective, quantitative performance reports.

With a simulator, we can create test data to quality assure an algorithm before we roll it out. We can use it to understand what "should" happen in certain scenarios to 1) enhance our understanding of our client's true production data, 2) evaluate how well our models or algorithms respond to the data and 3) adjust accordingly.

Demofyable applications

Due to the confidential nature of our clients' data, it's seldomly straightforward for us to give a demo of our products. Certain features, even whole applications, are very dependent on the specific traits of a client's data and that the data correlations are realistic. Hence, you can't just scramble the data to anonymize it. In a demo, where the point is to demonstrate the application's value, it's underwhelming and undermining if you can't tell the story with the data backing it.

The simulator is then a crucial step towards being able to demonstrate an application with realistic data, without our clients knocking on our door with a subpoena.

Publications

Manywells: Simulation of Multiphase Flow in Thousands of Wells: For the journal paper
Link to ManyWells' paper

People

People in all locations

Bjarne Grimstad

Kristoffer Nesland

Erlend Lundby

Kristian Løvland