Two years from its emergence, the EAGE A.I. community is going strong, led by a multidisciplinary team of geoscientists that have found their way to data science. Next to a brand new advice segment that the Committee plans on releasing periodically throughout 2022 and a new dedicated session at the EAGE Annual in June, they have also sat (virtually) with a group of experts to reflect on the recent changes brought in by artificial intelligence and the role of geoscientists in the digital transformation.
In the first part of this interview, published last month, we explored AI and machine learning in general. In this second installation we will go deeper into how AI/ML can become part of your workflow, what their impact will be in the long term and whether AI is a realm for data scientists exclusively.
To find all resources and opportunities offered by EAGE in the field of digitalization, visit our new Digitalization Hub.
Explore the EAGE Digitalization Hub
How and where do you use applications of AI in your work?
Felix Herrmann: My group is mainly involved in the development of algorithms for computational seismic and now also medical imaging. This includes algorithms for seismic data processing, wave-based inversion, and more recently also in seismic monitoring of Carbon Capture Sequestration. Especially, these new areas call for a scalable framework that allows for a systematic assessment of uncertainty. The fact that neural networks can be trained to generate samples from unknown high-dimensional statistical distributions really opens completely new and more feasible ways to meet this challenging task of “putting error bars on what we do”.
Cedric John: I use it pretty much in every new PhD and postdoc project now. My group specializes in the application of data science / machine learning / deep-learning to geological problems. We are currently working on core images, thin-section images, FMS logs, conventional logs, seismic data, and satellite imagery. We use machine learning algorithms (random forest, SVM) and deep learning networks (CNN, RNN, and GANs) to classify geological material, obtain inferred values through regression, or generate pseudo-core images.
Ashley Russell: As a leader I am not doing so much application myself, but certainly help to shape the technical data science portfolio in the subsurface in Equinor. Many are aware that we as Equinor release a lot of open source python packages that allude to the work that we do. We have active projects in image classification, imputation of missing data, sentiment analysis on unstructured data, anomaly detection, cluster analysis, automated interpretation, and non-simulation based production prediction. We also have large projects dedicated to cloud data engineering specifically and facilitating the more descriptive analytics capabilities that can be done ad hoc out in the specific assets. So we also see a lot of dashboarding.
Ali Karagul: I am still at early stages of learning.
Oleg Ovcharenko: Up to this point I was doing my PhD studies focused primarily on developing deep learning methods for low-frequency extrapolation and initial model building for FWI initialization. Aside from that, at a shallow level I explored applications for denoising and interpolation of seismic data, style transfer from geological priors, surface-related multiple elimination and moment tensor inversion. Our most recent efforts were dedicated to developing domain adaptation techniques for seismic data.
Ruslan Miftakhov: My work includes working with software developers, architects, data scientists, etc., reviewing numerous AI publications, and conducting business/technical/brainstorming meetings. There are not many AI tools available for this work.
Where is AI adding measurable value in your area?
Ali Karagul: Analysing big volumes of data.
Oleg Ovcharenko: For academia, the number of AI-related publications is one obvious measurable quantity. It feels like a breath of fresh air since the number of publications on the topic has grown exponentially over the last 5 years. These days almost every research group conducts research on the edge of AI and geosciences.
What will the impact of machine learning be on our industry?
Nathanial Jones: I think machine learning will take its place alongside other technologies that have come along over the years. When my dad started as a geologist, they still correlated with colored pencils, but now all of that is done with software and it definitely speeds up the process of making maps and measuring volumes in place. Machine learning will continue to make geoscientists more productive and make it easier to quickly generate new or modified interpretations and maps of assets. I think machine learning techniques like NLP will also help speed up searches of through large amounts of text and information in the future. Geoscientist’s work will focus less on tasks that machine learning can speed up like describing cores or correlating wells and more on the high level modelling and development of assets.
Ali Karagul: Optimisation of our manual tasks
What criteria should we use to ascertain the level of success in AI solutions?
Olawale Ibrahim: Speed and accuracy. Evaluation of AI solutions from a geoscientific standpoint should be used in ascertaining the level of success; AI solutions should be closely similar to that of experienced geologists.
Ali Karagul: Their help in advancing on our understanding of interconnectivity among different fields.
What particular challenges do you see for machine learning techniques in geoscience?
Olawale Ibrahim: Availability of large sets of labelled datasets for supervised learning.
Nathanial Jones: Subsurface datasets are generally small and expensive. Cutting edge techniques often need huge amounts of data to train. Due to this, geoscience data scientists need to think less about regression and more about probability and causation in the models. Sometimes how the data is represented and its level of certainty is more important than the algorithm or techniques you choose to regress on. Remember that bias and uncertainty are the main pieces of information you need to find out about an investment in the subsurface not a single number like reserves or initial oil production.
Felix Herrmann: Aside from having access to training data, I consider the need for extensive time & resource consuming parameter tuning as a major challenge especially when scaling to 3D problems. Large Machine Learning models, such as the recently introduced Transformers, require resources that are like those called for by industry-scale 3D full-waveform inversion and reverse-time migration. This demand for data and resources makes it very difficult for academia to participate in this field. This calls for engagement in public-private partnerships involving traditional industrial partners, cloud providers, and in case of Carbon Sequestration regulators. It is nice to see that at least lots of seismic datasets including shot data are put online by national regulators. So, overall, I am optimistic that important industrial scale problems can and will be solved by Machine Learning and this will include solutions that were unthinkable in the past including highly optimized hands-off seismic workflows that allow for a systematic handling of uncertainty.
Ashley Russell: Within the geosciences there is a lot of unlearning that has to happen for ML to flourish – and it isn’t just on the technical level, but also in management where the expectations of geoscience work is set and the quality checking happens. Moving away from deterministic workflows to probability and uncertainty based ones – where data traceability and lineage come to the forefront.
Ruslan Miftakhov: Everything in AI goes along with available data and computational resources. Open data is a real problem within oil and gas; we don’t have enough labeled data to experiment with and build better solutions. There are also some barriers to collaboration between data providers (oil and gas companies) and software companies.
Where do you see machine learning in the geosciences right now – at the peak of inflated expectations, trough of disillusionment, slope of enlightenment, or the plateau of productivity?
Felix Herrmann: As with many fundamental breakthroughs, it takes a while until the original innovation makes it into daily practice. I have seen this firsthand with the field of Compressive Sensing, a new sampling paradigm. It took more than 10 years to convince the industry of the merits of this technology, which according to reports has resulted in 10X increases in productivity in seismic data acquisition. Over the same period, I have seen an increase in technical expertise in industry explaining the more rapid and successful adaptation of Machine Learning in applied areas including the geosciences. However, translating successes from one to other application areas takes time and a good understanding of the domain. Herein lies a major challenge. To loosely quote an ex-colleague of mine Eldad Haber “it takes little time for a baby to recognize his/her parents. Geologists, on the other hand, can argue for decades over the interpretation of a seismic section”. So we are absolutely not at a peak. Sure, there will be disappointments, but in the end Machine Learning will play an essential role in our field. However, to get there, we will need to rely on more collaboration, openness, and public-private partnerships designed to foster rapid innovation.
Ashley Russell: I firmly believe we are in the trough of disillusionment, perhaps heading toward the slope of enlightenment! I believe we as a geoscience community know that AI has to be used, otherwise we will cease to exist in the modern world. But, we are quickly realizing the challenges and investment to survive in an AI-based world are massive for our workflows, software, and data in the current geoscience industries. There is almost a feeling I would say to try to “wipe the slate clean” because the challenges are so mounting – especially as we move toward carbon capture and sequestration and shallow subsurface for renewable installations.
Ali Karagul: Slope of enlightenment
What is your vision of geoscience work in 2050? Will machine learning and AI play a central role? If yes, how?
Ali Karagul: Smooth and powerful integration of decision making steps.
Is AI just for data scientists?
Olawale Ibrahim: AI is not just for data scientists. AI is applicable in virtually all fields. AI could also be applied in fields that do not require large datasets for use.
Nathanial Jones: No, the tools that are being developed are going to be used in all aspects of our lives and it is important to our careers and even to being informed citizens to be aware of them and how they can be used effectively (or abused!). Data science is being used to recommend movies, train baseball players, determine prison sentances for people convicted of crimes, for social credit schemes, and more. We need more people aware of how these systems work and how they can be done unethically.
Felix Herrmann: Absolutely not, domain knowledge is essential if one wants to be successful in applying Machine Learning to problems in the geosciences. I see this with my students who are asked to be “jacks of all trades”. It takes understanding of mathematical, statistical, and computational techniques in combination with a fundamental understanding of problems in the geosciences to be successful. Given the wide variety of topics this is no sinecure but a lot of fun since doing it this way will allow you to work on real problems that matter.
Cedric John: Yes and no. Yes, because I believe that to train models well and use them well you need an understanding of data science, statistics, and coding. Everyone can become a data scientist, but you need to invest the time needed. And no, because if you are willing to trust the data scientist who trained your network then applying it does not require data science skills.
Ashley Russell: No, most of us know that AI pretty much runs our modern life and is slowly creeping into our geoscience and engineering jobs. Not all of us are going to be building models or putting MLOps in place on the cloud, but we all need to be aware how “the algorithm” works – what are the limitations, what is the training data and how does that affect the output? How do you, as a person who creates data; creates interpretation, contribute to that training data? I think anyone who touches data or creates it is going to have to learn how both the data engineering side and the AI side of data science interact with it – and change the way they work to support these automation processes.
Ali Karagul: Not really. Anyone interested in scientific analysis can make use of it.
Oleg Ovcharenko: I think that anyone with a science background is capable of understanding high-level concepts of data science. The “AI” is an umbrella-name for a variety of methods applicable pretty much in any discipline. This is an indivisible part of the data-driven trend where all industries adopt AI methods for their needs.
Ruslan Miftakhov: That’s not true. Because on a high-level implementation layer, you can put your thought into the process and get the intended results without excellent knowledge of the internals. As a car driver, to be a good driver is not necessary to know how an internal combustion engine works. Understanding how to operate on a high level is all you need.
What challenges are being encountered in the adoption of AI in your work?
Olawale Ibrahim: A major challenge is the little availability of labelled data especially for training supervised learning models and evaluating unsupervised workflows. Little availability of open data and open source solutions in the geoscience industry as compared to other industries. Openness fosters contributions and consequently, more innovation and discoveries.
Nathanial Jones: Adoption challenges abound in this space. All too often the emphasis is on new fancy modelling techniques (CNNs, NLP, etc), tools, consulting firms, and more, but in reality most of the value is in the data and how it is collected, stored, and cleaned for use. It’s boring, it’s hard, and it can be expensive if you lots of legacy data that hasn’t been worked on and cleaned up. The reality is that the most of the work and value is in working the data until it is easy to find, easy to use, consistent, and trustworthy enough. That type of work isn’t sexy, it doesn’t present well to high level executives, and there few flashy start-ups going around talking about the wonders of data quality work. However, it is the foundation of everything in this space and the foundation of doing data science at scale in an organization. Instead we want to ask some smart speaker to optimize our field or describe a core for us, but, if the data isn’t ready for that, it’s garbage in – garbage out, no fancy new CNN model can change that fundamental reality. A strong focus on the data lifts everything, both traditional data analysis and modelling techniques and newer ones. The adoption challenge is that the traditional techniques in an organization often grew and became entrenched in chaotic data environments, but newer innovations and tools just don‘t survive in those environments.
Felix Herrmann: While the initial results of Machine Learning to seismic data processing and imaging were encouraging many challenges remain. These include lack of access to training data – we do not know what the Earth subsurface looks like after all – and access to compute for training. Even relatively small “post stamp” problems call for considerable computational resources to which we have limited access in academia. Not unlike full-waveform inversion and reverse-time migration in 3D, training of neural nets requires extensive parameter tweaking, which quickly becomes a major challenge when dealing with 3D problems.
Cedric John: Taking my field to mean academic geosciences, I think that there are two challenges impacting us. The first one is training: it is hard to recruit a PhD student or a postdoc who would have both sound geological understanding and good coding skills with a data science mindset. The second problem is simply availability of data. It is not always easy to obtain good quality data, and it is almost never the case that one can access labeled data.
Ashley Russell: There are three major challenges. 1. Getting the necessary funding to work on getting data into the right cloud-native structures with aligned metadata (which means new ways of metadata governance!). There is no inherent value in simply reengineering subsurface data, but it takes a ton of work thanks to the past 10 decades of subsurface data in formats not meant for anyone but oil and gas software and humans. We can of course use the data in a variety of cases with good value propositions, but it is a paradox for much of management why the investment to do the data work is so high. OSDU doesn’t build itself. 2. Understanding of the right cloud technology to engineer data. Traditionally IT has been someone you send a ticket to and they fix something on your computer. Now, we see IT and software development as its own scientific discipline but we have a lot of learning to do on how to use the cloud right alongside of us subsurface people. 3. Fear of automation/change and resulting dissociation/uninterest to learning or using analytics/AI or that analytics and AI is only for researchers. Combating misconceptions is perhaps a greater challenge than the technical ones!
Ali Karagul: Understanding different levels of complexity in data conditioning
Oleg Ovcharenko: Enabling the knowledge transfer between training and application domains is challenging. Some tasks, such as frequency bandwidth extrapolation by deep learning, assume rare or mostly unavailable real-world target low-frequency data. This limits supervised learning approaches to training on synthetic data which exists in the refined world of computer simulations. The challenge then converges to bridging the gap between training on synthetic data and application on field data. To enable this transfer one should either put effort into generation of very realistic synthetic by using available a priori knowledge, or to look towards domain adaptation. The former might be applied as a data pre-processing step or to be built into the training workflow and network architecture.
Ruslan Miftakhov: I am talking from the perspective of a person who develops AI solutions and tries to get them adopted by professionals in their workflows. The most prominent questions/problems arise from exaggerated expectations and mixed feelings about the black-box (or, better say, data-driven) nature. The black box doesn’t mean a wrong solution; it should mean that we need to experiment and find other ways to establish its validity. Otherwise, we will reject any development from this field that would hinder innovation and growth. Moreover, some professionals don’t fully understand what AI can/cannot do. Proper education, in this case, is the key.