Two years from its emergence, the EAGE A.I. community is going strong, led by a multidisciplinary team of geoscientists that have found their way to data science. Next to a brand new advice segment that the Committee plans on releasing periodically throughout 2022 and a new dedicated session at the EAGE Annual in June, they have also sat (virtually) with a group of experts to reflect on the recent changes brought in by artificial intelligence and the role of geoscientists in the digital transformation.
In which applications of A.I. have you seen successful outcomes? What lessons have they learnt since introducing A.I. in their work? How do they advise preparing geoscientists for the adoption of A.I.?
The responses are worth sharing over three instalments. Here is Part 1 of the interview.
To find all resources and opportunities offered by EAGE in the field of digitalization, visit our new Digitalization Hub.
Explore the EAGE Digitalization Hub
Why did you get into the field of A.I.?
Was there a compelling event, or reason, to make the commitment?
Olawale Ibrahim: I got into A.I. looking for a career change. This happened in 2019 during my undergraduate internship days. I couldn’t secure an internship based on merit due to the nepotism and “connection network” you must have to get an internship placement in the oil and gas industry in my country. It seemed like I might even face a bigger ordeal of unemployment upon graduation.
Nathanial Jones: I really dislike calling what most data scientists do A.I. I and most data scientists are not in the business of teaching machines to think. That aside, I’ve been interested in modelling and programming since college and I first got into data science for the same reasons that I got into programming in general, to reduce the amount of tedious work I needed to do. Coming up with programs that can describe cores and other subsurface data sets freed me to focus and experiment on the fun analysis and modelling parts of geoscience that are the product of a lot of sometime tedious work on data prep, data clean up, and data interpretation.
Felix Herrmann: Back in 2016 I decided to start working on Machine Learning to solve problems in exploration seismology. At the Seismic Laboratory for Imaging and Modelling (at the University of British Columbia at the time), we were always attuned with the latest developments in applied mathematics, computer science and to some extent in data science so it was quite natural to include machine learning in our research program. This pivot towards Machine Learning was accelerated with my move to the Georgia Institute of Technology – a powerhouse in Engineering and Machine Learning. Not long after my arrival I started working with professor Ghassan AlRegib to start ML4Seismic – Georgia Tech’s Center for Machine Learning for Seismic Industry Partners Program. We designed this center to foster research partnerships aimed to drive innovations in artificial intelligence assisted seismic imaging, interpretation, analysis, and monitoring in the cloud.
Cedric John: As far as I can remember, I have been coding. I did this ‘for fun’ in the 80’s and 90’s, and so it came naturally as part of my work when I became a faculty member at Imperial College London. In the 90’s I was aware of neural networks and the so called ‘expert systems’. Even back then there was talk of these systems replacing complex tasks such as geological interpretation. I was intrigued in them, but at the time there were no easy framework to apply machine learning, the datasets were not available, and computer power was low. Fast forward 30 years, and we now have a very different landscape. I became involved in deep-learning around 2017, when I conceived and supervised a project using transfer learning on convolutional neural networks to automatically identified carbonate facies. The project was a success, and this led me to completely change the focus of my research efforts towards data science and machine learning.
Ashley Russell: I think for me it was coming into a data management team as a geologist in 2015. Looking back today, it was an excellent pairing, as I knew what the data meant and could help to ensure quality and proper access to that data. But I was shocked at the processes being used – data on CDs, spending entire days renaming files – this was such a clash with the modern world outside of work. So I took it on myself to learn python and Spotfire – and that was a huge breakthrough for me. From that the snowball kept going, digging into statistics, correlation matrices, unstructured data extraction, and into the machine learning world – and automating a lot of manual processes to ensure standardization and consistency we need to make analytics happen!
Ali Karagul: Knowing that we are dealing with really big volumes of data and understanding the transition into new energy fields will further increase the volume of data.
Oleg Ovcharenko: I have always been into tech so when starting my PhD studies in 2016 I felt that putting all eggs into one basket of pure geophysics might not be a good idea from the career point of view. So I decided to explore adjacent fields to learn something valuable for the long-term. At that time, deep learning had already shown its capability in image processing so I decided to try it out in application to seismic problems. The next step was taking a basic course by Andrew Ng on Coursera and a few other online courses that gave me an introduction to the topic. The flexibility of data-driven methods was appealing and I realized that I will not burn out quickly by working in that direction.
Ruslan Miftakhov: I am always on the lookout for solutions that might improve designs or workflows, but I was not a data-driven man at all. I had an idea that everything could be solved/approximated with maths, physics, and computer science. However, my team was trying to solve the problem of seismic fault delineation one time in 2017. We cycled a lot of different analytical prototypes. Still, the resulting solution was not generally applicable to a wide range of seismic projects. Eventually, we tried using AI/ML for it. Outstanding initial results on this task made me open to exploring AI/ML in-depth.
In which applications of AI have you seen successful outcomes?
Olawale Ibrahim: In Geoscience: AI has seen successful outcomes in the area of supervised workflows like; seismic facies classification, lithofacies classification from well logs, predicting missing well log sections/entire logs. And also in unsupervised workflows like clustering seismic attributes for lithofacies identification.
Other industries: Medicine, financial sector, e-commerce.
Nathanial Jones: Keeping in mind my critique above of how we are using the term AI here, the most spectacular successes above traditional programming/automation have been with image recognition, natural language processing, and signal processing. I’ve personally witnessed the power of image recognition techniques like convolutional neural nets in my work and personal life. CNNs are very good at automating some types of core descriptions, analyzing very dense core datasets like 3D MRI scans of core, or very high res photographs. Above all they can allow you to spread a small amount of interpretation quickly over a large core dataset. They work well on siesmic data and field data (facilities, pipelines, etc) as well. I’ve also seen how natural language processing combined with OCR can help you sift through large volumes of scanned legacy data. NLP can help break down language barriers with machine translation and it can help make sense and find important information within large unstructured datasets like papers, comments fields, and reports. These developments have been exciting to witness and promise to put more and better data into geomodels and decision-makers‘ hands.
Felix Herrmann: To gain experience in machine Learning, and the application of deep Convolutional Neural Networks in particular, my PhD student Ali Siahkoohi and I started working on applying Generative Adversarial Neural Network to solve wavefield reconstruction problems, followed by the development of practical transfer-training based workflows for surface-related multiple elimination and deghosting and numerical dispersion removal. More recently, we have been concentrating our efforts towards distribution learning, where neural nets are trained to capture uncertainty and how this uncertainty affects certain tasks, e.g. the task of automatic horizon tracking on migrated images.
Cedric John: Limiting the term ‘AI’ to mean machine learning and deep learning, I think it is fair to say that there is a tremendous amount of successful applications. From our day-to-day experience, we have all seen how natural language processing (NLP) transforms our world with things like speech recognition and text translation, and computer vision is driving the autonomous vehicle recognition, among others. In terms of application to geology and geophysics, there has been some very successful use of computer vision to interpret thin-section and core facies and seismic volumes. Conventional machine learning algorithms such as random forest and support vector machine have been used with some degree of success in interpreting logging data and reconstructing lithology, and in predicting the prospectivity of unconventional reservoirs. Unsupervised learning is also an attractive approach as it requires no labelling, and we have seen successful applications of clustering algorithms to seismic data. Overall, machine learning and deep-learning is already impacting our field by improving automation and reproducibility.
Ashley Russell: I’ll take the same perspective of AI as Cedric above – but I need to also say that the descriptive side of analytics – being able to visualize the “flavor” of the data you work with is also extremely valuable. For example if we have the capability to visualize and numerically describe ranges of possible formation pressures, both from nearby wells and perhaps from other global analogues we limit the possibility of “being surprised” during drilling, preventing what could be a safety incident or a well control challenge. Coming to the AI side, image recognition, unsupervised clustering, supervised machine learning – are all quite powerful in terms of automating manual processes, collecting and pointing out funny looking things in datasets – both well and seismic, and adding data where it is missing. Essentially working toward that we have data, whether machine or human made – to make better decisions to quantify subsurface, including in places where maybe we are very data poor or combine with descriptive analytics to better work with a lot of data where we are data rich.
Ali Karagul: Modelling and Log analysis.
Oleg Ovcharenko: Deep learning has been undoubtedly successful in solving classification and segmentation problems as well as detecting objects and anomalies. In geophysics such applications might be found in characterization of lithological facies, delineation of salt bodies on migrated images and others. Unsupervised solutions for denoising and interpolation of seismic data are also promising. Regression tasks, however, appear to be more challenging to address since the solution space is typically much wider and extra constraints have to be posed to incorporate physics and domain-specific knowledge.
Ruslan Miftakhov: Countless… Broadly we can break down AI applications into three categories and measure their relative success:
- Use of AI in an area where we have extensive manual work and no automation. Potentially it could improve the quality and efficiency of manual manipulations. Examples of successful application, its horizons, faults, and facies pick from structural interpretation.
- Use AI in areas where we already have some analytical solutions. For instance, we have tomography and FWI for velocity model building, but it is challenging to use them because of nonuniqueness and stability. In this case, AI can reduce computational time, improve quality, and propose an additional way to fuse data.
- Use AI for problems where we don’t have a solution yet. An example of a successful application is that DeepMind solves the 50-years old “grand challenge” of protein folding with the AlphaFold network. The third category is the most impactful.
What things about AI surprise you most?
Felix Herrmann: Compared to the applied geosciences, the field of Machine Learning innovates at a much faster rate. Availability of open-source platforms, such as PyTorch and TensorFlow, have and continue to play a pivotal role in this. While we had an initial surge in productivity back in the eighties with CWP’s open-source Seismic Unix, the rate of innovations stifled when academic codes became more proprietary (with the exception of Madagascar-RSF). Perhaps not surprisingly, the Machine Learning community has in the meantime been able to make enormous progress. What astonishes me is that our industry is only slowly reaching the conclusion that innovations actually happen more quickly when results and code are shared openly.
Cedric John: there is a feeling of openness in data science and machine learning that is unrivaled in our field. Code and ideas are exchanged before papers are being published, something that would never happen in geochemistry or traditional Earth sciences. Also, the field of data science is so new that we are still living in an era where no matter what your background is, you can become a full fledged data scientist. I am not sure how long this will last, and when a ‘data science’ degree will be required to be a data scientist.
Ashley Russell: I have been surprised by the degree of connectivity and complexity that we have to handle when working with AI. It may seem like magic what the AI-forward companies do – Tesla, Apple..etc, but I have been shocked to learn how much time and technology and people and skills and needs are required to make that magic work.
Ali Karagul: Willingness of people to help each other.
Oleg Ovcharenko: That the “AI” doesn’t work out of the box. It requires creative thinking to design a proper objective, data pipeline and training scenario, followed by stacking architectural bricks, tweaking hyperparameters and making it all work with available dataset. Although obvious, I was surprised that for supervised learning the key challenge is often the creation of a representative dataset for training rather than building a network architecture. Another eye-opening insight from my experience was that the industry cares about quality and robustness rather than how fast the inference is.
Ruslan Miftakhov: The fascinating thing about AI is that it’s the code you don’t write. Conventional software development is based on predefined scenarios with a chain of conditional statements (if, else, then, etc.). Scenarios can always be explained, and results can always be reproduced. In contrast, AI/ML development is based on statistical methods that learn directly from data. Scenarios can hardly be understood, and results are probabilistic.
Which areas of machine learning do you find most fascinating?
Olawale Ibrahim: Reinforcement learning: Reinforcement learning is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. It’s application in geoscience involves the modelling of fluid flow paths in reservoirs and complex petroleum systems.
Felix Herrmann: For a long time, I was aware of Bayesian inversion techniques, including Uncertainty Quantification. These were after all pioneered in the seminal work of Albert Tarantola, who was way ahead of his time. What prevented me from entering this field was that I felt that practitioners of Bayesian (Variational) Inference needed to make too many simplifying assumptions (e.g., smoothness and Gaussian distributions) to be practical. Moreover, Monte-Carlo sampling techniques were limited to small problems with cheap forward modelling limiting their applicability. Deep neural networks are able to capture, i.e. learn from examples, complex high-dimensional probability distributions. This ability will completely change our ability to solve problems in the Earth Sciences as they were originally envisioned by Tarantola. It is too bad that Tarantola did not live long enough to witness this.
Cedric John: for me, the topic of transformer in computer vision is really fascinating. I am also really curious to know more about graph networks and deep learning physical simulations: these new techniques have been shown to yield great promises in the simulation of relatively simple systems such as small-scale fluid dynamics. Could we apply this approach to problems related to sediment transport and deposition, or flow in porous media? In short, new technologies and new ways to apply existing algorithms are in my opinion one of the most exciting aspects of my job.
Oleg Ovcharenko: Unsupervised and self-supervised learning fascinate me the most. The capability of supervised learning is usually defined by properties of the training dataset which in seismic is often limited by conventional methods or assumptions in synthetic simulations. Finding the use cases and designing unsupervised solutions is more challenging compared to supervised methods but it should also be rewarding. There are tremendous amounts of diverse geophysical data accumulated over decades of oil exploration and I believe it might bring new value by being re-processed in an unsupervised and scalable way.
Ruslan Miftakhov: Deep Reinforcement Learning. Supervised and unsupervised methods are good at implementing pieces of the puzzle (i.e., fault detection and facies identification) but not assembling a complete solution for automated geoscience workflow. The automated workflow would require having a decision-making engine that Reinforcement Learning can train.
What criteria should we use to ascertain the level of success in AI solutions?
Olawale Ibrahim: Speed and accuracy. Evaluation of AI solutions from a geoscientific standpoint should be used in ascertaining the level of success; AI solutions should be closely similar to that of experienced geologists.
Ali Karagul: Their help in advancing on our understanding of interconnectivity among different fields.
Where is AI adding measurable value in your area?
Ali Karagul: Analysing big volumes of data.
Oleg Ovcharenko: For academia, the number of AI-related publications is one obvious measurable quantity. It feels like a breath of fresh air since the number of publications on the topic has grown exponentially over the last 5 years. These days almost every research group conducts research on the edge of AI and geosciences.
What do you see as the biggest challenges for delivering AI solutions at scale?
Felix Herrmann: I already touched upon this. Machine Learning approaches need massive datasets and compute. Research on the fundamentals can bring some of these costs down, e.g. we can expect improvements when techniques from scientific and high-performance computing make it to the Machine Learning community. However, fundamental challenges remain that call for the development of new and improved algorithms. In exploration seismology, we went through a similar exercise where we are now able to image 3D wavefields over thousands of timesteps. This is astounding because this is roughly the equivalent of training a Convolutional Neural Net with a billion features over thousands of network layers. So, in that sense, the Machine Learning community may be able learn from us not only computationally but also how we handle problems such as non-uniqueness.
Cedric John: I think that this is a challenge for the whole data science community, not just the subset of data geoscience! One problem outside of data availability and how well our models scale to unseen data, is the fact that fundamentally we design data science program for local application. Python, the preferred language for data science, is not well suited to large code base because of the lack of compile-time checks on types. This means that subtle bug go undetected and can remain so for a long time. The framework tend to be hard to deploy at scale too. Of course, some solutions exist. You can use third-party solution to deploy containerized code or run code in parallel to improve efficiency. Or you can mode away from Python altogether or use a statically typed language like Scala and Spark: but the reality is that data scientists are not keen to do so. This creates a gap between data science development and data engineering that implements a new version of the code for deployment. In the future, I predict that the two roles will be merged and more streamlined solutions will exist – this is already becoming apparent today.
Ashley Russell: At scale to me means several components need assembly as proper data science operations (MLOps, DataOps..choose a name!) and that these components run harmoniously together with little human intervention. This means a system that scales in multiple dimensions – across global regions (different geology, different facilities, different data!), across changes in data ingestion (veracity changes, volume changes, format changes), across computing needs based on data density, and changes to ML training datasets, models, and the automatic scoring that comes along based on improvements. Tech companies do this today – check out MLOps at Google – but we are 8 years behind in setting up this infrastructure and competence in both our geoscience experts and IT departments. Not to mention the challenge that we in the energy industry are not familiar or used to working in this way at all – we are defined by our software we work with today. This is a massive change in how we use data to solve our problems and complete the work we need to do.
Ali Karagul: Underestimating the unknowns.