Two years from its emergence, the EAGE A.I. community is going strong, led by a multidisciplinary team of geoscientists that have found their way to data science. Next to a brand new advice segment that the Committee plans on releasing periodically throughout 2022 and a new dedicated session at the EAGE Annual in June, they have also sat (virtually) with a group of experts to reflect on the recent changes brought in by artificial intelligence and the role of geoscientists in the digital transformation.
In the first part of this interview, published last month, we explored AI and machine learning in general. The second installation we looked into how AI/ML can become part of your workflow, what their impact will be in the long term and whether AI is a realm for data scientists exclusively. In this third and final part, will focus on the needs and trends of AI/ML for training and development.
To find all resources and opportunities offered by EAGE in the field of digitalization, visit our new Digitalization Hub.
Explore the EAGE Digitalization Hub
Do we need to prepare our geoscientists for the adoption of AI?
Olawale Ibrahim: Yes
Nathanial Jones: I think that we should prepare geoscientists in data science the same way we have geoscientists take statistics, chemistry, etc by providing a basic grounding and knowledge. I think geoscientists who are interested in diving deeper should be afforded the chance to in University geoscience programs. Statistics and data science are at their most powerful when combined with subject matter expertise that allow you to understand the data you are working with. This is especially true in subsurface geoscience data sets with sparse data where geological models and knowledge are needed to make useful predictions and inferences.
Felix Herrmann: Absolutely 100%. Mathematics, statistics, and computational sciences in combination with good domain knowledge are essential to be successful in bringing Machine Learning to bear in the geosciences. To accomplish this, I am a big proponent of team teaching where faculty from the main discipline of Machine Learning, typically somebody from Computer Science, teams up with a practitioner, e.g. somebody in the Earth Sciences. In that way, instructors can focus on their strengths. Unfortunately, this rarely happens explaining the lack of courses with depth and content tailored to specific domains of interest including the geosciences.
Cedric John: Well, yes! See my thoughts about the challenges we are currently facing in data science. I can add to that that many geoscientists apply machine learning algorithms from easy to use packages but without any fundamental understanding of the data science principles. This of course leads to erroneous results, but the authors often are not able to see this because of a lack of training. So yes, yes, yes! We need to prepare our geoscientists for the adoption and correct applications of AI.
Ashley Russell: Only one word needed here: Yes.
Ali Karagul: Definitely. I see the general usage of it like using Excel nowadays.
Oleg Ovcharenko. Absolutely. The future is bright for AI in geosciences so getting familiar with essential concepts should be in the basic curriculum alongside the linear algebra course. It takes a couple days or even hours to draft a simple neural network by example from GitHub so any geophysicist can try it out. However, it takes much more efforts to become proficient in the topic.
Ruslan Miftakhov: Yes, because the current software development requires geoscientists to understand basic AI concepts like training and validation set, metrics to control the training, under/overfitting, biases in data. It does not mean every geoscientist should learn how to code a neural network in some programming language, but they should understand Data Science on the application level with some working intuition.
How can we best prepare them?
Olawale Ibrahim: I think the best way to start preparing geoscientists for the adoption of AI should be from the academia as students. This can be incorporated into course work to aid early adoption for the young geoscientists. Career geoscientists could be prepared through a series of structured AI trainings and applications in geoscience.
Nathanial Jones: I think a basic understanding of the approach and mentality, pros, cons, and techniques of data science would be valuable. A little knowledge would go a long way and help geoscientists to find out if they are interested in learning more data science (and help them work with data scientists at work in the future)
Felix Herrmann: I am afraid that it will be somewhat of a challenge to make up for years of deemphasizing the importance of mathematics, statistics, and computer science in the Earth Sciences. Having said that, I am very happy and encouraged to see a keen interest in more quantitative approaches amongst the new generation of students in the geosciences. As long we manage to convince students it is way more fun to study Machine Learning on some cool geophysical dataset than working on say “YouTube” videos, I am optimistic we will be able to train the next generation of quantitatively savvy geoscientist ready to meet modern-day challenges that come with the energy transition and fighting climate change. To make this happen, we would need to team up with computer scientists and develop hands-on “open-source courses” to which the community is encouraged to contribute by adding tangible geophysical examples.
Cedric John: the key is training. University training of course, but also training outside of university for the current working professional. For this, I think that having consultant teachers from academia who understand the problems faced by geoscientists is key.
Ashley Russell: I maybe have a different viewpoint on this – I do not worry so much about our incoming graduates into subsurface in Equinor, the digital literacy and knowledge of “the algorithm” is so prevalent in the current generation. Whatever you guys are doing in academia – keep doing it! Time is extremely precious for our organization and training is IMO often seen as a burden that there is no time, or no usefulness for post. Felix hit it right on the head, we are 2 decades behind. So my focus is on how we prepare those who do not see data science as anything more than something “on the side” or for specialists. This means really focusing on training that is coupled with development in the geoscience and earth science specialities – redesigning them in a way that is more data and statistics centric. That goes for software training too – we do not have the time to learn how to push buttons only, we need to also train on data flow, algorithms, data structures, data integration within certain applications.
Ali Karagul: By showing case studies with a good story line.
Oleg Ovcharenko: In my opinion, the efficient way would be gaining “AI experience” directly from the computer science community since they have already come the way that we still have to go. Specifically, it takes time to work out efficient research practices, what tools and environments to use for code development and how to run nice demos and share results. A good combination then would be, as Felix mentioned earlier, to have students from computer science and geoscience majors collaborating on shared projects. Hiring seniors from tech companies to advise or co-lead geoscience projects might work well in industry.
Ruslan Miftakhov: For the younger audience in universities: it would be better to add a course “Intro to Data Science” to learn about the basic AI concepts and a course about AI/ML applications in geoscience to learn about the limitations and current/future trends. For professionals, there are a lot of different online courses about AI. The one I recommend is called “AI for everyone” by Andrew Ng. However, I haven’t seen an online course on AI/ML applications in geoscience; it was why I created my YouTube channel and started posting AI/ML content in Geoscience on LinkedIn.
What is in your view the easiest way to get started with AI?
Nathanial Jones: Kaggle courses, tutorials, and datasets.
Felix Herrmann: Make sure you brush up on statistics, linear algebra, and if possible, topics such as optimization before participating in online courses on Machine Learning. Check out the web, there are plenty of blogs explaining methods and code. Read papers of conferences, especially papers with code. Install the code, play with it, understand it, and adapt it to solve a problem in the Earth Sciences. Finally, and perhaps most importantly see whether you can participate, as a student or otherwise, in summer workshops on the fundamentals of Data Science and Machine Learning.
Cedric John: there really are plenty of excellent tutorials out there that will help you get started. But at some point you will need to switch from the generic problems of determining who would survive the Titanic disaster to the real issue of improving things in the geoscience industry. For that, you need a real project and a good mentor.
Ashley Russell: As someone who has gone through this, I can say that you can’t just jump into ML overnight! It is going to be a journey – but start with the basics – what data is, what forms it comes in, what is a 0 versus empty versus null, how to access data, and then naturally that evolves in the visualization and statistics of that data. Try a dashboarding tool, like PowerBI or Spotfire – use data you know and try to solve a problem you may already do in a different way today. From there ML will come easier – but do not bypass the base. And find a data science mentor, (if you are already capable of some data science skills go find a mentee!). For a touch of reality, I have been mentoring data science for over a year now. We just now started on unsupervised machine learning!
Ali Karagul: Finding a case story which makes the user interested and approach the methods gradually.
Oleg Ovcharenko: I usually learn new technologies by drafting a toy application. Select a problem, create an empty Jupyter notebook in a clean virtual environment, set up version control, dive into web and StackOverflow, draft the solution plan in your head and make a list of what knowledge you are missing. Then fill the gaps one by one until you solve the problem. At this point you will already have a general idea on how it all works.
Ruslan Miftakhov: It depends on the end goal—if it is learning how to build a neural network on the low level with machine learning frameworks, then taking excellent courses about Deep Learning and GANs on Coursera and Deep Reinforcement Learning nanodegree program from Udacity. Important alongside those courses working with concrete AI applications in Geoscience on GitHub.
What lessons are you learning in the process?
Olawale Ibrahim: The more I learn, the more I know there’s always more to learn. That there is still a long way to go in the application of A.I. techniques in the geoscience sector.
Nathanial Jones: Keep it simple, don’t go with the most complicated solution first; Focus on the data first, understand it, improve it where you can, then worry about the use cases; Plan for scale, setup your proof of concept to be able expand; Begin with the end in mind, make sure you are working on something that has strong interest and engagement within your org. Sometimes the small, simple quality of life improvements to the data that you do will get you the most impact and adoption down the road.
Felix Herrmann: With any new technology it is always a challenge to find the right application. It is one thing to “just” throw Machine Learning at problems it is quite another to do things you were not able to do before. The latter takes more time, effort, and thought. I believe that neural networks, and Normalizing Flows (these are networks that transform complex statistical distributions into the canonical Gaussian distribution) especially, have the potential to finally “put error bars on seismic” in a systematic way. In my group, we are working on how to do this.
Cedric John: We have learned our lesson from the lack of properly trained geoscientists by reshuffling our teaching. I now teach data science and machine learning as an elective to our undegraduate students, and I hope other university will follow soon. Collectively, we have also decided to change our master programs. The energy-related masters are now full-fledged numerical masters, where my colleagues and I teach data science, machine learning, deep-learning and big data to the future expert in geoenergies. Data availability remains a problem, but we are working with our industry partners to overcome it. As a field, geoscience probably should think about long-term repositories for subsurface data, whether it is of academic or an industrial nature.
Ashley Russell: In tackling the challenges above, the lessons certainly come out. I personally think us in the big energy companies have neglected our IT competence and people. I have learned that those sitting in IT and software development are more than capable of working with subsurface data and algorithms and are able to capture knowledge from experts in subsurface fields in the right data ways. This collaboration has led us doing AI or data engineering to completely rethink how we work – cheaper and faster. I challenge those reading this that the next time you talk to an IT person, ask them how they would tackle a problem you might be having! The other major lesson is that developing an entire organization to work in a data centric way, where we have the possibility to turn to AI for any process, is no easy task. So this is an important lesson I have learned – do not underestimate the feelings or negativity toward AI that exist. We have to be ethical and we have to be user-centric and we have to be kind.
Ali Karagul: New vocabulary and its relevant techniques to use
Oleg Ovcharenko: My personal takeaways: Invest enough time to be confident in the data and then start playing with model architectures. Iterate fast at the prototyping stage. Don’t write “one code to rule them all” – isolate experiments and make them easily accessible, reproducible and modifiable (e.g. one notebook for one idea, not all ideas in one notebook). Small meaningful changes in the data or model often strongly impact the result. Be sufficiently fluent with one deep learning framework. Find a rationale for using a new architecture or idea from the web prior to any coding. Assume low-speed unstable internet connection and run heavy computations somewhere machine-independently e.g. in the cloud or a cluster.
Ruslan Miftakhov: I have an extensive R&D team that works on tens of different hypotheses every month within geoscience. There are several lessons from the top of my head that I can share: 1 – better to spend time analyzing, cleaning, and improving dataset than spending time in pursuit of the latest and greatest neural network architecture; 2 – establish a good practice by using MLOps tools and documentation, since after a while and several hundred experimentations passed by, it would be difficult go back to the original solution and pivot the idea; 3 – create curated endless bucket of different AI hypothesis/ideas; 4 – no wasting time, require a quick and dirty prototype, but have a final idea in mind. Essential to have clear acceptance criteria with established metrics for the solution. 5 – If a team tried three different attempts but failed to prove the AI hypothesis, discard it or leave it for later. There are many more lessons that I can share.
What are your sources for learning about AI?
Olawale Ibrahim: My first sources for learning about AI as a starter were online courses from Microsoft, IBM on e-learning platforms like edX, Coursera. Other sources were from popular AI blogs like Towards Data Science, and others for reading up on articles of interest. Community learning was also very influential in my beginner journey as I had a community of AI enthusiasts on campus where AI is being introduced and materials for learning are shared. For continuous development, I now use GitHub for learning by working on projects; learning from other open source approaches and codes.
Nathanial Jones: I definitely try to learn from a mixture of books, going to hackathons, and doing online courses (Kaggle, Coursera, etc). The very best way to learn is to try things and experiment and we are lucky to live in a time of abundant data, competitions, and more to try out techniques and experiment.
Felix Herrmann: My students have been fortunate enough to take excellent courses in Machine Learning and Data Science. I have been “less fortunate”. I basically learned the field by reading publications that are made available on arxiv.org – another example where open non-proprietary sharing of information can really speed up innovation. Having said this, I was as a professor fortunate enough to having built up a solid background in mathematics, statistics, and computational sciences. This allowed me to learn this field from scratch.
Cedric John: I do watch YouTube video and read MediumDigest but I am old school: I still enjoy a good text book about data science and new machine learning algorithm, and of course, academic papers. I truly ‘learn’ machine learning when I apply it: so I try as much as possible in my role to still actively code and solve problems. I find this helps me support my students better.
Ashley Russell: I am a big hackathon fan – I always learn something new. But for me personally it has been a mixture of learning from others, and from some online coursework. I always go back to some of the online resources from O’Reilly – they have such a rich set of papers, books, and examples on everything in the data science ecosystem.
Ali Karagul: Books and online learning
Oleg Ovcharenko: I started with online courses and then took a university course on machine learning. The rest was crawling the web and looking at open-source codes on GitHub. I often check paperswithcode.com and watch YouTube (relevant suggestions work like magic). A few times a year I also go through technical programs of computer science conferences that sometimes have energy-related topics.
Ruslan Miftakhov: The most exciting source of learning about the latest AI development would be reading a lot of publications (for example, arXiv). Then, use YouTube to learn basic or relatively old AI concepts and, finally, read books for a more comprehensive explanation.
For more resources on continuous professional development, visit EAGE’s LearningGeoscience platform, which hosts multiple online courses on machine learning, hands-on coding and digital tools for you to choose from.