Big Data – Small Patterns: Applying Geoscience Sentiment Analysis to Unstruct...


The Science and Technology for Exploration and Production Solutions (STEPS) outreach program is now in its second year. This year, the associated STEPS Distinguished Lecture series has followed the theme of Big Data in Exploration and Production (E&P).


Big data contains a great deal of untapped potential, which, if managed correctly, can provide a wealth of insight and knowledge. For this to be realized efficiently and effectively, advanced tools like natural language processing and machine learning are key to identifying patterns in big data sets that can then be leveraged by the oil and gas industry.



The fifth lecture in the “Big Data in E&P” Distinguished Lecture series will be delivered by Paul Cleverly, associate lecturer at Robert Gordon University on May 10, 2018. His talk is titled “Big Data – Small Patterns: Applying Geoscience Sentiment Analysis to Unstructured Text.”


Abstract: Text analytics and data mining are techniques used to analyze text and data for patterns, trends and other useful information. This can be combined with natural language processing (NLP) and machine learning to ‘mimic’ human thought processes to create actionable insights. Big data is, after all, about surfacing small patterns – and exploiting those patterns for humanitarian and commercial goals – potentially showing us what we don't know and challenging what we do know.

The majority of the published literature for text analytics in the geosciences focuses on rules-based extraction, extracting and counting specific geoscience concept occurrences and their associations within text. The surrounding 'context' of mention is largely ignored, although there has been some recent work mining social media for early earthquake warnings and geohazards. Sentiment analysis algorithms have been used in marketing, finance and communications to infer intentions, opinions and emotions towards institutions, brands and topics. To date, no known study has examined geoscience sentiment as it applies to petroleum systems texts.


Generic, out-of-the-box sentiment algorithms can perform poorly without customization. For example, in generic sentiment tools, the terms ‘fault,’ 'buried,' ‘thick,’ ‘old’ and ‘expelled’ tend to generate negative polarities not useful for geoscience; whereas, “…an older source rock” does not have a negative polarity! A research question is how well geoscience trained sentiment algorithms can compare to generic ones and how useful the resulting sentiment data may be for geoscientists.

Supervised machine learning using Naive Bayes and Skip-Grams was combined with NLP techniques and Knowledge Engineering using Python. The resulting algorithm was called Geoscience Aware sentiment analyZER (GAZER). The GAZER algorithm improved on out-of-the-box sentiment algorithms from well-known global tech corporations by over 30 percent. Research is ongoing, but there are early signs of the potential usefulness of the resultant sentiment data and visualizations showing patterns in time and space, which present opportunities for further research.


Biography: Paul is a Geoscientist turned Information Scientist. He holds a BSc in Geology, an MSc in Computing in Earth Sciences and a PhD in Enterprise Search and Discovery. He has worked in the oil and gas industry for over 25 years, with some of the world's biggest tech companies, as well as technology start-ups. He is a corporate advisor, an industry researcher based in Oxford and holds a role of Associate Lecturer at Robert Gordon University in Aberdeen, UK.



The lecture will be hosted at ENI Laboratories, San Donato Milanese, Milan on May 10, 3 p.m. local time (2 p.m. BST).  To attend the lecture in person, or to listen to the live broadcast, please email


If you are interested in the STEPS initiative, please register your interest at and join our growing iEnergy Community.