INTRODUCTION
In the
previous blog, pattern recognition in aviation safety reports has been demonstrated using clustering (unsupervised machine learning method)
and Pearson’s correlation coefficient calculation between selected report metadata attributes.
This case study exhibits how to detect recurring incidents using Natural Language Processing (NLP) methods, and Machine learning (ML) algorithms to
calculate the similarity score of occurrence narratives between any safety report (or free entered text) and all other safety reports from the database.
The goal was to provide a production-ready, highly responsive, fully automatic solution that can be easily used by safety professionals without any manual intervention.
METHOD
Database
The solution developed is compatible with any ECCAIRS format and ICAO ADREP Taxonomy-based safety report.
For this demonstration, the existing airline GALIOT SMS report database is used to showcase the solution in action.
Text preprocessing
Linguistic processing of the safety narratives is divided into two parts; aviation domain-specific and standard NLP methods.
In the first stage, commonly used acronyms in the aviation domain are grouped and replaced with a single term to prevent semantic confusion by using a custom-developed aviation dictionary. In the second stage, the standard text processing methods,
like stemming, stop-words removal, lemmatization, punctuation removal, …, are performed to prepare the text for vectorization.
Fetures extraction and similarity calculation
Reports vectorization (or feature extraction) is actually a transformation of the occurrence narratives into a two-dimensional numerical array
where rows are reports and columns are text features. In this case, vectorization is performed by TF-IDF (term frequency-inverse document frequency)
statistical measure that evaluates how relevant a word is to a report in the collection of reports in the database.
After the transformation, each report is represented as an n-dimensional vector, and the similarity between the reports is calculated using
a cosine similarity algorithm.
IN ACTION
Similarity and filter criteria
The only task required by the safety officer is to specify:
a) Similarity criteria
(similar text to find either by selecting one of the reports from the database or manually entering free form text)
b) Filter criteria
(aircraft from the fleet, aircraft make/model, last departure point, planned destination, and minimum similarity)
through the easy-to-use interface below.
Similarity result
Based on the specified criteria, the safety report similarity is calculated and the results are presented in the Time Plot
with the chronological distribution of similar safety reports on the x-axis and calculated similarity on the y-axis.
Built-in tooltip enables a quick overview of each report shown in the scatter plot.
In addition, the top 10 overall similar safety reports are listed in the separate table for following drill-down analysis.
CONCLUSION
This case study demonstrates how simple and effective GALIOT AI report similarity score calculation can be used to
detect recurring safety incidents from the aviation safety report database.
Marino Tudor
Founder & CEO
Galiot Aero Ltd
April, 2021