Ikeda, Chie (2022) A new feature engineering framework for financial cyber fraud detection using machine learning and deep learning. Doctoral thesis, London Metropolitan University.
As online payment system advances, the total losses via online banking in the United Kingdom have increased because fraudulent techniques have also progressed and used advanced technology. Using traditional fraud detection models with only raw transaction data cannot cope with the emerging new and innovative scheme to deceive financial institutions. Many studies published by both academic and commercial organisations introduce new fraud detection models using various machine learning algorithms, however, financial fraud losses via the online banking have been still increasing. This thesis looks at the holistic views of feature engineering for classification and machine learning (ML) and deep learning (DL) algorithms for fraud detection to understand their capabilities and how to deal with input data in each algorithm. And then, proposes a new feature engineering framework that can produce the most effective features set for any ML and DL algorithms by taking both methods of feature engineering and features selection into a new framework. The framework consists of two main components: feature creation and feature selection. The purpose of feature creation component is to create many effective feature candidates by feature aggregation and transformation based on customer’s behaviour. The purpose of feature selection is to evaluate all features and to drop irrelevant features and very high correlated features from the dataset. In the experiment, I proved the effect of using a new feature engineering framework by using a real-life banking transactional data provided by a private European bank and evaluating performances of the built fraud detection models in an appropriate way. Machine Learning and Deep learning models perform at their best when the created features set by the new framework are applied with higher scores in all evaluation metrics compared to the scores of the models built with the original dataset.
View Item |