An intelligent analysis of policing data for identity resolution

Nawaz, Asif (2024) An intelligent analysis of policing data for identity resolution. Doctoral thesis, London Metropolitan University.

Abstract

Identity refers to the unique characteristics or attributes that distinguish an individual. Identity crimes, such as theft or fraud, occur when someone unlawfully acquires and uses personal information for fraud. Identity resolution, the process of identifying and merging duplicates or similar entries, is critical for law enforcement agencies globally. However, matching identities in big data presents challenges due to inconsistencies, including typographical errors, naming variations, and abbreviations. Traditional record and identity matching techniques aim to consolidate or eliminate redundant data entries, ensuring accuracy and integrity. Manual identity matching is infeasible in big data environments. However, machine learning techniques offer a solution by automating pattern extraction and reducing reliance on manually coded rules.
This research proposes a fuzzy approach for identity resolution, combining unsupervised learning with fuzzy string similarity metrics to improve identity matching. The model incorporates an iterative search process using a combination of the Soundex and Jaro-Winkler algorithms to compute an aggregate score for names. The Soundex method has been enhanced to generate a six-digit numerical code, overcoming traditional limitations. Additionally, with the help of the FuzzyWuzzy Python library, the Edit-distance algorithm is applied to match attributes such as “address” and “ethnicity description.” The Mean-Shift clustering technique dynamically generates clusters based on the final dataset, avoiding needing a predefined number of clusters.
The three name variations of the iterative search process allow the categorisation of records into Match, Related or Close Match, and Possible Match while excluding duplicates. By grouping entities based on similarity scores and applying graph analysis, the framework accurately identifies target identities, even when links span different addresses. The results demonstrate the framework’s ability to enhance the speed and accuracy of identity resolution, offering a more efficient method than existing solutions.
This research significantly contributes to identity resolution techniques, improving investigative processes with minimal information and offering valuable applications for law enforcement and other sectors, such as fraud detection in the financial industry.

Documents
11192:56134
[thumbnail of Asif Nawaz_5027243.pdf]
Preview
Asif Nawaz_5027243.pdf - Published Version

Download (11MB) | Preview
Details
Record
View Item View Item