Cloud Spark Cluster to analyse English prescription big data for NHS intelligence

Fernando, Sandra; Sowinski-Mydlarz, Viktor; Virdee, Bal Singh

London Met Repository

Tools

Lists

Fernando, Sandra, Sowinski-Mydlarz, Viktor and Virdee, Bal Singh (2024) Cloud Spark Cluster to analyse English prescription big data for NHS intelligence. In: ICDAM2023, 23-24 June 2023, London Metropolitan University - London, UK.

Abstract
Documents
Details
Record

[+][-]

Abstract

Spark is a large-scale data processing engine that is at least a hundred times faster than the Hadoop big data processing engine. Even though Spark is a complete in-memory framework, although limited with its big data platforms facilities compared to Hadoop, Spark analytics engine with Ha-doop distributed file system gives better throughput than Hadoop alone. The main contribution of this paper is the insight into the behaviour of HDFS-based Azura Cloud Spark Cluster with discus-sion and evaluation of its strengths and limitations using NHS prescription large dataset. Data on NHS prescriptions obtained from 2015 to April 2022 exceeds 500 GB of records. A public dash-board for individual BNF code analysis and studies on NHS cost analysis exist, but no analysis of this data range and volume of NHS prescription and especially using new big data processing en-gines such as Spark, was conducted. This study also contributes descriptive statistics and machine learning models of prescription data trends using Cloud Spark engine, and PySpark technology that has not been used in this context before. This study illustrates regions as well as GP practices in terms of reimbursement cost, drug consumption level, the type of the drug, and the disease type; varied demand for dispensed chemical substances over the years; shows what diseases have increased or decreased over the years as well as the total cost and its trends.

Documents