Data ingestion pipeline and data marts to empower UK researchers, academics, and business and economic decision makers

Yu, Qicheng, Healy, Stephanie, Ravi, Indrajitrakuraj and Patel, Preeti (2021) Data ingestion pipeline and data marts to empower UK researchers, academics, and business and economic decision makers. In: Future of Information and Communication Conference (FICC) 2022, 3-4 March 2022, Onsite & Virtual // San Francisco, United States. (Submitted)

FICC 2022 DIP research paper.pdf - Accepted Version

Download (767kB) | Preview

Abstract / Description

The data integration problem from the voluminous data generated from different sources in disparate formats coupled with a large number of diverse requirements related to the data have made the need for a reconciliation of them into a unique model, the identification of relationships, and the enabling of data analytics processes extremely vital. In light of the unabated growth of data volume and the need for data sharing across various stakeholders there is a requirement for the design and implementation of a data ingestion pipeline with a set of data marts. In this paper, we present a data ingestion pipeline which empowers hitherto impeded data users to easily access shared big data sources. We aim to improve the effectiveness and efficiency of open 993source data sharing capability so that researchers, academics, policy makers, businesses and government departments can all benefit from the use of these sophisticated data management techniques. In this work, we propose a novel data ingestion pipeline and data marts approach to utilise data generated from big data systems and effectively integrate them to a unified form, ready for use. Currently, the data ingestion pipeline focuses on UK data, as our primary aim is to support the City of London and the various communities within it. An additional benefit is the potential for developing collaboration across disciplines to tackle the economic and social challenges faced by cities in innovative ways.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: big data; data pipeline; data marts; data ingestion; extracting loading and transforming (ETL); star schema
Subjects: 000 Computer science, information & general works
Department: School of Computing and Digital Media
Depositing User: Qicheng Yu
Date Deposited: 10 Sep 2021 09:29
Last Modified: 04 Mar 2022 01:58


Downloads per month over past year

Downloads each year

Actions (login required)

View Item View Item