Yu, Qicheng (2012) An agent-based adaptive join algorithm for building data warehouses. Doctoral thesis, London Metropolitan University.
Making better business decisions in an efficient way is the key to succeeding in today's competitive world. Organisations seeking to improve their decision-making process can be overwhelmed by the sheer volume and complexity of data available from their various operational information systems. Many organisations have responded to this challenge by employing data warehousing technologies to make full use of the information in their systems and address real-world business problems. As organisations move their operation to the Internet to take the advantages of the new technologies, the data warehouse environments for the organisations become more distributed and dynamic. Meanwhile, applications of a data warehouse have evolved from reporting and decision support systems to mission critical decision making systems, which require data warehouses to combine both historical and current data from operational systems. This presents both challenges and opportunities in the designing and developing of new data warehouse systems for supporting decision-making processes which can deliver the right information, to right people, at the right time, interactively and securely. In typical distributed data warehouse architectures both the logical layer and physical layer of the data warehouse are used to map physical tables in distributed data marts. The physical layer contains historical data materialised in a longer time period while most recent data is only available from the logical layer. To extract knowledge from this data is often expensive, as it usually requires complex queries involving a series of joins and aggregations. Many commercial data warehouse systems place limits on such operations at runtime or sacrifice precision by using approximate replication. The join operation is one of the most expensive operations in query processing as it combines, compares and merges potentially large data sets. Joining large tables could consume a significant amount of the system resources including CPU, disk, buffer and network bandwidth. Consequently join performance has a considerable impact on overall system performance especially in a distributed data warehouse environment. The traditional 'optimise-then-execute' query processing paradigm is inadequate in this case. This thesis investigates the evolution of data warehouses to identify architecture suitable for highly distributed data warehouses and studied the feasibility and effectiveness of utilising software agent technology for distributed information systems. A novel agent- based adaptive join algorithm called AJoin for effective and efficient online join operations in distributed data warehouses has been proposed to seamlessly integrate dynamic integration approach with traditional data warehousing technologies to address the issues arising from distributed and dynamic data warehouse environments. Taking into consideration data warehouse features, AJoin utilises intelligent agents for dynamic optimisation and coordination of join processing at run time. Key aspects of the AJoin algorithm have been implemented and evaluated against other modern adaptive join algorithms. The experimental evaluation results demonstrate that AJoin consistently outperforms other adaptive join algorithms under various distributed and dynamic data warehouse environments in this study. The outcome of this research has been very encouraging. The average performance of AJoin in matching the first 50 tuples has improved as much as 67% and overall join performance has improved more than 35% compared with other join algorithms in a distributed and dynamic data warehouse environment.
View Item |