Efficient Query Optimization in Co-Processor-accelerated
Project Members: Sebastian Breß, Jens Teubner, Gunter Saake
For a decade, the database community explored graphics processing units and other co-processors to accelerate query processing. While the developed algorithms often outperform their CPU counterparts, it is quite challenging to accelerate database engines with co-processors. Here, the DBMS needs to handle multiple heterogeneous processors in a uniform way. The query optimizer needs to efficiently place database operators on the available processors and at the same time, needs to keep the communication overhead between processors low.
In this project, we counter these challenges by a hardware-oblivious query optimizer. The key feature is that we place operators on (co-)processors without detailed information about the hardware. For this, we use a learning-based run-time estimator that continuously observes the run-time of database operators on different processors and adapts its performance models and operator placement decisions accordingly. As part of the project we developed an open-source library called HyPE (Hybrid Query Processing Engine) that implements the techniques we developed. HyPE is used as physical query optimizer for operator placement by our database mangement system CoGaDB and was also integrated in the hardware oblivious database engine Ocelot.
- Sebastian Breß. Efficient Query Processing in Co-Processor-accelerated Databases. PhD thesis, University of Magdeburg, Germany, October 2015.
- Sebastian Breß, Max Heimel, Michael Saecker, Bastian Köcher, Volker Markl, and Gunter Saake. Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware. PVLDB, 7(13), 2014.
- Sebastian Breß, Norbert Siegmund, Max Heimel, Michael Saecker, Tobias Lauer, Ladjel Bellatreche, and Gunter Saake. Load-Aware Inter-Co-Processor Parallelism in Database Query Processing. Data & Knowledge Engineering, 2014. doi: 10.1016/j.datak.2014.07.003.
- Sebastian Breß. Why it is Time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS. In The VLDB PhD workshop. VLDB Endowment, 2013.
- Sebastian Breß, Felix Beier, Hannes Rauhe, Kai-Uwe Sattler, Eike Schallehn, and Gunter Saake.Efficient Co-Processor Utilization in Database Query Processing. Information Systems, 38(8):1084–1096, 2013. http://dx.doi.org/10.1016/j.is.2013.05.004.
Hardware-Sensitive Database Operations on Heterogeneous Processors
Project Members: David Broneske, Sebastian Breß, Gunter Saake
Due to the power wall, modern processors are getting more specialized and, thus, database systems have to cope with an increasing heterogeneity in the hardware landscape. This project targets a new programming paradigm for heterogeneous hardware. For this, we use different code optimizations that can be applied automatically to the algorithms to create different hardware-sensitive variants that exploit specific programming capabilities of a processor. Since the optimal variant changes w.r.t. the given processor and workload (e.g., selectivity, data size), we have to find the optimal set of code optimizations which brings optimal performance for the actual use case. As the amount of possible variants grows exponentially, we want the database system to investigate suitable variants during query processing and try different combinations of code optimizations. With this, the DBMS rewrites its operators till they perform optimal for the given use case. This involves the following points:
- Survey promising code optimizations for database operations and examine their performance benefits for specific workloads
- Explore suitable techniques to automatically apply a set of code optimizations (e.g. using a domain-specific language)
- Use a learning-based cost model to approximate execution behavior of different variants for different machines and workloads
- Examine the relation between different code optimizations and the machine / workload
- David Broneske. Adaptive Reprogramming for Databases on Heterogeneous Processors. In SIGMOD/PODS Ph.D. Symposium, pages 51–55. ACM, 2015.
- David Broneske, Sebastian Breß, Max Heimel, and Gunter Saake. Toward Hardware-Sensitive Database Operations. In Proceedings of the 17th International Conference on Extending Database Technology (EDBT), pages 229–234. OpenProceedings.org, 2014.
- David Broneske, Sebastian Breß, and Gunter Saake. Database Scan Variants on Modern CPUs: A Performance Study. In Proceedings of the 2nd International Workshop on In-Memory Data Management and Analytics (IMDM), Lecture Notes in Computer Science, pages 97–111. Spinger, 2014.
Modern Data Management Technologies for Genome Data Management and Analysis
Project Members: Sebastian Dorok, Sebastian Breß, Jens Teubner, Horstfried Läpple, Gunter Saake
Genome analysis is an important method to detect diseases and improve disease treatment. Due to the use of next-generation DNA sequencing techniques that sequence genomes in less time and at reasonable cost, genome analysis will become a central part of future medicine. As data volume increases rapidly, new solutions must be developed for efficient and effective management and analysis of genome data.
In this project, we investigate the capabilities of modern database systems such as relational column-oriented main-memory database systems to store and query genome data efficiently, while enabling analysis applications to access the data flexibly. Thereby, we especially focus on (1) the identification of requirements that such genome data management solution should provide, (2) the development of data management concepts for genome analysis using modern database technology with regard to chosen use cases and data management aspects such as data integration, data integrity, data provenance, data security, and (3) the evaluation of efficient data structures for querying and processing genome data in database systems.
- Sebastian Dorok. The relational way to dam the flood of genome data. In SIGMOD/PODS Ph.D. Symposium, pages 9–13. ACM, 2015.
- Sebastian Dorok, Sebastian Breß, Jens Teubner, and Gunter Saake. Flexible Analysis of Plant Genomes in a Database Management System. In International Conference on Extending Database Technology (EDBT), pages 509–512. OpenProceedings.org, 2015.
- Sebastian Dorok, Sebastian Breß, and Gunter Saake. Toward Efficient Variant Calling inside Main-Memory Database Systems. In International Workshop on Biological Knowledge Discovery and Data Mining (BIOKDD-DEXA), pages 41–45. IEEE, 2014.
- Sebastian Dorok, Sebastian Breß, Horstfried Läpple, and Gunter Saake. Toward Efficient and Reliable Genome Analysis using Main Memory Database Systems. In International Conference on Scientific and Statistical Database Management (SSDBM), pages 34:1–34:4. ACM, 2014.
GPU-accelerated Join-Order Optimization
Project Members: Andreas Meister, Sebastian Breß, Gunter Saake
Different join orders can lead to a variation of execution times by several orders of magnitude, which makes join-order optimization to one of the most critical optimizations within DBMSs. At the same time, join-order optimization is an NP-hard problem, which makes the computation of an optimal join-order highly compute-intensive. Because current hardware architectures use highly specialized and parallel processors, the sequential algorithms for join-order optimization proposed in the past cannot fully utilize the computational power of current hardware architectures. Although existing approaches for join-order optimization such as dynamic programming benefit from parallel execution, there are no approaches for join-order optimization on highly parallel co-processors such as GPUs.
In this project, we are building a GPU-accelerated join-order optimizer by adapting existing join-order optimization approaches. Here, we are interested in the effects of GPUs on join-order optimization itself as well as the effects for query processing. For GPU-accelerated DBMSs, such as CoGaDB, using GPUs for query processing, we need to identify efficient scheduling strategies for query processing and query optimization tasks such that the GPU-accelerated optimization does not slow down query processing on GPUs.