Current Projects

Efficient Query Optimization in Co-Processor-accelerated
Databases

Project Members: Sebastian Breß, Jens Teubner, Gunter Saake

Description

For a decade, the database community explored graphics processing units and other co-processors to accelerate query processing. While the developed algorithms often outperform their CPU counterparts, it is quite challenging to accelerate database engines with co-processors. Here, the DBMS needs to handle multiple heterogeneous processors in a uniform way. The query optimizer needs to efficiently place database operators on the available processors and at the same time, needs to keep the communication overhead between processors low.

In this project, we counter these challenges by a hardware-oblivious query optimizer. The key feature is that we place operators on (co-)processors without detailed information about the hardware. For this, we use a learning-based run-time estimator that continuously observes the run-time of database operators on different processors and adapts its performance models and operator placement decisions accordingly. As part of the project we developed an open-source library called HyPE (Hybrid Query Processing Engine) that implements the techniques we developed. HyPE is used as physical query optimizer for operator placement by our database mangement system CoGaDB and was also integrated in the hardware oblivious database engine Ocelot.

Selected Publications

Hardware-Sensitive Database Operations on Heterogeneous Processors

Project Members: David Broneske, Sebastian Breß, Gunter Saake

Description

Due to the power wall, modern processors are getting more specialized and, thus, database systems have to cope with an increasing heterogeneity in the hardware landscape. This project targets a new programming paradigm for heterogeneous hardware. For this, we use different code optimizations that can be applied automatically to the algorithms to create different hardware-sensitive variants that exploit specific programming capabilities of a processor. Since the optimal variant changes w.r.t. the given processor and workload (e.g., selectivity, data size), we have to find the optimal set of code optimizations which brings optimal performance for the actual use case. As the amount of possible variants grows exponentially, we want the database system to investigate suitable variants during query processing and try different combinations of code optimizations. With this, the DBMS rewrites its operators till they perform optimal for the given use case. This involves the following points:

  • Survey promising code optimizations for database operations and examine their performance benefits for specific workloads
  • Explore suitable techniques to automatically apply a set of code optimizations (e.g. using a domain-specific language)
  • Use a learning-based cost model to approximate execution behavior of different variants for different machines and workloads
  • Examine the relation between different code optimizations and the machine / workload

Selected Publications

Modern Data Management Technologies for Genome Data Management and Analysis

Project Members: Sebastian Dorok, Sebastian Breß, Jens Teubner, Horstfried Läpple, Gunter Saake

Description

Genome analysis is an important method to detect diseases and improve disease treatment. Due to the use of next-generation DNA sequencing techniques that sequence genomes in less time and at reasonable cost, genome analysis will become a central part of future medicine. As data volume increases rapidly, new solutions must be developed for efficient and effective management and analysis of genome data.

In this project, we investigate the capabilities of modern database systems such as relational column-oriented main-memory database systems to store and query genome data efficiently, while enabling analysis applications to access the data flexibly. Thereby, we especially focus on (1) the identification of requirements that such genome data management solution should provide, (2) the development of data management concepts for genome analysis using modern database technology with regard to chosen use cases and data management aspects such as data integration, data integrity, data provenance, data security, and (3) the evaluation of efficient data structures for querying and processing genome data in database systems.

Selected Publications

GPU-accelerated Join-Order Optimization

Project Members: Andreas Meister, Sebastian Breß, Gunter Saake

Description

Different join orders can lead to a variation of execution times by several orders of magnitude, which makes join-order optimization to one of the most critical optimizations within DBMSs. At the same time, join-order optimization is an NP-hard problem, which makes the computation of an optimal join-order highly compute-intensive. Because current hardware architectures use highly specialized and parallel processors, the sequential algorithms for join-order optimization proposed in the past cannot fully utilize  the computational power of current hardware architectures. Although existing approaches for join-order optimization such as dynamic programming benefit from parallel execution, there are no approaches for join-order optimization on highly parallel co-processors such as GPUs.

In this project, we are building a GPU-accelerated  join-order optimizer by adapting existing join-order optimization approaches. Here, we are interested in the effects of GPUs on join-order optimization itself as well as the effects for query processing. For GPU-accelerated DBMSs, such as CoGaDB, using GPUs for query processing, we need to identify efficient scheduling strategies for query processing and query optimization tasks such that the GPU-accelerated optimization does not slow down query processing on GPUs.

Selected Publications