Download Redesigning Database Systems in Light of CPU Cache Prefetching PDF

TitleRedesigning Database Systems in Light of CPU Cache Prefetching
File Size1.5 MB
Total Pages227
Table of Contents
List of Figures
List of Tables
1 Introduction
	1.1 Can We Simply Adapt Memory-to-Disk Techniques?
	1.2 Cache Optimization Strategies
		1.2.1 Reducing the Number of Cache Misses
		1.2.2 Reducing the Impact of Cache Misses
	1.3 Our Approach: Redesigning Database Systems in Light of CPU Cache Prefetching
	1.4 Related Work
		1.4.1 Related Work on B+-Trees
		1.4.2 Related Work on Hash Joins
	1.5 Contributions
	1.6 Thesis Organization
2 Exploiting Cache Prefetching for Main Memory B+-Trees
	2.1 Introduction
		2.1.1 Previous Work on Improving the Cache Performance of Indices
		2.1.2 Our Approach: Prefetching B+-Trees
	2.2 Index Searches: Using Prefetching to Create Wider Nodes
		2.2.1 Modifications to the B+-Tree Algorithm
		2.2.2 Qualitative Analysis
	2.3 Index Scans: Prefetching Ahead Using Jump-Pointer Arrays
		2.3.1 Solving the Pointer-Chasing Problem
		2.3.2 Implementing Jump-Pointer Arrays to Support Efficient Updates
		2.3.3 Prefetching Algorithm
		2.3.4 Qualitative Analysis
		2.3.5 Internal Jump-Pointer Arrays
	2.4 Experimental Results
		2.4.1 Itanium 2 Machine Configuration
		2.4.2 Simulation Machine Model
		2.4.3 B+-Trees Studied and Implementation Details
		2.4.4 A Simple Cache Prefetching Experiment: Measuring Memory Bandwidth on the Itanium 2 Machine
		2.4.5 Search Performance
		2.4.6 Range Scan Performance
		2.4.7 Update Performance
		2.4.8 Operations on Mature Trees
		2.4.9 Sensitivity Analysis
		2.4.10 Cache Performance Breakdowns
		2.4.11 Impact of Larger Memory Latency
	2.5 Discussion and Related Work
	2.6 Chapter Summary
3 Optimizing Both Cache and Disk Performance for B+-Trees
	3.1 Introduction
		3.1.1 Our Approach: Fractal Prefetching B+-Trees
	3.2 Optimizing I/O Performance
		3.2.1 Searches: Prefetching and Node Sizes
		3.2.2 Range Scans: Prefetching via Jump-Pointer Arrays
	3.3 Optimizing CPU Cache Performance
		3.3.1 Why Traditional B+-Trees Suffer from Poor Cache Performance?
		3.3.2 Previous Approach to Improving B+-Tree Cache Performance
		3.3.3 Disk-First fpB+-Trees
		3.3.4 Cache-First fpB+-Trees
		3.3.5 Improving Range Scan Cache Performance
	3.4 Cache Performance through Simulations
		3.4.1 Experimental Setup
		3.4.2 Search Cache Performance through Simulations
		3.4.3 Insertion Cache Performance through Simulations
		3.4.4 Deletion Cache Performance through Simulations
		3.4.5 Range Scan Cache Performance through Simulations
		3.4.6 Mature Tree Cache Performance
		3.4.7 Results with Larger Key Size
	3.5 Cache Performance on an Itanium 2 Machine
		3.5.1 Experimental Setup
		3.5.2 Search Performance on Itanium 2
		3.5.3 Insertion Performance on Itanium 2
		3.5.4 Deletion Performance on Itanium 2
		3.5.5 Range Scan Performance on Itanium 2
		3.5.6 Operations on Mature Trees
		3.5.7 Comparing Simulation and Itanium 2 Results
	3.6 I/O Performance and Space Overhead
		3.6.1 Space Overhead
		3.6.2 Search Disk Performance
		3.6.3 Range Scan Disk Performance
		3.6.4 Range Scan Disk Performance on a Commercial DBMS
	3.7 Related Work
	3.8 Discussion
	3.9 Chapter Summary
4 Improving Hash Join Performance through Prefetching
	4.1 Introduction
		4.1.1 Hash Joins Suffer from CPU Cache Stalls
		4.1.2 Our Approach: Cache Prefetching
	4.2 Related Work
	4.3 Dependencies in the Join Phase
	4.4 Group Prefetching
		4.4.1 Group Prefetching for a Simplified Probing Algorithm
		4.4.2 Understanding Group Prefetching
		4.4.3 Critical Path Analysis for Group Prefetching
		4.4.4 Dealing with Complexities
	4.5 Software-Pipelined Prefetching
		4.5.1 Understanding Software-pipelined Prefetching
		4.5.2 Critical Path Analysis for Software-pipelined Prefetching
		4.5.3 Dealing with Complexities
		4.5.4 Group vs. Software-pipelined Prefetching
	4.6 Prefetching for the Partition Phase
	4.7 Experimental Results
		4.7.1 Experimental Setup
		4.7.2 Is Hash Join I/O-Bound or CPU-Bound?
		4.7.3 Join Phase Performance through Simulations
		4.7.4 Partition Phase Performance through Simulations
		4.7.5 Comparison with Cache Partitioning
		4.7.6 User Mode CPU Cache Performance on an Itanium 2 Machine
		4.7.7 Execution Times on the Itanium 2 Machine with Disk I/Os
	4.8 Chapter Summary
5 Inspector Joins
	5.1 Introduction
		5.1.1 Previous Cache-Friendly Approaches
		5.1.2 The Inspector Join Approach
	5.2 Related Work
	5.3 Inspector Joins: Overview
		5.3.1 Inspecting the Data: Multi-Filters
		5.3.2 Improving Locality for Stationary Tuples
		5.3.3 Exploiting Cache Prefetching
		5.3.4 Choosing the Best Join Phase Algorithm
	5.4 I/O Partition and Inspection Phase
		5.4.1 Bloom Filters: Background
		5.4.2 Memory Space Requirement
		5.4.3 Minimizing the Number of Cache Misses
		5.4.4 Partition and Inspection Phase Algorithm
	5.5 Cache-Stationary Join Phase
		5.5.1 Counting Sort
		5.5.2 Exploiting Prefetching in the Join Step
	5.6 Experimental Results
		5.6.1 Experimental Setup
		5.6.2 Varying the Number of CPUs
		5.6.3 Varying Other Parameters
		5.6.4 Robustness of the Algorithms
		5.6.5 Choosing the Best Join Phase Algorithm
	5.7 Chapter Summary
6 Conclusions

Similer Documents