2017年8月12日上午9点,我课题组请来了江松老师,为课题组带来了一场有关硬件存储方向的报告。
李老师与江老师合影
合影
报告主要内容:
Data management systems in large-scale data centers are designed for high performance, scalability, and reliability. They play important roles in supporting Internet-wide data-centric computing. An important design principle critical to their success is to design according to workload characteristics: the general-purpose, one-size-fits-all approach once used in small-scale systems is no longer cost-effective. Examples of modern, carefully engineered systems include Google’s GFS file system, Facebook’s Haystack photo storage, and Baidu’s Atlas cloud storage system.
In this talk we will describe how rigorous workload characterization is used to design and implement a key-value (KV) system for large-scale data centers. In collaboration with Facebook, our team collected week-long KV access traces from Facebook’s production Memcached system and systematically characterized the relevant workload characteristics. This study showed some distinct access patterns that have significant implications for the KV systems’ designs, such as that (1) very small KV items are widespread; (2) accesses are highly skewed towards a small set of hot keys in KV cache; and (3) access traffic can be highly dynamic with request traffic varying by a factor of two.
Using our understanding of real-world workloads we designed and implemented the high-performance and resource-efficient zExpander KV cache and the LSM-trie KV store system. We will detail how the two systems’ designs were motivated by the understanding of theirtargeted workloads. Evaluation results reveal substantially, sometimes dramatically, improved performance over other state-of-the-art systems. As an anecdotal example, the LSM-trie system can improve the read and write throughputs of Google’s LevelDB by up to 10 and 20 times, respectively.
江老师介绍:
Dr. Song Jiang is currently an associate professor of the CSE department at University of Texas at Arlington. His research interests include system infrastructure for big data processing, such as file and storage systems and data management systems, as well as I/O systems for high-performance computing. He was a recipient of a 2009 US National Science Foundation (NSF) CAREER award and his research activities have been continuously supported by the NSF. He has served on many conference program committees and proposal review panels. He has been involved in projects at Facebook and Baidu as a collaborator for providing high-quality Internet-wide services based on big data, resulting in many significant publications at top-tier conferences. Dr. Jiang’s research has generated substantial impact in industry where several of his proposed algorithms for memory and storage management have been officially adopted into mainstream systems, including the Linux kernel, the NetBSD kernel, and the storage engine of MySQL.