Publications

You can also find my articles on my ORCiD profile and also on my Google Scholar profile.

IOCost: block IO control for containers in datacenters

Published in Architectural Support for Programming Languages and Operating Systems (ASPLOS 22), 2022

Resource isolation is a fundamental requirement in datacenter environments. However, our production experience in Meta’s large-scale datacenters shows that existing IO control mechanisms for block storage are inadequate in containerized environments. IO control needs to provide proportional resources to containers while taking into account the hardware heterogeneity of storage devices and the idiosyncrasies of the workloads deployed in datacenters. The speed of modern SSDs requires IO control to execute with low-overheads. Furthermore, IO control should strive for work conservation, take into account the interactions with the memory management subsystem, and avoid priority inversions that lead to isolation failures. To address these challenges, this paper presents IOCost, an IO control solution that is designed for containerized environments and provides scalable, work-conserving, and low-overhead IO control for heterogeneous storage devices and diverse workloads in datacenters. IOCost performs offline profiling to build a device model and uses it to estimate device occupancy of each IO request. To minimize runtime overhead, it separates IO control into a fast per-IO issue path and a slower periodic planning path. A novel work-conserving budget donation algorithm enables containers to dynamically share unused budget. We have deployed IOCost across the entirety of Meta’s datacenters comprised of millions of ma- chines, upstreamed IOCost to the Linux kernel, and open-sourced our device-profiling tools. IOCost has been running in production for two years, providing IO control for Meta’s fleet. We describe the design of IOCost and share our experience deploying it at scale

Recommended citation : Heo Tejun, Schatzberg Dan, Newell Andrew, Liu Song, Dhakshinamurthy Saravanan, Narayanan Iyswarya, Bacik Josef, Mason Chris, Tang Chunqiang, and Skarlatos Dimitrios. 2022. Iocost: Block io control for containers in datacenters. In Proceedings of the ACM ASPLOS. 595–608 https://doi.org/10.1145/3503222.3507727

CASH: A Credit Aware Scheduling for Public Cloud Platforms

Published in 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2021

Distributed data processing frameworks such as Hadoop, Tez, Spark, and Flink are exclusively used by public cloud tenants for executing large scale data analytics applications in various domains including but not limited to content management, financial sector, healthcare etc. These frameworks slice a job into a number of smaller tasks, which are then executed by a job scheduler on a multi-node compute cluster. While making scheduling decisions, the State-of-art schedulers employed in these frameworks assume hardware resources such as CPU, disk I/O and network I/O to offer a fixed service rate. However, in a public cloud environment, many of these resources are associated with burstable service rates. More specifically, the resources offer a guaranteed baseline service rate with an option to burst above their baseline rate by expending accumulated burst credits. Being unaware about this underlying hardware burstability, schedulers tend to make sub-optimal task placement decisions, thereby adversely affecting the job completion times, leading to higher deployment costs.In this paper, we propose CASH, a burst credit aware scheduler, which is cognizant about the burst credits associated with the individual hardware resources in the public cloud cluster. Through coarse grained task annotations depicting the burst credit demand of individual tasks and dynamically monitoring the credits for the underlying resources, CASH performs optimal task placement decisions. We prototype CASH on YARN, Hadoop, and Tez, and extensively evaluate it using both batch and streaming workloads. Our experimental results with CASH show CPU-credit based instances, like AWS T3, are a viable cost effective alternative when compared to self-managed offerings like Amazon EMR, for running large scale batch workloads. Furthermore, we demonstrate that CASH can accelerate streaming SQL queries on a large Hive database by up to 39.4% , leading to public cloud cost savings by up to 22%.

Recommended citation : Sharma, A., Dhakshinamurthy, S., Kesidis, G., Das, C.R.: CASH: a credit aware scheduling for public cloud platforms. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 227–236. IEEE (2021) https://doi.org/10.1109/CCGrid51090.2021.00032

MochiDB : A Byzantine Fault Tolerant Datastore

Published in CS244b: Distributed Systems, Autumn 2017, Stanford University , 2017

In this paper we would like to present MochiDB - a consistent, high volume, distributed datastore which is Byzantine fault tolerant. MochiDB supports native transactions and uses BFT quorum protocol with random write seeds that requires only two round trips for writes and one for read which gives it low latency over WAN deployments. This paper puts focus on engineering solutions that minimize the cost of contention resolution, sharding, dynamic configuration changes, garbage collection and others.

Recommended citation : Tsaturyan, T., & Dhakshinamurthy, S. MochiDB: A Byzantine Fault Tolerant Datastore https://www.scs.stanford.edu/17au-cs244b/labs/projects/tsaturyan_dhakshinamurthy.pdf