CASH: A Credit Aware Scheduling for Public Cloud Platforms

Published in 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2021

Distributed data processing frameworks such as Hadoop, Tez, Spark, and Flink are exclusively used by public cloud tenants for executing large scale data analytics applications in various domains including but not limited to content management, financial sector, healthcare etc. These frameworks slice a job into a number of smaller tasks, which are then executed by a job scheduler on a multi-node compute cluster. While making scheduling decisions, the State-of-art schedulers employed in these frameworks assume hardware resources such as CPU, disk I/O and network I/O to offer a fixed service rate. However, in a public cloud environment, many of these resources are associated with burstable service rates. More specifically, the resources offer a guaranteed baseline service rate with an option to burst above their baseline rate by expending accumulated burst credits. Being unaware about this underlying hardware burstability, schedulers tend to make sub-optimal task placement decisions, thereby adversely affecting the job completion times, leading to higher deployment costs.In this paper, we propose CASH, a burst credit aware scheduler, which is cognizant about the burst credits associated with the individual hardware resources in the public cloud cluster. Through coarse grained task annotations depicting the burst credit demand of individual tasks and dynamically monitoring the credits for the underlying resources, CASH performs optimal task placement decisions. We prototype CASH on YARN, Hadoop, and Tez, and extensively evaluate it using both batch and streaming workloads. Our experimental results with CASH show CPU-credit based instances, like AWS T3, are a viable cost effective alternative when compared to self-managed offerings like Amazon EMR, for running large scale batch workloads. Furthermore, we demonstrate that CASH can accelerate streaming SQL queries on a large Hive database by up to 39.4% , leading to public cloud cost savings by up to 22%.

Recommended citation: Sharma, A., Dhakshinamurthy, S., Kesidis, G., Das, C.R.: CASH: a credit aware scheduling for public cloud platforms. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 227–236. IEEE (2021) https://doi.org/10.1109/CCGrid51090.2021.00032