NIMBLE Task Scheduling for Serverless Analytics

Name: NIMBLE Task Scheduling for Serverless Analytics
Start: 2021-03-19T13:00:00Z
End: 2021-03-19T15:00:00Z
Location: Online

Image credit: Unsplash

Abstract

Serverless platforms facilitate transparent resource elasticity and fine-grained billing, making them an attractive choice for data analytics. We find that while server-centric analytics frameworks typically optimize for job completion time (JCT), resource utilization and isolation via inter-job scheduling policies, serverless analytics requires optimizing for JCT and cost of execution instead, introducing a new scheduling problem. We present Caerus, a task scheduler for serverless analytics frameworks that employs a fine-grained NIMBLE scheduling algorithm to solve this problem. NIMBLE efficiently pipelines task executions within a job, minimizing execution cost while being Pareto-optimal between cost and JCT for arbitrary analytics jobs. To this end, NIMBLE models a wide range of execution parameters — pipelineable and non-piplineable data dependencies, data generation, consumption and processing rates, etc. — to determine the ideal task launch times. Our evaluation results show that in practice, Caerus is able to achieve both optimal cost and JCT for queries across a wide range of analytics workloads.

Date

Mar 19, 2021 1:00 PM — 3:00 PM

Event

Yale CSL group meeting

Location

Online

NIMBLE Task Scheduling for Serverless Analytics

Abstract

Yupeng Tang

Research Scientist @ Meta