ClavisMO: a Multi-Objective Virtualized Scheduling Framework for Distributed Computing Clusters

ClavisMO is the solution to the problem of dynamic multi-objective scheduling in Distributed Computing clusters. Our framework automatically detects and mitigates contention effects on cluster compute nodes resulting in faster job completion times, while minimizing the interconnect traffic and cluster energy consumption.

We address the problem both on node level via local OS scheduling and on cluster level via finding globally optimal solution and then enforcing it across the nodes though process migration. ClavisMO is build on top of a modern cluster management system, uses standard Open Source software and is released under Academic Free License (AFL).

Here are just some of the results showing how ClavisMO works. Our flexible approach allows to seamlessly enforce arbitrary trade-offs based on your preferences:

for a 600-node Hadoop cluster of Facebook (1 day, 6637 Hadoop jobs):

News

1 November 2015:Please join ClavisMO team at SuperComputing 2015! We will present our paper on multi-objective optimization as part of the Technical Program.
15 August 2013:Please join ClavisMO team at SuperComputing 2013! We will present our work during electronic demo session and Doctoral Dissertation Showcase.
22 March 2013:The source code of our Choco-based solver is now available at the Git repository.
17 November 2012:Please join ClavisMO team at SuperComputing 2012! We will present our paper on estimating performance degradation with Data Mining techniques as part of the Technical Program.
10 October 2012:Our poster appeared at OSDI.
15 September 2012:The project has been approved to access datacenters of Grid5000.
14 July 2012:A short paper that describes our contention-aware virtualized HPC framework appeared in HPCS 2012.
8 June 2012:The white paper and deployment scripts have been released.
12 February 2012:The project has been approved to access datacenters of FutureGrid.
2 February 2011:The project page with the initial proposal and overall description of the idea has been setup.

Download

The source code is available at our Git repository.

Sets of scripts for automated deployment of our contention-aware HPC/MapReduce framework on FutureGrid and Grid5000 hardware facilities are both available for download.

Documentation

Please refer to the READMEs included with the distribution.

White paper that describes the framework can be obtained from here.

Selected publications that use ClavisMO

Please refer to the videos that show a few simple examples of how our solution works (the traces for the videos are here).
The videos are created by our custom-made iPad application that allows to visualize ClavisMO work on-the-fly! We are currently porting the source to the new iOS7 framework and will release it soon!

Also check out Clavis: a user level scheduler for multicore systems used by our framework.

Contact Info

For all questions about the project, please contact directly:

Sergey Blagodurov (email: sergey_blagodurov@sfu.ca) PhD Candidate, School of Computing Science, SFU.

Alexandra Fedorova (email: sasha@ece.ubc.ca) Associate Professor, School of Computing Science, SFU.