Important update on 09/14/2013. I implemented the flow-based scheduler in this paper on Hadoop -- H-Quincy. For a much more detailed discussion, please visit this link.
How to achieve fair scheduling? In other words, a guy who gets up early and performs huge tasks on a cluster should not always monopolize most computing resource, and someone else's assignments should not be ignored. Otherwise, it is unfair for all the cluster users.
Fair sharing of the cluster resources
Job x takes t seconds, when running exclusively on the cluster. When the cluster has J jobs, x should take <= Jt seconds.
N computers and J jobs: each job gets at least N/J computers.
Traditionally, MPI Model, tasks are in a pipeline and then assigned to a part of cluster.
Dryad MapReduce Model
Fine-grain sharing model
Jobs uses N/J computers at a time but set in use varies over lifetime.
The solution is intended for data-intensive computing with locality. There is no SAN. However, data locality conflicts with fairness. So they present Quincy: a new, graph-based framework for cluster scheduling under a fine grain cluster resource-sharing model with locality constrains. 2 basic ideas:
Architecture: A core switch (CS) manages rack switches (RC). A rack switch (RC) manages computers (C). For example, C1, C2 and C3 in a rack have their own queues, and share a same rack queue. R1 R2 managed by the same core switch have their own queues above, and share a same core queue. Every time a task X is finished, X will be deleted from all the queues, no matter what hierarchy it is in.
So how to get fairness?
Simplify a scheduling problem to a matching problem.
How to minimize matching cost while still maintaining fairness?
Min-cost network flow.
There are U (unscheduled nodes), X (cluster aggregator nodes), R (rack aggregator nodes), C (computing nodes). In addition to queue-based scheduling, the edges connecting tasks to these nodes have weights showing the cost of the matching. The capacities on the outgoing edge of job j's unscheduled node Uj control the number of running tasks that the job will be allocated.
The authors advance a new fair schedule modeling for Dryad/MapReduce/Hadoop by min-cost network flow, achieving much better performance and effectiveness than traditional ways.