Tutorial: Build Your First Model

Overview

Rebalancer is a library for optimizing assignment problems, where the goal is to find an optimal assignment of objects to containers subject to different constraints and objectives.

Examples of assignment problems are:

Placing server racks into datacenters
Distributing jobs across machines
Routing user traffic to web clusters
Classic puzzles such as Sudoku and the eight queens problem

In this tutorial we'll build a solver for a small but realistic problem: placing tasks onto hosts in a compute cluster.

You don't need any prior context about Rebalancer to get started. We'll introduce the key concepts as we build the example.

What We're Going to Build

Our example has 12 tasks that start out crammed onto a single host, plus 4 hosts to spread them across. We'll build a model that places the tasks while considering the following features:

Balance memory: even out memory utilization across the hosts.
Drain a host: keep host0 empty, because it's about to go down for maintenance.
Spread for safety: keep the two tasks of each job in different racks, so one rack failure can't take a whole job down.

While a production model would have more moving parts (multiple resources like CPU and disk, or multiple failure domains like region and datacenter), the ideas are the same.

All the code below comes from a single runnable file, linked in the Full example section.

Defining the Problem

First we create a ProblemSolver. It holds the model, which we build up with the methods below, and then computes a solution. We use ProblemSolverFactory, which sets up a default thread pool for us; we just give it a service name and scope (used for logging).

// Create the solver object. The factory gives us one with a ready-to-use
// thread pool, so we only pass a name and scope (used for logging).
auto solver = interface::ProblemSolverFactory::makeProblemSolver(
    /*serviceName=*/"rebalancer", /*serviceScope=*/"examples");

With the solver created, every model needs three basic things:

Object name: a short name for the things we place. Here an object is a task.
Container name: a short name for where they go. Here a container is a host.
Initial assignment: which objects start in which containers. This serves two purposes: it tells Rebalancer the full set of tasks and hosts to work with, and it records where everything starts, which some objectives compare against---for example, "move as few tasks as possible".

In our example we have 12 tasks and 4 hosts, and all tasks start on host0.

// Tell the solver what we place (tasks) and where they go (hosts).
solver->setObjectName("task");
solver->setContainerName("host");

// Start with all 12 tasks on host0 and the other hosts empty. This also
// tells the solver the full list of tasks and hosts to work with.
std::map<std::string, std::vector<std::string>> hostToTasks;
for (const auto i : folly::irange(12)) {
  hostToTasks["host0"].push_back(fmt::format("task{}", i));
}
hostToTasks["host1"] = {};
hostToTasks["host2"] = {};
hostToTasks["host3"] = {};
solver->setAssignment(hostToTasks);

The names are just strings; you can name tasks and hosts however you like. The task0, host0, ... pattern is only a convention we picked to keep the example easy to follow.

Dimensions

To balance memory, we first have to tell Rebalancer how much memory things use. A dimension is a named value attached to objects and containers that represents some real-world attribute like memory, CPU, or disk---in this case, memory.

First, how much memory each task needs. To keep things simple, we will encode that every task needs 1 GB:

// How much memory each task needs, in GB. Every task here needs 1 GB.
std::map<std::string, double> taskToMemory;
for (const auto i : folly::irange(12)) {
  taskToMemory[fmt::format("task{}", i)] = 1.0;
}
solver->addObjectDimension("memory", taskToMemory);

Next, how much memory each host has. Each host has 10 GB, except host1 which has 20 GB:

// How much memory each host has, in GB. host1 is twice as big as the rest.
const std::map<std::string, double> hostToMemory = {
    {"host0", 10.0}, {"host1", 20.0}, {"host2", 10.0}, {"host3", 10.0}};
solver->addContainerDimension("memory", hostToMemory);

You can define as many dimensions as you need (CPU, disk, and so on) and use them in the goals and constraints that follow.

Goals

A goal (also called an objective) describes something to optimize: the closer the solution gets to it, the better. A model can have several goals.

Let's add a goal that balances memory use across hosts. For this, we use a BalanceSpec goal on the memory dimension we just defined:

// Goal: balance memory utilization across hosts.
interface::BalanceSpec balance;
balance.name() = "balance-hosts";
balance.scope() = "host";
balance.dimension() = "memory";
balance.formula() = interface::BalanceSpecFormula::LINEAR;
solver->addGoal(balance, /*weight=*/1.0, /*tuplePos=*/0);

A BalanceSpec goal pushes every host toward the same relative utilization---memory used divided by memory available. Here we use the default LINEAR formula; you can read more about BalanceSpec and the formulas it supports in the BalanceSpec reference.

addGoal() also takes two optional arguments. The weight (second argument, default 1) sets a goal's influence when several goals are combined into a weighted sum. The tuple position (third argument) puts a goal in its own priority tier: the solver fully optimizes a higher tier before trading anything away for a lower one. We use the defaults here.

Scopes

Real-world hosts are usually grouped---for example into racks, where one rack holds several hosts. A scope groups containers together so you can write goals and constraints at that level.

Here we list the hosts that belong to each rack: rack0 holds host0 and host1, and rack1 holds the other two:

// Group hosts into racks: rack0 holds host0 and host1, rack1 holds the rest.
const std::map<std::string, std::vector<std::string>> rackToHosts = {
    {"rack0", {"host0", "host1"}}, {"rack1", {"host2", "host3"}}};
solver->addScope("rack", rackToHosts);

A container is really the smallest possible scope---that's why we could use host directly as the scope of the BalanceSpec above.

Partitions

Just as scopes group containers, partitions group objects. (Easy way to remember it: scopes are for containers, partitions are for objects.)

Let's group tasks into jobs of two tasks each---task0 and task1 form job0, task2 and task3 form job1, and so on. We'll use these jobs in the fault-tolerance constraint in the next section:

// Group tasks into jobs of two tasks each: job0 = task0, task1; and so on.
std::map<std::string, std::vector<std::string>> jobToTasks;
for (const auto i : folly::irange(12)) {
  jobToTasks[fmt::format("job{}", i / 2)].push_back(fmt::format("task{}", i));
}
solver->addPartition("job", jobToTasks);

Constraints

A constraint is a hard requirement that a valid solution must satisfy. Unlike a goal, which is best-effort, a constraint must always hold.

First, host0 is going down for maintenance, so we need it empty. A ToFreeSpec constraint says host0 must hold no tasks in a valid solution:

// Constraint: host0 is going down for maintenance, so it must end up empty.
interface::ToFreeSpec toFree;
toFree.name() = "make-host0-empty";
toFree.containers() = {"host0"};
solver->addConstraint(toFree);

Second, we want each job to survive a rack failure: if a rack loses power, every host in it goes offline at once, so we keep a job's two tasks in different racks. Using the job partition and the rack scope, a group count constraint with a limit of 1 says: at most one task of any job per rack.

// Constraint: keep a job's two tasks in different racks, so losing one rack
// can't take down a whole job. (At most 1 task per job in any rack.)
interface::GroupCountSpec groupCount;
groupCount.name() = "job-rack-fault-tolerance";
groupCount.scope() = "rack";
groupCount.partitionName() = "job";
groupCount.bound() = interface::GroupCountSpecBound::MAX;
interface::Limit limit;
limit.type() = interface::LimitType::ABSOLUTE;
limit.globalLimit() = 1;
groupCount.limit() = std::move(limit);
solver->addConstraint(std::move(groupCount));

Solving the Problem

The model is complete. Now we choose a solver to compute the placement. Rebalancer offers a few strategies; for this example we use the local search solver, a good default that scales to large problems. We run it and print where each task ended up:

// Pick the local-search solver, then solve.
interface::LocalSearchSolverSpec localSearch;
localSearch.moveTypeList()->push_back(
    interface::ProblemSolver::makeMoveTypeSpec(
        interface::SingleMoveTypeSpec()));
solver->addSolver(localSearch);

const auto solution = solver->solve();

// solution.assignment() tells us, for each task, the host it landed on.
// Group the placed tasks by host.
std::map<std::string, std::vector<std::string>> hostToFinalTasks;
for (const auto& [task, host] : *solution.assignment()) {
  hostToFinalTasks[host].push_back(task);
}

// Look up each task's job so we can show it next to the task.
std::map<std::string, std::string> taskToJob;
for (const auto& [job, tasks] : jobToTasks) {
  for (const auto& task : tasks) {
    taskToJob[task] = job;
  }
}

// Reverse rackToHosts so we can look up each host's rack when printing.
std::map<std::string, std::string> hostToRack;
for (const auto& [rack, hosts] : rackToHosts) {
  for (const auto& host : hosts) {
    hostToRack[host] = rack;
  }
}

// Print each host with its rack, and each task with its job.
std::cout << "\n=== Final placement ===\n";
for (auto& [host, tasks] : hostToFinalTasks) {
  std::sort(tasks.begin(), tasks.end());
  std::cout << host << " (" << hostToRack.at(host) << "):";
  for (const auto& task : tasks) {
    std::cout << " " << task << "(" << taskToJob.at(task) << ")";
  }
  std::cout << "\n";
}

While solving, Rebalancer writes progress logs to standard error, so you can watch what it's doing. The final result here is printed below a header and looks like this (the exact placement may vary between runs):

=== Final placement ===
host1 (rack0): task1(job0) task10(job5) task2(job1) task4(job2) task6(job3) task9(job4)
host2 (rack1): task11(job5) task3(job1) task5(job2)
host3 (rack1): task0(job0) task7(job3) task8(job4)

Each line shows a host, its rack, and the job each task belongs to. This solution respects every constraint and reflects the objectives we set:

host0 is empty, so it doesn't appear.
host1 (rack0) holds twice as many tasks as the other hosts, so every host sits at about 30% memory utilization.
Each job's two tasks land in different racks, so no single rack failure can take down a whole job.

In general a perfect solution isn't always possible---one that perfectly balances every task, for instance. Rebalancer returns the best solution it can find given the set of objectives and constraints.

Build and Run

The default build produces only the Rebalancer library; the examples are gated behind a CMake option. Assuming you've already built Rebalancer from source (see Installation), re-run CMake with -DEXAMPLES=ON, then build and run this example's target:

# From the rebalancer/build/ directory
cmake -GNinja -DEXAMPLES=ON -DCMAKE_BUILD_TYPE=Debug ..
ninja tasks_on_hosts.exe
./tasks_on_hosts.exe

You should see the === Final placement === output shown above.

Full Example

The complete, runnable source for this tutorial is tasks_on_hosts.cpp.

Wrapping Up

Congratulations! You've built and solved your first model with Rebalancer. You've seen the core building blocks---objects and containers, dimensions, goals, constraints, scopes, and partitions---and you should be ready to start modeling your own problems.

Here are a few references as you explore more of Rebalancer:

Goals & Constraints: a list of all the specs available and their documentation.
Solvers: learn about the different solving strategies available and their trade-offs.
Core Concepts: a deeper look at objects, containers, dimensions, scopes, and partitions.
GitHub repository: source code, issues, and discussions.

Overview​

What We're Going to Build​

Defining the Problem​

Dimensions​

Goals​

Scopes​

Partitions​

Constraints​

Solving the Problem​

Build and Run​

Full Example​

Wrapping Up​