Performance Tuning

This guide covers performance optimization techniques for both Local Search and Optimal solvers.

Problem Size Recommendations

Objects	Containers	Solver	Expected Time	Tuning Priority
<100	<50	Optimal	Seconds	None needed
100-1K	50-500	Either	Seconds-Minutes	Solver choice
1K-10K	500-5K	Local Search	Seconds-Minutes	Goal/constraint complexity
10K-100K	5K-50K	Local Search	Minutes	Move limits, parallelization
>100K	>50K	Local Search	Minutes-Hours	All optimizations

Quick Wins

1. Choose the Right Solver

Impact: 10-1000x speedup

Python
C++

# Bad: Using Optimal for large problem
solver.add_solver(
        SolverSpec(
            optimalSolverSpec=OptimalSolverSpec()
        ))  # Will timeout/OOM

# Good: Use Local Search for 10K+ objects
solver.add_solver(
        SolverSpec(
            localSearchSolverSpec=LocalSearchSolverSpec(timeLimitMs=30000)
        ))

// Bad: Using Optimal for large problem
solver.addSolver(OptimalSolverSpec());  // Will timeout/OOM

// Good: Use Local Search for 10K+ objects
LocalSearchSolverSpec spec;
spec.timeLimitMs = 30000;
solver.addSolver(spec);

2. Set Appropriate Time Limits

Impact: Prevent wasted time

Python
C++

# Increase time limit for better quality
solver.addSolver(
    SolverSpecs(
        localSearchSolverSpec=LocalSearchSolverSpec(
            timeLimitMs=120000  # 2 minutes instead of 30 seconds
        )
    )
)

// Increase time limit for better quality
LocalSearchSolverSpec spec;
spec.moveTypeList()->push_back(
    ProblemSolver::makeMoveTypeSpec(SingleMoveTypeSpec()));
spec.solveTime() = 120; // 2 minutes instead of 30 seconds

solver.addSolver(spec);

3. Simplify When Possible

Impact: 2-10x speedup

Python
C++

# Bad: Too many low-priority goals
solver.add_goal(balance_cpu, weight=1.0)
solver.add_goal(balance_memory, weight=1.0)
solver.add_goal(balance_disk, weight=0.1)  # Low priority
solver.add_goal(balance_network, weight=0.05)  # Very low priority

# Good: Focus on what matters
solver.add_goal(balance_cpu, weight=1.0)
solver.add_goal(balance_memory, weight=1.0)
# Skip low-priority goals for faster solving

// Bad: Too many low-priority goals
solver.addGoal(balance_cpu, 1.0);
solver.addGoal(balance_memory, 1.0);
solver.addGoal(balance_disk, 0.1);  // Low priority
solver.addGoal(balance_network, 0.05);  // Very low priority

// Good: Focus on what matters
solver.addGoal(balance_cpu, 1.0);
solver.addGoal(balance_memory, 1.0);
// Skip low-priority goals for faster solving

Local Search Optimization

Move Type Selection

Different move types have different costs:

Move Type	Cost per Iteration	When to Use
SingleMove	O(objects × containers)	Always, fast
SwapMove	O(objects²)	Capacity constrained
TripleLoop	O(objects³)	Need escape local optima
ChainMoves	O(objects² × containers)	Dependencies

Optimization: Use only necessary move types

Iteration Limits

Balance quality vs speed:

Python
C++

# Fast but lower quality
solver.add_solver(
        SolverSpec(
            localSearchSolverSpec=LocalSearchSolverSpec(movesLimit=10000)
        ))

# Slower but better quality
solver.add_solver(
        SolverSpec(
            localSearchSolverSpec=LocalSearchSolverSpec(movesLimit=1000000)
        ))

# Let it run until local optimum (no limit)
solver.add_solver(
        SolverSpec(
            localSearchSolverSpec=LocalSearchSolverSpec(timeLimitMs=300000)
        ))  # Time limit only

// Fast but lower quality
LocalSearchSolverSpec spec1;
spec1.movesLimit = 10000;
solver.addSolver(spec1);

// Slower but better quality
LocalSearchSolverSpec spec2;
spec2.movesLimit = 1000000;
solver.addSolver(spec2);

// Let it run until local optimum (no limit)
LocalSearchSolverSpec spec3;
spec3.timeLimitMs = 300000;  // Time limit only
solver.addSolver(spec3);

Initial Assignment Quality

Impact: 2-5x speedup, better final quality

Python
C++

# Bad: Start with terrible assignment
initial = {"host0": all_objects, "host1": [], ...}  # Everything on one host

# Good: Start with reasonable distribution
initial = distribute_evenly(objects, containers)  # Pre-balance

solver.set_assignment(initial)

// Bad: Start with terrible assignment
std::map<std::string, std::vector<Object>> initial;
initial["host0"] = all_objects;  // Everything on one host
initial["host1"] = {};
// ...

// Good: Start with reasonable distribution
auto initial = distributeEvenly(objects, containers);  // Pre-balance

solver.setAssignment(initial);

Optimal Solver Optimization

MIP Gap Tolerance

Accept "good enough" solutions:

Python
C++

# Accept 5% gap instead of waiting for optimality
solver.addSolver(
    SolverSpecs(
        optimalSolverSpec=OptimalSolverSpec(
            solverPackage=OptimalSolverPackage.HIGHS,
            timeLimitMs=300000,
            mipGap=0.05,  # Stop when within 5%
        )
    )
)

// Accept 5% gap instead of waiting for optimality
OptimalSolverSpec spec;
spec.solverPackage() = OptimalSolverPackage::HIGHS;
spec.solveTime() = 300;

solver.addSolver(spec);

Impact: Can reduce solve time from hours to minutes.

Solver Selection

Use the best MIP solver available:

Python
C++

# Fast: Gurobi (if licensed)

# Fast: XPRESS (community license)

# Slower: HiGHS (open source)

// Fast: Gurobi (if licensed)

// Fast: XPRESS (community license)

// Slower: HiGHS (open source)

Impact: Gurobi can be 2-5x faster than HiGHS.

Thread Control

Python
C++

# Use 8 threads for faster solving
solver.addSolver(
    SolverSpecs(
        optimalSolverSpec=OptimalSolverSpec(
            solverPackage=OptimalSolverPackage.HIGHS,
            threads=8,
        )
    )
)

// Use 8 threads for faster solving
OptimalSolverSpec spec;
spec.solverPackage() = OptimalSolverPackage::HIGHS;

solver.addSolver(spec);

Warm Starting

Provide good initial solution:

Python
C++

# Run Local Search first for good initial solution
solver.addSolver(
    SolverSpecs(localSearchSolverSpec=LocalSearchSolverSpec(timeLimitMs=30000))
)

# Then refine with Optimal (uses LS result as warmstart)
solver.addSolver(
    SolverSpecs(
        optimalSolverSpec=OptimalSolverSpec(
            solverPackage=OptimalSolverPackage.HIGHS,
            timeLimitMs=180000,
            mipGap=0.02,
        )
    )
)

// Run Local Search first for good initial solution
LocalSearchSolverSpec lsSpec;
lsSpec.moveTypeList()->push_back(
    ProblemSolver::makeMoveTypeSpec(SingleMoveTypeSpec()));
lsSpec.solveTime() = 30;
solver.addSolver(lsSpec);

// Then refine with Optimal (uses LS result as warmstart)
OptimalSolverSpec optSpec;
optSpec.solverPackage() = OptimalSolverPackage::HIGHS;
optSpec.solveTime() = 180;
solver.addSolver(optSpec);

Impact: 2-10x speedup for Optimal solver.

Goal and Constraint Optimization

Reduce Complexity

Impact: 2-5x speedup

Python
C++

# Complex: Many scopes and dimensions
solver.add_goal(
        GoalSpec(
            balanceSpec=BalanceSpec(scope="host", dimension="cpu")
        ), 1.0)
solver.add_goal(
        GoalSpec(
            balanceSpec=BalanceSpec(scope="rack", dimension="cpu")
        ), 0.5)
solver.add_goal(
        GoalSpec(
            balanceSpec=BalanceSpec(scope="dc", dimension="cpu")
        ), 0.3)
solver.add_goal(
        GoalSpec(
            balanceSpec=BalanceSpec(scope="host", dimension="memory")
        ), 1.0)
solver.add_goal(
        GoalSpec(
            balanceSpec=BalanceSpec(scope="rack", dimension="memory")
        ), 0.5)
# ... many more ...

# Simpler: Focus on key goals
solver.add_goal(
        GoalSpec(
            balanceSpec=BalanceSpec(scope="host", dimension="cpu")
        ), 1.0)
solver.add_goal(
        GoalSpec(
            balanceSpec=BalanceSpec(scope="host", dimension="memory")
        ), 1.0)
# Skip less important goals

// Complex: Many scopes and dimensions
BalanceSpec spec1("host", "cpu");
solver.addGoal(spec1, 1.0);
BalanceSpec spec2("rack", "cpu");
solver.addGoal(spec2, 0.5);
BalanceSpec spec3("dc", "cpu");
solver.addGoal(spec3, 0.3);
BalanceSpec spec4("host", "memory");
solver.addGoal(spec4, 1.0);
BalanceSpec spec5("rack", "memory");
solver.addGoal(spec5, 0.5);
// ... many more ...

// Simpler: Focus on key goals
BalanceSpec cpu_spec("host", "cpu");
solver.addGoal(cpu_spec, 1.0);
BalanceSpec mem_spec("host", "memory");
solver.addGoal(mem_spec, 1.0);
// Skip less important goals

Use Constraints Wisely

Python
C++

# Expensive: Many individual constraints
for obj in objects:
    solver.add_constraint(
        ConstraintSpec(
            avoidMovingSpec=AvoidMovingSpec(objects=[obj])
        ))  # 1000s of constraints

# Cheaper: Single constraint with all objects
solver.add_constraint(
        ConstraintSpec(
            avoidMovingSpec=AvoidMovingSpec(objects=fixed_objects)
        ))  # 1 constraint

// Expensive: Many individual constraints
for (const auto& obj : objects) {
    AvoidMovingSpec spec({obj});
    solver.addConstraint(spec);  // 1000s of constraints
}

// Cheaper: Single constraint with all objects
AvoidMovingSpec spec(fixed_objects);
solver.addConstraint(spec);  // 1 constraint

Dimension Simplification

Python
C++

# Expensive: Many fine-grained dimensions
solver.add_object_dimension("cpu_user", ...)
solver.add_object_dimension("cpu_system", ...)
solver.add_object_dimension("cpu_io_wait", ...)

# Cheaper: Combined dimension
cpu_total = {obj: cpu_user[obj] + cpu_system[obj] + cpu_iowait[obj] for obj in objects}
solver.add_object_dimension("cpu_total", cpu_total)

// Expensive: Many fine-grained dimensions
solver.addObjectDimension("cpu_user", ...);
solver.addObjectDimension("cpu_system", ...);
solver.addObjectDimension("cpu_io_wait", ...);

// Cheaper: Combined dimension
std::map<std::string, double> cpu_total;
for (const auto& obj : objects) {
    cpu_total[obj] = cpu_user[obj] + cpu_system[obj] + cpu_iowait[obj];
}
solver.addObjectDimension("cpu_total", cpu_total);

Memory Optimization

Object/Container Limits

Each object and container uses memory:

Object: ~100-200 bytes
Container: ~50-100 bytes

Example: 100K objects + 10K containers ≈ 15-25 MB

For very large problems (1M+ objects):

Consider problem decomposition
Use streaming/chunking if possible

Dimension Storage

Each dimension adds memory:

Per dimension: ~8 bytes per object/container

Optimization: Only add necessary dimensions

Python
C++

# Adds memory: 10 dimensions × 100K objects = 8MB
for dim in all_possible_dimensions:
    solver.add_object_dimension(dim, values)

# Better: Only dimensions actually used
solver.add_object_dimension("cpu", cpu_values)
solver.add_object_dimension("memory", memory_values)

// Adds memory: 10 dimensions × 100K objects = 8MB
for (const auto& dim : all_possible_dimensions) {
    solver.addObjectDimension(dim, values);
}

// Better: Only dimensions actually used
solver.addObjectDimension("cpu", cpu_values);
solver.addObjectDimension("memory", memory_values);

Parallelization

Multi-Threading

Both solvers use multiple threads:

Local Search: Parallelizes move evaluation
Optimal: MIP solver uses parallel branch-and-bound

Optimization: Ensure enough cores available

# Check available cores
nproc
# Or
python -c "import os; print(os.cpu_count())"

Multiple Solves in Parallel

For batch processing:

Python
C++

from concurrent.futures import ThreadPoolExecutor

def solve_problem(problem_id):
    solver = ProblemSolver(...)
    # ... set up problem ...
    solver.add_solver(
        SolverSpec(
            localSearchSolverSpec=LocalSearchSolverSpec()
        ))
    return solver.solve()

# Solve multiple problems in parallel
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(solve_problem, problem_ids))

#include <future>
#include <vector>

auto solve_problem(const std::string& problem_id) {
    ProblemSolver solver(...);
    // ... set up problem ...
    LocalSearchSolverSpec spec;
    solver.addSolver(spec);
    return solver.solve();
}

// Solve multiple problems in parallel
std::vector<std::future<Solution>> futures;
for (const auto& problem_id : problem_ids) {
    futures.push_back(std::async(std::launch::async, solve_problem, problem_id));
}

std::vector<Solution> results;
for (auto& future : futures) {
    results.push_back(future.get());
}

Profiling and Debugging

Measure Solve Time

Python
C++

import time

start = time.time()
solution = solver.solve()
elapsed = time.time() - start

print(f"Total time: {elapsed:.2f}s")
print(f"Solver time: {solution["profile"].solveTime / 1000:.2f}s")
print(f"Setup overhead: {elapsed - solution["profile"].solveTime/1000:.2f}s")

#include <chrono>
#include <iostream>
#include <iomanip>

auto start = std::chrono::high_resolution_clock::now();
auto solution = solver.solve();
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration<double>(end - start).count();

std::cout << std::fixed << std::setprecision(2);
std::cout << "Total time: " << elapsed << "s\n";
std::cout << "Solver time: " << solution.profile.solveTime / 1000.0 << "s\n";
std::cout << "Setup overhead: " << elapsed - solution.profile.solveTime/1000.0 << "s\n";

Identify Bottlenecks

Python
C++

# The solution includes profiling information
solution = solver.solve()

# Access profiler data from solution
profile = solution["problemProfile"]

print(f"Materialization time: {profile.materializationSec:.2f} seconds")
print(f"Solving time: {profile.solveSec:.2f} seconds")

# For Local Search, access detailed move type events
if profile.localSearchProfiles:
    ls_profile = profile.localSearchProfiles[0]
    print(f"Move types used: {ls_profile.moveTypeNames}")
    for event in ls_profile.moveTypeEvents:
        print(f"  {event.moveTypeName}: {event.count} moves")

# For hierarchical profiling (detailed breakdown)
if profile.hierarchicalProfileRoot:
    root = profile.hierarchicalProfileRoot
    print(f"Total {root.eventName}: {root.duration:.2f}ms")
    for child in root.children:
        print(f"  {child.eventName}: {child.duration:.2f}ms")

// The solution includes profiling information
auto solution = solver.solve();

// Access profiler data from solution
auto& profile = solution.problemProfile;

std::cout << "Materialization time: " << profile.materializationSec << " seconds\n";
std::cout << "Solving time: " << profile.solveSec << " seconds\n";

// For Local Search, access detailed move type events
if (!profile.localSearchProfiles.empty()) {
    auto& lsProfile = profile.localSearchProfiles[0];
    std::cout << "Move types used: ";
    for (const auto& name : lsProfile.moveTypeNames) {
        std::cout << name << " ";
    }
    std::cout << "\n";
}

// For hierarchical profiling (detailed breakdown)
if (profile.hierarchicalProfileRoot) {
    auto& root = *profile.hierarchicalProfileRoot;
    std::cout << "Total " << root.eventName << ": " << root.duration << "ms\n";
    for (const auto& child : root.children) {
        std::cout << "  " << child.eventName << ": " << child.duration << "ms\n";
    }
}

Monitor Progress

For long-running solves, you can monitor progress by checking the solution periodically or using solver time limits with multiple iterations:

Python
C++

# Iterative solving with progress monitoring
time_budget = 300000  # 5 minutes total
iteration_time = 30000  # 30 seconds per iteration

current_best = None
for i in range(time_budget // iteration_time):
    solver.add_solver(
        SolverSpec(
            optimalSolverSpec=OptimalSolverSpec(solveTime=iteration_time)
        ))
    solution = solver.solve()

    print(f"Iteration {i+1}: Objective = {solution["finalObjective"].value}")

    if current_best is None or solution["finalObjective"].value < current_best:
        current_best = solution["finalObjective"].value
        print(f"  New best solution found!")

    # Check solver status
    if solution["solverSummaries"]:
        end_reason = solution["solverSummaries"][0].endReason
        if end_reason == EndReason.OPTIMAL:
            print("  Proven optimal, stopping early")
            break

// Iterative solving with progress monitoring
int timeBudget = 300000;  // 5 minutes total
int iterationTime = 30000;  // 30 seconds per iteration

double currentBest = std::numeric_limits<double>::max();
for (int i = 0; i < timeBudget / iterationTime; i++) {
    OptimalSolverSpec spec;
    spec.solveTime_ref() = iterationTime;
    solver.addSolver(spec);

    auto solution = solver.solve();

    std::cout << "Iteration " << (i+1) << ": Objective = "
              << solution.finalObjective.value << "\n";

    if (solution.finalObjective.value < currentBest) {
        currentBest = solution.finalObjective.value;
        std::cout << "  New best solution found!\n";
    }

    // Check solver status
    if (!solution.solverSummaries.empty() &&
        solution.solverSummaries[0].endReason == EndReason::OPTIMAL) {
        std::cout << "  Proven optimal, stopping early\n";
        break;
    }
}

Note: There is no built-in progress callback API. For production use, consider running the solver in a separate thread and checking status periodically.

Common Performance Issues

Issue: Solver much slower than expected

Diagnosis:

Check problem size (objects, containers, dimensions)
Check number of goals/constraints
Check goal/constraint complexity

Solutions:

Simplify problem (fewer goals/constraints)
Use Local Search instead of Optimal
Increase time limits (may just need more time)

Issue: Runs out of memory

Diagnosis:

Solver crashes or system OOM
Very large problem or Optimal solver

Solutions:

Use Local Search (much lower memory)
Reduce problem size
Add more RAM
Problem decomposition

Issue: No progress, infinite loop

Diagnosis:

Solver running but no iterations/improvements
May indicate bug or infeasible problem

Solutions:

Check logs for errors
Verify problem is feasible (constraints not contradictory)
Try simpler problem first
Report bug if confirmed

Benchmarking

Measure Baseline

Python
C++

import time

start = time.time()
solution = solver.solve()
baseline_time = time.time() - start

print(f"Baseline: {baseline_time:.2f}s, objective={solution["objectiveValue"]}")

#include <chrono>
#include <iostream>
#include <iomanip>

auto start = std::chrono::high_resolution_clock::now();
auto solution = solver.solve();
auto end = std::chrono::high_resolution_clock::now();
auto baseline_time = std::chrono::duration<double>(end - start).count();

std::cout << std::fixed << std::setprecision(2);
std::cout << "Baseline: " << baseline_time << "s, objective="
          << solution.objectiveValue << "\n";

Test Optimizations

Python
C++

# Try different configurations
configs = [
    {"timeLimitMs": 10000},
    {"timeLimitMs": 30000},
    {"timeLimitMs": 60000},
]

for config in configs:
    solver = ProblemSolver(...)
    # ... set up problem ...
    solver.add_solver(
        SolverSpec(
            localSearchSolverSpec=LocalSearchSolverSpec(**config)
        ))

    start = time.time()
    solution = solver.solve()
    elapsed = time.time() - start

    print(f"Config {config}: {elapsed:.2f}s, objective={solution["objectiveValue"]}")

// Try different configurations
std::vector<int> time_limits = {10000, 30000, 60000};

for (int time_limit : time_limits) {
    ProblemSolver solver(...);
    // ... set up problem ...
    LocalSearchSolverSpec spec;
    spec.timeLimitMs = time_limit;
    solver.addSolver(spec);

    auto start = std::chrono::high_resolution_clock::now();
    auto solution = solver.solve();
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration<double>(end - start).count();

    std::cout << std::fixed << std::setprecision(2);
    std::cout << "Config timeLimitMs=" << time_limit << ": "
              << elapsed << "s, objective=" << solution.objectiveValue << "\n";
}

Advanced Optimizations

Problem Decomposition

For very large problems, solve in pieces:

Python
C++

# Split objects into chunks
chunks = split_objects_into_chunks(all_objects, chunk_size=10000)

solutions = []
for chunk in chunks:
    solver = ProblemSolver(...)
    # Set up problem with just this chunk
    solver.set_assignment(chunk_assignment)
    # ... add goals/constraints ...
    solutions.append(solver.solve())

# Merge solutions
final_solution = merge_solutions(solutions)

// Split objects into chunks
auto chunks = splitObjectsIntoChunks(all_objects, 10000);

std::vector<Solution> solutions;
for (const auto& chunk : chunks) {
    ProblemSolver solver(...);
    // Set up problem with just this chunk
    solver.setAssignment(chunk_assignment);
    // ... add goals/constraints ...
    solutions.push_back(solver.solve());
}

// Merge solutions
auto final_solution = mergeSolutions(solutions);

Caching and Reuse

For repeated solves:

Python
C++

# Cache problem setup
problem_setup = build_problem_setup()  # Expensive

# Solve multiple times with different parameters
for params in parameter_sweep:
    solver = ProblemSolver(...)
    apply_problem_setup(solver, problem_setup)
    # Change only parameters
    solver.add_goal(
        GoalSpec(
            balanceSpec=BalanceSpec(...)
        ), weight=params['weight'])
    solution = solver.solve()

// Cache problem setup
auto problem_setup = buildProblemSetup();  // Expensive

// Solve multiple times with different parameters
for (const auto& params : parameter_sweep) {
    ProblemSolver solver(...);
    applyProblemSetup(solver, problem_setup);
    // Change only parameters
    BalanceSpec spec(...);
    solver.addGoal(spec, params.weight);
    auto solution = solver.solve();
}

Next Steps

Solver Comparison: Solver Overview
Local Search Details: Local Search Guide
Optimal Solver Details: Optimal Solver Guide
Advanced Strategies: Solver Strategies

Problem Size Recommendations​

Quick Wins​

1. Choose the Right Solver​

2. Set Appropriate Time Limits​

3. Simplify When Possible​

Local Search Optimization​

Move Type Selection​

Iteration Limits​

Initial Assignment Quality​

Optimal Solver Optimization​

MIP Gap Tolerance​

Solver Selection​

Thread Control​

Warm Starting​

Goal and Constraint Optimization​

Reduce Complexity​

Use Constraints Wisely​

Dimension Simplification​

Memory Optimization​

Object/Container Limits​

Dimension Storage​

Parallelization​

Multi-Threading​

Multiple Solves in Parallel​

Profiling and Debugging​

Measure Solve Time​

Identify Bottlenecks​

Monitor Progress​

Common Performance Issues​

Issue: Solver much slower than expected​

Issue: Runs out of memory​

Issue: No progress, infinite loop​

Benchmarking​

Measure Baseline​

Test Optimizations​

Advanced Optimizations​

Problem Decomposition​

Caching and Reuse​

Next Steps​

Problem Size Recommendations

Quick Wins

1. Choose the Right Solver

2. Set Appropriate Time Limits

3. Simplify When Possible

Local Search Optimization

Move Type Selection

Iteration Limits

Initial Assignment Quality

Optimal Solver Optimization

MIP Gap Tolerance

Solver Selection

Thread Control

Warm Starting

Goal and Constraint Optimization

Reduce Complexity

Use Constraints Wisely

Dimension Simplification

Memory Optimization

Object/Container Limits

Dimension Storage

Parallelization

Multi-Threading

Multiple Solves in Parallel

Profiling and Debugging

Measure Solve Time

Identify Bottlenecks

Monitor Progress

Common Performance Issues

Issue: Solver much slower than expected

Issue: Runs out of memory

Issue: No progress, infinite loop

Benchmarking

Measure Baseline

Test Optimizations

Advanced Optimizations

Problem Decomposition

Caching and Reuse

Next Steps