Skip to main content

Performance Tuning

This guide covers performance optimization techniques for both Local Search and Optimal solvers.

Problem Size Recommendations

ObjectsContainersSolverExpected TimeTuning Priority
<100<50OptimalSecondsNone needed
100-1K50-500EitherSeconds-MinutesSolver choice
1K-10K500-5KLocal SearchSeconds-MinutesGoal/constraint complexity
10K-100K5K-50KLocal SearchMinutesMove limits, parallelization
>100K>50KLocal SearchMinutes-HoursAll optimizations

Quick Wins

1. Choose the Right Solver

Impact: 10-1000x speedup

# Bad: Using Optimal for large problem
solver.add_solver(
SolverSpec(
optimalSolverSpec=OptimalSolverSpec()
)) # Will timeout/OOM

# Good: Use Local Search for 10K+ objects
solver.add_solver(
SolverSpec(
localSearchSolverSpec=LocalSearchSolverSpec(timeLimitMs=30000)
))

2. Set Appropriate Time Limits

Impact: Prevent wasted time

# Increase time limit for better quality
solver.addSolver(
SolverSpecs(
localSearchSolverSpec=LocalSearchSolverSpec(
timeLimitMs=120000 # 2 minutes instead of 30 seconds
)
)
)

3. Simplify When Possible

Impact: 2-10x speedup

# Bad: Too many low-priority goals
solver.add_goal(balance_cpu, weight=1.0)
solver.add_goal(balance_memory, weight=1.0)
solver.add_goal(balance_disk, weight=0.1) # Low priority
solver.add_goal(balance_network, weight=0.05) # Very low priority

# Good: Focus on what matters
solver.add_goal(balance_cpu, weight=1.0)
solver.add_goal(balance_memory, weight=1.0)
# Skip low-priority goals for faster solving

Local Search Optimization

Move Type Selection

Different move types have different costs:

Move TypeCost per IterationWhen to Use
SingleMoveO(objects × containers)Always, fast
SwapMoveO(objects²)Capacity constrained
TripleLoopO(objects³)Need escape local optima
ChainMovesO(objects² × containers)Dependencies

Optimization: Use only necessary move types

Iteration Limits

Balance quality vs speed:

# Fast but lower quality
solver.add_solver(
SolverSpec(
localSearchSolverSpec=LocalSearchSolverSpec(movesLimit=10000)
))

# Slower but better quality
solver.add_solver(
SolverSpec(
localSearchSolverSpec=LocalSearchSolverSpec(movesLimit=1000000)
))

# Let it run until local optimum (no limit)
solver.add_solver(
SolverSpec(
localSearchSolverSpec=LocalSearchSolverSpec(timeLimitMs=300000)
)) # Time limit only

Initial Assignment Quality

Impact: 2-5x speedup, better final quality

# Bad: Start with terrible assignment
initial = {"host0": all_objects, "host1": [], ...} # Everything on one host

# Good: Start with reasonable distribution
initial = distribute_evenly(objects, containers) # Pre-balance

solver.set_assignment(initial)

Optimal Solver Optimization

MIP Gap Tolerance

Accept "good enough" solutions:

# Accept 5% gap instead of waiting for optimality
solver.addSolver(
SolverSpecs(
optimalSolverSpec=OptimalSolverSpec(
solverPackage=OptimalSolverPackage.HIGHS,
timeLimitMs=300000,
mipGap=0.05, # Stop when within 5%
)
)
)

Impact: Can reduce solve time from hours to minutes.

Solver Selection

Use the best MIP solver available:

# Fast: Gurobi (if licensed)

# Fast: XPRESS (community license)

# Slower: HiGHS (open source)

Impact: Gurobi can be 2-5x faster than HiGHS.

Thread Control

# Use 8 threads for faster solving
solver.addSolver(
SolverSpecs(
optimalSolverSpec=OptimalSolverSpec(
solverPackage=OptimalSolverPackage.HIGHS,
threads=8,
)
)
)

Warm Starting

Provide good initial solution:

# Run Local Search first for good initial solution
solver.addSolver(
SolverSpecs(localSearchSolverSpec=LocalSearchSolverSpec(timeLimitMs=30000))
)

# Then refine with Optimal (uses LS result as warmstart)
solver.addSolver(
SolverSpecs(
optimalSolverSpec=OptimalSolverSpec(
solverPackage=OptimalSolverPackage.HIGHS,
timeLimitMs=180000,
mipGap=0.02,
)
)
)

Impact: 2-10x speedup for Optimal solver.

Goal and Constraint Optimization

Reduce Complexity

Impact: 2-5x speedup

# Complex: Many scopes and dimensions
solver.add_goal(
GoalSpec(
balanceSpec=BalanceSpec(scope="host", dimension="cpu")
), 1.0)
solver.add_goal(
GoalSpec(
balanceSpec=BalanceSpec(scope="rack", dimension="cpu")
), 0.5)
solver.add_goal(
GoalSpec(
balanceSpec=BalanceSpec(scope="dc", dimension="cpu")
), 0.3)
solver.add_goal(
GoalSpec(
balanceSpec=BalanceSpec(scope="host", dimension="memory")
), 1.0)
solver.add_goal(
GoalSpec(
balanceSpec=BalanceSpec(scope="rack", dimension="memory")
), 0.5)
# ... many more ...

# Simpler: Focus on key goals
solver.add_goal(
GoalSpec(
balanceSpec=BalanceSpec(scope="host", dimension="cpu")
), 1.0)
solver.add_goal(
GoalSpec(
balanceSpec=BalanceSpec(scope="host", dimension="memory")
), 1.0)
# Skip less important goals

Use Constraints Wisely

# Expensive: Many individual constraints
for obj in objects:
solver.add_constraint(
ConstraintSpec(
avoidMovingSpec=AvoidMovingSpec(objects=[obj])
)) # 1000s of constraints

# Cheaper: Single constraint with all objects
solver.add_constraint(
ConstraintSpec(
avoidMovingSpec=AvoidMovingSpec(objects=fixed_objects)
)) # 1 constraint

Dimension Simplification

# Expensive: Many fine-grained dimensions
solver.add_object_dimension("cpu_user", ...)
solver.add_object_dimension("cpu_system", ...)
solver.add_object_dimension("cpu_io_wait", ...)

# Cheaper: Combined dimension
cpu_total = {obj: cpu_user[obj] + cpu_system[obj] + cpu_iowait[obj] for obj in objects}
solver.add_object_dimension("cpu_total", cpu_total)

Memory Optimization

Object/Container Limits

Each object and container uses memory:

  • Object: ~100-200 bytes
  • Container: ~50-100 bytes

Example: 100K objects + 10K containers ≈ 15-25 MB

For very large problems (1M+ objects):

  • Consider problem decomposition
  • Use streaming/chunking if possible

Dimension Storage

Each dimension adds memory:

  • Per dimension: ~8 bytes per object/container

Optimization: Only add necessary dimensions

# Adds memory: 10 dimensions × 100K objects = 8MB
for dim in all_possible_dimensions:
solver.add_object_dimension(dim, values)

# Better: Only dimensions actually used
solver.add_object_dimension("cpu", cpu_values)
solver.add_object_dimension("memory", memory_values)

Parallelization

Multi-Threading

Both solvers use multiple threads:

  • Local Search: Parallelizes move evaluation
  • Optimal: MIP solver uses parallel branch-and-bound

Optimization: Ensure enough cores available

# Check available cores
nproc
# Or
python -c "import os; print(os.cpu_count())"

Multiple Solves in Parallel

For batch processing:

from concurrent.futures import ThreadPoolExecutor

def solve_problem(problem_id):
solver = ProblemSolver(...)
# ... set up problem ...
solver.add_solver(
SolverSpec(
localSearchSolverSpec=LocalSearchSolverSpec()
))
return solver.solve()

# Solve multiple problems in parallel
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(solve_problem, problem_ids))

Profiling and Debugging

Measure Solve Time

import time

start = time.time()
solution = solver.solve()
elapsed = time.time() - start

print(f"Total time: {elapsed:.2f}s")
print(f"Solver time: {solution["profile"].solveTime / 1000:.2f}s")
print(f"Setup overhead: {elapsed - solution["profile"].solveTime/1000:.2f}s")

Identify Bottlenecks

# The solution includes profiling information
solution = solver.solve()

# Access profiler data from solution
profile = solution["problemProfile"]

print(f"Materialization time: {profile.materializationSec:.2f} seconds")
print(f"Solving time: {profile.solveSec:.2f} seconds")

# For Local Search, access detailed move type events
if profile.localSearchProfiles:
ls_profile = profile.localSearchProfiles[0]
print(f"Move types used: {ls_profile.moveTypeNames}")
for event in ls_profile.moveTypeEvents:
print(f" {event.moveTypeName}: {event.count} moves")

# For hierarchical profiling (detailed breakdown)
if profile.hierarchicalProfileRoot:
root = profile.hierarchicalProfileRoot
print(f"Total {root.eventName}: {root.duration:.2f}ms")
for child in root.children:
print(f" {child.eventName}: {child.duration:.2f}ms")

Monitor Progress

For long-running solves, you can monitor progress by checking the solution periodically or using solver time limits with multiple iterations:

# Iterative solving with progress monitoring
time_budget = 300000 # 5 minutes total
iteration_time = 30000 # 30 seconds per iteration

current_best = None
for i in range(time_budget // iteration_time):
solver.add_solver(
SolverSpec(
optimalSolverSpec=OptimalSolverSpec(solveTime=iteration_time)
))
solution = solver.solve()

print(f"Iteration {i+1}: Objective = {solution["finalObjective"].value}")

if current_best is None or solution["finalObjective"].value < current_best:
current_best = solution["finalObjective"].value
print(f" New best solution found!")

# Check solver status
if solution["solverSummaries"]:
end_reason = solution["solverSummaries"][0].endReason
if end_reason == EndReason.OPTIMAL:
print(" Proven optimal, stopping early")
break

Note: There is no built-in progress callback API. For production use, consider running the solver in a separate thread and checking status periodically.

Common Performance Issues

Issue: Solver much slower than expected

Diagnosis:

  1. Check problem size (objects, containers, dimensions)
  2. Check number of goals/constraints
  3. Check goal/constraint complexity

Solutions:

  • Simplify problem (fewer goals/constraints)
  • Use Local Search instead of Optimal
  • Increase time limits (may just need more time)

Issue: Runs out of memory

Diagnosis:

  • Solver crashes or system OOM
  • Very large problem or Optimal solver

Solutions:

  • Use Local Search (much lower memory)
  • Reduce problem size
  • Add more RAM
  • Problem decomposition

Issue: No progress, infinite loop

Diagnosis:

  • Solver running but no iterations/improvements
  • May indicate bug or infeasible problem

Solutions:

  • Check logs for errors
  • Verify problem is feasible (constraints not contradictory)
  • Try simpler problem first
  • Report bug if confirmed

Benchmarking

Measure Baseline

import time

start = time.time()
solution = solver.solve()
baseline_time = time.time() - start

print(f"Baseline: {baseline_time:.2f}s, objective={solution["objectiveValue"]}")

Test Optimizations

# Try different configurations
configs = [
{"timeLimitMs": 10000},
{"timeLimitMs": 30000},
{"timeLimitMs": 60000},
]

for config in configs:
solver = ProblemSolver(...)
# ... set up problem ...
solver.add_solver(
SolverSpec(
localSearchSolverSpec=LocalSearchSolverSpec(**config)
))

start = time.time()
solution = solver.solve()
elapsed = time.time() - start

print(f"Config {config}: {elapsed:.2f}s, objective={solution["objectiveValue"]}")

Advanced Optimizations

Problem Decomposition

For very large problems, solve in pieces:

# Split objects into chunks
chunks = split_objects_into_chunks(all_objects, chunk_size=10000)

solutions = []
for chunk in chunks:
solver = ProblemSolver(...)
# Set up problem with just this chunk
solver.set_assignment(chunk_assignment)
# ... add goals/constraints ...
solutions.append(solver.solve())

# Merge solutions
final_solution = merge_solutions(solutions)

Caching and Reuse

For repeated solves:

# Cache problem setup
problem_setup = build_problem_setup() # Expensive

# Solve multiple times with different parameters
for params in parameter_sweep:
solver = ProblemSolver(...)
apply_problem_setup(solver, problem_setup)
# Change only parameters
solver.add_goal(
GoalSpec(
balanceSpec=BalanceSpec(...)
), weight=params['weight'])
solution = solver.solve()

Next Steps