Skip to main content

GroupMoveWithHintStrategies

Move Type: Group Complexity: O(T × N × S × K) - efficient for million-object problems Primary Use: TorchRec shard placement, large-scale group-based placement

Move groups of objects using strategy hints per group. Designed for large-scale problems with multiple groups having different constraints.

Overview

GroupMoveWithHintStrategies (also known as GROUP_MOVE_WITH_HINT_STRATEGIES) enables applying different move strategies for different groups based on their unique constraints. Instead of trying all possible combinations (which would be impractical for million-object problems), this move type uses hints to guide the solver toward feasible moves for each group.

Use when:

  • Large-scale problems (millions of objects)
  • Different groups have different constraints
  • Know which strategies work well for each group
  • Have nested partition structure (primary + secondary)
  • TorchRec table sharding scenarios
  • Need efficient moves for heterogeneous groups

Avoid when:

  • Simple homogeneous problems
  • No group-specific constraints
  • Don't have strategy hints
  • Single partition is sufficient

Quick Example

# Apply different strategies for different shard types
solver.addSolver(
SolverSpecs(
localSearchSolverSpec=LocalSearchSolverSpec(
moveTypeList=[
GroupMoveWithHintStrategiesMoveTypeSpec(
primaryPartition="table",
secondaryPartition="shard_type",
moveStrategies=MoveStrategies(
groupToMoveStrategy={
"row_wise": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITH_REPLACEMENT,
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["node1", "node2"]
),
),
),
"column_wise": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITHOUT_REPLACEMENT,
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["node1", "node2"]
),
),
),
}
),
),
]
)
)
)

Parameters

ParameterTypeRequiredDefaultDescription
primaryPartitionstringYesnullOuter partition (e.g., tables)
secondaryPartitionstringYesnullInner partition (e.g., shard types)
moveStrategiesMoveStrategiesYesnullMap of group → strategy
unassignedContainerstringNonullContainer for unassigned objects
secondaryGroupReplacementConfigSecondaryGroupReplacementConfigNonullAllowed secondary group replacements

Parameter Details

primaryPartition:

  • The outer partition defining primary groups
  • Example: "table" partition in TorchRec
  • Each primary group processed independently

secondaryPartition:

  • The inner partition splitting the primary partition
  • Example: "shard_type" partition in TorchRec
  • Defines subgroups within each primary group

moveStrategies:

  • Map: group_name → MoveStrategy
  • Each MoveStrategy contains:
    • type: RANDOM_SAMPLING_WITH_REPLACEMENT or RANDOM_SAMPLING_WITHOUT_REPLACEMENT
    • moveSetsGeneratedPerScopeItem: Number of move sets to generate (default: 1)
    • moveToScopeItems: Destination scope items specification
    • tertiaryPartition: Optional third partition level
    • numScopeItemsToExplorePerTertiaryGroup: Optional sampling for tertiary groups

unassignedContainer:

  • Optional container for initially unassigned objects
  • Enables group replacement when specified

secondaryGroupReplacementConfig:

  • Controls which secondary groups can replace each other
  • Only used when unassignedContainer is set
  • Map: secondary_group → allowed_replacement_groups

How It Works

For each primary group (e.g., table):

  1. Identify secondary groups: Find all secondary groups (e.g., shard types) within this primary group
  2. Apply strategy per group: For each secondary group:
    • Look up the strategy hint for this group
    • Generate move sets according to the strategy
    • Sample with/without replacement as specified
  3. Evaluate move sets: Test all generated move sets in parallel
  4. Select best: Choose the move set that improves objective most
  5. Repeat: Process all primary groups

Visual Example

Primary Partition: Tables
Secondary Partition: Shard Types

Table1:
├─ row_wise shards → Strategy: Random sampling with replacement
├─ column_wise shards → Strategy: Random sampling without replacement
└─ data_parallel shard → Strategy: Random sampling with replacement

Table2:
├─ row_wise shards → Strategy: Random sampling with replacement
├─ column_wise shards → Strategy: Random sampling without replacement
└─ data_parallel shard → Strategy: Random sampling with replacement

Each table × shard type combination gets its own strategy

Complexity

Per primary group: O(N × S × K)

Where:

  • N = average objects per secondary group
  • S = number of secondary groups
  • K = move sets generated per secondary group

Total: O(T × N × S × K)

Where:

  • T = number of primary groups (tables)

Real-world TorchRec example:

  • Tables (T): 2,500
  • Shards per group (N): 10
  • Shard types (S): 10
  • Move sets per group (K): 1
  • Total evaluations: 2,500 × 10 × 10 × 1 = 250,000
  • At 100K eval/sec: ~2.5 seconds

Worst case (10M shards, 5K tables):

  • Tables (T): 5,000
  • Shards per group (N): 200
  • Shard types (S): 10
  • Move sets per group (K): 1
  • Total evaluations: 10,000,000
  • At 100K eval/sec: ~100 seconds

Usage Patterns

TorchRec Table Sharding

Different strategies for different shard types:

# TorchRec table sharding with different strategies per shard type
solver.addSolver(
SolverSpecs(
localSearchSolverSpec=LocalSearchSolverSpec(
moveTypeList=[
GroupMoveWithHintStrategiesMoveTypeSpec(
primaryPartition="table",
secondaryPartition="shard_type",
moveStrategies=MoveStrategies(
groupToMoveStrategy={
"row_wise": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITH_REPLACEMENT,
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["node1", "node2", "node3"]
),
),
),
"column_wise": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITHOUT_REPLACEMENT,
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["node1", "node2", "node3"]
),
),
),
"data_parallel": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITH_REPLACEMENT,
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["node1", "node2", "node3"]
),
),
),
}
),
),
]
)
)
)

With Replacement vs Without

Different sampling strategies for different constraints:

# With replacement for flexible placement, without for exclusive
solver.addSolver(
SolverSpecs(
localSearchSolverSpec=LocalSearchSolverSpec(
moveTypeList=[
GroupMoveWithHintStrategiesMoveTypeSpec(
primaryPartition="service",
secondaryPartition="tier",
moveStrategies=MoveStrategies(
groupToMoveStrategy={
"primary": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITHOUT_REPLACEMENT, # Exclusive
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["rack1", "rack2"]
),
),
),
"replica": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITH_REPLACEMENT, # Flexible
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["rack1", "rack2"]
),
),
),
}
),
),
]
)
)
)

Multiple Move Sets

Generate multiple move sets per group:

# Generate multiple move sets for better solution quality
solver.addSolver(
SolverSpecs(
localSearchSolverSpec=LocalSearchSolverSpec(
moveTypeList=[
GroupMoveWithHintStrategiesMoveTypeSpec(
primaryPartition="table",
secondaryPartition="shard_type",
moveStrategies=MoveStrategies(
groupToMoveStrategy={
"type_a": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITH_REPLACEMENT,
moveSetsGeneratedPerScopeItem=5, # Generate 5 move sets
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["node1", "node2"]
),
),
),
"type_b": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITHOUT_REPLACEMENT,
moveSetsGeneratedPerScopeItem=3, # Generate 3 move sets
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["node1", "node2"]
),
),
),
}
),
),
]
)
)
)

With Unassigned Container

Enable group replacement via unassigned container:

# Use unassigned container to enable group replacement
from rebalancer.interface.thrift.v2.SolverSpecs.thrift_types import (
MoveStrategies,
MoveStrategy,
MoveStrategyType,
MoveToScopeItemsSpec,
SecondaryGroupReplacementConfig,
)

solver.addSolver(
SolverSpecs(
localSearchSolverSpec=LocalSearchSolverSpec(
moveTypeList=[
GroupMoveWithHintStrategiesMoveTypeSpec(
primaryPartition="table",
secondaryPartition="shard_type",
unassignedContainer="unassigned", # Enable replacement
secondaryGroupReplacementConfig=SecondaryGroupReplacementConfig(
secondaryGroupToAllowedReplacements={
"type_a": [
"type_b",
"type_c",
], # type_a can be replaced by type_b or type_c
"type_b": [
"type_a"
], # type_b can be replaced by type_a
# type_c has no entry, can be replaced by any
}
),
moveStrategies=MoveStrategies(
groupToMoveStrategy={
"type_a": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITH_REPLACEMENT,
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["node1", "node2"]
)
),
),
"type_b": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITH_REPLACEMENT,
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["node1", "node2"]
)
),
),
"type_c": MoveStrategy(
type=MoveStrategyType.RANDOM_SAMPLING_WITH_REPLACEMENT,
moveToScopeItems=MoveToScopeItemsSpec(
defaultScopeItems=ScopeItemList(
itemList=["node1", "node2"]
)
),
),
}
),
),
]
)
)
)

Performance Characteristics

Strategy Comparison

StrategyReplacementUse Case
RANDOM_SAMPLING_WITH_REPLACEMENTYesCan place multiple objects on same container
RANDOM_SAMPLING_WITHOUT_REPLACEMENTNoEach container used at most once

Scalability

ObjectsTablesShard TypesKEvaluationsTime @100K/s
250K2.5K101250K2.5s
250K2.5K1051.25M12.5s
10M5K10110M100s

Key insight: Even with 10 million objects, this approach is tractable due to strategy hints

When Does It Help?

GroupMoveWithHintStrategies helps when:

  • Large scale: Millions of objects to place
  • Heterogeneous groups: Different groups have different constraints
  • Known strategies: You know which approach works for each group
  • Nested structure: Primary + secondary partition hierarchy
  • TorchRec workloads: Table sharding scenarios

GroupMoveWithHintStrategies does NOT help when:

  • Small problems: Overhead not worth it
  • Homogeneous groups: All groups have same constraints
  • No hints available: Don't know which strategies to use
  • Simple structure: Single partition sufficient

Comparison with Alternatives

Move TypeApproachScaleUse Case
SingleExplore allSmallIndependent objects
ColocateGroupsColocationMediumRelated groups together
GroupMoveWithHintStrategiesStrategy hintsLargeMillion+ objects with hints

Troubleshooting

Problem: Too slow even with hints

Diagnosis: Too many move sets being generated

Solutions:

  • Reduce moveSetsGeneratedPerScopeItem (keep at 1 initially)
  • Ensure strategies are appropriate for each group
  • Check if tertiary partition is needed
  • Review number of secondary groups

Problem: Poor solution quality

Diagnosis: Strategy hints not optimal for groups

Solutions:

  • Review which strategy type works best for each group
  • Try RANDOM_SAMPLING_WITHOUT_REPLACEMENT for exclusive placement
  • Try RANDOM_SAMPLING_WITH_REPLACEMENT for flexible placement
  • Increase moveSetsGeneratedPerScopeItem (e.g., 3-5)
  • May need different moveToScopeItems per group

Problem: Groups not moving as expected

Diagnosis: Partition or strategy configuration issue

Solutions:

  • Verify primary and secondary partitions are correct
  • Check that all secondary groups have strategies
  • Review moveToScopeItems destinations
  • Ensure unassignedContainer is set if using replacement

Problem: Strategy not defined for group

Diagnosis: Missing strategy hint in moveStrategies map

Solutions:

  • Add strategy for every secondary group
  • Check group names match partition exactly
  • Review partition definition

When to Use GroupMoveWithHintStrategies

DO use when:

  • Large-scale problems (100K+ objects)
  • Have nested partition structure (primary + secondary)
  • Know which strategies work for each group
  • Different groups have different constraints
  • TorchRec or similar workloads

DO NOT use when:

  • Small problems (<10K objects)
  • Single partition sufficient
  • All groups homogeneous
  • Don't have strategy hints
  • Need exploratory approach

Group-based alternatives:

Simpler alternatives:

Source Code

Next Steps