Skip to main content

Debugging FBOSS Without CLI

This guide covers debugging FBOSS when the CLI is not available, typically when the hardware agent (fboss_hw_agent) is down or not responding. The FBOSS CLI and L1 debugging commands require a running hardware agent. When that's not available, you'll need to use lower-level debugging tools.

Service Status and Logs

Checking Service Status

FBOSS services run under systemd. Check their status:

# Check all FBOSS services
systemctl status 'fboss*' 'platform_manager' 'qsfp_service' 'fsdb'

# Check specific service
systemctl status fboss_hw_agent@0

# Check if service is active
systemctl is-active fboss_sw_agent

Viewing Service Logs

All FBOSS services log to journald. Use journalctl to view logs:

# View logs for a specific service
journalctl -u fboss_sw_agent

# Follow logs in real-time
journalctl -u fboss_hw_agent@0 -f

# View logs since last boot
journalctl -u platform_manager -b

# View logs from the last hour
journalctl -u qsfp_service --since "1 hour ago"

# View logs with priority level (err, warning, info, debug)
journalctl -u fboss_sw_agent -p err

# Combine multiple services
journalctl -u fboss_sw_agent -u fboss_hw_agent@0 -u qsfp_service

# Export logs to file
journalctl -u fboss_sw_agent --since "2 hours ago" > /tmp/fboss_sw_agent.log

Journald logs include detailed stack traces for crashes with systemd-coredump information, making them valuable for debugging service failures.

Additional Log Files

FBOSS maintains several log files outside of journald:

# Boot history - shows WARM_BOOT vs COLD_BOOT events
cat /var/facebook/logs/fboss/fboss_sw_agent_boot_history.log

# Agent snapshots - detailed state snapshots
ls -lh /var/facebook/logs/fboss/fboss_sw_agent_snapshots.log
ls -lh /var/facebook/logs/fboss/fboss_hw_agent_snapshots.log

# QSFP service snapshots
ls -lh /var/facebook/logs/fboss/qsfp_service_snapshots.log

Boot history log format:

[ 2025 November 04 03:51:17 ]: Start of a WARM_BOOT, SDK version: sdk, Agent version: buildPackageVersion
[ 2025 November 04 03:52:55 ]: Start of a COLD_BOOT, SDK version: sdk, Agent version: buildPackageVersion

Service Dependencies

FBOSS services have dependencies. If a service fails, check its dependencies:

# Check dependency tree for a service
systemctl list-dependencies fboss_sw_agent
systemctl list-dependencies fboss_hw_agent@0

# Example: Restart services in dependency order
systemctl restart platform_manager
systemctl restart qsfp_service
systemctl restart fboss_hw_agent@0
systemctl restart fboss_sw_agent

Crash Dumps and State Files

Crash Dump Locations

When FBOSS agents crash, they dump state to specific locations:

# Crash info directory
/var/facebook/fboss/crash/

# Bad state updates that caused crashes
/var/facebook/fboss/crash/bad_update/old_state
/var/facebook/fboss/crash/bad_update/new_state

# Check for crash dumps
ls -la /var/facebook/fboss/crash/

Analyzing Crash Dumps

# View crash state files
cat /var/facebook/fboss/crash/bad_update/old_state | jq .
cat /var/facebook/fboss/crash/bad_update/new_state | jq .

Core Dumps

Core dumps are managed by systemd (LimitCORE=32G in service files):

# List core dumps
coredumpctl list

# Filter by FBOSS services
coredumpctl list | grep fboss

# View info about latest core
coredumpctl info

# View info for specific service
coredumpctl info fboss_sw_agent

# Extract core dump
coredumpctl dump fboss_sw_agent -o /tmp/fboss_sw_agent.core

# Debug with gdb
coredumpctl debug fboss_sw_agent

Note: Some core dumps may be marked as "inaccessible" if they were generated by root-owned processes. Use sudo to access them.

Direct Hardware Access Utilities

Important: FBOSS utilities are located in /opt/fboss/bin/ and are not in the default PATH. Use the full path when running these commands.

wedge_qsfp_util - Transceiver Debugging

When qsfp_service is down, use --direct-i2c to access transceivers directly:

# Read transceiver info without qsfp_service
/opt/fboss/bin/wedge_qsfp_util eth1/1/1 --direct-i2c

# Verbose output with thresholds
/opt/fboss/bin/wedge_qsfp_util eth1/1/1 --direct-i2c --verbose

# Read specific register
/opt/fboss/bin/wedge_qsfp_util eth1/1/1 --direct-i2c --read-reg --offset 0 --length 128

# Reset transceiver
/opt/fboss/bin/wedge_qsfp_util eth1/1/1 --direct-i2c --qsfp_hard_reset

weutil - EEPROM Information

# Read chassis EEPROM
/opt/fboss/bin/weutil

# List all EEPROMs
/opt/fboss/bin/weutil --list

# Read all EEPROMs
/opt/fboss/bin/weutil --all

# Read specific EEPROM
/opt/fboss/bin/weutil --eeprom SCM

# JSON output
/opt/fboss/bin/weutil --json

fw_util - Firmware Versions

# Get all firmware versions
/opt/fboss/bin/fw_util --fw_action version --fw_target_name all

# Get specific component version
/opt/fboss/bin/fw_util --fw_action version --fw_target_name bios

fixmyfboss - Diagnostic Tool

Automated diagnostic tool that runs various health checks:

# List available checks
/opt/fboss/bin/fixmyfboss --list-checks

# Run all checks with verbose output
/opt/fboss/bin/fixmyfboss --verbose

# Run with debug logging
/opt/fboss/bin/fixmyfboss --debug

Available checks include:

  • MAC Address Correctness
  • PCI Devices Exist
  • Recent Manual Reboot Check
  • Recent Kernel Panic Check
  • Watchdog Did Not Stop Check
  • i801 SMBUS Timeout Check

showtech - System Information Collection

Collects comprehensive system information for troubleshooting:

# Collect all system information
/opt/fboss/bin/showtech --details all

# Collect specific information (fboss, logs, sensors, etc.)
/opt/fboss/bin/showtech --details fboss
/opt/fboss/bin/showtech --details logs
/opt/fboss/bin/showtech --details sensor

# Available options: all, fan, fanspinner, fboss, fwutil, gpio, host, i2c,
# i2cdump, logs, lspci, nvme, pem, port, powergood, psu, sensor, weutil
# Note: Some options like fwutil, i2c, i2cdump, pem, psu are disruptive

SAI Diagnostic Shell

The SAI diagnostic shell provides low-level ASIC debugging capabilities. It requires the hardware agent (fboss_hw_agent@N) to be running.

Note: This is an advanced debugging tool. Commands vary by ASIC vendor (Broadcom, Tajo, etc.).

# Connect to the SAI diagnostic shell
/opt/fboss/bin/diag_shell_client

# Optionally specify host and port
/opt/fboss/bin/diag_shell_client --host ::1 --port 5931

# Provide reason for production environments
/opt/fboss/bin/diag_shell_client --reason "SEV12345"

Other Diagnostic Utilities

# Rack monitoring
/opt/fboss/bin/rackmon

# Sensor service client
/opt/fboss/bin/sensor_service_client

Important Runtime Files and Directories

Configuration Files

# Agent configuration
/etc/coop/agent.conf

# QSFP service configuration
/etc/coop/qsfp.conf

# Running configuration
/var/facebook/fboss/running-agent.conf

Platform Information

# Platform name
cat /var/facebook/fboss/platform_name

# FRU ID information (product name, serial, MAC addresses)
cat /var/facebook/fboss/fruid.json | jq .

# System information using dmidecode
dmidecode -s system-product-name
dmidecode -s system-serial-number
dmidecode -s system-manufacturer

# Full DMI information
dmidecode

Example FRUID output:

{
"Product Name": "MONTBLANC",
"Product Serial Number": "XXXXXXXXXXXX",
"Local MAC": "XX:XX:XX:XX:XX:XX",
"Extended MAC Base": "XX:XX:XX:XX:XX:XX",
"Extended MAC Address Size": "139"
}

Warm Boot State

Warm boot state is stored in shared memory:

# Warm boot directory
ls -la /dev/shm/fboss/warm_boot/

# SAI adaptor state
ls -lh /dev/shm/fboss/warm_boot/sai_adaptor_state_0

# Switch state
ls -lh /dev/shm/fboss/warm_boot/switch_state_0
ls -lh /dev/shm/fboss/warm_boot/thrift_switch_state

SDK Dumps

# SDK dump directories
ls -la /var/facebook/fboss/fboss_sdk_dump.*

Debug Builds

Building with Debug Symbols

Debug builds include full debug symbols and disable optimizations for easier debugging:

# Build FBOSS with Debug build type
./fboss/oss/scripts/run-getdeps.py build fboss --build-type Debug

For more information about the --build-type option and other build configurations, see Build Type Options.

Debug builds are located in the same output directory but with different optimization levels.

Using Debug Builds

# Stop the service
systemctl stop fboss_sw_agent

# Run debug binary manually with verbose logging
/path/to/debug/fboss_sw_agent --minloglevel=0

# Or with gdb
gdb --args /path/to/debug/fboss_sw_agent --minloglevel=0

Logging Control

FBOSS agents and tests support various logging flags to control verbosity and enable detailed debugging output.

Agent Logging Flags

# General logging level (glog levels)
--minloglevel=0 # 0=INFO, 1=WARNING, 2=ERROR, 3=FATAL

# FBOSS-specific verbose logging
--logging DBG0 # Minimal debug logging
--logging DBG1 # Basic debug logging
--logging DBG2 # Moderate debug logging
--logging DBG5 # Detailed debug logging
--logging DBG9 # Maximum debug logging (very verbose)

# Example: Run agent with detailed logging
systemctl stop fboss_hw_agent@0
/opt/fboss/bin/fboss_hw_agent \
--switchIndex=0 \
--minloglevel=0 \
--logging DBG5

SAI Logging Control

For SAI-based platforms, you can enable SAI adapter logging:

# Enable SAI logging at different levels
--enable_sai_log NOTICE # Basic SAI events
--enable_sai_log INFO # Informational SAI messages
--enable_sai_log DEBUG # Detailed SAI debugging (verbose)

# Combine with FBOSS logging for comprehensive debugging
/opt/fboss/bin/fboss_hw_agent \
--switchIndex=0 \
--minloglevel=0 \
--logging DBG9 \
--enable_sai_log DEBUG

SAI Log Levels:

  • CRITICAL: Only critical errors
  • ERROR: Error conditions
  • WARN: Warning conditions
  • NOTICE: Normal but significant conditions
  • INFO: Informational messages
  • DEBUG: Detailed debug information (generates significant log volume)

Test Logging

When running hardware tests or unit tests:

# Run test with detailed logging
./sai_test-sai_impl-<version> \
--gtest_filter=YourTestName \
--logging DBG9 \
--enable_sai_log DEBUG

# Redirect logs to file for analysis
./sai_test-sai_impl-<version> \
--gtest_filter=YourTestName \
--logging DBG9 \
--enable_sai_log DEBUG \
2>&1 | tee test_output.log

Persistent Logging Configuration

To enable detailed logging permanently for a service, edit the systemd service file:

# Edit service file
systemctl edit fboss_hw_agent@0

# Add logging flags to ExecStart:
[Service]
ExecStart=
ExecStart=/opt/fboss/bin/fboss_hw_agent \
--switchIndex=0 \
--minloglevel=0 \
--logging DBG5 \
--enable_sai_log INFO

# Reload and restart
systemctl daemon-reload
systemctl restart fboss_hw_agent@0

Note: High logging levels (DBG9, SAI DEBUG) generate significant log volume and may impact performance. Use them only for active debugging and reduce verbosity for production use.

GDB Debugging

Attaching to Running Process

# Find process ID
pgrep fboss_sw_agent

# Attach gdb
gdb -p $(pgrep fboss_sw_agent)

# Common gdb commands:
# (gdb) bt # backtrace
# (gdb) bt full # backtrace with local variables
# (gdb) thread apply all bt # backtrace all threads
# (gdb) info threads # list all threads
# (gdb) frame N # switch to frame N
# (gdb) print variable # print variable value
# (gdb) continue # continue execution
# (gdb) detach # detach without killing process

Debugging Core Dumps

# Debug core dump with gdb
gdb /opt/fboss/bin/fboss_sw_agent /tmp/fboss_sw_agent.core

# Or use coredumpctl
coredumpctl debug fboss_sw_agent

# Useful gdb commands for core dumps:
# (gdb) bt full
# (gdb) thread apply all bt full
# (gdb) info registers
# (gdb) disassemble

Debugging Tests with GDB

When debugging unit tests or hardware tests with GDB, use these GTest flags for better debugging experience:

# Run test under gdb with break on failure
gdb --args /path/to/test_binary \
--gtest_break_on_failure \
--gtest_catch_exceptions=0 \
--gtest_filter=YourTestName

# Inside gdb:
# (gdb) run
# Test will break at the exact point of failure

GTest Debug Flags:

  • --gtest_break_on_failure: Automatically breaks into the debugger when a test assertion fails. This allows you to inspect the exact state at the point of failure without manually setting breakpoints.

  • --gtest_catch_exceptions=0: Disables GTest's exception catching, allowing exceptions to propagate to the debugger. This is useful when debugging crashes or unexpected exceptions, as the debugger will stop at the throw point rather than GTest's exception handler.

Example debugging a hardware test:

# Debug a specific hardware test
gdb --args ./sai_test-sai_impl-<version> \
--gtest_break_on_failure \
--gtest_catch_exceptions=0 \
--gtest_filter=HwVlanTest.VlanApplyConfig \
--flexports \
--fruid_filepath /path/to/fruid.json \
--config /path/to/agent.conf

# Inside gdb, set breakpoints if needed:
# (gdb) break SomeFunction
# (gdb) run
# Test will break at assertion failure or exception

Common Troubleshooting Scenarios

Hardware Agent Won't Start

# 1. Check service status
systemctl status fboss_hw_agent@0

# 2. View recent logs
journalctl -u fboss_hw_agent@0 -n 100

# 3. Check dependencies
systemctl status platform_manager qsfp_service

# 4. Check for crash dumps
ls -la /var/facebook/fboss/crash/

# 5. Run diagnostic checks
/opt/fboss/bin/fixmyfboss --verbose

# 6. Try manual start with verbose logging
systemctl stop fboss_hw_agent@0
/opt/fboss/bin/fboss_hw_agent --switchIndex=0 --minloglevel=0

Service Crashes Immediately

# 1. Check core dumps
coredumpctl list | grep fboss

# 2. View crash state
cat /var/facebook/fboss/crash/bad_update/old_state | jq .
cat /var/facebook/fboss/crash/bad_update/new_state | jq .

# 3. Check for bad state updates
ls -la /var/facebook/fboss/crash/bad_update/

# 4. Debug with gdb
coredumpctl debug fboss_sw_agent

No Logs Available

# Check journald status
systemctl status systemd-journald

# Check disk space
df -h /var/log

# Manually check service output
systemctl status fboss_sw_agent -l --no-pager