Debugging FBOSS Without CLI
This guide covers debugging FBOSS when the CLI is not available, typically when the hardware agent (fboss_hw_agent) is down or not responding. The FBOSS CLI and L1 debugging commands require a running hardware agent. When that's not available, you'll need to use lower-level debugging tools.
Service Status and Logs
Checking Service Status
FBOSS services run under systemd. Check their status:
# Check all FBOSS services
systemctl status 'fboss*' 'platform_manager' 'qsfp_service' 'fsdb'
# Check specific service
systemctl status fboss_hw_agent@0
# Check if service is active
systemctl is-active fboss_sw_agent
Viewing Service Logs
All FBOSS services log to journald. Use journalctl to view logs:
# View logs for a specific service
journalctl -u fboss_sw_agent
# Follow logs in real-time
journalctl -u fboss_hw_agent@0 -f
# View logs since last boot
journalctl -u platform_manager -b
# View logs from the last hour
journalctl -u qsfp_service --since "1 hour ago"
# View logs with priority level (err, warning, info, debug)
journalctl -u fboss_sw_agent -p err
# Combine multiple services
journalctl -u fboss_sw_agent -u fboss_hw_agent@0 -u qsfp_service
# Export logs to file
journalctl -u fboss_sw_agent --since "2 hours ago" > /tmp/fboss_sw_agent.log
Journald logs include detailed stack traces for crashes with systemd-coredump information, making them valuable for debugging service failures.
Additional Log Files
FBOSS maintains several log files outside of journald:
# Boot history - shows WARM_BOOT vs COLD_BOOT events
cat /var/facebook/logs/fboss/fboss_sw_agent_boot_history.log
# Agent snapshots - detailed state snapshots
ls -lh /var/facebook/logs/fboss/fboss_sw_agent_snapshots.log
ls -lh /var/facebook/logs/fboss/fboss_hw_agent_snapshots.log
# QSFP service snapshots
ls -lh /var/facebook/logs/fboss/qsfp_service_snapshots.log
Boot history log format:
[ 2025 November 04 03:51:17 ]: Start of a WARM_BOOT, SDK version: sdk, Agent version: buildPackageVersion
[ 2025 November 04 03:52:55 ]: Start of a COLD_BOOT, SDK version: sdk, Agent version: buildPackageVersion
Service Dependencies
FBOSS services have dependencies. If a service fails, check its dependencies:
# Check dependency tree for a service
systemctl list-dependencies fboss_sw_agent
systemctl list-dependencies fboss_hw_agent@0
# Example: Restart services in dependency order
systemctl restart platform_manager
systemctl restart qsfp_service
systemctl restart fboss_hw_agent@0
systemctl restart fboss_sw_agent
Crash Dumps and State Files
Crash Dump Locations
When FBOSS agents crash, they dump state to specific locations:
# Crash info directory
/var/facebook/fboss/crash/
# Bad state updates that caused crashes
/var/facebook/fboss/crash/bad_update/old_state
/var/facebook/fboss/crash/bad_update/new_state
# Check for crash dumps
ls -la /var/facebook/fboss/crash/
Analyzing Crash Dumps
# View crash state files
cat /var/facebook/fboss/crash/bad_update/old_state | jq .
cat /var/facebook/fboss/crash/bad_update/new_state | jq .
Core Dumps
Core dumps are managed by systemd (LimitCORE=32G in service files):
# List core dumps
coredumpctl list
# Filter by FBOSS services
coredumpctl list | grep fboss
# View info about latest core
coredumpctl info
# View info for specific service
coredumpctl info fboss_sw_agent
# Extract core dump
coredumpctl dump fboss_sw_agent -o /tmp/fboss_sw_agent.core
# Debug with gdb
coredumpctl debug fboss_sw_agent
Note: Some core dumps may be marked as "inaccessible" if they were generated by root-owned processes. Use sudo to access them.
Direct Hardware Access Utilities
Important: FBOSS utilities are located in /opt/fboss/bin/ and are not in the default PATH. Use the full path when running these commands.
wedge_qsfp_util - Transceiver Debugging
When qsfp_service is down, use --direct-i2c to access transceivers directly:
# Read transceiver info without qsfp_service
/opt/fboss/bin/wedge_qsfp_util eth1/1/1 --direct-i2c
# Verbose output with thresholds
/opt/fboss/bin/wedge_qsfp_util eth1/1/1 --direct-i2c --verbose
# Read specific register
/opt/fboss/bin/wedge_qsfp_util eth1/1/1 --direct-i2c --read-reg --offset 0 --length 128
# Reset transceiver
/opt/fboss/bin/wedge_qsfp_util eth1/1/1 --direct-i2c --qsfp_hard_reset
weutil - EEPROM Information
# Read chassis EEPROM
/opt/fboss/bin/weutil
# List all EEPROMs
/opt/fboss/bin/weutil --list
# Read all EEPROMs
/opt/fboss/bin/weutil --all
# Read specific EEPROM
/opt/fboss/bin/weutil --eeprom SCM
# JSON output
/opt/fboss/bin/weutil --json
fw_util - Firmware Versions
# Get all firmware versions
/opt/fboss/bin/fw_util --fw_action version --fw_target_name all
# Get specific component version
/opt/fboss/bin/fw_util --fw_action version --fw_target_name bios
fixmyfboss - Diagnostic Tool
Automated diagnostic tool that runs various health checks:
# List available checks
/opt/fboss/bin/fixmyfboss --list-checks
# Run all checks with verbose output
/opt/fboss/bin/fixmyfboss --verbose
# Run with debug logging
/opt/fboss/bin/fixmyfboss --debug
Available checks include:
- MAC Address Correctness
- PCI Devices Exist
- Recent Manual Reboot Check
- Recent Kernel Panic Check
- Watchdog Did Not Stop Check
- i801 SMBUS Timeout Check
showtech - System Information Collection
Collects comprehensive system information for troubleshooting:
# Collect all system information
/opt/fboss/bin/showtech --details all
# Collect specific information (fboss, logs, sensors, etc.)
/opt/fboss/bin/showtech --details fboss
/opt/fboss/bin/showtech --details logs
/opt/fboss/bin/showtech --details sensor
# Available options: all, fan, fanspinner, fboss, fwutil, gpio, host, i2c,
# i2cdump, logs, lspci, nvme, pem, port, powergood, psu, sensor, weutil
# Note: Some options like fwutil, i2c, i2cdump, pem, psu are disruptive
SAI Diagnostic Shell
The SAI diagnostic shell provides low-level ASIC debugging capabilities. It requires the hardware agent (fboss_hw_agent@N) to be running.
Note: This is an advanced debugging tool. Commands vary by ASIC vendor (Broadcom, Tajo, etc.).
# Connect to the SAI diagnostic shell
/opt/fboss/bin/diag_shell_client
# Optionally specify host and port
/opt/fboss/bin/diag_shell_client --host ::1 --port 5931
# Provide reason for production environments
/opt/fboss/bin/diag_shell_client --reason "SEV12345"
Other Diagnostic Utilities
# Rack monitoring
/opt/fboss/bin/rackmon
# Sensor service client
/opt/fboss/bin/sensor_service_client
Important Runtime Files and Directories
Configuration Files
# Agent configuration
/etc/coop/agent.conf
# QSFP service configuration
/etc/coop/qsfp.conf
# Running configuration
/var/facebook/fboss/running-agent.conf
Platform Information
# Platform name
cat /var/facebook/fboss/platform_name
# FRU ID information (product name, serial, MAC addresses)
cat /var/facebook/fboss/fruid.json | jq .
# System information using dmidecode
dmidecode -s system-product-name
dmidecode -s system-serial-number
dmidecode -s system-manufacturer
# Full DMI information
dmidecode
Example FRUID output:
{
"Product Name": "MONTBLANC",
"Product Serial Number": "XXXXXXXXXXXX",
"Local MAC": "XX:XX:XX:XX:XX:XX",
"Extended MAC Base": "XX:XX:XX:XX:XX:XX",
"Extended MAC Address Size": "139"
}
Warm Boot State
Warm boot state is stored in shared memory:
# Warm boot directory
ls -la /dev/shm/fboss/warm_boot/
# SAI adaptor state
ls -lh /dev/shm/fboss/warm_boot/sai_adaptor_state_0
# Switch state
ls -lh /dev/shm/fboss/warm_boot/switch_state_0
ls -lh /dev/shm/fboss/warm_boot/thrift_switch_state
SDK Dumps
# SDK dump directories
ls -la /var/facebook/fboss/fboss_sdk_dump.*
Debug Builds
Building with Debug Symbols
Debug builds include full debug symbols and disable optimizations for easier debugging:
# Build FBOSS with Debug build type
./fboss/oss/scripts/run-getdeps.py build fboss --build-type Debug
For more information about the --build-type option and other build configurations, see Build Type Options.
Debug builds are located in the same output directory but with different optimization levels.
Using Debug Builds
# Stop the service
systemctl stop fboss_sw_agent
# Run debug binary manually with verbose logging
/path/to/debug/fboss_sw_agent --minloglevel=0
# Or with gdb
gdb --args /path/to/debug/fboss_sw_agent --minloglevel=0
Logging Control
FBOSS agents and tests support various logging flags to control verbosity and enable detailed debugging output.
Agent Logging Flags
# General logging level (glog levels)
--minloglevel=0 # 0=INFO, 1=WARNING, 2=ERROR, 3=FATAL
# FBOSS-specific verbose logging
--logging DBG0 # Minimal debug logging
--logging DBG1 # Basic debug logging
--logging DBG2 # Moderate debug logging
--logging DBG5 # Detailed debug logging
--logging DBG9 # Maximum debug logging (very verbose)
# Example: Run agent with detailed logging
systemctl stop fboss_hw_agent@0
/opt/fboss/bin/fboss_hw_agent \
--switchIndex=0 \
--minloglevel=0 \
--logging DBG5
SAI Logging Control
For SAI-based platforms, you can enable SAI adapter logging:
# Enable SAI logging at different levels
--enable_sai_log NOTICE # Basic SAI events
--enable_sai_log INFO # Informational SAI messages
--enable_sai_log DEBUG # Detailed SAI debugging (verbose)
# Combine with FBOSS logging for comprehensive debugging
/opt/fboss/bin/fboss_hw_agent \
--switchIndex=0 \
--minloglevel=0 \
--logging DBG9 \
--enable_sai_log DEBUG
SAI Log Levels:
CRITICAL: Only critical errorsERROR: Error conditionsWARN: Warning conditionsNOTICE: Normal but significant conditionsINFO: Informational messagesDEBUG: Detailed debug information (generates significant log volume)
Test Logging
When running hardware tests or unit tests:
# Run test with detailed logging
./sai_test-sai_impl-<version> \
--gtest_filter=YourTestName \
--logging DBG9 \
--enable_sai_log DEBUG
# Redirect logs to file for analysis
./sai_test-sai_impl-<version> \
--gtest_filter=YourTestName \
--logging DBG9 \
--enable_sai_log DEBUG \
2>&1 | tee test_output.log
Persistent Logging Configuration
To enable detailed logging permanently for a service, edit the systemd service file:
# Edit service file
systemctl edit fboss_hw_agent@0
# Add logging flags to ExecStart:
[Service]
ExecStart=
ExecStart=/opt/fboss/bin/fboss_hw_agent \
--switchIndex=0 \
--minloglevel=0 \
--logging DBG5 \
--enable_sai_log INFO
# Reload and restart
systemctl daemon-reload
systemctl restart fboss_hw_agent@0
Note: High logging levels (DBG9, SAI DEBUG) generate significant log volume and may impact performance. Use them only for active debugging and reduce verbosity for production use.
GDB Debugging
Attaching to Running Process
# Find process ID
pgrep fboss_sw_agent
# Attach gdb
gdb -p $(pgrep fboss_sw_agent)
# Common gdb commands:
# (gdb) bt # backtrace
# (gdb) bt full # backtrace with local variables
# (gdb) thread apply all bt # backtrace all threads
# (gdb) info threads # list all threads
# (gdb) frame N # switch to frame N
# (gdb) print variable # print variable value
# (gdb) continue # continue execution
# (gdb) detach # detach without killing process
Debugging Core Dumps
# Debug core dump with gdb
gdb /opt/fboss/bin/fboss_sw_agent /tmp/fboss_sw_agent.core
# Or use coredumpctl
coredumpctl debug fboss_sw_agent
# Useful gdb commands for core dumps:
# (gdb) bt full
# (gdb) thread apply all bt full
# (gdb) info registers
# (gdb) disassemble
Debugging Tests with GDB
When debugging unit tests or hardware tests with GDB, use these GTest flags for better debugging experience:
# Run test under gdb with break on failure
gdb --args /path/to/test_binary \
--gtest_break_on_failure \
--gtest_catch_exceptions=0 \
--gtest_filter=YourTestName
# Inside gdb:
# (gdb) run
# Test will break at the exact point of failure
GTest Debug Flags:
--gtest_break_on_failure: Automatically breaks into the debugger when a test assertion fails. This allows you to inspect the exact state at the point of failure without manually setting breakpoints.--gtest_catch_exceptions=0: Disables GTest's exception catching, allowing exceptions to propagate to the debugger. This is useful when debugging crashes or unexpected exceptions, as the debugger will stop at the throw point rather than GTest's exception handler.
Example debugging a hardware test:
# Debug a specific hardware test
gdb --args ./sai_test-sai_impl-<version> \
--gtest_break_on_failure \
--gtest_catch_exceptions=0 \
--gtest_filter=HwVlanTest.VlanApplyConfig \
--flexports \
--fruid_filepath /path/to/fruid.json \
--config /path/to/agent.conf
# Inside gdb, set breakpoints if needed:
# (gdb) break SomeFunction
# (gdb) run
# Test will break at assertion failure or exception
Common Troubleshooting Scenarios
Hardware Agent Won't Start
# 1. Check service status
systemctl status fboss_hw_agent@0
# 2. View recent logs
journalctl -u fboss_hw_agent@0 -n 100
# 3. Check dependencies
systemctl status platform_manager qsfp_service
# 4. Check for crash dumps
ls -la /var/facebook/fboss/crash/
# 5. Run diagnostic checks
/opt/fboss/bin/fixmyfboss --verbose
# 6. Try manual start with verbose logging
systemctl stop fboss_hw_agent@0
/opt/fboss/bin/fboss_hw_agent --switchIndex=0 --minloglevel=0
Service Crashes Immediately
# 1. Check core dumps
coredumpctl list | grep fboss
# 2. View crash state
cat /var/facebook/fboss/crash/bad_update/old_state | jq .
cat /var/facebook/fboss/crash/bad_update/new_state | jq .
# 3. Check for bad state updates
ls -la /var/facebook/fboss/crash/bad_update/
# 4. Debug with gdb
coredumpctl debug fboss_sw_agent
No Logs Available
# Check journald status
systemctl status systemd-journald
# Check disk space
df -h /var/log
# Manually check service output
systemctl status fboss_sw_agent -l --no-pager