This report compares the inference performance of two GPU systems running local LLM models using Ollama. The benchmark tests were conducted using the llm-tester tool with concurrent requests set to 1, simulating single-user workload scenarios.

Test Configuration

Systems Tested

  1. AI Max+ 395

    • Host: bosgame.localnet
    • ROCm: Custom installation in home directory
    • Memory: 32 GB unified memory
    • VRAM: 96 GB
  2. AMD Radeon RX 7900 XTX

    • Host: rig.localnet
    • ROCm: System default installation
    • Memory: 96 GB
    • VRAM: 24 GB

Models Tested

Test Methodology

  • Benchmark Tool: llm-tester (https://github.com/Laszlobeer/llm-tester)
  • Concurrent Requests: 1 (single-user simulation)
  • Tasks per Model: 5 diverse prompts
  • Timeout: 180 seconds per task
  • Backend: Ollama API (http://localhost:11434)

Performance Results

deepseek-r1:1.5b Performance

System Avg Tokens/s Avg Latency Total Time Performance Ratio
AMD RX 7900 197.01 6.54s 32.72s 1.78x faster
Max+ 395 110.52 21.51s 107.53s baseline

Detailed Results - AMD RX 7900:

  • Task 1: 196.88 tokens/s, Latency: 9.81s
  • Task 2: 185.87 tokens/s, Latency: 17.60s
  • Task 3: 200.72 tokens/s, Latency: 1.97s
  • Task 4: 200.89 tokens/s, Latency: 1.76s
  • Task 5: 200.70 tokens/s, Latency: 1.57s

Detailed Results - Max+ 395:

  • Task 1: 111.78 tokens/s, Latency: 13.38s
  • Task 2: 93.81 tokens/s, Latency: 82.23s
  • Task 3: 115.97 tokens/s, Latency: 3.83s
  • Task 4: 114.72 tokens/s, Latency: 4.52s
  • Task 5: 116.34 tokens/s, Latency: 3.57s

AMD RX 7900 XTX running deepseek-r1:1.5b benchmark

AMD RX 7900 XTX performance on deepseek-r1:1.5b model

Max+ 395 running deepseek-r1:1.5b benchmark

Max+ 395 performance on deepseek-r1:1.5b model

qwen3:latest Performance

System Avg Tokens/s Avg Latency Total Time Performance Ratio
AMD RX 7900 86.46 12.81s 64.04s 2.71x faster
Max+ 395 31.85 41.00s 204.98s baseline

Detailed Results - AMD RX 7900:

  • Task 1: 86.56 tokens/s, Latency: 15.07s
  • Task 2: 85.69 tokens/s, Latency: 18.37s
  • Task 3: 86.74 tokens/s, Latency: 7.15s
  • Task 4: 87.91 tokens/s, Latency: 1.56s
  • Task 5: 85.43 tokens/s, Latency: 21.90s

Detailed Results - Max+ 395:

  • Task 1: 32.21 tokens/s, Latency: 33.15s
  • Task 2: 27.53 tokens/s, Latency: 104.82s
  • Task 3: 33.47 tokens/s, Latency: 16.79s
  • Task 4: 34.96 tokens/s, Latency: 4.64s
  • Task 5: 31.08 tokens/s, Latency: 45.59s

AMD RX 7900 XTX running qwen3:latest benchmark

AMD RX 7900 XTX performance on qwen3:latest model

Max+ 395 running qwen3:latest benchmark

Max+ 395 performance on qwen3:latest model

Comparative Analysis

Overall Performance Summary

Model RX 7900 Max+ 395 Performance Multiplier
deepseek-r1:1.5b 197.01 tok/s 110.52 tok/s 1.78x
qwen3:latest 86.46 tok/s 31.85 tok/s 2.71x

Key Findings

  1. RX 7900 Dominance: The AMD RX 7900 significantly outperforms the Max+ 395 across both models
  2. 78% faster on deepseek-r1:1.5b
  3. 171% faster on qwen3:latest

  4. Model-Dependent Performance Gap: The performance difference is more pronounced with the larger/more complex model (qwen3:latest), suggesting the RX 7900 handles larger models more efficiently

  5. Consistency: The RX 7900 shows more consistent performance across tasks, with lower variance in latency

  6. Total Execution Time:

  7. For deepseek-r1:1.5b: RX 7900 completed in 32.72s vs 107.53s (3.3x faster)
  8. For qwen3:latest: RX 7900 completed in 64.04s vs 204.98s (3.2x faster)

Comparison with Previous Results

Desktop PC (i9-9900k + RTX 2080, 8GB VRAM)

  • deepseek-r1:1.5b: 143 tokens/s
  • qwen3:latest: 63 tokens/s

M4 Mac (24GB Unified Memory)

  • deepseek-r1:1.5b: 81 tokens/s
  • qwen3:latest: Timeout issues (needed 120s timeout)

Performance Ranking

deepseek-r1:1.5b:

  1. AMD RX 7900: 197.01 tok/s ⭐
  2. RTX 2080 (CUDA): 143 tok/s
  3. Max+ 395: 110.52 tok/s
  4. M4 Mac: 81 tok/s

qwen3:latest:

  1. AMD RX 7900: 86.46 tok/s ⭐
  2. RTX 2080 (CUDA): 63 tok/s
  3. Max+ 395: 31.85 tok/s
  4. M4 Mac: Unable to complete within timeout

Cost-Benefit Analysis

System Pricing Context

  • Framework Desktop with Max+ 395: ~$2,500
  • AMD RX 7900: Available as standalone GPU (~$600-800 used, ~$900-1000 new)

Value Proposition

The AMD RX 7900 delivers:

  • 1.78-2.71x better performance than the Max+ 395
  • Significantly better price-to-performance ratio (~$800 vs $2,500)
  • Dedicated GPU VRAM vs shared unified memory
  • Better thermal management in desktop form factor

The $2,500 Framework Desktop investment could alternatively fund:

  • AMD RX 7900 GPU
  • High-performance desktop motherboard
  • AMD Ryzen CPU
  • 32-64GB DDR5 RAM
  • Storage and cooling
  • With budget remaining

Conclusions

  1. Clear Performance Winner: The AMD RX 7900 is substantially faster than the Max+ 395 for LLM inference workloads

  2. Value Analysis: The Framework Desktop's $2,500 price point doesn't provide competitive performance for LLM workloads compared to desktop alternatives

  3. Use Case Consideration: The Framework Desktop offers portability and unified memory benefits, but if LLM performance is the primary concern, the RX 7900 desktop configuration is superior

  4. ROCm Compatibility: Both systems successfully ran ROCm workloads, demonstrating AMD's growing ecosystem for AI/ML tasks

  5. Recommendation: For users prioritizing LLM inference performance per dollar, a desktop workstation with an RX 7900 provides significantly better value than the Max+ 395 Framework Desktop

Technical Notes

  • All tests used identical benchmark methodology with single concurrent requests
  • Both systems were running similar ROCm configurations
  • Network latency was negligible (local Ollama API)
  • Results represent real-world single-user inference scenarios

Systems Information

Both systems are running:

  • Operating System: Linux
  • LLM Runtime: Ollama
  • Acceleration: ROCm (AMD GPU compute)
  • Python: 3.12.3