Benchmark Suites¶
vLLM provides comprehensive benchmarking tools for performance testing and evaluation:
- Benchmark CLI:
vllm benchCLI tools and specialized benchmark scripts for interactive performance testing. - Parameter Sweeps: Automate
vllm benchruns for multiple configurations, useful for optimization and tuning. - Performance Dashboard: Automated CI that publishes benchmarks on each commit.
测试指令模型的准确率(例如 GSM8K)¶
对于指令微调模型(如 Phi-4、Llama-3-Instruct 等),必须使用聊天补全接口才能获得符合预期的结果。以下示例展示如何对 GSM8K 测试集进行准确率评估。
启动服务¶
```bash python -m vllm.entrypoints.openai.api_server \ --model /path/to/phi4-mini-instruct \ --served-model-name phi4-mini \ --port 8004 \ --trust-remote-code