2026-05-08 04:59:31 网络安全文章来源：ZONE.CI 全球网 0 阅读模式

文章总结： 该文档详细介绍了本地部署GLM-5.1-FP8模型的技术要求与操作步骤，核心要点包括：需要8张H200/H20显卡（总计至少1T显存）的硬件条件，提供Docker部署和vLLM环境安装两种方案，并给出完整的服务启动命令及API调用示例。文档强调必须安装DeepGEMM组件以支持FP8精度，同时标注了CUDA版本兼容性注意事项。 综合评分： 72 文章分类： 技术标准,解决方案,AI安全,安全工具,安全开发

cover_image

本地部署GLM-5.1需要什么条件

原创

hyang0 hyang0

生有可恋

2026年4月27日 06:52 湖北

在小说阅读器读本章

去阅读

以8位精度的zai-org/GLM-5.1-FP8为例，官方推荐运行需要的算力为8卡 H200/H20，显存要求 141GB × 8，至少1T显存。

如果你刚好有算力卡，可以根据官方指导进行本地部署：

第一种部署方式，docker 部署：

docker&nbsp;run --gpus&nbsp;all&nbsp;\&nbsp; -p&nbsp;8000:8000&nbsp;\&nbsp; --ipc=host&nbsp;\&nbsp; -v ~/.cache/huggingface:/root/.cache/huggingface&nbsp;\&nbsp; vllm/vllm-openai:glm51 zai-org/GLM-5.1-FP8&nbsp;\&nbsp; &nbsp; --tensor-parallel-size&nbsp;8&nbsp;\&nbsp; &nbsp; --tool-call-parser glm47&nbsp;\&nbsp; &nbsp; --reasoning-parser glm45&nbsp;\&nbsp; &nbsp; --enable-auto-tool-choice&nbsp;\&nbsp; &nbsp; --chat-template-content-format=string&nbsp;\&nbsp; &nbsp; --served-model-name glm-5.1-fp8

提示：如果 CUDA 13+，使用 vllm/vllm-openai:glm51-cu130 镜像

本地部署前需要安装 vllm 环境：

uv venv
source&nbsp;.venv/bin/activate
uv pip install&nbsp;"vllm==0.19.0"&nbsp;--torch-backend=autouv pip install&nbsp;"transformers>=5.4.0"
# FP8 模型必须安装 DeepGEMMbash install_deepgemm.sh

install_deepgemm.sh 下载地址：

https://github.com/vllm-project/vllm/blob/v0.16.0rc0/tools/install_deepgemm.sh

本地部署命令（8×H200/H20）

vllm&nbsp;serve zai-org/GLM-5.1-FP8&nbsp;\&nbsp; &nbsp; &nbsp;--tensor-parallel-size&nbsp;8&nbsp;\&nbsp; &nbsp; &nbsp;--speculative-config.method mtp&nbsp;\&nbsp; &nbsp; &nbsp;--speculative-config.num_speculative_tokens&nbsp;3&nbsp;\&nbsp; &nbsp; &nbsp;--tool-call-parser glm47&nbsp;\&nbsp; &nbsp; &nbsp;--reasoning-parser glm45&nbsp;\&nbsp; &nbsp; &nbsp;--enable-auto-tool-choice&nbsp;\&nbsp; &nbsp; &nbsp;--chat-template-content-format=string&nbsp;\&nbsp; &nbsp; &nbsp;--served-model-name glm-5.1-fp8

API 调用示例

# Thinking 开启（默认）curl http://localhost:8000/v1/chat/completions \&nbsp; -H "Content-Type: application/json" \&nbsp; -d '{&nbsp; &nbsp;&nbsp;"model":&nbsp;"glm-5.1-fp8",&nbsp; &nbsp;&nbsp;"messages": [&nbsp; &nbsp; &nbsp; {"role":&nbsp;"system",&nbsp;"content":&nbsp;"You are a helpful assistant."},&nbsp; &nbsp; &nbsp; {"role":&nbsp;"user",&nbsp;"content":&nbsp;"Summarize GLM-5 in one sentence."}&nbsp; &nbsp; ],&nbsp; &nbsp;&nbsp;"temperature":&nbsp;1,&nbsp; &nbsp;&nbsp;"max_tokens":&nbsp;4096&nbsp; }'# Thinking 关闭curl http://localhost:8000/v1/chat/completions \&nbsp; -H "Content-Type: application/json" \&nbsp; -d '{&nbsp; &nbsp;&nbsp;"model":&nbsp;"glm-5.1-fp8",&nbsp; &nbsp;&nbsp;"messages": [&nbsp; &nbsp; &nbsp; {"role":&nbsp;"system",&nbsp;"content":&nbsp;"You are a helpful assistant."},&nbsp; &nbsp; &nbsp; {"role":&nbsp;"user",&nbsp;"content":&nbsp;"Summarize GLM-5 in one sentence."}&nbsp; &nbsp; ],&nbsp; &nbsp;&nbsp;"temperature":&nbsp;1,&nbsp; &nbsp;&nbsp;"max_tokens":&nbsp;4096,&nbsp; &nbsp;&nbsp;"chat_template_kwargs": {"enable_thinking":&nbsp;false}&nbsp; }'

参考文档：

https://github.com/vllm-project/recipes/blob/main/GLM/GLM5.md

全文完。

免责声明：

本文所载程序、技术方法仅面向合法合规的安全研究与教学场景，旨在提升网络安全防护能力，具有明确的技术研究属性。

任何单位或个人未经授权，将本文内容用于攻击、破坏等非法用途的，由此引发的全部法律责任、民事赔偿及连带责任，均由行为人独立承担，本站不承担任何连带责任。

本站内容均为技术交流与知识分享目的发布，若存在版权侵权或其他异议，请通过邮件联系处理，具体联系方式可点击页面上方的联系我。

本文转载自：生有可恋 hyang0 hyang0《本地部署GLM-5.1需要什么条件》