2026-05-16 06:14:24 网络安全文章来源：ZONE.CI 全球网 0 阅读模式

文章总结： 本文是关于AIAgent（智能体）的入门指南，核心在于解释其与传统聊天机器人的本质区别——即具备感知环境、自主决策和执行行动的能力。文章详细阐述了Agent由感知、推理、行动和记忆四大模块构成的闭环工作流程，并分析了从LLM到Agent的演进原因，如解决知识时效性、增强推理深度及实现外部交互等。最后，文档还提供了从零开始搭建Node.js/TypeScript和Python双栈开发环境的具体步骤。 综合评分： 85 文章分类： AI安全,安全开发,技术标准,解决方案,其他

cover_image

1. Agent基础概念与环境搭建

原创

李北辰李北辰

SPEEDCoding

2026年5月13日 19:59 山西

在小说阅读器读本章

去阅读

完整docx文件关注公众号回复：从零构建AI驱动的二进制安全系统

欢迎开始Agent开发的旅程。在这一章中，你将理解什么是AI Agent（智能体），掌握它与普通聊天机器人的本质区别，并搭建一套完整的开发环境。本章的目标很简单：让你在阅读完毕后，能够运行属于自己的第一个Agent，并理解支撑它运转的每一个核心概念。我们不会使用任何复杂的框架——只用最基础的API调用和清晰的代码注释，确保你能看清Agent工作的每一个细节。

1.1 什么是Agent：从LLM到智能体

1.1.1 Agent的定义：超越对话的AI系统

在深入代码之前，必须先厘清一个关键概念：Agent（智能体）究竟是什么？

大型语言模型（Large Language Model，LLM），比如GPT-4或Claude，本质上是一个极其强大的文本预测引擎。你给它一段输入（Prompt），它基于训练数据生成最可能的后续文本。这种模式的局限很明显——LLM只能”说”，不能”做”。它无法查询数据库、无法调用API、无法执行代码，也无法记住超过上下文窗口限制的对话历史1^。

Agent正是在这一痛点上诞生的。一个Agent是一个能感知环境、自主决策并执行行动的AI系统。它的核心特征可以用一个简单的类比来理解：如果把LLM比作一个拥有丰富知识但被困在房间里的智者，那么Agent就是为这位智者配备了电话（工具调用）、记事本（记忆）和行动计划（推理循环），使他能够主动与外界互动、完成任务2^。

从架构视角看，Agent与Chatbot的区别体现在三个维度上：

这种区别带来了根本性的能力跃迁。一个Chatbot可以告诉你”东京今天天气如何”的原理，但一个Agent可以实际调用天气API获取实时数据、分析气温趋势，并在温度超过30度时自动发送提醒邮件给你的手机3^。

1.1.2 Agent的核心组成：四大模块的协同工作

每一个功能完备的Agent都由四个核心模块构成：感知（Perception）、推理（Reasoning）、行动（Action）和记忆（Memory）。这四个模块协同工作，形成所谓的”感知-思考-行动”闭环4^。

感知（Perception）模块负责接收外部输入。这包括用户的自然语言指令、工具返回的结构化数据、文件系统的变更通知，甚至其他Agent发送的协作消息。感知模块的核心任务是将这些异构的输入转换为LLM能够理解的统一格式——通常是文本或结构化的JSON。

推理（Reasoning）模块是Agent的”大脑”，通常由LLM承担。它的职责是根据当前感知到的信息和记忆内容，决定下一步该做什么。这包括判断是否需要调用工具、选择哪个工具、如何构造工具参数，以及何时向用户返回最终结果。先进的推理模式如ReAct（Reasoning + Acting）和Chain-of-Thought（思维链）能显著提升Agent的决策质量5^。

行动（Action）模块负责执行推理模块做出的决策。最常见的行动是工具调用（Tool Calling）——Agent通过标准化的接口调用外部函数，如发送HTTP请求、查询数据库、执行Shell命令等。工具调用的结果会作为新的观测信息反馈给感知模块，形成闭环。

记忆（Memory）模块负责信息的存储和检索。它分为两个层次：短期记忆保存当前对话的上下文，确保Agent理解对话的连贯性；长期记忆通过向量数据库等技术存储历史对话和知识，使Agent能够”回忆”过去的交互经验6^。

这四个模块的协作流程可以用下面的序列来描述：

用户输入 → [感知] 解析意图 → [记忆] 检索相关上下文 → [推理] LLM决策 →
如果需要工具 → [行动] 调用工具 → [感知] 获取工具返回 → [记忆] 存储结果 →
[推理] 基于新信息继续决策 → ... → [推理] 输出最终答案

1.1.3 从Chatbot到Agent的演进：为什么LLM alone不够

理解Agent的演进路径，有助于你把握这项技术的本质。2022年底ChatGPT的问世震撼了世界，人们惊叹于LLM流畅的对话能力和广博的知识。但很快，开发者和研究者遇到了三个难以逾越的障碍7^。

第一个障碍是知识的时效性。 LLM的训练数据有明确的截止日期，它不知道今天的新闻、实时的股价、当前的天气。通过RAG（检索增强生成）技术可以部分缓解这个问题，但RAG本质上仍是”读取”信息，无法”执行”操作。

第二个障碍是推理的深度。 面对复杂的多步骤任务，比如”分析这个项目的安全漏洞并生成修复建议”，单轮LLM调用往往力不从心。任务需要被分解为子任务、逐个执行、根据中间结果调整策略——这正是Agent循环擅长的事情。

第三个障碍是与外部世界的交互。 软件系统的价值在于它能与数据库、API、文件系统、消息队列等基础设施交互。纯文本生成的LLM无法直接操作这些系统，而工具调用（Function Calling）机制打通了这一关卡8^。

2023年，OpenAI在GPT-4中引入了Function Calling功能，这标志着Agent时代的正式开启。LLM不再只是生成文本，它开始能够输出结构化的函数调用指令。开发者可以注册一组工具（每个工具包含名称、描述和参数Schema），LLM会根据用户请求自主判断是否需要调用某个工具、使用什么参数。这一突破使得”LLM + 工具调用 + 自主循环”的Agent架构成为现实9^。

随后，MCP（Model Context Protocol，模型上下文协议）的出现进一步推动了Agent生态的标准化。MCP由Anthropic于2024年提出并开源，它定义了一套标准的协议，让AI模型可以统一地发现和调用外部工具。截至2025年，已有超过10,000个共享的MCP服务器发布，涵盖文件系统、数据库、GitHub、Slack等几乎所有主流服务10^。MCP的价值在于它解耦了工具的”实现”和”消费”——工具开发者只需按MCP标准实现一次，任何兼容MCP的Agent客户端都能自动发现和调用这些工具。

1.1.4 Agent的应用场景：从代码助手到安全分析

Agent技术已经在多个领域展现出变革性的潜力。了解这些应用场景，能帮助你找到自己感兴趣的方向11^。

代码助手与开发工具是最成熟的Agent应用领域。Cursor、GitHub Copilot、Cline等工具本质上都是Agent——它们不仅能生成代码，还能读取项目文件、执行终端命令、运行测试套件、分析错误日志。这些工具通过MCP协议连接到开发者的整个工具链，实现了从”代码补全”到”自主编程”的跨越12^。

自动化运维与DevOps是另一个高价值场景。Agent可以监控服务器指标、分析日志、识别异常模式，并在检测到问题时自动执行修复操作（如重启服务、回滚部署、调整配置）。这种模式将”人工响应”转变为”自主修复”，大幅缩短了故障恢复时间13^。

数据分析与商业智能领域，Agent能够连接数据库、编写SQL查询、生成可视化图表、撰写分析报告。用户只需用自然语言描述需求，如”分析上季度各区域的销售趋势并找出下降最明显的原因”，Agent就能自主完成从数据提取到洞察生成的全流程。

安全研究与漏洞分析是本书实战项目的核心方向。在这个领域，Agent可以执行二进制文件分析、扫描危险函数调用、识别潜在漏洞模式、生成分险评估报告。我们的目标就是构建一个能够理解二进制结构、调用反汇编工具、与LLM协作分析安全风险的智能Agent14^。

1.2 开发环境搭建

本节将带你从零开始搭建Agent开发的完整环境。我们将配置Node.js/TypeScript和Python双栈环境，安装必要的开发工具，并创建项目的初始目录结构。每一个步骤都配有详细的命令和验证方法，确保你能一次成功15^。

1.2.1 Node.js与TypeScript环境配置

我们的Agent前端和部分后端将使用TypeScript编写。TypeScript为JavaScript添加了静态类型系统，能在编译阶段捕获大量错误，这对于Agent这种逻辑复杂的应用尤为重要。

第一步：使用nvm安装Node.js

nvm（Node Version Manager）是管理Node.js版本的最佳工具，它允许你在同一台机器上切换多个Node.js版本。

# 安装nvm（如果尚未安装）
curl-o-&nbsp;https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh |&nbsp;bash
source&nbsp;~/.bashrc

# 安装Node.js 20 LTS版本（长期支持版，稳定性最佳）
nvm install&nbsp;20
nvm use&nbsp;20
nvm alias default&nbsp;20

# 验证安装
node--version&nbsp;&nbsp;# 应输出 v20.x.x
npm--version&nbsp; &nbsp;# 应输出 10.x.x

第二步：创建项目目录并初始化

# 创建项目根目录
mkdir-p&nbsp;binaryguard/{agent-frontend,agent-backend,agent-core}
cd&nbsp;binaryguard

# 初始化agent-core（Agent核心逻辑）
cd&nbsp;agent-core
npm&nbsp;init&nbsp;-y

# 安装核心依赖
npm&nbsp;install typescript @types/node ts-node dotenv @anthropic-ai/sdk openai ai @ai-sdk/openai zod

# 安装开发依赖
npm&nbsp;install&nbsp;--save-dev&nbsp;eslint @typescript-eslint/parser @typescript-eslint/plugin prettier eslint-config-prettier nodemon

# 创建TypeScript配置文件
cat&nbsp;> tsconfig.json <<&nbsp;'EOF'
{
&nbsp;&nbsp;"compilerOptions": {
&nbsp; &nbsp;&nbsp;"target":&nbsp;"ES2022",
&nbsp; &nbsp;&nbsp;"module":&nbsp;"Node16",
&nbsp; &nbsp;&nbsp;"moduleResolution":&nbsp;"Node16",
&nbsp; &nbsp;&nbsp;"lib": ["ES2022"],
&nbsp; &nbsp;&nbsp;"outDir":&nbsp;"./dist",
&nbsp; &nbsp;&nbsp;"rootDir":&nbsp;"./src",
&nbsp; &nbsp;&nbsp;"strict":&nbsp;true,
&nbsp; &nbsp;&nbsp;"esModuleInterop":&nbsp;true,
&nbsp; &nbsp;&nbsp;"skipLibCheck":&nbsp;true,
&nbsp; &nbsp;&nbsp;"forceConsistentCasingInFileNames":&nbsp;true,
&nbsp; &nbsp;&nbsp;"resolveJsonModule":&nbsp;true,
&nbsp; &nbsp;&nbsp;"declaration":&nbsp;true,
&nbsp; &nbsp;&nbsp;"declarationMap":&nbsp;true,
&nbsp; &nbsp;&nbsp;"sourceMap":&nbsp;true
&nbsp; },
&nbsp;&nbsp;"include": ["src/**/*"],
&nbsp;&nbsp;"exclude": ["node_modules",&nbsp;"dist"]
}
EOF

tsconfig.json中的几个关键配置值得解释。"module": "Node16"启用Node.js 16+的ES模块解析策略，这是与现代npm包兼容的最佳选择。"strict": true开启TypeScript的严格模式，强制要求类型声明，虽然初期编写成本略高，但长期来看能避免大量运行时错误。"esModuleInterop": true让CommonJS模块和ES模块之间的互操作更加顺畅16^。

第三步：配置ESLint和Prettier

# 创建ESLint配置
cat&nbsp;> .eslintrc.json <<&nbsp;'EOF'
{
&nbsp;&nbsp;"parser":&nbsp;"@typescript-eslint/parser",
&nbsp;&nbsp;"extends": [
&nbsp; &nbsp;&nbsp;"eslint:recommended",
&nbsp; &nbsp;&nbsp;"plugin:@typescript-eslint/recommended",
&nbsp; &nbsp;&nbsp;"prettier"
&nbsp; ],
&nbsp;&nbsp;"parserOptions": {
&nbsp; &nbsp;&nbsp;"ecmaVersion":&nbsp;2022,
&nbsp; &nbsp;&nbsp;"sourceType":&nbsp;"module"
&nbsp; },
&nbsp;&nbsp;"rules": {
&nbsp; &nbsp;&nbsp;"@typescript-eslint/explicit-function-return-type":&nbsp;"warn",
&nbsp; &nbsp;&nbsp;"@typescript-eslint/no-unused-vars":&nbsp;"error",
&nbsp; &nbsp;&nbsp;"@typescript-eslint/no-explicit-any":&nbsp;"warn"
&nbsp; }
}
EOF

# 创建Prettier配置
cat&nbsp;> .prettierrc <<&nbsp;'EOF'
{
&nbsp;&nbsp;"semi":&nbsp;true,
&nbsp;&nbsp;"trailingComma":&nbsp;"es5",
&nbsp;&nbsp;"singleQuote":&nbsp;true,
&nbsp;&nbsp;"printWidth":&nbsp;100,
&nbsp;&nbsp;"tabWidth":&nbsp;2
}
EOF

ESlint负责代码质量检查——它会提醒你未使用的变量、缺少返回类型的函数、以及潜在的错误模式。Prettier负责代码格式化——它统一团队的代码风格，让你从缩进、引号、换行等琐事中解放出来。两者配合使用，能显著提升代码的可维护性17^。

1.2.2 Python环境配置

我们的二进制分析引擎和Agent后端将使用Python编写。Python在AI生态中拥有最丰富的库支持，从Capstone反汇编引擎到OpenAI/Anthropic的SDK，Python都是一等公民。

第一步：创建Python虚拟环境

# 回到项目根目录
cd&nbsp;../agent-backend

# 创建虚拟环境（使用Python 3.11+）
python3&nbsp;-m&nbsp;venv .venv

# 激活虚拟环境
source&nbsp;.venv/bin/activate &nbsp;# Linux/Mac
# .venv\Scripts\activate &nbsp;# Windows

# 升级pip
pip install&nbsp;--upgrade&nbsp;pip

虚拟环境（Virtual Environment）是Python项目的最佳实践。它将项目的依赖隔离在一个独立的环境中，避免与系统Python或其他项目的依赖冲突。每一个Python项目都应该有自己的虚拟环境18^。

第二步：安装核心依赖

# 创建requirements.txt
cat&nbsp;> requirements.txt <<&nbsp;'EOF'
# Web框架
fastapi>=0.110.0
uvicorn[standard]>=0.27.0

# AI SDK
openai>=1.30.0
anthropic>=0.28.0

# 数据验证
pydantic>=2.7.0
pydantic-settings>=2.2.0

# 二进制分析
capstone>=5.0.0
pefile>=2023.2.7
pyelftools>=0.31
lief>=0.14.0

# 任务队列和缓存
celery>=5.3.0
redis>=5.0.0

# 数据库
sqlalchemy[asyncio]>=2.0.0
asyncpg>=0.29.0
alembic>=1.13.0

# 工具
python-dotenv>=1.0.0
httpx>=0.27.0
aiofiles>=23.2.0
python-multipart>=0.0.9

# 开发依赖
pytest>=8.0.0
pytest-asyncio>=0.23.0
black>=24.0.0
ruff>=0.4.0
mypy>=1.9.0
EOF

# 安装所有依赖
pip install&nbsp;-r&nbsp;requirements.txt

这些依赖包的选择经过精心考虑。FastAPI作为后端框架，原生支持异步处理，自动生成OpenAPI文档，是现代Python API开发的首选。Capstone是业界领先的反汇编引擎，支持x86、ARM、MIPS等15种以上的处理器架构。LIEF（Library to Instrument Executable Formats）提供统一的API来解析PE、ELF和Mach-O格式的二进制文件。Celery配合Redis用于异步任务队列，我们将在分析大型二进制文件时使用它来解耦耗时操作19^。

1.2.3 开发工具配置

VS Code插件推荐

如果你使用VS Code作为编辑器，以下插件将显著提升开发体验：

VS Code调试配置

在项目根目录创建.vscode/launch.json：

{
&nbsp;&nbsp;"version":&nbsp;"0.2.0",
&nbsp;&nbsp;"configurations": [
&nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp;&nbsp;"name":&nbsp;"Debug Agent Core",
&nbsp; &nbsp; &nbsp;&nbsp;"type":&nbsp;"node",
&nbsp; &nbsp; &nbsp;&nbsp;"request":&nbsp;"launch",
&nbsp; &nbsp; &nbsp;&nbsp;"runtimeExecutable":&nbsp;"npx",
&nbsp; &nbsp; &nbsp;&nbsp;"runtimeArgs": ["ts-node",&nbsp;"--esm"],
&nbsp; &nbsp; &nbsp;&nbsp;"args": ["src/index.ts"],
&nbsp; &nbsp; &nbsp;&nbsp;"cwd":&nbsp;"${workspaceFolder}/agent-core",
&nbsp; &nbsp; &nbsp;&nbsp;"envFile":&nbsp;"${workspaceFolder}/agent-core/.env"
&nbsp; &nbsp; },
&nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp;&nbsp;"name":&nbsp;"Debug FastAPI Backend",
&nbsp; &nbsp; &nbsp;&nbsp;"type":&nbsp;"debugpy",
&nbsp; &nbsp; &nbsp;&nbsp;"request":&nbsp;"launch",
&nbsp; &nbsp; &nbsp;&nbsp;"module":&nbsp;"uvicorn",
&nbsp; &nbsp; &nbsp;&nbsp;"args": ["app.main:app",&nbsp;"--reload",&nbsp;"--port",&nbsp;"8000"],
&nbsp; &nbsp; &nbsp;&nbsp;"cwd":&nbsp;"${workspaceFolder}/agent-backend",
&nbsp; &nbsp; &nbsp;&nbsp;"envFile":&nbsp;"${workspaceFolder}/agent-backend/.env"
&nbsp; &nbsp; }
&nbsp; ]
}

这个配置文件中定义了两个调试入口。第一个是TypeScript Agent核心，使用ts-node直接运行TypeScript文件而无需预编译，配合--esm参数支持ES模块。第二个是FastAPI后端，使用Python调试器附加到Uvicorn服务器进程。envFile字段指定了环境变量文件的路径，让你的API密钥等敏感信息不会硬编码在代码中20^。

1.2.4 项目目录结构设计

在正式开始编码之前，让我们先看一下整个项目的目录布局。了解最终目标有助于你在编写每一行代码时都清楚它所在的上下文和职责边界。

binaryguard/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 项目根目录
├── agent-core/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Agent核心逻辑（TypeScript）
│ &nbsp; ├── src/
│ &nbsp; │ &nbsp; ├── agents/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Agent类定义
│ &nbsp; │ &nbsp; │ &nbsp; ├── base-agent.ts &nbsp; &nbsp; &nbsp; &nbsp; # 抽象基类
│ &nbsp; │ &nbsp; │ &nbsp; ├── vulnerability-agent.ts # 漏洞分析Agent
│ &nbsp; │ &nbsp; │ &nbsp; └── chat-agent.ts &nbsp; &nbsp; &nbsp; &nbsp; # 对话Agent
│ &nbsp; │ &nbsp; ├── llm/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# LLM服务层
│ &nbsp; │ &nbsp; │ &nbsp; ├── llm-service.ts &nbsp; &nbsp; &nbsp; &nbsp;# 统一LLM调用接口
│ &nbsp; │ &nbsp; │ &nbsp; ├── providers/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 各Provider实现
│ &nbsp; │ &nbsp; │ &nbsp; │ &nbsp; ├── openai-provider.ts
│ &nbsp; │ &nbsp; │ &nbsp; │ &nbsp; └── anthropic-provider.ts
│ &nbsp; │ &nbsp; │ &nbsp; └── token-manager.ts &nbsp; &nbsp; &nbsp;# Token管理与成本控制
│ &nbsp; │ &nbsp; ├── tools/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 工具定义
│ &nbsp; │ &nbsp; │ &nbsp; ├── binary-tools.ts &nbsp; &nbsp; &nbsp; # 二进制分析工具
│ &nbsp; │ &nbsp; │ &nbsp; └── system-tools.ts &nbsp; &nbsp; &nbsp; # 系统工具
│ &nbsp; │ &nbsp; ├── memory/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 记忆系统
│ &nbsp; │ &nbsp; │ &nbsp; ├── short-term-memory.ts &nbsp;# 短期记忆（对话历史）
│ &nbsp; │ &nbsp; │ &nbsp; └── long-term-memory.ts &nbsp; # 长期记忆（向量存储）
│ &nbsp; │ &nbsp; ├── mcp/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# MCP协议集成
│ &nbsp; │ &nbsp; │ &nbsp; ├── mcp-client.ts &nbsp; &nbsp; &nbsp; &nbsp; # MCP客户端
│ &nbsp; │ &nbsp; │ &nbsp; └── mcp-server.ts &nbsp; &nbsp; &nbsp; &nbsp; # MCP服务端
│ &nbsp; │ &nbsp; ├── types/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 共享类型定义
│ &nbsp; │ &nbsp; └── index.ts &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 入口文件
│ &nbsp; ├── tests/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 测试文件
│ &nbsp; ├── package.json
│ &nbsp; ├── tsconfig.json
│ &nbsp; └── .env &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 环境变量（不提交Git）
│
├── agent-backend/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 后端API服务（Python）
│ &nbsp; ├── app/
│ &nbsp; │ &nbsp; ├── api/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# API路由
│ &nbsp; │ &nbsp; │ &nbsp; ├── upload.py &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 文件上传
│ &nbsp; │ &nbsp; │ &nbsp; ├── analysis.py &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 分析任务管理
│ &nbsp; │ &nbsp; │ &nbsp; └── websocket.py &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# WebSocket实时推送
│ &nbsp; │ &nbsp; ├── services/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 业务服务
│ &nbsp; │ &nbsp; │ &nbsp; ├── binary_analysis.py &nbsp; &nbsp;# 二进制分析引擎
│ &nbsp; │ &nbsp; │ &nbsp; ├── disassembly.py &nbsp; &nbsp; &nbsp; &nbsp;# 反汇编服务
│ &nbsp; │ &nbsp; │ &nbsp; └── file_service.py &nbsp; &nbsp; &nbsp; # 文件处理服务
│ &nbsp; │ &nbsp; ├── models/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 数据模型
│ &nbsp; │ &nbsp; ├── tasks/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# Celery异步任务
│ &nbsp; │ &nbsp; ├── core/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 核心配置
│ &nbsp; │ &nbsp; │ &nbsp; ├── config.py &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 应用配置
│ &nbsp; │ &nbsp; │ &nbsp; └── database.py &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 数据库连接
│ &nbsp; │ &nbsp; └── main.py &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # FastAPI入口
│ &nbsp; ├── migrations/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 数据库迁移
│ &nbsp; ├── tests/
│ &nbsp; ├── requirements.txt
│ &nbsp; └── .env
│
├── agent-frontend/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 前端界面（Vue3）
│ &nbsp; ├── src/
│ &nbsp; │ &nbsp; ├── components/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # UI组件
│ &nbsp; │ &nbsp; ├── views/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 页面视图
│ &nbsp; │ &nbsp; ├── stores/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Pinia状态管理
│ &nbsp; │ &nbsp; ├── composables/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 组合式函数
│ &nbsp; │ &nbsp; └── api/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# API客户端
│ &nbsp; └── package.json
│
├── docker-compose.yml &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 服务编排
├── Dockerfile &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 容器构建
└── README.md

本章我们主要关注agent-core目录下的内容，特别是src/agents/和src/llm/两个模块。其他目录将在后续章节中逐步填充21^。

现在让我们运行一个环境验证脚本，确认所有依赖都正确安装。

代码示例1：环境验证脚本

创建agent-core/scripts/verify-env.ts文件：

/**
&nbsp;* 环境验证脚本
&nbsp;* 运行此脚本可确认所有核心依赖已正确安装
&nbsp;* 执行命令: npx ts-node --esm scripts/verify-env.ts
&nbsp;*/

import { execSync } from 'child_process';
import fs from 'fs';

// 验证结果收集器
const results: { check: string; status: 'PASS' | 'FAIL' | 'WARN'; message: string }[] = [];

function check(description: string, fn: () => void): void {
&nbsp; try {
&nbsp; &nbsp; fn();
&nbsp; &nbsp; results.push({ check: description, status: 'PASS', message: 'OK' });
&nbsp; } catch (error: unknown) {
&nbsp; &nbsp; const msg = error instanceof Error ? error.message : String(error);
&nbsp; &nbsp; results.push({ check: description, status: 'FAIL', message: msg });
&nbsp; }
}

function warn(description: string, fn: () => void): void {
&nbsp; try {
&nbsp; &nbsp; fn();
&nbsp; &nbsp; results.push({ check: description, status: 'PASS', message: 'OK' });
&nbsp; } catch (error: unknown) {
&nbsp; &nbsp; const msg = error instanceof Error ? error.message : String(error);
&nbsp; &nbsp; results.push({ check: description, status: 'WARN', message: msg });
&nbsp; }
}

console.log('🔍 BinaryGuard 环境验证\n');

// 1. 验证Node.js版本
check('Node.js 版本 >= 18', () => {
&nbsp; const version = process.version; // e.g., "v20.10.0"
&nbsp; const major = parseInt(version.slice(1).split('.')[0], 10);
&nbsp; if (major < 18) {
&nbsp; &nbsp; throw new Error(`当前版本 ${version}，需要 >= 18`);
&nbsp; }
});

// 2. 验证TypeScript安装
check('TypeScript 编译器', () => {
&nbsp; const output = execSync('npx tsc --version', { encoding: 'utf8' });
&nbsp; if (!output.includes('Version')) {
&nbsp; &nbsp; throw new Error('tsc 未正确安装');
&nbsp; }
});

// 3. 验证核心npm包
check('OpenAI SDK', () => {
&nbsp; require('openai');
});

check('Anthropic SDK', () => {
&nbsp; require('@anthropic-ai/sdk');
});

check('Vercel AI SDK', () => {
&nbsp; require('ai');
});

check('Zod 验证库', () => {
&nbsp; const z = require('zod');
&nbsp; // 验证Zod能正常工作
&nbsp; const schema = z.object({ name: z.string() });
&nbsp; schema.parse({ name: 'test' });
});

// 4. 验证Python环境
warn('Python 3.11+', () => {
&nbsp; const output = execSync('python3 --version', { encoding: 'utf8' });
&nbsp; const version = output.trim().split(' ')[1]; // "Python 3.11.x"
&nbsp; const minor = parseInt(version.split('.')[1], 10);
&nbsp; if (minor < 11) {
&nbsp; &nbsp; throw new Error(`当前Python ${version}，推荐 3.11+`);
&nbsp; }
});

// 5. 验证Python关键包
warn('Python: FastAPI', () => {
&nbsp; execSync('python3 -c "import fastapi; print(fastapi.__version__)"', { encoding: 'utf8' });
});

warn('Python: Capstone', () => {
&nbsp; execSync('python3 -c "import capstone; print(capstone.__version__)"', { encoding: 'utf8' });
});

// 6. 验证环境变量文件
warn('.env 文件存在', () => {
&nbsp; if (!fs.existsSync('.env')) {
&nbsp; &nbsp; throw new Error('.env 文件不存在，请从 .env.example 复制创建');
&nbsp; }
});

// 输出验证结果
console.log('\n' + '='.repeat(50));
console.log('验证结果汇总');
console.log('='.repeat(50));

let passCount = 0;
let failCount = 0;
let warnCount = 0;

for (const r of results) {
&nbsp; const icon = r.status === 'PASS' ? '✅' : r.status === 'FAIL' ? '❌' : '⚠️';
&nbsp; console.log(`${icon} [${r.status}] ${r.check}`);
&nbsp; if (r.status !== 'PASS') {
&nbsp; &nbsp; console.log(` &nbsp; → ${r.message}`);
&nbsp; }
&nbsp; if (r.status === 'PASS') passCount++;
&nbsp; if (r.status === 'FAIL') failCount++;
&nbsp; if (r.status === 'WARN') warnCount++;
}

console.log('\n' + '='.repeat(50));
console.log(`总计: ${passCount} 通过, ${failCount} 失败, ${warnCount} 警告`);

if (failCount > 0) {
&nbsp; console.log('\n❌ 环境配置不完整，请参考上方错误信息修复。');
&nbsp; process.exit(1);
} else if (warnCount > 0) {
&nbsp; console.log('\n⚠️ 核心环境就绪，部分可选组件待配置。');
} else {
&nbsp; console.log('\n🎉 所有检查通过，环境配置完整！');
}

这个验证脚本采用防御式编程风格：每个检查项都被包裹在try-catch块中，单个检查失败不会影响其他检查的执行。脚本最后输出结构化的验证报告，让你一眼看出哪些组件就绪、哪些需要修复。warn级别的检查代表可选依赖——你可以暂时跳过它们，但后续章节的某些功能可能会受限22^。

运行验证脚本的命令是：

cd agent-core
npx ts-node --esm scripts/verify-env.ts

如果所有检查都通过，你将看到”🎉 所有检查通过”的提示。如果有失败项，脚本会给出具体的错误信息，帮助你定位问题。

1.3 第一个Agent：Hello Agent

环境就绪后，让我们进入最激动人心的部分——编写你的第一个Agent。我们将从最基础的对话Agent开始，逐步添加系统提示词、对话历史管理和Token监控功能。

1.3.1 OpenAI/Anthropic API接入

在使用任何LLM服务之前，你需要获取API Key。目前主流的选择有OpenAI的GPT系列和Anthropic的Claude系列，两者各有优势：GPT-4o在工具调用和结构化输出方面表现优异，Claude 3.5 Sonnet在长文本理解和代码生成方面更具优势23^。

获取API Key：

OpenAI：访问 platform.openai.com 创建API Key
Anthropic：访问 console.anthropic.com 创建API Key

安全存储API Key：

永远不要将API Key硬编码在源代码中。创建.env文件来存储敏感信息：

cd agent-core
cat > .env << 'EOF'
# LLM API配置
OPENAI_API_KEY=sk-your-openai-key-here
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here

# 默认使用的模型
DEFAULT_MODEL_PROVIDER=openai
DEFAULT_MODEL=gpt-4o-mini

# 可选：自定义API基础地址（用于代理或兼容服务）
# OPENAI_BASE_URL=https://your-proxy.example.com/v1
EOF

.env文件应该被列入.gitignore，确保不会被意外提交到版本控制系统。dotenv包会在应用启动时自动加载这些环境变量24^。

1.3.2 最小Agent实现：50行代码构建对话Agent

下面是本书的第一个核心代码示例——一个能用50行代码运行的基础Agent。它展示了Agent最本质的工作模式：接收用户输入、调用LLM、返回响应。

代码示例2：Hello Agent（最小Agent实现）

创建agent-core/src/agents/hello-agent.ts文件：

/**
&nbsp;* Hello Agent - 最简Agent实现
&nbsp;* 展示Agent的核心工作模式：用户输入 → LLM调用 → 响应输出
&nbsp;* 运行方式: npx ts-node --esm src/agents/hello-agent.ts
&nbsp;*/

import OpenAI from 'openai';
import dotenv from 'dotenv';

// 加载环境变量（从 .env 文件读取 OPENAI_API_KEY）
dotenv.config();

// 初始化OpenAI客户端
const openai = new OpenAI({
&nbsp; apiKey: process.env.OPENAI_API_KEY,
});

/**
&nbsp;* Agent主函数：运行一个完整的对话循环
&nbsp;* 这个循环会持续读取用户输入，直到用户输入 "exit" 或 "quit"
&nbsp;*/
async function runAgent(): Promise<void> {
&nbsp; console.log('🤖 Hello Agent 已启动');
&nbsp; console.log('输入 "exit" 退出对话\n');

&nbsp; // 模拟从命令行读取用户输入
&nbsp; // 实际应用中，这里可能是Web界面传来的消息或API请求
&nbsp; const userInputs: string[] = [
&nbsp; &nbsp; '你好，请介绍一下自己',
&nbsp; &nbsp; '什么是二进制安全分析？',
&nbsp; &nbsp; 'exit',
&nbsp; ];

&nbsp; for (const userInput of userInputs) {
&nbsp; &nbsp; console.log(`👤 用户: ${userInput}`);

&nbsp; &nbsp; // 退出条件检查
&nbsp; &nbsp; if (userInput.toLowerCase() === 'exit' || userInput.toLowerCase() === 'quit') {
&nbsp; &nbsp; &nbsp; console.log('👋 再见！');
&nbsp; &nbsp; &nbsp; break;
&nbsp; &nbsp; }

&nbsp; &nbsp; // 核心：调用LLM生成响应
&nbsp; &nbsp; const response = await openai.chat.completions.create({
&nbsp; &nbsp; &nbsp; model: 'gpt-4o-mini', &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// 使用轻量级模型降低成本
&nbsp; &nbsp; &nbsp; messages: [
&nbsp; &nbsp; &nbsp; &nbsp; { role: 'system', content: '你是一个友好的技术助手，擅长安全分析领域。' },
&nbsp; &nbsp; &nbsp; &nbsp; { role: 'user', content: userInput },
&nbsp; &nbsp; &nbsp; ],
&nbsp; &nbsp; &nbsp; temperature: 0.7, &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // 控制创造性：0=保守，1=随机
&nbsp; &nbsp; &nbsp; max_tokens: 500, &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// 限制响应长度，控制成本
&nbsp; &nbsp; });

&nbsp; &nbsp; // 提取并输出LLM的回复
&nbsp; &nbsp; const reply = response.choices[0]?.message?.content ?? '（无响应）';
&nbsp; &nbsp; console.log(`🤖 Agent: ${reply}\n`);

&nbsp; &nbsp; // 输出Token使用情况（成本监控的基础）
&nbsp; &nbsp; if (response.usage) {
&nbsp; &nbsp; &nbsp; console.log(
&nbsp; &nbsp; &nbsp; &nbsp; `📊 Token使用: 输入=${response.usage.prompt_tokens}, ` +
&nbsp; &nbsp; &nbsp; &nbsp; `输出=${response.usage.completion_tokens}, ` +
&nbsp; &nbsp; &nbsp; &nbsp; `总计=${response.usage.total_tokens}`
&nbsp; &nbsp; &nbsp; );
&nbsp; &nbsp; }
&nbsp; }
}

// 运行Agent，并捕获可能的错误
runAgent().catch((error: unknown) => {
&nbsp; console.error('Agent运行出错:', error instanceof Error ? error.message : String(error));
&nbsp; process.exit(1);
});

这个50行左右的程序虽然简短，却包含了一个Agent的所有核心要素。

逐行解析：

dotenv.config()读取.env文件，将OPENAI_API_KEY等变量加载到process.env中。这是管理敏感配置的标准做法。

new OpenAI({ apiKey: process.env.OPENAI_API_KEY })创建OpenAI SDK客户端。这个客户端封装了HTTP请求、身份验证、重试等底层细节，让你只需关注业务逻辑。

openai.chat.completions.create()是核心API调用。model参数指定使用的模型，gpt-4o-mini是一个性价比极高的选择，适合开发和测试阶段。messages数组是对话的核心载体——它包含所有历史消息，每条消息有role（system/user/assistant/tool）和content两个字段25^。

temperature参数控制输出的随机性。值为0时模型总是选择概率最高的token，输出确定性最强；值为1时模型更可能选择低概率的token，输出更具创造性。对于Agent开发，推荐设置在0.3-0.7之间——既保持一定灵活性，又避免过于离奇的输出。

max_tokens是成本控制的关键参数。它限制模型单次响应的最大长度，防止意外情况下的Token暴增。在开发阶段建议设置一个合理的上限，生产环境中则需要根据具体任务调整。

运行这个Agent：

cd agent-core
npx ts-node --esm src/agents/hello-agent.ts

如果API Key配置正确，你会看到Agent依次处理三条模拟的用户输入，并在最后输出Token使用统计。这个对话循环虽然简单，但已经具备了Agent最本质的特征：持续接收输入、调用LLM处理、输出结果26^。

1.3.3 系统提示词（System Prompt）设计

系统提示词（System Prompt）是Agent开发中最容易被低估、却又最具影响力的因素。它定义了Agent的身份、行为边界和能力范围。一个精心设计的System Prompt能让普通模型的表现媲美昂贵模型，而一个糟糕的System Prompt则会让最强大的模型表现失常。

System Prompt在messages数组中的role: 'system'消息里指定。它的内容不会被用户直接看到，但会深刻影响模型对每一条用户消息的处理方式。

代码示例3：带系统提示词的Agent

创建agent-core/src/agents/persona-agent.ts文件：

/**
&nbsp;* Persona Agent - 展示系统提示词的强大效果
&nbsp;* 通过精心设计的System Prompt，让Agent具备专业身份和结构化输出能力
&nbsp;*/

import OpenAI from 'openai';
import dotenv from 'dotenv';

dotenv.config();

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

/**
&nbsp;* 定义Agent的系统提示词
&nbsp;* 好的System Prompt包含四个要素：角色定义、行为约束、输出格式、安全边界
&nbsp;*/
const SYSTEM_PROMPT = `你是一位二进制安全分析专家Agent，名为BinaryGuard。

## 角色定义
- 你专注于软件安全领域，擅长分析PE/ELF格式的二进制文件
- 你使用专业的安全术语，但会确保解释清晰易懂
- 你的分析风格严谨、客观，不夸大风险也不遗漏隐患

## 行为约束
- 只讨论与软件安全、二进制分析、漏洞研究相关的话题
- 如果用户询问无关内容，礼貌地引导回安全分析领域
- 不确定的信息明确标注"推测"，绝不编造技术细节
- 涉及漏洞利用时，强调防御和修复建议，不提供攻击性利用代码

## 输出格式
- 对复杂分析使用Markdown格式，包含标题和列表
- 风险评级使用以下等级：【严重】【高危】【中危】【低危】【信息】
- 每个发现必须包含：问题描述、影响范围、修复建议

## 记忆
- 当前分析的文件：{filename}
- 文件架构：{architecture}
- 分析阶段：{stage}`;

interface AgentContext {
&nbsp; filename: string;
&nbsp; architecture: string;
&nbsp; stage: string;
}

/**
&nbsp;* 构建最终的System Prompt，将模板变量替换为实际值
&nbsp;*/
function buildSystemPrompt(template: string, context: AgentContext): string {
&nbsp; return template
&nbsp; &nbsp; .replace('{filename}', context.filename)
&nbsp; &nbsp; .replace('{architecture}', context.architecture)
&nbsp; &nbsp; .replace('{stage}', context.stage);
}

/**
&nbsp;* 运行Agent，处理用户查询
&nbsp;*/
async function runPersonaAgent(userQuery: string, context: AgentContext): Promise<string> {
&nbsp; const systemPrompt = buildSystemPrompt(SYSTEM_PROMPT, context);

&nbsp; const response = await openai.chat.completions.create({
&nbsp; &nbsp; model: 'gpt-4o-mini',
&nbsp; &nbsp; messages: [
&nbsp; &nbsp; &nbsp; { role: 'system', content: systemPrompt },
&nbsp; &nbsp; &nbsp; { role: 'user', content: userQuery },
&nbsp; &nbsp; ],
&nbsp; &nbsp; temperature: 0.3, // 安全分析需要保守、准确的输出
&nbsp; &nbsp; max_tokens: 1000,
&nbsp; });

&nbsp; return response.choices[0]?.message?.content ?? '（无响应）';
}

// 运行示例
async function main(): Promise<void> {
&nbsp; const context: AgentContext = {
&nbsp; &nbsp; filename: 'sample.exe',
&nbsp; &nbsp; architecture: 'x86-64 (PE32+)',
&nbsp; &nbsp; stage: '初始分析',
&nbsp; };

&nbsp; const query = '这个文件没有启用ASLR，这意味着什么风险？';
&nbsp; console.log(`👤 用户: ${query}\n`);

&nbsp; const reply = await runPersonaAgent(query, context);
&nbsp; console.log(`🤖 BinaryGuard:\n${reply}`);
}

main().catch(console.error);

这个示例展示了System Prompt设计的核心方法论。角色定义让模型进入特定的心理框架——当被告知”你是一位二进制安全分析专家”时，模型会更倾向于使用安全领域的术语和思维模式。行为约束设定了明确的边界，防止模型偏离主题或产生有害输出。输出格式规范确保模型的响应结构一致，便于后续解析和展示。

注意System Prompt中的{filename}、{architecture}等占位符。这是一种模板化设计，允许你在运行时注入动态上下文信息。这种方法比静态System Prompt更灵活——Agent可以”知道”当前正在分析哪个文件、处于哪个分析阶段27^。

temperature: 0.3的设置也经过深思熟虑。安全分析是一个需要精确性的领域，过高的temperature会让模型产生不准确甚至虚构的技术细节。对于分析类任务，推荐将temperature设置在0.1-0.3之间。

1.3.4 对话历史管理

真实的Agent对话不是一次性交互，而是一个持续的过程。Agent需要记住之前的对话内容，才能理解上下文相关的追问。对话历史的管理是Agent开发中必须掌握的技能。

对话历史在OpenAI API中通过messages数组实现。每条用户消息和助手回复都会被追加到这个数组中，随每次API请求一起发送。LLM会基于整个消息数组来理解对话的上下文28^。

代码示例4：带对话历史管理的Agent

创建agent-core/src/agents/chat-agent.ts文件：

/**
&nbsp;* Chat Agent - 完整的对话Agent，支持对话历史管理和Token监控
&nbsp;* 展示了消息数组的维护、上下文窗口限制和Token消耗监控
&nbsp;*/

import OpenAI from 'openai';
import dotenv from 'dotenv';

dotenv.config();

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// 模型上下文窗口限制配置
const MODEL_CONFIG = {
&nbsp; 'gpt-4o-mini': { maxContextTokens: 128000, maxOutputTokens: 16384 },
&nbsp; 'gpt-4o': { maxContextTokens: 128000, maxOutputTokens: 4096 },
&nbsp; 'claude-3-haiku': { maxContextTokens: 200000, maxOutputTokens: 4096 },
} as const;

type Message = {
&nbsp; role: 'system' | 'user' | 'assistant';
&nbsp; content: string;
};

interface TokenUsage {
&nbsp; promptTokens: number;
&nbsp; completionTokens: number;
&nbsp; totalTokens: number;
&nbsp; totalCost: number; // 累计成本（美元）
}

/**
&nbsp;* ChatAgent类 - 封装对话管理、历史维护和Token监控
&nbsp;*/
class ChatAgent {
&nbsp; private messages: Message[] = [];
&nbsp; private model: string;
&nbsp; private maxContextTokens: number;
&nbsp; private maxOutputTokens: number;
&nbsp; private tokenUsage: TokenUsage;

&nbsp; constructor(
&nbsp; &nbsp; systemPrompt: string,
&nbsp; &nbsp; model: keyof typeof MODEL_CONFIG = 'gpt-4o-mini'
&nbsp; ) {
&nbsp; &nbsp; this.model = model;
&nbsp; &nbsp; const config = MODEL_CONFIG[model];
&nbsp; &nbsp; this.maxContextTokens = config.maxContextTokens;
&nbsp; &nbsp; this.maxOutputTokens = config.maxOutputTokens;

&nbsp; &nbsp; // 初始化Token使用统计
&nbsp; &nbsp; this.tokenUsage = {
&nbsp; &nbsp; &nbsp; promptTokens: 0,
&nbsp; &nbsp; &nbsp; completionTokens: 0,
&nbsp; &nbsp; &nbsp; totalTokens: 0,
&nbsp; &nbsp; &nbsp; totalCost: 0,
&nbsp; &nbsp; };

&nbsp; &nbsp; // 系统提示词始终放在第一位
&nbsp; &nbsp; this.messages.push({ role: 'system', content: systemPrompt });
&nbsp; }

&nbsp; /**
&nbsp; &nbsp;* 发送用户消息并获取Agent响应
&nbsp; &nbsp;* 这是Agent的核心交互方法
&nbsp; &nbsp;*/
&nbsp; async sendMessage(userContent: string): Promise<string> {
&nbsp; &nbsp; // 1. 添加用户消息到历史
&nbsp; &nbsp; this.messages.push({ role: 'user', content: userContent });

&nbsp; &nbsp; // 2. 调用LLM（传入完整对话历史）
&nbsp; &nbsp; const response = await openai.chat.completions.create({
&nbsp; &nbsp; &nbsp; model: this.model,
&nbsp; &nbsp; &nbsp; messages: this.messages,
&nbsp; &nbsp; &nbsp; temperature: 0.7,
&nbsp; &nbsp; &nbsp; max_tokens: this.maxOutputTokens,
&nbsp; &nbsp; });

&nbsp; &nbsp; // 3. 提取助手回复
&nbsp; &nbsp; const assistantContent = response.choices[0]?.message?.content ?? '';

&nbsp; &nbsp; // 4. 将助手回复添加到历史（这是下一步对话的上下文）
&nbsp; &nbsp; this.messages.push({ role: 'assistant', content: assistantContent });

&nbsp; &nbsp; // 5. 更新Token使用统计
&nbsp; &nbsp; if (response.usage) {
&nbsp; &nbsp; &nbsp; this.tokenUsage.promptTokens += response.usage.prompt_tokens;
&nbsp; &nbsp; &nbsp; this.tokenUsage.completionTokens += response.usage.completion_tokens;
&nbsp; &nbsp; &nbsp; this.tokenUsage.totalTokens += response.usage.total_tokens;
&nbsp; &nbsp; &nbsp; this.tokenUsage.totalCost += this.calculateCost(response.usage);

&nbsp; &nbsp; &nbsp; // 6. 检查上下文窗口，必要时进行截断
&nbsp; &nbsp; &nbsp; this.checkContextWindow();
&nbsp; &nbsp; }

&nbsp; &nbsp; return assistantContent;
&nbsp; }

&nbsp; /**
&nbsp; &nbsp;* 计算API调用的成本（基于OpenAI 2025年定价）
&nbsp; &nbsp;*/
&nbsp; private calculateCost(usage: {
&nbsp; &nbsp; prompt_tokens: number;
&nbsp; &nbsp; completion_tokens: number;
&nbsp; }): number {
&nbsp; &nbsp; const pricing: Record<string, { input: number; output: number }> = {
&nbsp; &nbsp; &nbsp; 'gpt-4o-mini': { input: 0.15, output: 0.6 }, &nbsp; &nbsp; // $/1M tokens
&nbsp; &nbsp; &nbsp; 'gpt-4o': { input: 5.0, output: 15.0 },
&nbsp; &nbsp; };

&nbsp; &nbsp; const p = pricing[this.model] ?? pricing['gpt-4o-mini'];
&nbsp; &nbsp; // 转换为美元：token数 / 1,000,000 * 单价
&nbsp; &nbsp; return (
&nbsp; &nbsp; &nbsp; (usage.prompt_tokens / 1_000_000) * p.input +
&nbsp; &nbsp; &nbsp; (usage.completion_tokens / 1_000_000) * p.output
&nbsp; &nbsp; );
&nbsp; }

&nbsp; /**
&nbsp; &nbsp;* 检查上下文窗口是否接近上限，必要时截断早期对话
&nbsp; &nbsp;* 这是生产环境Agent必须实现的功能
&nbsp; &nbsp;*/
&nbsp; private checkContextWindow(): void {
&nbsp; &nbsp; // 保守策略：当对话历史超过窗口的80%时进行截断
&nbsp; &nbsp; const threshold = this.maxContextTokens * 0.8;

&nbsp; &nbsp; // 估算当前Token数（粗略估计：每个token约4个字符）
&nbsp; &nbsp; const estimatedTokens = this.messages.reduce(
&nbsp; &nbsp; &nbsp; (sum, msg) => sum + Math.ceil(msg.content.length / 4),
&nbsp; &nbsp; &nbsp; 0
&nbsp; &nbsp; );

&nbsp; &nbsp; if (estimatedTokens > threshold) {
&nbsp; &nbsp; &nbsp; console.log(`⚠️ 上下文接近上限(${estimatedTokens}/${this.maxContextTokens})，截断早期对话`);
&nbsp; &nbsp; &nbsp; // 保留system prompt和最近的对话（删除第2和第3条消息）
&nbsp; &nbsp; &nbsp; // system prompt是索引0，最早的用户-助手对话是索引1和2
&nbsp; &nbsp; &nbsp; if (this.messages.length > 5) {
&nbsp; &nbsp; &nbsp; &nbsp; this.messages.splice(1, 2); // 删除索引1和2的消息对
&nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; }
&nbsp; }

&nbsp; /**
&nbsp; &nbsp;* 获取Token使用报告
&nbsp; &nbsp;*/
&nbsp; getTokenReport(): TokenUsage {
&nbsp; &nbsp; return { ...this.tokenUsage };
&nbsp; }

&nbsp; /**
&nbsp; &nbsp;* 获取当前对话历史（用于调试）
&nbsp; &nbsp;*/
&nbsp; getHistory(): Message[] {
&nbsp; &nbsp; return [...this.messages];
&nbsp; }
}

// 运行演示
async function demo(): Promise<void> {
&nbsp; const agent = new ChatAgent(
&nbsp; &nbsp; '你是一位安全技术助手，擅长二进制分析和漏洞检测。'
&nbsp; );

&nbsp; // 模拟多轮对话
&nbsp; const queries = [
&nbsp; &nbsp; '什么是缓冲区溢出漏洞？',
&nbsp; &nbsp; '它常见于哪些函数调用？', // 需要上下文理解（指代"它"）
&nbsp; &nbsp; '如何防范这种漏洞？', &nbsp; &nbsp; &nbsp; // 继续同一话题
&nbsp; ];

&nbsp; for (const query of queries) {
&nbsp; &nbsp; console.log(`👤 用户: ${query}`);
&nbsp; &nbsp; const reply = await agent.sendMessage(query);
&nbsp; &nbsp; console.log(`🤖 Agent: ${reply}\n`);
&nbsp; }

&nbsp; // 输出Token使用报告
&nbsp; const report = agent.getTokenReport();
&nbsp; console.log('📊 会话Token使用报告:');
&nbsp; console.log(` &nbsp; 输入Token: ${report.promptTokens}`);
&nbsp; console.log(` &nbsp; 输出Token: ${report.completionTokens}`);
&nbsp; console.log(` &nbsp; 总计Token: ${report.totalTokens}`);
&nbsp; console.log(` &nbsp; 预估成本: $${report.totalCost.toFixed(6)}`);
}

demo().catch(console.error);

ChatAgent类展示了对话管理的三个核心问题及解决方案。

消息数组的维护：每次调用sendMessage()时，方法会依次将用户消息追加到历史、调用API、将助手回复追加到历史。这确保了下一次API调用时，完整的对话上下文会被传递给模型。特别要注意的是role字段的准确性——用户的输入必须是'user'，模型的回复必须是'assistant'，否则API会返回错误29^。

上下文窗口限制：每个模型都有最大上下文窗口（Context Window），即单次请求能处理的最大Token数。如果messages数组过长，API会返回”context length exceeded”错误。checkContextWindow()方法实现了保守的截断策略：当估计Token数超过窗口80%时，删除最早的用户-助手对话对，但始终保留System Prompt。生产环境中更精确的截断可以使用tiktoken库来精确计算Token数，而非基于字符长度的估算30^。

Token消耗监控：calculateCost()方法基于OpenAI的定价模型计算每次调用的成本，并累计到totalCost中。这对于控制开发成本和预防意外费用至关重要。在实际项目中，你可能还想添加预算上限检查——当累计成本超过阈值时自动停止或告警。

1.4 Agent开发核心概念

在成功运行第一个Agent之后，让我们深入几个核心开发概念。这些概念将贯穿整个教程，是构建生产级Agent的必备知识。

1.4.1 提示词工程基础

提示词工程（Prompt Engineering）是Agent开发中最具杠杆效应的技能。同样的模型、同样的工具，提示词的差异可以导致输出质量的天壤之别。

Zero-shot提示是最直接的方式——直接描述任务，不提供示例。这种方式简单直接，适合模型已经充分理解的任务类型。

const zeroShotPrompt = '请将以下C代码中的strcpy函数替换为安全的strncpy版本。\n\n代码：...';

Few-shot提示通过提供几个输入-输出示例来”教会”模型任务的模式。这种方式能显著提升模型在特定格式或风格上的表现。

const fewShotPrompt = `将自然语言描述转换为结构化漏洞报告。

示例1：
描述: "程序使用了不安全的gets函数，可能导致缓冲区溢出"
报告: {"severity": "高危", "type": "缓冲区溢出", "function": "gets", "fix": "使用fgets替代"}

示例2：
描述: "sprintf没有检查目标缓冲区大小"
报告: {"severity": "高危", "type": "格式化字符串", "function": "sprintf", "fix": "使用snprintf替代"}

现在处理：
描述: "发现memcpy复制了用户控制的长度到栈缓冲区"
报告：`;

Chain-of-Thought（CoT，思维链）提示通过在提示中引导模型”一步一步思考”，显著提升复杂推理任务的准确率。这是ReAct等高级Agent架构的基础31^。

const cotPrompt = `分析以下二进制文件的安全特性。请一步一步思考：

1. 首先，检查文件是否启用了ASLR（地址空间布局随机化）
2. 然后，检查是否启用了NX/DEP（数据执行保护）
3. 接着，检查是否有栈保护（Stack Canary）
4. 最后，综合以上发现给出风险评估

分析对象：sample.exe
文件头特征：DYNAMIC_BASE标志未设置，NX_COMPAT已设置，无栈保护`;

CoT的核心洞察在于：LLM在生成最终答案的过程中，如果先被要求生成中间推理步骤，最终答案的质量会显著提升。这类似于人类解决复杂问题时先在草稿纸上推导再写最终答案的过程。

1.4.2 结构化输出解析

Agent的输出通常需要被下游代码处理。如果模型返回纯文本，解析起来既脆弱又容易出错。结构化输出要求模型以JSON等机器可读格式返回数据，这是生产级Agent的标准实践32^。

代码示例5：结构化输出与Zod Schema验证

创建agent-core/src/agents/structured-agent.ts文件：

/**
&nbsp;* Structured Agent - 展示结构化输出和Schema验证
&nbsp;* 使用Zod定义输出格式，确保LLM返回的数据符合预期结构
&nbsp;*/

import OpenAI from 'openai';
import { z } from 'zod';
import dotenv from 'dotenv';

dotenv.config();

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// ====== 使用Zod定义输出Schema ======
// Zod是一个TypeScript优先的模式验证库
// 我们先定义期望的数据结构，然后用它验证LLM的输出

const VulnerabilitySchema = z.object({
&nbsp; // 漏洞类型，必须是枚举值之一
&nbsp; type: z.enum([
&nbsp; &nbsp; '缓冲区溢出',
&nbsp; &nbsp; '格式化字符串',
&nbsp; &nbsp; '整数溢出',
&nbsp; &nbsp; '命令注入',
&nbsp; &nbsp; 'Use-After-Free',
&nbsp; &nbsp; '信息泄露',
&nbsp; &nbsp; '其他',
&nbsp; ]),

&nbsp; // 严重程度
&nbsp; severity: z.enum(['严重', '高危', '中危', '低危', '信息']),

&nbsp; // 涉及的函数名
&nbsp; functionName: z.string().min(1).describe('存在漏洞的函数名称'),

&nbsp; // 漏洞描述
&nbsp; description: z.string().min(10).describe('漏洞的详细描述'),

&nbsp; // 修复建议
&nbsp; recommendation: z.string().min(10).describe('具体的修复建议'),

&nbsp; // 置信度（0-1）
&nbsp; confidence: z.number().min(0).max(1).describe('分析置信度，0表示不确定，1表示完全确定'),
});

// 整个分析报告的Schema：包含一个漏洞数组和风险评分
const AnalysisReportSchema = z.object({
&nbsp; findings: z.array(VulnerabilitySchema).describe('发现的漏洞列表'),
&nbsp; riskScore: z.number().min(0).max(100).describe('综合风险评分（0-100）'),
&nbsp; summary: z.string().min(20).describe('分析摘要'),
});

// 从Zod Schema推断TypeScript类型
// 这样我们就有类型安全的分析报告了
type AnalysisReport = z.infer<typeof AnalysisReportSchema>;

/**
&nbsp;* 分析函数，返回结构化的漏洞报告
&nbsp;*/
async function analyzeVulnerabilities(code: string): Promise<{
&nbsp; report: AnalysisReport;
&nbsp; rawResponse: string;
}> {
&nbsp; // 构建提示词，明确要求JSON格式输出
&nbsp; const prompt = `分析以下C代码中的安全漏洞，以JSON格式返回分析报告。

代码：
\`\`\`c
${code}
\`\`\`

要求：
1. 返回严格的JSON格式，不要包含任何其他文本
2. JSON必须符合以下Schema：
&nbsp; &nbsp;- findings: 漏洞数组，每个漏洞包含 type/severity/functionName/description/recommendation/confidence
&nbsp; &nbsp;- riskScore: 0-100的综合风险评分
&nbsp; &nbsp;- summary: 分析摘要
3. 只报告有较高置信度（confidence > 0.5）的漏洞
4. 如果没有发现漏洞，findings为空数组，riskScore为0`;

&nbsp; const response = await openai.chat.completions.create({
&nbsp; &nbsp; model: 'gpt-4o', // 结构化输出需要更强的模型
&nbsp; &nbsp; messages: [
&nbsp; &nbsp; &nbsp; {
&nbsp; &nbsp; &nbsp; &nbsp; role: 'system',
&nbsp; &nbsp; &nbsp; &nbsp; content: '你是一个专业的代码安全分析器。你只输出JSON，不输出任何其他文本。',
&nbsp; &nbsp; &nbsp; },
&nbsp; &nbsp; &nbsp; { role: 'user', content: prompt },
&nbsp; &nbsp; ],
&nbsp; &nbsp; temperature: 0.2, // 低temperature确保输出一致性
&nbsp; &nbsp; max_tokens: 2000,
&nbsp; });

&nbsp; const rawResponse = response.choices[0]?.message?.content ?? '{}';

&nbsp; try {
&nbsp; &nbsp; // 尝试解析JSON
&nbsp; &nbsp; const parsedJson = JSON.parse(rawResponse);

&nbsp; &nbsp; // 使用Zod验证解析后的数据
&nbsp; &nbsp; const report = AnalysisReportSchema.parse(parsedJson);

&nbsp; &nbsp; return { report, rawResponse };
&nbsp; } catch (error: unknown) {
&nbsp; &nbsp; // 如果验证失败，提供详细的错误信息
&nbsp; &nbsp; if (error instanceof z.ZodError) {
&nbsp; &nbsp; &nbsp; console.error('Schema验证失败:');
&nbsp; &nbsp; &nbsp; for (const issue of error.issues) {
&nbsp; &nbsp; &nbsp; &nbsp; console.error(` &nbsp;- ${issue.path.join('.')}: ${issue.message}`);
&nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; }
&nbsp; &nbsp; throw new Error(
&nbsp; &nbsp; &nbsp; `结构化输出解析失败: ${error instanceof Error ? error.message : String(error)}\n原始响应: ${rawResponse.slice(0, 500)}`
&nbsp; &nbsp; );
&nbsp; }
}

/**
&nbsp;* 错误回退处理：当主模型失败时，使用备用模型重试
&nbsp;*/
async function analyzeWithFallback(code: string): Promise<AnalysisReport> {
&nbsp; try {
&nbsp; &nbsp; // 第一次尝试：使用gpt-4o（高质量）
&nbsp; &nbsp; console.log('🔄 使用 gpt-4o 分析...');
&nbsp; &nbsp; const result = await analyzeVulnerabilities(code);
&nbsp; &nbsp; console.log('✅ gpt-4o 分析成功');
&nbsp; &nbsp; return result.report;
&nbsp; } catch (error: unknown) {
&nbsp; &nbsp; console.warn(`⚠️ gpt-4o 失败: ${error instanceof Error ? error.message : String(error)}`);

&nbsp; &nbsp; // 第二次尝试：使用gpt-4o-mini（更快更便宜，但可能质量稍低）
&nbsp; &nbsp; console.log('🔄 降级到 gpt-4o-mini...');
&nbsp; &nbsp; const fallbackResult = await analyzeVulnerabilities(code);
&nbsp; &nbsp; console.log('✅ gpt-4o-mini 分析成功');
&nbsp; &nbsp; return fallbackResult.report;
&nbsp; }
}

// 运行示例
async function main(): Promise<void> {
&nbsp; const sampleCode = `
#include&nbsp;<stdio.h>
#include&nbsp;<string.h>

void process_input(char *user_input) {
&nbsp; &nbsp; char buffer[64];
&nbsp; &nbsp; strcpy(buffer, user_input); &nbsp;// 危险：无长度检查
&nbsp; &nbsp; printf("Input: %s\\n", buffer);
}

int main() {
&nbsp; &nbsp; char input[256];
&nbsp; &nbsp; gets(input); &nbsp;// 危险：gets已被弃用
&nbsp; &nbsp; process_input(input);
&nbsp; &nbsp; return 0;
}
`;

&nbsp; console.log('🔍 开始安全分析...\n');
&nbsp; const report = await analyzeWithFallback(sampleCode);

&nbsp; console.log('\n📊 分析报告:');
&nbsp; console.log(`风险评分: ${report.riskScore}/100`);
&nbsp; console.log(`摘要: ${report.summary}`);
&nbsp; console.log(`\n发现 ${report.findings.length} 个漏洞:`);

&nbsp; for (const finding of report.findings) {
&nbsp; &nbsp; console.log(`\n &nbsp;[${finding.severity}] ${finding.type}`);
&nbsp; &nbsp; console.log(` &nbsp;函数: ${finding.functionName}`);
&nbsp; &nbsp; console.log(` &nbsp;描述: ${finding.description}`);
&nbsp; &nbsp; console.log(` &nbsp;修复: ${finding.recommendation}`);
&nbsp; &nbsp; console.log(` &nbsp;置信度: ${(finding.confidence * 100).toFixed(1)}%`);
&nbsp; }
}

main().catch(console.error);

这个示例展示了结构化输出的完整工作流。

Zod Schema定义：VulnerabilitySchema使用Zod的链式API精确定义了每个字段的类型和约束。z.enum()限制取值范围，z.string().min(10)要求最小长度，z.number().min(0).max(1)限定数值范围。.describe()方法为每个字段添加描述——这些描述在传递给LLM时能帮助模型理解每个字段的含义33^。

类型推断：z.infer<typeof AnalysisReportSchema>自动从Zod Schema推导出TypeScript类型。这意味着你在编译期就能获得完整的类型检查和IDE自动补全，同时在运行期有Zod的验证保障——这是TypeScript生态的独特优势。

双重验证机制：代码先使用JSON.parse()解析LLM返回的原始字符串，再用AnalysisReportSchema.parse()进行深度验证。如果LLM返回了格式正确的JSON但字段类型不匹配（比如confidence是字符串而非数字），Zod会捕获这个错误并给出详细的诊断信息。

降级回退策略：analyzeWithFallback()函数实现了智能降级。当主模型（gpt-4o）因网络问题或API限流失败时，自动切换到备用模型（gpt-4o-mini）。这种设计确保了系统的可用性，同时优先使用高质量模型34^。

1.4.3 错误处理与重试机制

Agent系统运行在生产环境中，不可避免地会遇到各种故障：API限流、网络超时、模型不可用等。健壮的错误处理是区分”玩具”和”产品”的关键。

指数退避（Exponential Backoff）是处理API限流的标准策略。当收到429（Too Many Requests）错误时，不是立即重试，而是等待一个随时间递增的间隔再重试。这能避免给已经过载的服务器雪上加霜35^。

代码示例6：错误处理与重试机制

创建agent-core/src/utils/retry.ts文件：

/**
&nbsp;* 重试工具模块
&nbsp;* 实现指数退避策略和API错误分类处理
&nbsp;*/

/**
&nbsp;* API错误分类
&nbsp;* 不同类型的错误需要不同的处理策略
&nbsp;*/
export enum ErrorType {
&nbsp; RATE_LIMIT = 'RATE_LIMIT', &nbsp; &nbsp; &nbsp; // 429: 请求频率超限，应重试
&nbsp; TIMEOUT = 'TIMEOUT', &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // 网络超时，应重试
&nbsp; AUTH_ERROR = 'AUTH_ERROR', &nbsp; &nbsp; &nbsp; // 401/403: 认证失败，不应重试
&nbsp; INVALID_REQUEST = 'INVALID_REQUEST', // 400: 请求参数错误，不应重试
&nbsp; SERVER_ERROR = 'SERVER_ERROR', &nbsp; // 500/502/503: 服务器错误，可重试
&nbsp; UNKNOWN = 'UNKNOWN', &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // 未知错误，保守处理
}

/**
&nbsp;* 分类API错误
&nbsp;*/
export function classifyError(error: unknown): { type: ErrorType; message: string } {
&nbsp; if (error instanceof Error) {
&nbsp; &nbsp; const message = error.message.toLowerCase();

&nbsp; &nbsp; // 根据错误消息和状态码分类
&nbsp; &nbsp; if (message.includes('429') || message.includes('rate limit') || message.includes('too many requests')) {
&nbsp; &nbsp; &nbsp; return { type: ErrorType.RATE_LIMIT, message: error.message };
&nbsp; &nbsp; }
&nbsp; &nbsp; if (message.includes('timeout') || message.includes('etimedout') || message.includes('econnreset')) {
&nbsp; &nbsp; &nbsp; return { type: ErrorType.TIMEOUT, message: error.message };
&nbsp; &nbsp; }
&nbsp; &nbsp; if (message.includes('401') || message.includes('403') || message.includes('unauthorized')) {
&nbsp; &nbsp; &nbsp; return { type: ErrorType.AUTH_ERROR, message: error.message };
&nbsp; &nbsp; }
&nbsp; &nbsp; if (message.includes('400') || message.includes('invalid')) {
&nbsp; &nbsp; &nbsp; return { type: ErrorType.INVALID_REQUEST, message: error.message };
&nbsp; &nbsp; }
&nbsp; &nbsp; if (message.includes('500') || message.includes('502') || message.includes('503')) {
&nbsp; &nbsp; &nbsp; return { type: ErrorType.SERVER_ERROR, message: error.message };
&nbsp; &nbsp; }
&nbsp; }

&nbsp; return { type: ErrorType.UNKNOWN, message: String(error) };
}

/**
&nbsp;* 重试配置选项
&nbsp;*/
export interface RetryOptions {
&nbsp; maxRetries: number; &nbsp; &nbsp; &nbsp; &nbsp;// 最大重试次数
&nbsp; baseDelayMs: number; &nbsp; &nbsp; &nbsp; // 基础延迟（毫秒）
&nbsp; maxDelayMs: number; &nbsp; &nbsp; &nbsp; &nbsp;// 最大延迟（毫秒）
&nbsp; retryableErrors: ErrorType[]; // 哪些错误类型可以触发重试
}

// 默认配置：适合OpenAI/Anthropic API
export const DEFAULT_RETRY_OPTIONS: RetryOptions = {
&nbsp; maxRetries: 3,
&nbsp; baseDelayMs: 1000, &nbsp; // 从1秒开始
&nbsp; maxDelayMs: 30000, &nbsp; // 最多等待30秒
&nbsp; retryableErrors: [
&nbsp; &nbsp; ErrorType.RATE_LIMIT,
&nbsp; &nbsp; ErrorType.TIMEOUT,
&nbsp; &nbsp; ErrorType.SERVER_ERROR,
&nbsp; ],
};

/**
&nbsp;* 计算指数退避延迟
&nbsp;* 公式: min(baseDelay * 2^attempt + jitter, maxDelay)
&nbsp;* jitter是随机偏移，防止多个客户端同时重试造成"惊群效应"
&nbsp;*/
function calculateDelay(attempt: number, options: RetryOptions): number {
&nbsp; // 指数增长: 1s, 2s, 4s, 8s...
&nbsp; const exponentialDelay = options.baseDelayMs * Math.pow(2, attempt);

&nbsp; // 添加随机jitter（±25%），防止同步重试
&nbsp; const jitter = exponentialDelay * 0.25 * (Math.random() * 2 - 1);

&nbsp; // 确保不超过最大延迟
&nbsp; return Math.min(exponentialDelay + jitter, options.maxDelayMs);
}

/**
&nbsp;* 带重试机制的异步函数包装器
&nbsp;*
&nbsp;* 使用示例:
&nbsp;* const result = await withRetry(
&nbsp;* &nbsp; () => openai.chat.completions.create({...}),
&nbsp;* &nbsp; { maxRetries: 3 }
&nbsp;* );
&nbsp;*/
export async function withRetry<T>(
&nbsp; fn: () => Promise<T>,
&nbsp; options: Partial<RetryOptions> = {}
): Promise<T> {
&nbsp; const opts = { ...DEFAULT_RETRY_OPTIONS, ...options };

&nbsp; for (let attempt = 0; attempt <= opts.maxRetries; attempt++) {
&nbsp; &nbsp; try {
&nbsp; &nbsp; &nbsp; return await fn();
&nbsp; &nbsp; } catch (error: unknown) {
&nbsp; &nbsp; &nbsp; const { type, message } = classifyError(error);

&nbsp; &nbsp; &nbsp; // 不可重试的错误立即抛出
&nbsp; &nbsp; &nbsp; if (!opts.retryableErrors.includes(type)) {
&nbsp; &nbsp; &nbsp; &nbsp; throw new Error(`不可恢复的错误 [${type}]: ${message}`);
&nbsp; &nbsp; &nbsp; }

&nbsp; &nbsp; &nbsp; // 如果是最后一次尝试，抛出错误
&nbsp; &nbsp; &nbsp; if (attempt === opts.maxRetries) {
&nbsp; &nbsp; &nbsp; &nbsp; throw new Error(
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; `重试${opts.maxRetries}次后仍然失败 [${type}]: ${message}`
&nbsp; &nbsp; &nbsp; &nbsp; );
&nbsp; &nbsp; &nbsp; }

&nbsp; &nbsp; &nbsp; // 计算退避延迟并等待
&nbsp; &nbsp; &nbsp; const delay = calculateDelay(attempt, opts);
&nbsp; &nbsp; &nbsp; console.log(`⚠️ 第${attempt + 1}次尝试失败 [${type}]，${delay.toFixed(0)}ms后重试...`);
&nbsp; &nbsp; &nbsp; await sleep(delay);
&nbsp; &nbsp; }
&nbsp; }

&nbsp; // TypeScript需要这个返回值（理论上不会执行到这里）
&nbsp; throw new Error('Unexpected end of retry loop');
}

/**
&nbsp;* 超时包装器
&nbsp;* 如果函数在指定时间内未完成，抛出超时错误
&nbsp;*/
export async function withTimeout<T>(
&nbsp; fn: () => Promise<T>,
&nbsp; timeoutMs: number
): Promise<T> {
&nbsp; return new Promise((resolve, reject) => {
&nbsp; &nbsp; // 设置超时定时器
&nbsp; &nbsp; const timer = setTimeout(() => {
&nbsp; &nbsp; &nbsp; reject(new Error(`操作超时（${timeoutMs}ms）`));
&nbsp; &nbsp; }, timeoutMs);

&nbsp; &nbsp; // 执行目标函数
&nbsp; &nbsp; fn()
&nbsp; &nbsp; &nbsp; .then((result) => {
&nbsp; &nbsp; &nbsp; &nbsp; clearTimeout(timer);
&nbsp; &nbsp; &nbsp; &nbsp; resolve(result);
&nbsp; &nbsp; &nbsp; })
&nbsp; &nbsp; &nbsp; .catch((error) => {
&nbsp; &nbsp; &nbsp; &nbsp; clearTimeout(timer);
&nbsp; &nbsp; &nbsp; &nbsp; reject(error);
&nbsp; &nbsp; &nbsp; });
&nbsp; });
}

/**
&nbsp;* 工具函数：睡眠指定毫秒
&nbsp;*/
function sleep(ms: number): Promise<void> {
&nbsp; return new Promise((resolve) => setTimeout(resolve, ms));
}

/**
&nbsp;* 组合使用：带重试和超时的API调用
&nbsp;* 这是生产环境中推荐的使用方式
&nbsp;*/
export async function resilientAPICall<T>(
&nbsp; fn: () => Promise<T>,
&nbsp; timeoutMs: number = 30000,
&nbsp; retryOptions?: Partial<RetryOptions>
): Promise<T> {
&nbsp; return withRetry(
&nbsp; &nbsp; () => withTimeout(fn, timeoutMs),
&nbsp; &nbsp; retryOptions
&nbsp; );
}

这个重试模块实现了生产级的错误处理策略。

错误分类是智能重试的前提。classifyError()函数通过分析错误消息中的关键词和状态码，将错误分为五类。认证错误（401/403）和请求参数错误（400）不应该重试——重试只会浪费Token和时间。而限流错误（429）、超时和服务器错误（500系列）则是暂时性的，重试很可能在下一次成功36^。

指数退避算法的核心是calculateDelay()函数。它的延迟公式是baseDelay * 2^attempt，即第一次重试等1秒，第二次等2秒，第三次等4秒。jitter（随机偏移）是一个关键细节——如果多个客户端同时遇到限流并同时重试，它们会在完全相同的时刻再次冲击服务器。加入随机偏移能将重试请求分散开，避免”惊群效应”。

超时控制通过withTimeout()函数实现。它使用Promise.race的原理，在给定时间内未完成就抛出超时错误。这在调用可能hang住的外部API时尤为重要——没有超时保护的API调用可能导致整个Agent无限等待。

组合使用的resilientAPICall()是推荐的生产模式。它同时应用了重试和超时两个保护机制，确保Agent在面对各种故障时都能优雅恢复。

使用示例：

import { resilientAPICall } from './utils/retry';

// 生产级API调用：30秒超时，最多重试3次
const response = await resilientAPICall(
&nbsp; () => openai.chat.completions.create({
&nbsp; &nbsp; model: 'gpt-4o-mini',
&nbsp; &nbsp; messages: [{ role: 'user', content: '分析这段代码' }],
&nbsp; }),
&nbsp; 30000, &nbsp;// 30秒超时
&nbsp; { maxRetries: 3, baseDelayMs: 1000 }
);

1.4.4 Token管理与成本控制

Token是LLM API的计费单位，也是模型处理能力的边界。理解Token的概念、掌握成本控制的方法，是Agent开发的必修课。

什么是Token？ Token不是单词，也不是字符。它是LLM处理文本时的基本单位——可能是一个完整的英文单词（如”hello”），也可能是一个单词的一部分（如”unbelievable”被拆为”un”、”believ”、”able”三个token），还可能是一个中文字符（一个汉字通常对应1-2个token）。OpenAI提供了一个在线工具（platform.openai.com/tokenizer）来直观查看文本的token拆分方式37^。

不同模型的上下文窗口和定价差异很大。下面是2025年主流模型的对比：

| 模型 | 上下文窗口 | 输入价格 ($/1M tokens) | 输出价格 ($/1M tokens) | 适用场景 | | — | — | — | — | — | | GPT-3.5 Turbo | 16,385 | $0.50 | $1.50 | 简单对话、开发测试 | | GPT-4 | 8,192 | $30.00 | $60.00 | 复杂推理（价格较高） | | GPT-4o | 128,000 | $5.00 | $15.00 | 多模态任务、工具调用 | | Claude 3 Haiku | 200,000 | $0.25 | $1.25 | 长文本处理、高吞吐量 | | Claude 3 Sonnet | 200,000 | $3.00 | $15.00 | 代码分析、平衡性价比 | | Claude 3 Opus | 200,000 | $15.00 | $75.00 | 最高质量、复杂分析 |

从表格中可以观察到几个关键规律。Claude 3系列模型拥有200K的超大上下文窗口，是GPT-4o的1.5倍以上，这在需要分析大型二进制文件或长代码库的场景中是显著优势。GPT-4o-mini以其极低的价格和128K的上下文窗口，成为开发测试阶段的最佳选择——一次普通对话的成本通常不到0.01美元38^。

上图直观展示了各模型上下文窗口的差异。可以看到，Claude 3系列在上下文长度上处于领先地位，而早期的GPT-4受限于8K的窗口，在处理长文档时需要额外的分块策略。

上图对比了各模型的输入和输出价格。GPT-4的定价（输入$30/1M，输出$60/1M）显著高于其他模型，这反映了它在发布时领先的推理能力。对于日常开发，GPT-4o-mini和Claude 3 Haiku提供了极具吸引力的性价比。

基于这些数据，推荐的Agent开发成本策略如下：

开发阶段使用gpt-4o-mini或claude-3-haiku，这两个模型价格低廉且能力足以应对大部分开发调试工作。假设你每天进行100次API调用，平均每次使用2000输入token和500输出token，使用gpt-4o-mini的日成本约为：(2000 × $0.15 + 500 × $0.60) / 1,000,000 × 100 = $0.06/天。

生产阶段根据任务复杂度选择模型。简单的分类和提取任务继续使用轻量级模型；需要深度推理的漏洞分析任务使用gpt-4o或claude-3-sonnet。同时实现智能路由——先用轻量模型处理，如果置信度不足再升级到强力模型39^。

Token使用优化技巧：

精简System Prompt：System Prompt在每个请求中都会被计入输入token。将不必要的说明和示例从System Prompt移到应用文档中，只在需要时通过Few-shot注入。
对话历史截断：如前所示，定期移除早期的对话轮次，只保留最近的N条消息。生产环境中使用tiktoken精确计算token数，而非字符估算。
响应长度限制：通过max_tokens参数控制输出长度。对于简单任务（如是/否分类），设置max_tokens: 50就能节省大量成本。
缓存重复内容：如果多个请求包含相同的前缀（如相同的System Prompt和文件内容），考虑使用API提供商的Prompt Caching功能（如有），避免重复计算相同的输入token40^。

免责声明：

本文所载程序、技术方法仅面向合法合规的安全研究与教学场景，旨在提升网络安全防护能力，具有明确的技术研究属性。

任何单位或个人未经授权，将本文内容用于攻击、破坏等非法用途的，由此引发的全部法律责任、民事赔偿及连带责任，均由行为人独立承担，本站不承担任何连带责任。

本站内容均为技术交流与知识分享目的发布，若存在版权侵权或其他异议，请通过邮件联系处理，具体联系方式可点击页面上方的联系我。

本文转载自：SPEEDCoding 李北辰李北辰《1. Agent基础概念与环境搭建》