diff --git a/.claude/design/bug/chat-file-remap-功能开发文档.md b/.claude/design/bug/chat-file-remap-功能开发文档.md index 990f811191..e7f232154d 100644 --- a/.claude/design/bug/chat-file-remap-功能开发文档.md +++ b/.claude/design/bug/chat-file-remap-功能开发文档.md @@ -38,7 +38,7 @@ | `packages/service/core/workflow/dispatch/ai/chat.ts` | 修改 | 并行处理 human messages,逐条重写 user query,文件内容不进 system | `Promise.all(...rewriteUserQueryWithFileContent(...))` | T2/T3 | | `packages/service/core/workflow/dispatch/ai/tool/index.ts` | 修改 | Tool LLM messages 同步并行重写;保留 `hasReadFilesTool` skip | `skip: hasReadFilesTool` | T2/T4 | | `packages/service/core/workflow/utils/context.ts` | 修改/复用 | 承载单条 user query 文件内容重写 helper | `rewriteUserQueryWithFileContent(...)` | T2 | -| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | 修改/复用 | 保留可读文件 URL 标准化、读文件与解析文件能力,供 readFiles tool 和重写 helper 复用 | `normalizeReadableFileUrl(...)` / `getFileContentFromLinks(...)` | T2 | +| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | 修改/复用 | 保留可读文件 URL 标准化、读文件与解析文件能力,供 readFiles tool 和重写 helper 复用 | `normalizeReadableFileUrl(...)` / `parseFileContentFromUrls(...)` | T2 | | `packages/service/core/ai/llm/utils.ts` | 修改/测试驱动 | 保持 `file_url` 过滤,确保同条 text 保留 | 不改协议行为 | T5 | | `test/cases/...` | 修改/新增 | 替换保存前增强测试,新增运行时逐条注入测试 | 当前轮/历史/Tool/maxFiles | T5 | @@ -65,7 +65,7 @@ const userMessages = await Promise.all( requestOrigin, maxFiles, customPdfParse, - getFileContentFromLinks, + parseFileContentFromUrls, teamId, tmbId }) @@ -95,11 +95,11 @@ N/A(无对外接口结构变化)。 | 模块 | 函数/类型 | 具体改动 | 依赖关系 | |---|---|---|---| -| `packages/service/core/workflow/dispatch/ai/chat.ts` | `getChatMessages` 附近 | 构造 LLM messages 前,对历史 human 与当前轮 user 做文件内容注入 | 依赖 `getFileContentFromLinks` | +| `packages/service/core/workflow/dispatch/ai/chat.ts` | `getChatMessages` 附近 | 构造 LLM messages 前,对历史 human 与当前轮 user 做文件内容注入 | 依赖 `parseFileContentFromUrls` | | `packages/service/core/workflow/dispatch/ai/chat.ts` | `getMultiInput` | 不再把文件正文作为 system quote;当前轮文件参与逐条注入 | 与 token 裁剪链路协同 | | `packages/service/core/workflow/dispatch/ai/tool/index.ts` | `dispatchRunTools` | 与 Chat 路径一致;无 `readFiles` tool 时注入,有则跳过 | 避免与 readFiles tool 重复预解析 | -| `packages/service/core/workflow/utils/context.ts` | `rewriteUserQueryWithFileContent` | 单条 user query 重写 ``,外层负责并行处理 history/current messages | 通过入参复用 `getFileContentFromLinks` | -| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | `normalizeReadableFileUrl` / `getFileContentFromLinks` | 统一负责 URL 标准化、过滤、文件读取与解析;按单条 query URL 顺序与 `maxFiles` 控制解析量 | 保持现有错误兜底 | +| `packages/service/core/workflow/utils/context.ts` | `rewriteUserQueryWithFileContent` | 单条 user query 重写 ``,外层负责并行处理 history/current messages | 通过入参复用 `parseFileContentFromUrls` | +| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | `normalizeReadableFileUrl` / `parseFileContentFromUrls` | 统一负责 URL 标准化、过滤、文件读取与解析;按单条 query URL 顺序与 `maxFiles` 控制解析量 | 保持现有错误兜底 | | `packages/service/core/ai/llm/utils.ts` | `loadRequestMessages` | 保持 `file_url` 过滤;回归验证 text part 不丢 | 最终模型请求安全过滤 | ### 3.3 运行时注入算法 @@ -108,7 +108,7 @@ N/A(无对外接口结构变化)。 2. Chat/Tool 外层通过 `Promise.all` 并行处理运行时 messages。 3. 非 human message 原样返回;human message 调用 `rewriteUserQueryWithFileContent`。 4. 单条 user query 内只收集本条 `file.url`,不做跨 message URL 去重或共享缓存。 -5. 调用 `getFileContentFromLinks` 统一完成 URL 标准化、过滤、`maxFiles` 截断与文件解析。 +5. 调用 `parseFileContentFromUrls` 统一完成 URL 标准化、过滤、`maxFiles` 截断与文件解析。 6. 将解析结果回填到当前 user query: - message 原本有 text:追加分隔符和 ``。 - message 原本无 text:新增 text part。 @@ -144,7 +144,7 @@ const userMessages = await Promise.all( maxFiles, requestOrigin, customPdfParse, - getFileContentFromLinks, + parseFileContentFromUrls, teamId, tmbId }) diff --git a/.claude/design/bug/chat-file-remap-需求设计文档.md b/.claude/design/bug/chat-file-remap-需求设计文档.md index f0b4bf711b..f661395053 100644 --- a/.claude/design/bug/chat-file-remap-需求设计文档.md +++ b/.claude/design/bug/chat-file-remap-需求设计文档.md @@ -129,7 +129,7 @@ | `packages/service/core/workflow/dispatch/ai/chat.ts` | `getMultiInput/getChatMessages` | 构造 LLM messages 前增强运行时副本:历史和当前轮每条 user message 注入自己的文件内容;文件内容不进 system | Chat node 满足历史逐条注入 | | `packages/service/core/workflow/dispatch/ai/tool/index.ts` | `getMultiInput/dispatchRunTools` | 无 `readFiles` tool 时同 Chat;有 `readFiles` tool 时跳过预解析 | 避免与 readFiles tool 职责冲突 | | `packages/service/core/workflow/utils/context.ts` | `rewriteUserQueryWithFileContent` | 承载单条 user query 的文件内容重写逻辑,外层并行处理 history/current messages | 不污染 readFiles tool 职责 | -| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | `normalizeReadableFileUrl` / `getFileContentFromLinks` | `getFileContentFromLinks` 统一负责 URL 标准化、过滤、文件读取与解析;`normalizeReadableFileUrl` 仅作为底层清洗工具 | 不改对外 API | +| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | `normalizeReadableFileUrl` / `parseFileContentFromUrls` | `parseFileContentFromUrls` 统一负责 URL 标准化、过滤、文件读取与解析;`normalizeReadableFileUrl` 仅作为底层清洗工具 | 不改对外 API | | `packages/service/core/ai/llm/utils.ts` | `loadRequestMessages` | 保持 `file_url` 过滤逻辑;确保同条消息 text 不被过滤 | 回归保障 | ### 6.4 运行时注入规则 @@ -137,7 +137,7 @@ 1. 使用消息副本,不修改 `histories`、`query`、`userQuestion` 原对象。 2. Chat/Tool 外层用 `Promise.all` 并行处理运行时 messages。 3. 单条 user query 只收集本条 `file.url`;不做跨 message URL 去重,不共享解析缓存。 -4. `getFileContentFromLinks` 负责 URL 标准化、过滤、`maxFiles` 截断和文件解析。 +4. `parseFileContentFromUrls` 负责 URL 标准化、过滤、`maxFiles` 截断和文件解析。 5. 文件解析结果回填到原本所属的 user message: - 原 message 已有 text:追加 `\n\n===---===---===\n\n...`。 - 原 message 只有 file:新增一个 text part 存放 ``。 diff --git a/.codex/design/bug/chat-file-remap-功能开发文档.md b/.codex/design/bug/chat-file-remap-功能开发文档.md index 990f811191..e7f232154d 100644 --- a/.codex/design/bug/chat-file-remap-功能开发文档.md +++ b/.codex/design/bug/chat-file-remap-功能开发文档.md @@ -38,7 +38,7 @@ | `packages/service/core/workflow/dispatch/ai/chat.ts` | 修改 | 并行处理 human messages,逐条重写 user query,文件内容不进 system | `Promise.all(...rewriteUserQueryWithFileContent(...))` | T2/T3 | | `packages/service/core/workflow/dispatch/ai/tool/index.ts` | 修改 | Tool LLM messages 同步并行重写;保留 `hasReadFilesTool` skip | `skip: hasReadFilesTool` | T2/T4 | | `packages/service/core/workflow/utils/context.ts` | 修改/复用 | 承载单条 user query 文件内容重写 helper | `rewriteUserQueryWithFileContent(...)` | T2 | -| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | 修改/复用 | 保留可读文件 URL 标准化、读文件与解析文件能力,供 readFiles tool 和重写 helper 复用 | `normalizeReadableFileUrl(...)` / `getFileContentFromLinks(...)` | T2 | +| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | 修改/复用 | 保留可读文件 URL 标准化、读文件与解析文件能力,供 readFiles tool 和重写 helper 复用 | `normalizeReadableFileUrl(...)` / `parseFileContentFromUrls(...)` | T2 | | `packages/service/core/ai/llm/utils.ts` | 修改/测试驱动 | 保持 `file_url` 过滤,确保同条 text 保留 | 不改协议行为 | T5 | | `test/cases/...` | 修改/新增 | 替换保存前增强测试,新增运行时逐条注入测试 | 当前轮/历史/Tool/maxFiles | T5 | @@ -65,7 +65,7 @@ const userMessages = await Promise.all( requestOrigin, maxFiles, customPdfParse, - getFileContentFromLinks, + parseFileContentFromUrls, teamId, tmbId }) @@ -95,11 +95,11 @@ N/A(无对外接口结构变化)。 | 模块 | 函数/类型 | 具体改动 | 依赖关系 | |---|---|---|---| -| `packages/service/core/workflow/dispatch/ai/chat.ts` | `getChatMessages` 附近 | 构造 LLM messages 前,对历史 human 与当前轮 user 做文件内容注入 | 依赖 `getFileContentFromLinks` | +| `packages/service/core/workflow/dispatch/ai/chat.ts` | `getChatMessages` 附近 | 构造 LLM messages 前,对历史 human 与当前轮 user 做文件内容注入 | 依赖 `parseFileContentFromUrls` | | `packages/service/core/workflow/dispatch/ai/chat.ts` | `getMultiInput` | 不再把文件正文作为 system quote;当前轮文件参与逐条注入 | 与 token 裁剪链路协同 | | `packages/service/core/workflow/dispatch/ai/tool/index.ts` | `dispatchRunTools` | 与 Chat 路径一致;无 `readFiles` tool 时注入,有则跳过 | 避免与 readFiles tool 重复预解析 | -| `packages/service/core/workflow/utils/context.ts` | `rewriteUserQueryWithFileContent` | 单条 user query 重写 ``,外层负责并行处理 history/current messages | 通过入参复用 `getFileContentFromLinks` | -| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | `normalizeReadableFileUrl` / `getFileContentFromLinks` | 统一负责 URL 标准化、过滤、文件读取与解析;按单条 query URL 顺序与 `maxFiles` 控制解析量 | 保持现有错误兜底 | +| `packages/service/core/workflow/utils/context.ts` | `rewriteUserQueryWithFileContent` | 单条 user query 重写 ``,外层负责并行处理 history/current messages | 通过入参复用 `parseFileContentFromUrls` | +| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | `normalizeReadableFileUrl` / `parseFileContentFromUrls` | 统一负责 URL 标准化、过滤、文件读取与解析;按单条 query URL 顺序与 `maxFiles` 控制解析量 | 保持现有错误兜底 | | `packages/service/core/ai/llm/utils.ts` | `loadRequestMessages` | 保持 `file_url` 过滤;回归验证 text part 不丢 | 最终模型请求安全过滤 | ### 3.3 运行时注入算法 @@ -108,7 +108,7 @@ N/A(无对外接口结构变化)。 2. Chat/Tool 外层通过 `Promise.all` 并行处理运行时 messages。 3. 非 human message 原样返回;human message 调用 `rewriteUserQueryWithFileContent`。 4. 单条 user query 内只收集本条 `file.url`,不做跨 message URL 去重或共享缓存。 -5. 调用 `getFileContentFromLinks` 统一完成 URL 标准化、过滤、`maxFiles` 截断与文件解析。 +5. 调用 `parseFileContentFromUrls` 统一完成 URL 标准化、过滤、`maxFiles` 截断与文件解析。 6. 将解析结果回填到当前 user query: - message 原本有 text:追加分隔符和 ``。 - message 原本无 text:新增 text part。 @@ -144,7 +144,7 @@ const userMessages = await Promise.all( maxFiles, requestOrigin, customPdfParse, - getFileContentFromLinks, + parseFileContentFromUrls, teamId, tmbId }) diff --git a/.codex/design/bug/chat-file-remap-需求设计文档.md b/.codex/design/bug/chat-file-remap-需求设计文档.md index f0b4bf711b..f661395053 100644 --- a/.codex/design/bug/chat-file-remap-需求设计文档.md +++ b/.codex/design/bug/chat-file-remap-需求设计文档.md @@ -129,7 +129,7 @@ | `packages/service/core/workflow/dispatch/ai/chat.ts` | `getMultiInput/getChatMessages` | 构造 LLM messages 前增强运行时副本:历史和当前轮每条 user message 注入自己的文件内容;文件内容不进 system | Chat node 满足历史逐条注入 | | `packages/service/core/workflow/dispatch/ai/tool/index.ts` | `getMultiInput/dispatchRunTools` | 无 `readFiles` tool 时同 Chat;有 `readFiles` tool 时跳过预解析 | 避免与 readFiles tool 职责冲突 | | `packages/service/core/workflow/utils/context.ts` | `rewriteUserQueryWithFileContent` | 承载单条 user query 的文件内容重写逻辑,外层并行处理 history/current messages | 不污染 readFiles tool 职责 | -| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | `normalizeReadableFileUrl` / `getFileContentFromLinks` | `getFileContentFromLinks` 统一负责 URL 标准化、过滤、文件读取与解析;`normalizeReadableFileUrl` 仅作为底层清洗工具 | 不改对外 API | +| `packages/service/core/workflow/dispatch/tools/readFiles.ts` | `normalizeReadableFileUrl` / `parseFileContentFromUrls` | `parseFileContentFromUrls` 统一负责 URL 标准化、过滤、文件读取与解析;`normalizeReadableFileUrl` 仅作为底层清洗工具 | 不改对外 API | | `packages/service/core/ai/llm/utils.ts` | `loadRequestMessages` | 保持 `file_url` 过滤逻辑;确保同条消息 text 不被过滤 | 回归保障 | ### 6.4 运行时注入规则 @@ -137,7 +137,7 @@ 1. 使用消息副本,不修改 `histories`、`query`、`userQuestion` 原对象。 2. Chat/Tool 外层用 `Promise.all` 并行处理运行时 messages。 3. 单条 user query 只收集本条 `file.url`;不做跨 message URL 去重,不共享解析缓存。 -4. `getFileContentFromLinks` 负责 URL 标准化、过滤、`maxFiles` 截断和文件解析。 +4. `parseFileContentFromUrls` 负责 URL 标准化、过滤、`maxFiles` 截断和文件解析。 5. 文件解析结果回填到原本所属的 user message: - 原 message 已有 text:追加 `\n\n===---===---===\n\n...`。 - 原 message 只有 file:新增一个 text part 存放 ``。 diff --git a/document/content/self-host/upgrading/4-14/41417.mdx b/document/content/self-host/upgrading/4-14/41417.mdx new file mode 100644 index 0000000000..b6003dab65 --- /dev/null +++ b/document/content/self-host/upgrading/4-14/41417.mdx @@ -0,0 +1,17 @@ +--- +title: 'V4.14.17(处理中)' +description: 'FastGPT V4.14.17 更新说明' +--- + +## 升级指南 + +### 1. 更新镜像 tag + +- 更新 fastgpt-app(fastgpt 主服务) 镜像 tag: v4.14.17 +- 更新 fastgpt-pro(fastgpt 商业版) 镜像 tag: v4.14.17 + +## 🐛 修复 + +1. API 知识库 parentId 类型校验错误。 +2. 门户页对话无法上传文件。 +3. 商业版未包含内部文件解析接口,如果未配置 S3 External Endpoint,会导致文件解析失败。 \ No newline at end of file diff --git a/document/content/self-host/upgrading/4-14/meta.en.json b/document/content/self-host/upgrading/4-14/meta.en.json index e2711085d9..a234171923 100644 --- a/document/content/self-host/upgrading/4-14/meta.en.json +++ b/document/content/self-host/upgrading/4-14/meta.en.json @@ -2,6 +2,7 @@ "title": "4.14.x", "description": "", "pages": [ + "41417", "41416", "41415", "41414", diff --git a/document/content/self-host/upgrading/4-14/meta.json b/document/content/self-host/upgrading/4-14/meta.json index e2711085d9..a234171923 100644 --- a/document/content/self-host/upgrading/4-14/meta.json +++ b/document/content/self-host/upgrading/4-14/meta.json @@ -2,6 +2,7 @@ "title": "4.14.x", "description": "", "pages": [ + "41417", "41416", "41415", "41414", diff --git a/document/content/self-host/upgrading/4-15/4150.mdx b/document/content/self-host/upgrading/4-15/4150.mdx index b37734fe81..885250c97a 100644 --- a/document/content/self-host/upgrading/4-15/4150.mdx +++ b/document/content/self-host/upgrading/4-15/4150.mdx @@ -7,6 +7,7 @@ description: 'FastGPT V4.15.0 更新说明' 1. 新增循环节点,弃用旧的批量执行。 2. 全局变量输入框支持输入 object 类型数据。 +3. 工具调用模式下,如果开启了虚拟机功能,用户对话框上传的文件会直接注入到虚拟机中。 ## ⚙️ 优化 diff --git a/document/content/toc.mdx b/document/content/toc.mdx index e1d4f3515c..b2465c931a 100644 --- a/document/content/toc.mdx +++ b/document/content/toc.mdx @@ -118,6 +118,7 @@ description: FastGPT 文档目录 - [/self-host/upgrading/4-14/41414](/self-host/upgrading/4-14/41414) - [/self-host/upgrading/4-14/41415](/self-host/upgrading/4-14/41415) - [/self-host/upgrading/4-14/41416](/self-host/upgrading/4-14/41416) +- [/self-host/upgrading/4-14/41417](/self-host/upgrading/4-14/41417) - [/self-host/upgrading/4-14/4142](/self-host/upgrading/4-14/4142) - [/self-host/upgrading/4-14/4143](/self-host/upgrading/4-14/4143) - [/self-host/upgrading/4-14/4144](/self-host/upgrading/4-14/4144) diff --git a/document/data/doc-last-modified.json b/document/data/doc-last-modified.json index a9c4924b47..3b9f9d8b14 100644 --- a/document/data/doc-last-modified.json +++ b/document/data/doc-last-modified.json @@ -251,7 +251,7 @@ "content/self-host/upgrading/4-14/41481.mdx": "2026-04-26T21:08:47+08:00", "content/self-host/upgrading/4-14/4149.en.mdx": "2026-04-26T21:08:47+08:00", "content/self-host/upgrading/4-14/4149.mdx": "2026-04-26T21:08:47+08:00", - "content/self-host/upgrading/4-15/4150.mdx": "2026-04-28T13:31:00+08:00", + "content/self-host/upgrading/4-15/4150.mdx": "2026-04-28T15:10:52+08:00", "content/self-host/upgrading/outdated/40.en.mdx": "2026-04-26T21:08:47+08:00", "content/self-host/upgrading/outdated/40.mdx": "2026-04-26T21:08:47+08:00", "content/self-host/upgrading/outdated/41.en.mdx": "2026-04-26T21:08:47+08:00", diff --git a/packages/global/core/ai/sandbox/constants.ts b/packages/global/core/ai/sandbox/constants.ts index f5d32f7ce4..06c0a40c4a 100644 --- a/packages/global/core/ai/sandbox/constants.ts +++ b/packages/global/core/ai/sandbox/constants.ts @@ -77,11 +77,14 @@ export const SANDBOX_GET_FILE_URL_TOOL: ChatCompletionTool = { }; // Prompt -export const SANDBOX_SYSTEM_PROMPT = `你拥有一个独立的 Linux 沙盒环境(Ubuntu 22.04),可通过 ${SANDBOX_TOOL_NAME} 工具执行命令: -- 预装:bash / python3 / node / bun / git / curl +export const SANDBOX_USER_FILES_PATH = 'user_files/'; +export const SANDBOX_SYSTEM_PROMPT = `## 沙盒能力 +你拥有一个独立的 Linux 沙盒环境(Ubuntu 22.04),可通过 ${SANDBOX_TOOL_NAME} 工具执行命令。 +- 系统预装:bash / python3 / node / bun / git / curl - 可自行安装软件包(apt / pip / npm) - 生成的文件内容都保存在当前目录下即可 -- 若需要将生成的文件分享给用户,可使用 ${SANDBOX_GET_FILE_URL_TOOL_NAME} 工具获取文件的临时访问链接`; +- 用户主动上传的文件存储在 ${SANDBOX_USER_FILES_PATH} 目录下 +- 若需要将生成的文件链接,可使用 ${SANDBOX_GET_FILE_URL_TOOL_NAME} 工具获取文件的临时访问链接`; // 聚合 export const sandboxToolMap: Record< diff --git a/packages/service/core/ai/llm/agentLoop/prompt.ts b/packages/service/core/ai/llm/agentLoop/prompt.ts index 7adf5ec83a..21e8001103 100644 --- a/packages/service/core/ai/llm/agentLoop/prompt.ts +++ b/packages/service/core/ai/llm/agentLoop/prompt.ts @@ -46,16 +46,18 @@ ${list} /* ===== Inject user query ===== */ export const getUserFilesPrompt = ( - files: { id: string; name: string; content?: string }[] = [] + files: { id?: string; name: string; sandboxPath?: string; content?: string }[] = [] ) => { if (files.length === 0) return ''; return `# Input Files -本次用户上传的文件: +用户本次上传的文件: ${files .map((file) => ` +${file.id ? `${file.id}` : ''} ${file.name} +${file.sandboxPath ? `${file.sandboxPath}` : ''} ${file.content ? `${file.content}` : ''} `.trim() ) diff --git a/packages/service/core/ai/sandbox/toolCall/index.ts b/packages/service/core/ai/sandbox/toolCall/index.ts index 0580adcb0d..451555e01c 100644 --- a/packages/service/core/ai/sandbox/toolCall/index.ts +++ b/packages/service/core/ai/sandbox/toolCall/index.ts @@ -6,6 +6,9 @@ import { toolMap as getFileUrlToolMap } from './getFileUrl.tool'; import { toolMap as shellToolMap } from './shell.tool'; import { getSandboxClient } from '../controller'; import { parseJsonArgs } from '../../utils'; +import { axios } from '../../../../common/api/axios'; +import { serverRequestBaseUrl } from '../../../../common/api/serverRequest'; +import type { FileWriteEntry } from '@fastgpt-sdk/sandbox-adapter'; const ToolMap = { ...getFileUrlToolMap, @@ -74,6 +77,39 @@ export const runSandboxTools = async ({ }; }; +export const injectSandboxFiles = async ({ + appId, + userId, + chatId, + files +}: { + appId: string; + userId: string; + chatId: string; + files: { path: string; url: string }[]; +}) => { + const instance = await getSandboxClient({ appId, userId, chatId }); + await instance.ensureAvailable(); + + const writeFilesData = await Promise.all( + files + .filter((file) => file.path) + .map(async ({ path, url }): Promise => { + const response = await axios.get(url, { + baseURL: serverRequestBaseUrl, + responseType: 'arraybuffer' + }); + + return { + path, + data: response.data + }; + }) + ); + + await instance.provider.writeFiles(writeFilesData); +}; + export const getSandboxToolInfo = (name: string, lang: localeType = LangEnum.en) => { if (name in sandboxToolMap) { const info = sandboxToolMap[name]; diff --git a/packages/service/core/workflow/dispatch/ai/chat.ts b/packages/service/core/workflow/dispatch/ai/chat.ts index 29ef5285ca..02f8107ccc 100644 --- a/packages/service/core/workflow/dispatch/ai/chat.ts +++ b/packages/service/core/workflow/dispatch/ai/chat.ts @@ -30,9 +30,9 @@ import { getHistoryPreview } from '@fastgpt/global/core/chat/utils'; import { computedMaxToken } from '../../../ai/utils'; import { formatTime2YMDHM } from '@fastgpt/global/common/string/time'; import type { AiChatQuoteRoleType } from '@fastgpt/global/core/workflow/template/system/aiChat/type'; -import { getFileContentFromLinks } from '../../utils/file'; +import { parseFileContentFromUrls } from '../../utils/file'; import { parseUrlToFileType } from '../../utils/context'; -import { rewriteUserQueryWithFiles } from '../../utils/file'; +import { formatUserQueryWithFiles } from '../../utils/file'; import { i18nT } from '../../../../../web/i18n/utils'; import { postTextCensor } from '../../../chat/postTextCensor'; import { createLLMResponse } from '../../../ai/llm/request'; @@ -167,9 +167,6 @@ export const dispatchChatCompletion = async (props: ChatProps): Promise { + const files = await parseFileContentFromUrls({ + urls, + requestOrigin, + maxFiles, + teamId: runningUserInfo.teamId, + tmbId: runningUserInfo.tmbId, + customPdfParse, + usageId + }); + + return files.map((file) => ({ + name: file.name, + content: file.content + })); + } + }); + return { ...message, - value: await rewriteUserQueryWithFiles({ - queryId: message.dataId || `${index}`, - userQuery: message.value, - requestOrigin, - maxFiles, - customPdfParse, - usageId, - teamId: runningUserInfo.teamId, - tmbId: runningUserInfo.tmbId - }) + value: query }; }) ); diff --git a/packages/service/core/workflow/dispatch/ai/toolcall/constants.ts b/packages/service/core/workflow/dispatch/ai/toolcall/constants.ts index 804f23b1d3..0e4cbb3c75 100644 --- a/packages/service/core/workflow/dispatch/ai/toolcall/constants.ts +++ b/packages/service/core/workflow/dispatch/ai/toolcall/constants.ts @@ -1,4 +1,3 @@ -import { replaceVariable } from '@fastgpt/global/common/string/tools'; import { FlowNodeTypeEnum } from '@fastgpt/global/core/workflow/node/constant'; import { getNanoid } from '@fastgpt/global/common/string/tools'; import type { ChildResponseItemType } from './type'; diff --git a/packages/service/core/workflow/dispatch/ai/toolcall/index.ts b/packages/service/core/workflow/dispatch/ai/toolcall/index.ts index 6b8095ac69..ff269db46c 100644 --- a/packages/service/core/workflow/dispatch/ai/toolcall/index.ts +++ b/packages/service/core/workflow/dispatch/ai/toolcall/index.ts @@ -4,6 +4,7 @@ import type { DispatchNodeResultType } from '@fastgpt/global/core/workflow/runti import { getLLMModel } from '../../../../ai/model'; import { filterToolNodeIdByEdges, getNodeErrResponse, getHistories } from '../../utils'; import { runToolCall } from './toolCall'; +import type { FileInputType } from './type'; import { type DispatchToolModuleProps, type ToolNodeItemType } from './type'; import type { UserChatItemFileItemType, ChatItemMiniType } from '@fastgpt/global/core/chat/type'; import { ChatRoleEnum } from '@fastgpt/global/core/chat/constants'; @@ -16,11 +17,12 @@ import { import { getHistoryPreview } from '@fastgpt/global/core/chat/utils'; import { filterToolResponseToPreview } from './utils'; import { parseUrlToFileType } from '../../../utils/context'; -import { rewriteUserQueryWithFiles } from '../../../utils/file'; +import { formatUserQueryWithFiles, parseFileInfoFromUrls } from '../../../utils/file'; import { postTextCensor } from '../../../../chat/postTextCensor'; import type { FlowNodeInputItemType } from '@fastgpt/global/core/workflow/type/io'; import type { McpToolDataType } from '@fastgpt/global/core/app/tool/mcpTool/type'; import { getToolConfigStatus } from '@fastgpt/global/core/app/formEdit/utils'; +import { SANDBOX_USER_FILES_PATH } from '@fastgpt/global/core/ai/sandbox/constants'; type Response = DispatchNodeResultType<{ [NodeOutputKeyEnum.answerText]: string; @@ -38,6 +40,7 @@ export const dispatchRunTools = async (props: DispatchToolModuleProps): Promise< runningUserInfo, externalProvider, usageId, + responseChatItemId, params: { model, systemPrompt, @@ -46,10 +49,13 @@ export const dispatchRunTools = async (props: DispatchToolModuleProps): Promise< fileUrlList: fileLinks, aiChatVision, aiChatReasoning, - isResponseAnswerText = true + isResponseAnswerText = true, + useAgentSandbox } } = props; + const useSandbox = !!useAgentSandbox && !!global.feConfigs?.show_agent_sandbox; + try { const toolModel = getLLMModel(model); const useVision = aiChatVision && toolModel.vision; @@ -120,11 +126,14 @@ export const dispatchRunTools = async (props: DispatchToolModuleProps): Promise< .filter(Boolean) .join('\n\n-----\n\n'); + const allFiles = new Map(); + const currentInputFiles: FileInputType[] = []; const messages = await (async () => { const value: ChatItemMiniType[] = [ ...getSystemPrompt_ChatItemType(concatenateSystemPrompt), ...chatHistories, { + dataId: responseChatItemId, obj: ChatRoleEnum.Human, value: runtimePrompt2ChatsValue({ text: userChatInput, @@ -142,18 +151,40 @@ export const dispatchRunTools = async (props: DispatchToolModuleProps): Promise< return message; } + const prefixId = message.dataId || `${index}`; + const query = await formatUserQueryWithFiles({ + userQuery: message.value, + parseFileFn: async (urls) => { + const files = await parseFileInfoFromUrls({ + urls, + requestOrigin, + maxFiles, + teamId: runningUserInfo.teamId + }).then((res) => + res + .filter((item) => item.success) + .map((item, index) => ({ + id: `${prefixId}-${index}`, + name: item.name, + url: item.url, + sandboxPath: useSandbox ? `${SANDBOX_USER_FILES_PATH}${item.name}` : undefined + })) + ); + + files.forEach((file) => { + allFiles.set(file.id, file); + }); + if (index === runtimeMessages.length - 1) { + currentInputFiles.push(...files); + } + + return files; + } + }); + return { ...message, - value: await rewriteUserQueryWithFiles({ - queryId: message.dataId || `${index}`, - userQuery: message.value, - requestOrigin, - maxFiles, - customPdfParse: chatConfig?.fileSelectConfig?.customPdfParse, - usageId, - teamId: runningUserInfo.teamId, - tmbId: runningUserInfo.tmbId - }) + value: query }; }) ); @@ -188,6 +219,8 @@ export const dispatchRunTools = async (props: DispatchToolModuleProps): Promise< return runToolCall({ ...props, + allFiles, + currentInputFiles, runtimeNodes, runtimeEdges, toolNodes, diff --git a/packages/service/core/workflow/dispatch/ai/toolcall/toolCall.ts b/packages/service/core/workflow/dispatch/ai/toolcall/toolCall.ts index a176c05eb2..0be5862263 100644 --- a/packages/service/core/workflow/dispatch/ai/toolcall/toolCall.ts +++ b/packages/service/core/workflow/dispatch/ai/toolcall/toolCall.ts @@ -19,7 +19,18 @@ import type { ToolCallChildrenInteractive } from '@fastgpt/global/core/workflow/ import type { JsonSchemaPropertiesItemType } from '@fastgpt/global/core/app/jsonschema'; import { SANDBOX_SYSTEM_PROMPT, SANDBOX_TOOLS } from '@fastgpt/global/core/ai/sandbox/constants'; import { getSandboxToolWorkflowResponse } from './constants'; -import { getSandboxToolInfo, runSandboxTools } from '../../../../ai/sandbox/toolCall'; +import { + getSandboxToolInfo, + injectSandboxFiles, + runSandboxTools +} from '../../../../ai/sandbox/toolCall'; +import { + dispatchReadFileTool, + ReadFileTooData, + ReadFileToolParamsSchema, + ReadFileToolSchema +} from './tools/file'; +import { parseI18nString } from '@fastgpt/global/common/i18n/utils'; type ResponseType = { requestIds: string[]; @@ -40,6 +51,8 @@ export const runToolCall = async (props: DispatchToolModuleProps): Promise 0) { + tools.push(ReadFileToolSchema); + } + + // 注入 sandbox tool if (useAgentSandbox && global.feConfigs?.show_agent_sandbox) { // 注入 sandbox_shell 工具 tools.push(...SANDBOX_TOOLS); @@ -128,9 +146,27 @@ export const runToolCall = async (props: DispatchToolModuleProps): Promise ({ + path: file.sandboxPath!, + url: file.url + })) + }); } const getToolInfo = (name: string) => { + if (name === ReadFileTooData.id) { + return { + type: 'file' as const, + name: parseI18nString(ReadFileTooData.name, workflowProps.lang), + avatar: ReadFileTooData.avatar + }; + } const sandboxToolInfo = getSandboxToolInfo(name, workflowProps.lang); if (sandboxToolInfo) { return { @@ -276,6 +312,20 @@ export const runToolCall = async (props: DispatchToolModuleProps): Promise ({ id, url: allFiles.get(id)?.url! })), + teamId: workflowProps.runningUserInfo.teamId, + tmbId: workflowProps.runningUserInfo.tmbId, + customPdfParse: workflowProps.chatConfig?.fileSelectConfig?.customPdfParse, + usageId: workflowProps.usageId + }); + return { + response, + usages, + nodeResponse + }; } else { const toolNode = toolInfo.rawData; diff --git a/packages/service/core/workflow/dispatch/ai/toolcall/tools/file.ts b/packages/service/core/workflow/dispatch/ai/toolcall/tools/file.ts new file mode 100644 index 0000000000..41d23fa931 --- /dev/null +++ b/packages/service/core/workflow/dispatch/ai/toolcall/tools/file.ts @@ -0,0 +1,113 @@ +import type { ChatCompletionTool } from '@fastgpt/global/core/ai/llm/type'; +import type { ChatNodeUsageType } from '@fastgpt/global/support/wallet/bill/type'; +import { getFileContentByUrl } from '../../../../utils/file'; +import { getErrText } from '@fastgpt/global/common/error/utils'; +import { getLogger } from '@fastgpt-sdk/otel/logger'; +import { LogCategories } from '../../../../../../common/logger'; +import { FlowNodeTypeEnum } from '@fastgpt/global/core/workflow/node/constant'; +import { i18nT } from '../../../../../../../web/i18n/utils'; +import z from 'zod'; + +const logger = getLogger(LogCategories.MODULE.AI.TOOL_CALL); + +export const ReadFileTooData = { + id: 'read_files', + name: { + 'zh-CN': '文件解析', + en: 'File parse', + 'zh-Hant': '文件解析' + }, + avatar: 'core/workflow/template/readFiles' +}; +export const ReadFileToolSchema: ChatCompletionTool = { + type: 'function', + function: { + name: ReadFileTooData.id, + description: '解析文件内容,获取文本。', + parameters: { + type: 'object', + properties: { + ids: { type: 'array', items: { type: 'string' } } + }, + required: ['ids'] + } + } +}; + +export const ReadFileToolParamsSchema = z.object({ + ids: z.array(z.string()) +}); +type FileReadParams = { + files: { id: string; url: string }[]; + + teamId: string; + tmbId: string; + customPdfParse?: boolean; + usageId?: string; +}; +export const dispatchReadFileTool = async ({ + files, + teamId, + tmbId, + customPdfParse, + usageId +}: FileReadParams) => { + try { + const usages: ChatNodeUsageType[] = []; + const readFilesResult = await Promise.all( + files.map(async ({ url, id }) => { + try { + const { name, content } = await getFileContentByUrl({ + url, + teamId, + tmbId, + customPdfParse, + usageId + }); + + return { + id, + name, + content + }; + } catch (error) { + return { + id, + name: url, + content: getErrText(error, 'Load file error') + }; + } + }) + ); + + // Stringify the result + const response = readFilesResult + .map( + (file) => ` +${file.id} +${file.content} +` + ) + .join('\n'); + + return { + response, + usages, + nodeResponse: { + moduleType: FlowNodeTypeEnum.readFiles, + moduleName: i18nT('chat:read_file') + } + }; + } catch (error) { + logger.error('[File Read] Compression failed, using original content', { error }); + return { + response: `Failed to read file: ${getErrText(error)}`, + usages: [], + nodeResponse: { + moduleType: FlowNodeTypeEnum.readFiles, + moduleName: i18nT('chat:read_file'), + errorText: `Failed to read file: ${getErrText(error)}` + } + }; + } +}; diff --git a/packages/service/core/workflow/dispatch/ai/toolcall/type.ts b/packages/service/core/workflow/dispatch/ai/toolcall/type.ts index bf2888bdce..76102910e1 100644 --- a/packages/service/core/workflow/dispatch/ai/toolcall/type.ts +++ b/packages/service/core/workflow/dispatch/ai/toolcall/type.ts @@ -30,6 +30,8 @@ export type DispatchToolModuleProps = ModuleDispatchProps<{ toolNodes: ToolNodeItemType[]; toolModel: LLMModelItemType; childrenInteractiveParams?: ToolCallChildrenInteractive['params']; + allFiles: Map; + currentInputFiles: FileInputType[]; }; export type ToolNodeItemType = { @@ -49,3 +51,10 @@ export type ChildResponseItemType = { runTimes: DispatchFlowResponse['runTimes']; flowUsages: DispatchFlowResponse['flowUsages']; }; + +export type FileInputType = { + id: string; + name: string; + url: string; + sandboxPath?: string; +}; diff --git a/packages/service/core/workflow/dispatch/tools/readFiles.ts b/packages/service/core/workflow/dispatch/tools/readFiles.ts index 33a4b6bdc5..2c67f68caa 100644 --- a/packages/service/core/workflow/dispatch/tools/readFiles.ts +++ b/packages/service/core/workflow/dispatch/tools/readFiles.ts @@ -6,7 +6,7 @@ import { type DispatchNodeResultType } from '@fastgpt/global/core/workflow/runti import { ChatRoleEnum } from '@fastgpt/global/core/chat/constants'; import { type ChatItemMiniType } from '@fastgpt/global/core/chat/type'; import { getNodeErrResponse } from '../utils'; -import { getFileContentFromLinks } from '../../utils/file'; +import { parseFileContentFromUrls } from '../../utils/file'; import { getUserFilesPrompt } from '../../../ai/llm/agentLoop/prompt'; import { sliceStrStartEnd } from '@fastgpt/global/common/string/tools'; @@ -35,7 +35,7 @@ export const dispatchReadFiles = async (props: Props): Promise => { const filesFromHistories = version !== '489' ? [] : getHistoryFileLinks(histories); try { - const readFilesResult = await getFileContentFromLinks({ + const readFilesResult = await parseFileContentFromUrls({ // Concat fileUrlList and filesFromHistories; remove not supported files urls: [...fileUrlList, ...filesFromHistories], requestOrigin, @@ -47,7 +47,7 @@ export const dispatchReadFiles = async (props: Props): Promise => { }); const files = readFilesResult.map((item, index) => ({ id: `${index}`, - name: item.filename, + name: item.name, content: item.content })); @@ -61,14 +61,14 @@ export const dispatchReadFiles = async (props: Props): Promise => { data: { [NodeOutputKeyEnum.text]: text, [NodeOutputKeyEnum.rawResponse]: readFilesResult.map((item) => ({ - filename: item.filename, + filename: item.name, url: item.url, text: item.content })) }, [DispatchNodeResponseKeyEnum.nodeResponse]: { readFiles: readFilesResult.map((item) => ({ - name: item.filename, + name: item.name, url: item.url })), readFilesResult: getPreviewResponse diff --git a/packages/service/core/workflow/utils/file.ts b/packages/service/core/workflow/utils/file.ts index 88fc17708a..e96bc88bcf 100644 --- a/packages/service/core/workflow/utils/file.ts +++ b/packages/service/core/workflow/utils/file.ts @@ -29,19 +29,20 @@ type GetFileProps = { usageId?: string; }; -export const rewriteUserQueryWithFiles = async ({ - queryId, +export const formatUserQueryWithFiles = async ({ userQuery, - requestOrigin, - maxFiles, - customPdfParse, - teamId, - tmbId, - usageId -}: GetFileProps & { - queryId: string; + parseFileFn +}: { userQuery: UserChatItemValueItemType[]; -}) => { + parseFileFn: (urls: string[]) => Promise< + { + id?: string; + name: string; + sandboxPath?: string; + content?: string; + }[] + >; +}): Promise => { const urls = userQuery .map((item) => (item.file?.type === ChatFileTypeEnum.file ? item.file.url : '')) .filter(Boolean); @@ -50,29 +51,15 @@ export const rewriteUserQueryWithFiles = async ({ return userQuery; } - const readFilesResult = await getFileContentFromLinks({ - urls, - requestOrigin, - maxFiles, - teamId, - tmbId, - customPdfParse, - usageId - }); + const readFilesResult = await parseFileFn(urls); if (readFilesResult.length === 0) { return userQuery; } - const files = readFilesResult.map((item, index) => ({ - id: `${queryId}-${index}`, - name: item.filename, - content: item.content - })); - // 把 file 和 text 合并成一个 text(实际上应该只会有一个 text+多个 files) const text = userQuery.find((item) => item.text?.content)?.text?.content; - const fileQuery = getUserFilesPrompt(files); + const fileQuery = getUserFilesPrompt(readFilesResult); const finalQuery = injectUserQueryPrompt({ query: text, @@ -124,7 +111,122 @@ export const normalizeReadableFileUrl = ({ } }; -export const getFileContentFromLinks = async ({ +export const getFileInfoFromUrl = async ({ teamId, url }: { teamId: string; url: string }) => { + // Get file buffer data + const response = await axios.get(url, { + baseURL: serverRequestBaseUrl, + responseType: 'arraybuffer' + }); + + const urlObj = new URL(url, 'http://localhost:3000'); + const isChatExternalUrl = !urlObj.pathname.startsWith(`/${S3Buckets.private}/${S3Sources.chat}/`); + + // Get file name + const { filename, extension, imageParsePrefix } = (() => { + if (isChatExternalUrl) { + const contentDisposition = response.headers['content-disposition'] || ''; + const matchFilename = parseContentDispositionFilename(contentDisposition); + const filename = matchFilename || urlObj.pathname.split('/').pop() || 'file'; + const extension = path.extname(filename).replace('.', ''); + + return { + filename, + extension, + imageParsePrefix: getFileS3Key.temp({ teamId, filename }).fileParsedPrefix + }; + } + + return S3ChatSource.parseChatUrl(url); + })(); + + return { + isChatExternalUrl, + filename, + extension, + imageParsePrefix, + contentType: response.headers['content-type'], + stream: response.data + }; +}; + +export const getFileContentByUrl = async ({ + url, + teamId, + tmbId, + customPdfParse, + usageId +}: { + url: string; + teamId: string; + tmbId: string; + customPdfParse?: boolean; + usageId?: string; +}) => { + // Get from buffer + const rawTextBuffer = await getS3RawTextSource().getRawTextBuffer({ + sourceId: url, + customPdfParse + }); + if (rawTextBuffer) { + return { + name: rawTextBuffer.filename, + url, + content: rawTextBuffer.text + }; + } + + const { isChatExternalUrl, filename, extension, imageParsePrefix, contentType, stream } = + await getFileInfoFromUrl({ teamId, url }); + + const buffer = Buffer.from(stream, 'binary'); + // Get encoding + const encoding = (() => { + if (contentType) { + const charsetRegex = /charset=([^;]*)/; + const matches = charsetRegex.exec(contentType); + if (matches != null && matches[1]) { + return matches[1]; + } + } + + return detectFileEncoding(buffer); + })(); + + const { rawText } = await readFileContentByBuffer({ + extension, + teamId, + tmbId, + buffer, + encoding, + customPdfParse, + getFormatText: true, + imageKeyOptions: imageParsePrefix + ? { + prefix: imageParsePrefix, + // 聊天对话里面上传的外部链接,解析出来的图片过期时间设置为1天,而且是存储在临时文件夹的 + expiredTime: isChatExternalUrl ? addDays(new Date(), 1) : undefined + } + : undefined, + usageId + }); + + const replacedText = replaceS3KeyToPreviewUrl(rawText, addDays(new Date(), 90)); + + // Add to buffer + getS3RawTextSource().addRawTextBuffer({ + sourceId: url, + sourceName: filename, + text: replacedText, + customPdfParse + }); + + return { + name: filename, + url, + content: replacedText + }; +}; +export const parseFileContentFromUrls = async ({ urls, requestOrigin, maxFiles, @@ -134,7 +236,72 @@ export const getFileContentFromLinks = async ({ usageId }: GetFileProps & { urls: string[]; -}) => { +}): Promise< + { + success: boolean; + name: string; + url: string; + content: string; + }[] +> => { + const parseUrlList = urls + .map((url) => normalizeReadableFileUrl({ url, requestOrigin })) + .filter(Boolean) + .slice(0, maxFiles); + + const readFilesResult = await Promise.all( + parseUrlList + .map(async (url) => { + try { + if (await isInternalAddress(url)) { + return { + success: false, + name: '', + url, + content: PRIVATE_URL_TEXT + }; + } + + const { name, content } = await getFileContentByUrl({ + url, + teamId, + tmbId, + customPdfParse, + usageId + }); + + return { success: true, name, url, content: content }; + } catch (error) { + return { + success: false, + name: '', + url, + content: getErrText(error, 'Load file error') + }; + } + }) + .filter(Boolean) + ); + + return readFilesResult; +}; +export const parseFileInfoFromUrls = async ({ + urls, + requestOrigin, + maxFiles, + teamId +}: { + requestOrigin?: string; + maxFiles: number; + teamId: string; + urls: string[]; +}): Promise< + { + success: boolean; + name: string; + url: string; + }[] +> => { const parseUrlList = urls .map((url) => normalizeReadableFileUrl({ url, requestOrigin })) .filter(Boolean) @@ -146,102 +313,33 @@ export const getFileContentFromLinks = async ({ // Get from buffer const rawTextBuffer = await getS3RawTextSource().getRawTextBuffer({ sourceId: url, - customPdfParse + customPdfParse: false }); if (rawTextBuffer) { return { success: true, - filename: rawTextBuffer.filename, - url, - content: rawTextBuffer.text + name: rawTextBuffer.filename, + url }; } try { if (await isInternalAddress(url)) { - return Promise.reject(PRIVATE_URL_TEXT); + return { + success: false, + name: '', + url + }; } - // Get file buffer data - const response = await axios.get(url, { - baseURL: serverRequestBaseUrl, - responseType: 'arraybuffer' - }); + const { filename } = await getFileInfoFromUrl({ teamId, url }); - const buffer = Buffer.from(response.data, 'binary'); - - const urlObj = new URL(url, 'http://localhost:3000'); - const isChatExternalUrl = !urlObj.pathname.startsWith( - `/${S3Buckets.private}/${S3Sources.chat}/` - ); - - // Get file name - const { filename, extension, imageParsePrefix } = (() => { - if (isChatExternalUrl) { - const contentDisposition = response.headers['content-disposition'] || ''; - const matchFilename = parseContentDispositionFilename(contentDisposition); - const filename = matchFilename || urlObj.pathname.split('/').pop() || 'file'; - const extension = path.extname(filename).replace('.', ''); - - return { - filename, - extension, - imageParsePrefix: getFileS3Key.temp({ teamId, filename }).fileParsedPrefix - }; - } - - return S3ChatSource.parseChatUrl(url); - })(); - - // Get encoding - const encoding = (() => { - const contentType = response.headers['content-type']; - if (contentType) { - const charsetRegex = /charset=([^;]*)/; - const matches = charsetRegex.exec(contentType); - if (matches != null && matches[1]) { - return matches[1]; - } - } - - return detectFileEncoding(buffer); - })(); - - const { rawText } = await readFileContentByBuffer({ - extension, - teamId, - tmbId, - buffer, - encoding, - customPdfParse, - getFormatText: true, - imageKeyOptions: imageParsePrefix - ? { - prefix: imageParsePrefix, - // 聊天对话里面上传的外部链接,解析出来的图片过期时间设置为1天,而且是存储在临时文件夹的 - expiredTime: isChatExternalUrl ? addDays(new Date(), 1) : undefined - } - : undefined, - usageId - }); - - const replacedText = replaceS3KeyToPreviewUrl(rawText, addDays(new Date(), 90)); - - // Add to buffer - getS3RawTextSource().addRawTextBuffer({ - sourceId: url, - sourceName: filename, - text: replacedText, - customPdfParse - }); - - return { success: true, filename, url, content: replacedText }; + return { success: true, name: filename, url }; } catch (error) { return { success: false, - filename: '', - url, - content: getErrText(error, 'Load file error') + name: '', + url }; } }) diff --git a/packages/service/test/core/workflow/dispatch/tools/readFiles.test.ts b/packages/service/test/core/workflow/dispatch/tools/readFiles.test.ts index c281ab2fd0..e91613f0ed 100644 --- a/packages/service/test/core/workflow/dispatch/tools/readFiles.test.ts +++ b/packages/service/test/core/workflow/dispatch/tools/readFiles.test.ts @@ -4,10 +4,10 @@ import type { ChatItemMiniType } from '@fastgpt/global/core/chat/type'; import { NodeOutputKeyEnum } from '@fastgpt/global/core/workflow/constants'; import { DispatchNodeResponseKeyEnum } from '@fastgpt/global/core/workflow/runtime/constants'; -const mockGetFileContentFromLinks = vi.hoisted(() => vi.fn()); +const mockparseFileContentFromUrls = vi.hoisted(() => vi.fn()); vi.mock('@fastgpt/service/core/workflow/utils/file', () => ({ - getFileContentFromLinks: mockGetFileContentFromLinks + parseFileContentFromUrls: mockparseFileContentFromUrls })); import { @@ -28,13 +28,13 @@ const baseProps = { describe('dispatchReadFiles', () => { beforeEach(() => { vi.clearAllMocks(); - mockGetFileContentFromLinks.mockResolvedValue([]); + mockparseFileContentFromUrls.mockResolvedValue([]); }); it('成功读取并返回文本/原始响应/节点响应/工具响应结构', async () => { - mockGetFileContentFromLinks.mockResolvedValue([ - { success: true, filename: 'a.pdf', url: '/a.pdf', content: 'Alpha' }, - { success: true, filename: 'b.pdf', url: '/b.pdf', content: 'Beta' } + mockparseFileContentFromUrls.mockResolvedValue([ + { success: true, name: 'a.pdf', url: '/a.pdf', content: 'Alpha' }, + { success: true, name: 'b.pdf', url: '/b.pdf', content: 'Beta' } ]); const result = await dispatchReadFiles({ @@ -42,7 +42,7 @@ describe('dispatchReadFiles', () => { params: { fileUrlList: ['/a.pdf', '/b.pdf'] } }); - expect(mockGetFileContentFromLinks).toHaveBeenCalledWith({ + expect(mockparseFileContentFromUrls).toHaveBeenCalledWith({ urls: ['/a.pdf', '/b.pdf'], requestOrigin: 'http://localhost:3000', maxFiles: 20, @@ -90,7 +90,7 @@ describe('dispatchReadFiles', () => { params: { fileUrlList: ['/a.pdf'] } }); - expect(mockGetFileContentFromLinks).toHaveBeenCalledWith( + expect(mockparseFileContentFromUrls).toHaveBeenCalledWith( expect.objectContaining({ maxFiles: 5, customPdfParse: true @@ -105,7 +105,7 @@ describe('dispatchReadFiles', () => { params: { fileUrlList: ['/a.pdf'] } }); - expect(mockGetFileContentFromLinks).toHaveBeenCalledWith( + expect(mockparseFileContentFromUrls).toHaveBeenCalledWith( expect.objectContaining({ maxFiles: 20, customPdfParse: false }) ); }); @@ -117,7 +117,7 @@ describe('dispatchReadFiles', () => { params: { fileUrlList: ['/a.pdf'] } }); - expect(mockGetFileContentFromLinks).toHaveBeenCalledWith( + expect(mockparseFileContentFromUrls).toHaveBeenCalledWith( expect.objectContaining({ maxFiles: 20 }) ); }); @@ -145,7 +145,7 @@ describe('dispatchReadFiles', () => { params: { fileUrlList: ['/current.pdf'] } }); - expect(mockGetFileContentFromLinks).toHaveBeenCalledWith( + expect(mockparseFileContentFromUrls).toHaveBeenCalledWith( expect.objectContaining({ urls: ['/current.pdf', '/history.pdf'] }) @@ -175,7 +175,7 @@ describe('dispatchReadFiles', () => { params: { fileUrlList: ['/current.pdf'] } }); - expect(mockGetFileContentFromLinks).toHaveBeenCalledWith( + expect(mockparseFileContentFromUrls).toHaveBeenCalledWith( expect.objectContaining({ urls: ['/current.pdf'] }) @@ -188,11 +188,13 @@ describe('dispatchReadFiles', () => { params: {} }); - expect(mockGetFileContentFromLinks).toHaveBeenCalledWith(expect.objectContaining({ urls: [] })); + expect(mockparseFileContentFromUrls).toHaveBeenCalledWith( + expect.objectContaining({ urls: [] }) + ); }); it('空文件结果返回空文本和空数组结构', async () => { - mockGetFileContentFromLinks.mockResolvedValue([]); + mockparseFileContentFromUrls.mockResolvedValue([]); const result = await dispatchReadFiles({ ...baseProps, @@ -209,8 +211,8 @@ describe('dispatchReadFiles', () => { it('超大内容下预览仍按 sliceStrStartEnd 截断 (start/end 各 1000)', async () => { const huge = 'x'.repeat(5000); - mockGetFileContentFromLinks.mockResolvedValue([ - { success: true, filename: 'big.txt', url: '/big.txt', content: huge } + mockparseFileContentFromUrls.mockResolvedValue([ + { success: true, name: 'big.txt', url: '/big.txt', content: huge } ]); const result = await dispatchReadFiles({ @@ -226,8 +228,8 @@ describe('dispatchReadFiles', () => { expect(preview).toContain('## big.txt'); }); - it('getFileContentFromLinks 抛错时通过 getNodeErrResponse 返回错误结构', async () => { - mockGetFileContentFromLinks.mockRejectedValue(new Error('boom')); + it('parseFileContentFromUrls 抛错时通过 getNodeErrResponse 返回错误结构', async () => { + mockparseFileContentFromUrls.mockRejectedValue(new Error('boom')); const result = await dispatchReadFiles({ ...baseProps, diff --git a/packages/service/test/core/workflow/utils/file.test.ts b/packages/service/test/core/workflow/utils/file.test.ts index a93e6dc5fb..7e5223d276 100644 --- a/packages/service/test/core/workflow/utils/file.test.ts +++ b/packages/service/test/core/workflow/utils/file.test.ts @@ -58,9 +58,10 @@ vi.mock('@fastgpt/service/common/s3/sources/chat/index', async (importOriginal) }); import { - getFileContentFromLinks, + parseFileContentFromUrls, + parseFileInfoFromUrls, normalizeReadableFileUrl, - rewriteUserQueryWithFiles + formatUserQueryWithFiles } from '@fastgpt/service/core/workflow/utils/file'; const createHumanMessage = (value: UserChatItemValueItemType[]): ChatItemMiniType => ({ @@ -68,6 +69,27 @@ const createHumanMessage = (value: UserChatItemValueItemType[]): ChatItemMiniTyp value }); +const createMockParseFileFn = ({ maxFiles = 20 }: { maxFiles?: number } = {}) => + vi.fn(async (urls: string[]) => { + const files = await Promise.all( + urls.slice(0, maxFiles).map(async (url) => { + const rawTextBuffer = await mockGetRawTextBuffer({ + sourceId: url, + customPdfParse: undefined + }); + + return rawTextBuffer + ? { + name: rawTextBuffer.filename, + content: rawTextBuffer.text + } + : undefined; + }) + ); + + return files.filter(Boolean) as { name: string; content: string }[]; + }); + const rewriteMessagesWithFileContent = async ({ messages, maxFiles = 20 @@ -83,12 +105,9 @@ const rewriteMessagesWithFileContent = async ({ return { ...message, - value: await rewriteUserQueryWithFiles({ - queryId: message.dataId || `${index}`, + value: await formatUserQueryWithFiles({ userQuery: message.value, - maxFiles, - teamId: 'team-1', - tmbId: 'tmb-1' + parseFileFn: createMockParseFileFn({ maxFiles }) }) }; }) @@ -135,7 +154,7 @@ describe('normalizeReadableFileUrl', () => { }); }); -describe('getFileContentFromLinks (buffer hit)', () => { +describe('parseFileContentFromUrls (buffer hit)', () => { beforeEach(() => { vi.clearAllMocks(); mockGetRawTextBuffer.mockImplementation(({ sourceId }: { sourceId: string }) => { @@ -154,7 +173,7 @@ describe('getFileContentFromLinks (buffer hit)', () => { }); it('在读取前统一标准化 URL', async () => { - const result = await getFileContentFromLinks({ + const result = await parseFileContentFromUrls({ urls: ['http://localhost:3000/a.pdf', '/b.pdf'], requestOrigin: 'http://localhost:3000', maxFiles: 20, @@ -176,7 +195,7 @@ describe('getFileContentFromLinks (buffer hit)', () => { }); }); -describe('getFileContentFromLinks (external fetch)', () => { +describe('parseFileContentFromUrls (external fetch)', () => { beforeEach(() => { vi.clearAllMocks(); // 默认 buffer 缓存未命中,强制走外部读取路径 @@ -185,21 +204,25 @@ describe('getFileContentFromLinks (external fetch)', () => { mockReadFileContentByBuffer.mockResolvedValue({ rawText: 'parsed text' }); }); - it('内部地址命中时整体 reject 抛出 PRIVATE_URL_TEXT', async () => { + it('内部地址命中时返回失败结果和 PRIVATE_URL_TEXT', async () => { mockIsInternalAddress.mockResolvedValue(true); - // 源码中使用 `return Promise.reject(...)`,async 函数的 try/catch 不会捕获, - // 因此整个 getFileContentFromLinks 会以 PRIVATE_URL_TEXT 作为 reason 拒绝 - await expect( - getFileContentFromLinks({ - urls: ['http://internal.svc/a.pdf'], - maxFiles: 20, - teamId: 'team-1', - tmbId: 'tmb-1' - }) - ).rejects.toBe(PRIVATE_URL_TEXT); + const result = await parseFileContentFromUrls({ + urls: ['http://internal.svc/a.pdf'], + maxFiles: 20, + teamId: 'team-1', + tmbId: 'tmb-1' + }); expect(mockAxiosGet).not.toHaveBeenCalled(); + expect(result).toEqual([ + { + success: false, + name: '', + url: 'http://internal.svc/a.pdf', + content: PRIVATE_URL_TEXT + } + ]); }); it('外部地址下载并使用 content-disposition 的文件名,按 charset 解码', async () => { @@ -211,7 +234,7 @@ describe('getFileContentFromLinks (external fetch)', () => { } }); - const result = await getFileContentFromLinks({ + const result = await parseFileContentFromUrls({ urls: ['http://example.com/raw'], maxFiles: 20, teamId: 'team-1', @@ -238,7 +261,7 @@ describe('getFileContentFromLinks (external fetch)', () => { ); expect(result[0]).toMatchObject({ success: true, - filename: 'report.pdf', + name: 'report.pdf', url: 'http://example.com/raw', content: 'parsed text' }); @@ -252,7 +275,7 @@ describe('getFileContentFromLinks (external fetch)', () => { } }); - const result = await getFileContentFromLinks({ + const result = await parseFileContentFromUrls({ urls: ['http://example.com/files/notes.txt'], maxFiles: 20, teamId: 'team-1', @@ -267,7 +290,7 @@ describe('getFileContentFromLinks (external fetch)', () => { ); expect(result[0]).toMatchObject({ success: true, - filename: 'notes.txt', + name: 'notes.txt', url: 'http://example.com/files/notes.txt' }); }); @@ -279,7 +302,7 @@ describe('getFileContentFromLinks (external fetch)', () => { headers: {} }); - const result = await getFileContentFromLinks({ + const result = await parseFileContentFromUrls({ urls: [chatUrl], maxFiles: 20, teamId: 'team-1', @@ -291,7 +314,7 @@ describe('getFileContentFromLinks (external fetch)', () => { ); expect(result[0]).toMatchObject({ success: true, - filename: 'abc123-doc.pdf', + name: 'abc123-doc.pdf', url: chatUrl }); }); @@ -302,7 +325,7 @@ describe('getFileContentFromLinks (external fetch)', () => { headers: {} }); - const result = await getFileContentFromLinks({ + const result = await parseFileContentFromUrls({ urls: ['http://example.com/?filename=fake.pdf'], maxFiles: 20, teamId: 'team-1', @@ -312,7 +335,7 @@ describe('getFileContentFromLinks (external fetch)', () => { // pathname 是 '/',split('/').pop() 返回 '',最终落到 'file' 兜底 expect(result[0]).toMatchObject({ success: true, - filename: 'file', + name: 'file', url: 'http://example.com/?filename=fake.pdf' }); }); @@ -320,7 +343,7 @@ describe('getFileContentFromLinks (external fetch)', () => { it('axios 抛错时返回失败结果,错误信息作为 content', async () => { mockAxiosGet.mockRejectedValue(new Error('network down')); - const result = await getFileContentFromLinks({ + const result = await parseFileContentFromUrls({ urls: ['http://example.com/x.pdf'], maxFiles: 20, teamId: 'team-1', @@ -330,14 +353,127 @@ describe('getFileContentFromLinks (external fetch)', () => { expect(mockAddRawTextBuffer).not.toHaveBeenCalled(); expect(result[0]).toMatchObject({ success: false, - filename: '', + name: '', url: 'http://example.com/x.pdf', content: 'network down' }); }); }); -describe('rewriteUserQueryWithFiles', () => { +describe('parseFileInfoFromUrls', () => { + beforeEach(() => { + vi.clearAllMocks(); + mockGetRawTextBuffer.mockResolvedValue(undefined); + mockIsInternalAddress.mockResolvedValue(false); + }); + + it('缓存命中时返回文件名,不下载文件内容', async () => { + mockGetRawTextBuffer.mockResolvedValue({ + filename: 'cached.pdf', + text: 'cached text' + }); + + const result = await parseFileInfoFromUrls({ + urls: ['/cached.pdf'], + maxFiles: 20, + teamId: 'team-1' + }); + + expect(mockGetRawTextBuffer).toHaveBeenCalledWith({ + sourceId: '/cached.pdf', + customPdfParse: false + }); + expect(mockAxiosGet).not.toHaveBeenCalled(); + expect(result).toEqual([ + { + success: true, + name: 'cached.pdf', + url: '/cached.pdf' + } + ]); + }); + + it('缓存未命中时只读取文件信息,并按 maxFiles 和 requestOrigin 处理 URL', async () => { + mockAxiosGet.mockResolvedValue({ + data: Buffer.from('payload'), + headers: { + 'content-disposition': 'attachment; filename="report.pdf"' + } + }); + + const result = await parseFileInfoFromUrls({ + urls: ['http://localhost:3000/report.pdf', '/skip.pdf'], + requestOrigin: 'http://localhost:3000', + maxFiles: 1, + teamId: 'team-1' + }); + + expect(mockAxiosGet).toHaveBeenCalledTimes(1); + expect(mockAxiosGet).toHaveBeenCalledWith('/report.pdf', { + baseURL: expect.any(String), + responseType: 'arraybuffer' + }); + expect(mockReadFileContentByBuffer).not.toHaveBeenCalled(); + expect(result).toEqual([ + { + success: true, + name: 'report.pdf', + url: '/report.pdf' + } + ]); + }); + + it('内部地址返回失败项,并跳过下载', async () => { + mockIsInternalAddress.mockResolvedValue(true); + + const result = await parseFileInfoFromUrls({ + urls: ['http://internal.svc/a.pdf'], + maxFiles: 20, + teamId: 'team-1' + }); + + expect(mockAxiosGet).not.toHaveBeenCalled(); + expect(result).toEqual([ + { + success: false, + name: '', + url: 'http://internal.svc/a.pdf' + } + ]); + }); + + it('读取文件信息失败时返回失败项', async () => { + mockAxiosGet.mockRejectedValue(new Error('network down')); + + const result = await parseFileInfoFromUrls({ + urls: ['http://example.com/a.pdf'], + maxFiles: 20, + teamId: 'team-1' + }); + + expect(result).toEqual([ + { + success: false, + name: '', + url: 'http://example.com/a.pdf' + } + ]); + }); + + it('过滤不支持的 URL 后不触发读取', async () => { + const result = await parseFileInfoFromUrls({ + urls: ['chat/a.pdf', '/image.png'], + maxFiles: 20, + teamId: 'team-1' + }); + + expect(mockGetRawTextBuffer).not.toHaveBeenCalled(); + expect(mockAxiosGet).not.toHaveBeenCalled(); + expect(result).toEqual([]); + }); +}); + +describe('formatUserQueryWithFiles', () => { beforeEach(() => { vi.clearAllMocks(); mockGetRawTextBuffer.mockImplementation(({ sourceId }: { sourceId: string }) => { @@ -358,39 +494,34 @@ describe('rewriteUserQueryWithFiles', () => { it('userQuery 不含文件时直接返回原 query', async () => { const userQuery: UserChatItemValueItemType[] = [{ text: { content: '只有文本' } }]; - const result = await rewriteUserQueryWithFiles({ - queryId: 'q1', + const parseFileFn = vi.fn(); + const result = await formatUserQueryWithFiles({ userQuery, - maxFiles: 20, - teamId: 'team-1', - tmbId: 'tmb-1' + parseFileFn }); - expect(mockGetRawTextBuffer).not.toHaveBeenCalled(); + expect(parseFileFn).not.toHaveBeenCalled(); expect(result).toBe(userQuery); }); - it('文件 URL 全部被标准化过滤后返回原 query', async () => { + it('parseFileFn 没有返回文件信息时返回原 query', async () => { const userQuery: UserChatItemValueItemType[] = [ { text: { content: '不应被改写' } }, { file: { type: ChatFileTypeEnum.file, name: 'bad.pdf', - // 不以 / http ws 开头,会被 normalizeReadableFileUrl 过滤掉 url: 'chat/bad.pdf' } } ]; - const result = await rewriteUserQueryWithFiles({ - queryId: 'q1', + const parseFileFn = vi.fn(async () => []); + const result = await formatUserQueryWithFiles({ userQuery, - maxFiles: 20, - teamId: 'team-1', - tmbId: 'tmb-1' + parseFileFn }); - expect(mockGetRawTextBuffer).not.toHaveBeenCalled(); + expect(parseFileFn).toHaveBeenCalledWith(['chat/bad.pdf']); expect(result).toBe(userQuery); }); @@ -405,18 +536,48 @@ describe('rewriteUserQueryWithFiles', () => { } } ]; - const result = await rewriteUserQueryWithFiles({ - queryId: 'q1', + const parseFileFn = vi.fn(); + const result = await formatUserQueryWithFiles({ userQuery, - maxFiles: 20, - teamId: 'team-1', - tmbId: 'tmb-1' + parseFileFn }); - expect(mockGetRawTextBuffer).not.toHaveBeenCalled(); + expect(parseFileFn).not.toHaveBeenCalled(); expect(result).toBe(userQuery); }); + it('把 parseFileFn 返回的 id、sandboxPath 和 content 注入到文本 prompt', async () => { + const parseFileFn = vi.fn(async () => [ + { + id: 'file-1', + name: 'a.pdf', + sandboxPath: 'user_files/a.pdf', + content: 'Alpha' + } + ]); + + const result = await formatUserQueryWithFiles({ + userQuery: [ + { text: { content: '总结这个文件' } }, + { + file: { + type: ChatFileTypeEnum.file, + name: 'a.pdf', + url: '/a.pdf' + } + } + ], + parseFileFn + }); + + const content = result[0].text?.content; + expect(content).toContain('总结这个文件'); + expect(content).toContain('file-1'); + expect(content).toContain('a.pdf'); + expect(content).toContain('user_files/a.pdf'); + expect(content).toContain('Alpha'); + }); + it('把历史和当前轮文件内容分别注入到所属 user message', async () => { const messages: ChatItemMiniType[] = [ createHumanMessage([ @@ -502,8 +663,8 @@ describe('rewriteUserQueryWithFiles', () => { }); it('同一条 user query 内重复 URL 不去重', async () => { - const result = await rewriteUserQueryWithFiles({ - queryId: 'q1', + const parseFileFn = createMockParseFileFn(); + const result = await formatUserQueryWithFiles({ userQuery: [ { text: { @@ -525,11 +686,10 @@ describe('rewriteUserQueryWithFiles', () => { } } ], - maxFiles: 20, - teamId: 'team-1', - tmbId: 'tmb-1' + parseFileFn }); + expect(parseFileFn).toHaveBeenCalledWith(['/a.pdf', '/a.pdf']); expect(mockGetRawTextBuffer).toHaveBeenCalledTimes(2); expect(mockGetRawTextBuffer).toHaveBeenNthCalledWith(1, { sourceId: '/a.pdf', diff --git a/packages/service/type/env.ts b/packages/service/type/env.ts index 017a57b694..7209fd284d 100644 --- a/packages/service/type/env.ts +++ b/packages/service/type/env.ts @@ -7,7 +7,6 @@ declare global { PRO_URL: string; LOG_DEPTH: string; DB_MAX_LINK: string; - FILE_TOKEN_KEY: string; STORAGE_VENDOR?: 'minio' | 'aws-s3' | 'cos' | 'oss'; STORAGE_PUBLIC_BUCKET?: string; diff --git a/pro b/pro index 1d38337167..41720ca13d 160000 --- a/pro +++ b/pro @@ -1 +1 @@ -Subproject commit 1d38337167baeed33ece061772c84b0ccce71333 +Subproject commit 41720ca13d5c9c85a6f135bb8c71567308ef3d96