Optimize the file storage structure of the knowledge base (#386)

2025-10-17 08:37:59 +00:00 · 2023-10-10 22:41:05 +08:00
parent 29d152784f
commit d0041a98b4
41 changed files with 591 additions and 231 deletions
--- a/docSite/content/docs/development/configuration.md
+++ b/docSite/content/docs/development/configuration.md
@@ -84,6 +84,14 @@ weight: 520
    "maxToken": 16000,
    "price": 0,
    "prompt": ""
+  },
+  "QGModel": { // 生成下一步指引模型
+    "model": "gpt-3.5-turbo",
+    "name": "GPT35-4k",
+    "maxToken": 4000,
+    "price": 0,
+    "prompt": "",
+    "functionCall": false
  }
 }
 ```
--- a/docSite/content/docs/development/design/_index.md
+++ b/docSite/content/docs/development/design/_index.md
@@ -0,0 +1,8 @@
+---
+weight: 540
+title: "设计方案"
+description: "FastGPT 部分设计方案"
+icon: public
+draft: false
+images: []
+---
--- a/docSite/content/docs/development/design/dataset.md
+++ b/docSite/content/docs/development/design/dataset.md
@@ -0,0 +1,25 @@
+---
+weight: 541
+title: "数据集"
+description: "FastGPT 数据集中文件与数据的设计方案"
+icon: dataset
+draft: false
+images: []
+---
+
+## 文件与数据的关系
+
+在 FastGPT 中，文件会通过 MongoDB 的 FS 存储，而具体的数据会通过 PostgreSQL 存储，PG 中的数据会有一列 file_id，关联对应的文件。考虑到旧版本的兼容，以及手动输入、标注数据等，我们给 file_id 增加了一些特殊的值，如下：
+
+- manual: 手动输入
+- mark: 手动标注的数据
+
+注意，file_id 仅在插入数据时会写入，变更时无法修改。
+
+## 文件导入流程
+
+1. 上传文件到 MongoDB 的 FS 中，获取 file_id，此时文件标记为 `unused` 状态
+2. 浏览器解析文件，获取对应的文本和 chunk
+3. 给每个 chunk 打上 file_id
+4. 点击上传数据：将文件的状态改为 `used`，并将数据推送到 mongo `training` 表中等待训练
+5. 由训练线程从 mongo 中取数据，并在获取向量后插入到 pg。
--- a/docSite/content/docs/installation/upgrading/447.md
+++ b/docSite/content/docs/installation/upgrading/447.md
@@ -0,0 +1,29 @@
+---
+title: 'V4.4.7'
+description: 'FastGPT V4.4.7 更新（需执行升级脚本）'
+icon: 'upgrade'
+draft: false
+toc: true
+weight: 840
+---
+
+## 执行初始化 API
+
+发起 1 个 HTTP 请求（{{rootkey}} 替换成环境变量里的`rootkey`，{{host}}替换成自己域名）
+
+1. https://xxxxx/api/admin/initv445
+
+```bash
+curl --location --request POST 'https://{{host}}/api/admin/initv447' \
+--header 'rootkey: {{rootkey}}' \
+--header 'Content-Type: application/json'
+```
+
+初始化 pg 索引以及将 file_id 中空对象转成 manual 对象。如果数据多，可能需要较长时间，可以通过日志查看进度。
+
+## 功能介绍
+
+### Fast GPT V4.4.7
+
+1. 优化了数据库文件 crud。
+2. 兼容链接读取，作为 source。