Files
FastGPT/document/content/docs/introduction/development/design/dataset.mdx
Archer fe7abf22a9 New document (#5299)
* add new doc (#5175)

Co-authored-by: dreamer6680 <146868355@qq.com>

* Test docs (#5235)

* fix: change the page of doc

* chore: add new dependencies, update global styles/layout, optimize docs, add Feishu & GitHub icons, update API examples

* fix: docs/index 404 not found

* Update environment variable names, optimize styles, add new API routes, fix component styles, adjust documentation, and update GitHub and Feishu icons

* update readme

* feat: add a linkfastgpt compontent

* feat: update new doc

* fix:remove unuse page and redirect homepage to docs (#5288)

* fix:remove some unuse doc

* fix: redirect homepage to doc

* git ignore

* fix:navbar to index (#5295)

* sidbar

* fix: navtab unlight (#5298)

* doc

---------

Co-authored-by: dreamer6680 <1468683855@qq.com>
Co-authored-by: dreamer6680 <146868355@qq.com>
2025-07-23 21:35:03 +08:00

22 lines
951 B
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: 数据集
description: FastGPT 数据集中文件与数据的设计方案
---
## 文件与数据的关系
在 FastGPT 中,文件会通过 MongoDB 的 FS 存储,而具体的数据会通过 PostgreSQL 存储PG 中的数据会有一列 file_id关联对应的文件。考虑到旧版本的兼容以及手动输入、标注数据等我们给 file_id 增加了一些特殊的值,如下:
- manual: 手动输入
- mark: 手动标注的数据
注意file_id 仅在插入数据时会写入,变更时无法修改。
## 文件导入流程
1. 上传文件到 MongoDB 的 FS 中,获取 file_id此时文件标记为 `unused` 状态
2. 浏览器解析文件,获取对应的文本和 chunk
3. 给每个 chunk 打上 file_id
4. 点击上传数据:将文件的状态改为 `used`,并将数据推送到 mongo `training` 表中等待训练
5. 由训练线程从 mongo 中取数据,并在获取向量后插入到 pg。