Files
FastGPT/document/content/docs/self-host/design/dataset.en.mdx
T
Archer 87b0bca30c Doc (#6493)
* cloud doc

* doc refactor

* doc move

* seo

* remove doc

* yml

* doc

* fix: tsconfig

* fix: tsconfig
2026-03-03 17:39:47 +08:00

22 lines
1021 B
Plaintext

---
title: Dataset Design
description: FastGPT dataset file and data design
---
## Relationship Between Files and Data
In FastGPT, files are stored using MongoDB's GridFS, while the actual data is stored in PostgreSQL. Each row in PG has a `file_id` column that references the corresponding file. For backward compatibility and to support manual input and annotated data, `file_id` has some special values:
- manual: Manually entered data
- mark: Manually annotated data
Note: `file_id` is only written at data insertion time and cannot be modified afterward.
## File Import Process
1. Upload the file to MongoDB GridFS and obtain a `file_id`. The file is marked as `unused` at this point.
2. The browser parses the file to extract text and chunks.
3. Each chunk is tagged with the `file_id`.
4. Click upload: the file status changes to `used`, and the data is pushed to the mongo `training` collection to await processing.
5. The training thread pulls data from mongo, generates vectors, and inserts them into PG.