mirror of
https://github.com/labring/FastGPT.git
synced 2026-02-28 01:02:28 +08:00
* docs(i18n): translate batch 1 * docs(i18n): translate batch 2 * docs(i18n): translate batch 3 (20 files) - openapi/: app, share - faq/: all 8 files - use-cases/: index, external-integration (5 files), app-cases (4 files) Translated using North American style with natural, concise language. Preserved MDX syntax, code blocks, images, and component imports. * docs(i18n): translate protocol docs * docs(i18n): translate introduction docs (part 1) * docs(i18n): translate use-cases docs * docs(i18n): translate introduction docs (part 2 - batch 1) * docs(i18n): translate final 9 files * fix(i18n): fix YAML and MDX syntax errors in translated files - Add quotes to description with colon in submit_application_template.en.mdx - Remove duplicate Chinese content in translate-subtitle-using-gpt.en.mdx - Fix unclosed details tag issue * docs(i18n): translate all meta.json navigation files * fix(i18n): translate Chinese separators in meta.en.json files * translate * translate * i18n --------- Co-authored-by: archer <archer@archerdeMac-mini.local> Co-authored-by: archer <545436317@qq.com>
1306 lines
36 KiB
Plaintext
1306 lines
36 KiB
Plaintext
---
|
||
title: Dataset API
|
||
description: FastGPT OpenAPI Dataset API
|
||
---
|
||
|
||
| How to Get Dataset ID (datasetId) | How to Get Collection ID (collection_id) |
|
||
| ----------------------------- | ----------------------------------- |
|
||
|  |  |
|
||
|
||
## Create Training Order
|
||
|
||
<Tabs items={['Request Example','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
**New Example**
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/support/wallet/usage/createTrainingUsage' \
|
||
--header 'Authorization: Bearer {{apikey}}' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"datasetId": "Dataset ID",
|
||
"name": "Optional, custom order name, e.g.: Document Training-fastgpt.docx"
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
<Tab value="Response Example" >
|
||
|
||
data is the billId, which can be used for bill aggregation when adding dataset data.
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": "65112ab717c32018f4156361"
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
## Dataset
|
||
|
||
### Create a Dataset
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/create' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"parentId": null,
|
||
"type": "dataset",
|
||
"name":"测试",
|
||
"intro":"介绍",
|
||
"avatar": "",
|
||
"vectorModel": "text-embedding-ada-002",
|
||
"agentModel": "gpt-3.5-turbo-16k",
|
||
"vlmModel": "gpt-4.1"
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
|
||
- parentId - Parent ID for building directory structure. Usually can be null or omitted.
|
||
- type - `dataset` or `folder`, represents regular dataset or folder. If not provided, creates a regular dataset.
|
||
- name - Dataset name (required)
|
||
- intro - Description (optional)
|
||
- avatar - Avatar URL (optional)
|
||
- vectorModel - Vector model (recommended to leave empty, use system default)
|
||
- agentModel - Text processing model (recommended to leave empty, use system default)
|
||
- vlmModel - Image understanding model (recommended to leave empty, use system default)
|
||
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": "65abc9bd9d1448617cba5e6c"
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Get Dataset List
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/list?parentId=' \
|
||
--header 'Authorization: Bearer xxxx' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"parentId":""
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
<Tab value="Parameters">
|
||
|
||
<div>
|
||
- parentId - Parent ID. Pass empty string or null to get datasets in the root directory
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": [
|
||
{
|
||
"_id": "65abc9bd9d1448617cba5e6c",
|
||
"parentId": null,
|
||
"avatar": "",
|
||
"name": "测试",
|
||
"intro": "",
|
||
"type": "dataset",
|
||
"permission": "private",
|
||
"canWrite": true,
|
||
"isOwner": true,
|
||
"vectorModel": {
|
||
"model": "text-embedding-ada-002",
|
||
"name": "Embedding-2",
|
||
"charsPointsPrice": 0,
|
||
"defaultToken": 512,
|
||
"maxToken": 8000,
|
||
"weight": 100
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Get Dataset Details
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request GET 'http://localhost:3000/api/core/dataset/detail?id=6593e137231a2be9c5603ba7' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
```
|
||
|
||
</Tab>
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- id: Dataset ID
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": {
|
||
"_id": "6593e137231a2be9c5603ba7",
|
||
"parentId": null,
|
||
"teamId": "65422be6aa44b7da77729ec8",
|
||
"tmbId": "65422be6aa44b7da77729ec9",
|
||
"type": "dataset",
|
||
"status": "active",
|
||
"avatar": "/icon/logo.svg",
|
||
"name": "FastGPT test",
|
||
"vectorModel": {
|
||
"model": "text-embedding-ada-002",
|
||
"name": "Embedding-2",
|
||
"charsPointsPrice": 0,
|
||
"defaultToken": 512,
|
||
"maxToken": 8000,
|
||
"weight": 100
|
||
},
|
||
"agentModel": {
|
||
"model": "gpt-3.5-turbo-16k",
|
||
"name": "FastAI-16k",
|
||
"maxContext": 16000,
|
||
"maxResponse": 16000,
|
||
"charsPointsPrice": 0
|
||
},
|
||
"intro": "",
|
||
"permission": "private",
|
||
"updateTime": "2024-01-02T10:11:03.084Z",
|
||
"canWrite": true,
|
||
"isOwner": true
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Delete a Dataset
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request DELETE 'http://localhost:3000/api/core/dataset/delete?id=65abc8729d1448617cba5df6' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
```
|
||
|
||
</Tab>
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- id: Dataset ID
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": null
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
## Collection
|
||
|
||
### Common Creation Parameters (Must Read)
|
||
|
||
**Request**
|
||
|
||
| Parameter | Description | Required |
|
||
| ---------------- | ----------------------------------------------------------------------------------------------------------- | ---- |
|
||
| datasetId | Dataset ID | ✅ |
|
||
| parentId: | Parent ID. Defaults to root directory if not provided | |
|
||
| trainingType | Data processing method. chunk: split by text length; qa: Q&A extraction | ✅ |
|
||
| indexPrefixTitle | Whether to auto-generate title index | |
|
||
| customPdfParse | Whether to enable enhanced PDF parsing. Default false: disabled; true: enabled | |
|
||
| autoIndexes | Whether to auto-generate indexes (commercial version only) | |
|
||
| imageIndex | Whether to auto-generate image indexes (commercial version only) | |
|
||
| chunkSettingMode | Chunk parameter mode. auto: system default; custom: manual specification | |
|
||
| chunkSplitMode | Chunk split mode. size: split by length; char: split by character. Ineffective when chunkSettingMode=auto. | |
|
||
| chunkSize | Chunk size, default 1500. Ineffective when chunkSettingMode=auto. | |
|
||
| indexSize | Index size, default 512, must be less than index model max token. Ineffective when chunkSettingMode=auto. | |
|
||
| chunkSplitter | Custom highest priority split symbol. Won't split further unless exceeding file processing max context. Ineffective when chunkSettingMode=auto. | |
|
||
| qaPrompt | QA split prompt | |
|
||
| tags | Collection tags (string array) | |
|
||
| createTime | File creation time (Date / String) | |
|
||
|
||
**Response**
|
||
|
||
- collectionId - New collection ID
|
||
- insertLen:Number of inserted chunks
|
||
|
||
### Create an Empty Collection
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/create' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"datasetId":"6593e137231a2be9c5603ba7",
|
||
"parentId": null,
|
||
"name":"测试",
|
||
"type":"virtual",
|
||
"metadata":{
|
||
"test":111
|
||
}
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- datasetId: Dataset ID(Required)
|
||
- parentId: Parent ID. Defaults to root directory if not provided
|
||
- name: Collection name (required)
|
||
- type:
|
||
- folder:Folder
|
||
- virtual: Virtual collection (manual collection)
|
||
- metadata: Metadata (not currently used)
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
data is the collection ID.
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": "65abcd009d1448617cba5ee1"
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Create a Text Collection
|
||
|
||
Pass in text to create a collection. The text will be split accordingly.
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/create/text' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"text":"xxxxxxxx",
|
||
"datasetId":"6593e137231a2be9c5603ba7",
|
||
"parentId": null,
|
||
"name":"测试训练",
|
||
|
||
"trainingType": "qa",
|
||
"chunkSettingMode": "auto",
|
||
"qaPrompt":"",
|
||
|
||
"metadata":{}
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- text: Original text
|
||
- datasetId: Dataset ID(Required)
|
||
- parentId: Parent ID. Defaults to root directory if not provided
|
||
- name: Collection name (required)
|
||
- metadata: Metadata (not currently used)
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
data is the collection ID.
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": {
|
||
"collectionId": "65abcfab9d1448617cba5f0d",
|
||
"results": {
|
||
"insertLen": 5, // Split into how many segments
|
||
"overToken": [],
|
||
"repeat": [],
|
||
"error": []
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Create a Link Collection
|
||
|
||
Pass in a web link to create a collection. Content will be fetched from the webpage first, then split.
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/create/link' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"link":"https://doc.fastgpt.io/docs/course/quick-start/",
|
||
"datasetId":"6593e137231a2be9c5603ba7",
|
||
"parentId": null,
|
||
|
||
"trainingType": "chunk",
|
||
"chunkSettingMode": "auto",
|
||
"qaPrompt":"",
|
||
|
||
"metadata":{
|
||
"webPageSelector":".docs-content"
|
||
}
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- link: Web link
|
||
- datasetId: Dataset ID(Required)
|
||
- parentId: Parent ID. Defaults to root directory if not provided
|
||
- metadata.webPageSelector: Web page selector to specify which element to use as text (optional)
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
data is the collection ID.
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": {
|
||
"collectionId": "65abd0ad9d1448617cba6031",
|
||
"results": {
|
||
"insertLen": 1,
|
||
"overToken": [],
|
||
"repeat": [],
|
||
"error": []
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Create a File Collection
|
||
|
||
Pass in a file to create a collection. File content will be read and split. Currently supports: pdf, docx, md, txt, html, csv.
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
When uploading via code, note that Chinese filenames need to be encoded to avoid garbled text.
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/create/localFile' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--form 'file=@"C:\\Users\\user\\Desktop\\fastgpt测试File\\index.html"' \
|
||
--form 'data="{\"datasetId\":\"6593e137231a2be9c5603ba7\",\"parentId\":null,\"trainingType\":\"chunk\",\"chunkSize\":512,\"chunkSplitter\":\"\",\"qaPrompt\":\"\",\"metadata\":{}}"'
|
||
```
|
||
|
||
</Tab>
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
Use POST form-data format for upload. Contains file and data fields.
|
||
|
||
- file: File
|
||
- data: Dataset-related info (pass as serialized JSON). See "Common Creation Parameters" above
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
data is the collection ID.
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": {
|
||
"collectionId": "65abc044e4704bac793fbd81",
|
||
"results": {
|
||
"insertLen": 1,
|
||
"overToken": [],
|
||
"repeat": [],
|
||
"error": []
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Create an API Collection
|
||
|
||
Pass in a file ID to create a collection. File content will be read and split. Currently supports: pdf, docx, md, txt, html, csv.
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
When uploading via code, note that Chinese filenames need to be encoded to avoid garbled text.
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/create/apiCollection' \
|
||
--header 'Authorization: Bearer fastgpt-xxx' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"name": "A Quick Guide to Building a Discord Bot.pdf",
|
||
"apiFileId":"A Quick Guide to Building a Discord Bot.pdf",
|
||
|
||
"datasetId": "674e9e479c3503c385495027",
|
||
"parentId": null,
|
||
|
||
"trainingType": "chunk",
|
||
"chunkSize":512,
|
||
"chunkSplitter":"",
|
||
"qaPrompt":""
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
Use POST form-data format for upload. Contains file and data fields.
|
||
|
||
- name: Collection name, recommended to use filename, required.
|
||
- apiFileId: File ID, required.
|
||
- datasetId: Dataset ID(Required)
|
||
- parentId: Parent ID. Defaults to root directory if not provided
|
||
- trainingType:Training mode (required)
|
||
- chunkSize: Length of each chunk (optional). chunk mode: 100~3000; qa mode: 4000~model max token (16k models usually recommended not to exceed 10000)
|
||
- chunkSplitter: Custom highest priority split symbol (optional)
|
||
- qaPrompt: QA split custom prompt (optional)
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
data is the collection ID.
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": {
|
||
"collectionId": "65abc044e4704bac793fbd81",
|
||
"results": {
|
||
"insertLen": 1,
|
||
"overToken": [],
|
||
"repeat": [],
|
||
"error": []
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Create an External File Collection (Commercial)
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/proApi/core/dataset/collection/create/externalFileUrl' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"externalFileUrl":"https://image.xxxxx.com/fastgpt-dev/%E6%91%82.pdf",
|
||
"externalFileId":"1111",
|
||
"createTime": "2024-05-01T00:00:00.000Z",
|
||
"filename":"自定义File名.pdf",
|
||
"datasetId":"6642d105a5e9d2b00255b27b",
|
||
"parentId": null,
|
||
"tags": ["tag1","tag2"],
|
||
|
||
"trainingType": "chunk",
|
||
"chunkSize":512,
|
||
"chunkSplitter":"",
|
||
"qaPrompt":""
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
| Parameter | Description | Required |
|
||
| --------------- | ------------------------------------ | ---- |
|
||
| externalFileUrl | File access URL (can be temporary) | ✅ |
|
||
| externalFileId | External file ID | |
|
||
| filename | Custom filename with extension | |
|
||
| createTime | File creation time (Date or ISO string both ok) | |
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
data is the collection ID.
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": {
|
||
"collectionId": "6646fcedfabd823cdc6de746",
|
||
"results": {
|
||
"insertLen": 1,
|
||
"overToken": [],
|
||
"repeat": [],
|
||
"error": []
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Get Collection List
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/listV2' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"offset":0,
|
||
"pageSize": 10,
|
||
"datasetId":"6593e137231a2be9c5603ba7",
|
||
"parentId": null,
|
||
"searchText":""
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- offset: Offset
|
||
- pageSize: Items per page, max 30 (optional)
|
||
- datasetId: Dataset ID(Required)
|
||
- parentId: Parent ID (optional)
|
||
- searchText: Fuzzy search text (optional)
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": {
|
||
"list": [
|
||
{
|
||
"_id": "6593e137231a2be9c5603ba9",
|
||
"parentId": null,
|
||
"tmbId": "65422be6aa44b7da77729ec9",
|
||
"type": "virtual",
|
||
"name": "Manual entry",
|
||
"updateTime": "2099-01-01T00:00:00.000Z",
|
||
"dataAmount": 3,
|
||
"trainingAmount": 0,
|
||
"externalFileId": "1111",
|
||
"tags": ["11", "测试的"],
|
||
"forbid": false,
|
||
"trainingType": "chunk",
|
||
"permission": {
|
||
"value": 4294967295,
|
||
"isOwner": true,
|
||
"hasManagePer": true,
|
||
"hasWritePer": true,
|
||
"hasReadPer": true
|
||
}
|
||
},
|
||
{
|
||
"_id": "65abd0ad9d1448617cba6031",
|
||
"parentId": null,
|
||
"tmbId": "65422be6aa44b7da77729ec9",
|
||
"type": "link",
|
||
"name": "快速上手 | FastGPT",
|
||
"rawLink": "https://doc.fastgpt.io/docs/course/quick-start/",
|
||
"updateTime": "2024-01-20T13:54:53.031Z",
|
||
"dataAmount": 3,
|
||
"trainingAmount": 0,
|
||
"externalFileId": "222",
|
||
"tags": ["测试的"],
|
||
"forbid": false,
|
||
"trainingType": "chunk",
|
||
"permission": {
|
||
"value": 4294967295,
|
||
"isOwner": true,
|
||
"hasManagePer": true,
|
||
"hasWritePer": true,
|
||
"hasReadPer": true
|
||
}
|
||
}
|
||
],
|
||
"total": 93
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Get Collection Details
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request GET 'http://localhost:3000/api/core/dataset/collection/detail?id=65abcfab9d1448617cba5f0d' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- id: Collection ID
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": {
|
||
"_id": "65abcfab9d1448617cba5f0d",
|
||
"parentId": null,
|
||
"teamId": "65422be6aa44b7da77729ec8",
|
||
"tmbId": "65422be6aa44b7da77729ec9",
|
||
"datasetId": {
|
||
"_id": "6593e137231a2be9c5603ba7",
|
||
"parentId": null,
|
||
"teamId": "65422be6aa44b7da77729ec8",
|
||
"tmbId": "65422be6aa44b7da77729ec9",
|
||
"type": "dataset",
|
||
"status": "active",
|
||
"avatar": "/icon/logo.svg",
|
||
"name": "FastGPT test",
|
||
"vectorModel": "text-embedding-ada-002",
|
||
"agentModel": "gpt-3.5-turbo-16k",
|
||
"intro": "",
|
||
"permission": "private",
|
||
"updateTime": "2024-01-02T10:11:03.084Z"
|
||
},
|
||
"type": "virtual",
|
||
"name": "测试训练",
|
||
"trainingType": "qa",
|
||
"chunkSize": 8000,
|
||
"chunkSplitter": "",
|
||
"qaPrompt": "11",
|
||
"rawTextLength": 40466,
|
||
"hashRawText": "47270840614c0cc122b29daaddc09c2a48f0ec6e77093611ab12b69cba7fee12",
|
||
"createTime": "2024-01-20T13:50:35.838Z",
|
||
"updateTime": "2024-01-20T13:50:35.838Z",
|
||
"canWrite": true,
|
||
"sourceName": "测试训练"
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Update Collection Info
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
**Update Collection Info by Collection ID**
|
||
|
||
```bash
|
||
curl --location --request PUT 'http://localhost:3000/api/core/dataset/collection/update' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"id":"65abcfab9d1448617cba5f0d",
|
||
"parentId": null,
|
||
"name": "测2222试",
|
||
"tags": ["tag1", "tag2"],
|
||
"forbid": false,
|
||
"createTime": "2024-01-01T00:00:00.000Z"
|
||
}'
|
||
```
|
||
|
||
**Update Collection Info by External File ID**, Just replace id with datasetId and externalFileId.
|
||
|
||
```bash
|
||
curl --location --request PUT 'http://localhost:3000/api/core/dataset/collection/update' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"datasetId":"6593e137231a2be9c5603ba7",
|
||
"externalFileId":"1111",
|
||
"parentId": null,
|
||
"name": "测2222试",
|
||
"tags": ["tag1", "tag2"],
|
||
"forbid": false,
|
||
"createTime": "2024-01-01T00:00:00.000Z"
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- id: Collection ID
|
||
- parentId: Update parent ID (optional)
|
||
- name: Update collection name (optional)
|
||
- tags: Update collection tags (optional)
|
||
- forbid: Update collection disabled status (optional)
|
||
- createTime: Update collection creation time (optional)
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": null
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Delete a Collection
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/collection/delete' \
|
||
--header 'Authorization: Bearer fastgpt-' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"collectionIds": ["65a8cdcb0d70d3de0bf08d0a"]
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- collectionIds: Collection ID list
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": null
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
## Data
|
||
|
||
### Data Structure
|
||
|
||
**Data Structure**
|
||
|
||
| Field | Type | Description | Required |
|
||
| ------------- | ------- | -------- | ---- |
|
||
| teamId | String | Team ID | ✅ |
|
||
| tmbId | String | Member ID | ✅ |
|
||
| datasetId | String | Dataset ID | ✅ |
|
||
| collectionId | String | CollectionID | ✅ |
|
||
| q | String | Primary data | ✅ |
|
||
| a | String | Auxiliary data | ✖ |
|
||
| fullTextToken | String | Tokenization | ✖ |
|
||
| indexes | Index[] | Vector indexes | ✅ |
|
||
| updateTime | Date | Update time | ✅ |
|
||
| chunkIndex | Number | Chunk index | ✖ |
|
||
|
||
**Index Structure**
|
||
|
||
Maximum 5 custom indexes per data group
|
||
|
||
| Field | Type | Description | Required |
|
||
| ------ | ------ | ------------------------------------------------------------------------------------------------------ | ---- |
|
||
| type | String | Optional index types: default-default index; custom-custom index; summary-summary index; question-question index; image-image index | |
|
||
| dataId | String | Associated vector ID. Pass this ID when updating data for incremental updates instead of full updates | |
|
||
| text | String | Text content | ✅ |
|
||
|
||
`type` If not provided, defaults to `custom` index. A default index will also be created based on q/a. If a default index is provided, no additional one will be created.
|
||
|
||
### Batch Add Data to Collection
|
||
|
||
Note: Maximum 200 data groups per push.
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/data/pushData' \
|
||
--header 'Authorization: Bearer apikey' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"collectionId": "64663f451ba1676dbdef0499",
|
||
"trainingType": "chunk",
|
||
"prompt": "Optional. QA split guide prompt, ignored in chunk mode",
|
||
"billId": "可选。如果有这个值,本次的Data会被聚合到一个订单中,这个值可以重复使用。可以参考 [Create Training Order] 获取该值。",
|
||
"data": [
|
||
{
|
||
"q": "Who are you?",
|
||
"a": "I'm FastGPT Assistant"
|
||
},
|
||
{
|
||
"q": "What can you do?",
|
||
"a": "I can do anything",
|
||
"indexes": [
|
||
{
|
||
"text":"Custom index 1"
|
||
},
|
||
{
|
||
"text":"Custom index 2"
|
||
}
|
||
]
|
||
}
|
||
]
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- collectionId: Collection ID (required)
|
||
- trainingType:Training mode (required)
|
||
- prompt: Custom QA split prompt. Must follow template strictly. Recommended not to pass. (optional)
|
||
- data:(Specific data)
|
||
|
||
- q: Primary data(Required)
|
||
- a: Auxiliary data (optional)
|
||
- indexes: Custom indexes (optional). Can omit or pass empty array. By default, an index will be created from q and a.
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"data": {
|
||
"insertLen": 1, // Final number of successful insertions
|
||
"overToken": [], // Exceeding token
|
||
"repeat": [], // Number of duplicates
|
||
"error": [] // Other errors
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="QA Prompt Template" >
|
||
|
||
[theme] content can be replaced with the data theme. Default: They may contain multiple theme contents
|
||
|
||
```
|
||
I'll give you a text, [theme], learn it, and organize the learning results, requirements:
|
||
1. Propose up to 25 questions.
|
||
2. Provide answers to each question.
|
||
3. Answers should be detailed and complete, and can include plain text, links, code, tables, formulas, media links, and other markdown elements.
|
||
4. Return multiple questions and answers in format:
|
||
|
||
Q1: Question.
|
||
A1: Answer.
|
||
Q2:
|
||
A2:
|
||
……
|
||
|
||
My text:"""{{text}}"""
|
||
```
|
||
|
||
</Tab>
|
||
|
||
</Tabs>
|
||
|
||
### Get Collection Data List
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/data/v2/list' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"offset": 0,
|
||
"pageSize": 10,
|
||
"collectionId":"65abd4ac9d1448617cba6171",
|
||
"searchText":""
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- offset: Offset (optional)
|
||
- pageSize: Items per page, max 30 (optional)
|
||
- collectionId: Collection ID(Required)
|
||
- searchText: Fuzzy search term (optional)
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": {
|
||
"list": [
|
||
{
|
||
"_id": "65abd4b29d1448617cba61db",
|
||
"datasetId": "65abc9bd9d1448617cba5e6c",
|
||
"collectionId": "65abd4ac9d1448617cba6171",
|
||
"q": "N o . 2 0 2 2 1 2中 国 信 息 通 信 研 究 院京东探索研究院2022年 9月人工智能生成内容(AIGC)白皮书(2022 年)版权声明本白皮书版权属于中国信息通信研究院和京东探索研究院,并受法律保护。转载、摘编或利用其它方式使用本白皮书文字or观点的,应注明“来源:中国信息通信研究院和京东探索研究院”。违反上述声明者,编者将追究其相关法律责任。前 言习近平总书记曾指出,“数字技术正以新理念、新业态、新模式全面融入人类经济、政治、文化、社会、生态文明建设各领域和全过程”。在当前数字世界和物理世界加速融合的大背景下,人工智能生成内容(Artificial Intelligence Generated Content,简称 AIGC)正在悄然引导着一场深刻的变革,重塑甚至颠覆数字内容的生产方式和消费模式,将极大地丰富人们的数字生活,是未来全面迈向数字文明新时代不可或缺的支撑力量。",
|
||
"a": "",
|
||
"chunkIndex": 0
|
||
},
|
||
{
|
||
"_id": "65abd4b39d1448617cba624d",
|
||
"datasetId": "65abc9bd9d1448617cba5e6c",
|
||
"collectionId": "65abd4ac9d1448617cba6171",
|
||
"q": "本白皮书重点从 AIGC 技术、应用和治理等维度进行了阐述。在技术层面,梳理提出了 AIGC 技术体系,既涵盖了对现实世界各种内容的数字化呈现和增强,也包括了基于人工智能的自主内容创作。在应用层面,重点分析了 AIGC 在传媒、电商、影视等行业和场景的应用情况,探讨了以虚拟数字人、写作机器人等为代表的新业态和新应用。在治理层面,从政策监管、技术能力、企业应用等视角,分析了AIGC 所暴露出的版权纠纷、虚假信息传播等各种Question.最后,从政府、行业、企业、社会等层面,给出了 AIGC 发展和治理建议。由于人工智能仍处于飞速发展阶段,我们对 AIGC 的认识还有待进一步深化,白皮书中存在不足之处,敬请大家批评指正。目 录一、 人工智能生成内容的发展历程与概念.............................................................. 1(一)AIGC 历史沿革 .......................................................................................... 1(二)AIGC 的概念与内涵 .................................................................................. 4二、人工智能生成内容的技术体系及其演进方向.................................................... 7(一)AIGC 技术升级步入深化阶段 .................................................................. 7(二)AIGC 大模型架构潜力凸显 .................................................................... 10(三)AIGC 技术演化出三大前沿能力 ............................................................ 18三、人工智能生成内容的应用场景.......................................................................... 26(一)AIGC+传媒:人机协同生产,",
|
||
"a": "",
|
||
"chunkIndex": 1
|
||
}
|
||
],
|
||
"total": 63
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Get Single Data Details
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request GET 'http://localhost:3000/api/core/dataset/data/detail?id=65abd4b29d1448617cba61db' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- id: Data ID
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": {
|
||
"id": "65abd4b29d1448617cba61db",
|
||
"q": "N o . 2 0 2 2 1 2中 国 信 息 通 信 研 究 院京东探索研究院2022年 9月人工智能生成内容(AIGC)白皮书(2022 年)版权声明本白皮书版权属于中国信息通信研究院和京东探索研究院,并受法律保护。转载、摘编或利用其它方式使用本白皮书文字or观点的,应注明“来源:中国信息通信研究院和京东探索研究院”。违反上述声明者,编者将追究其相关法律责任。前 言习近平总书记曾指出,“数字技术正以新理念、新业态、新模式全面融入人类经济、政治、文化、社会、生态文明建设各领域和全过程”。在当前数字世界和物理世界加速融合的大背景下,人工智能生成内容(Artificial Intelligence Generated Content,简称 AIGC)正在悄然引导着一场深刻的变革,重塑甚至颠覆数字内容的生产方式和消费模式,将极大地丰富人们的数字生活,是未来全面迈向数字文明新时代不可或缺的支撑力量。",
|
||
"a": "",
|
||
"chunkIndex": 0,
|
||
"indexes": [
|
||
{
|
||
"type": "default",
|
||
"dataId": "3720083",
|
||
"text": "N o . 2 0 2 2 1 2中 国 信 息 通 信 研 究 院京东探索研究院2022年 9月人工智能生成内容(AIGC)白皮书(2022 年)版权声明本白皮书版权属于中国信息通信研究院和京东探索研究院,并受法律保护。转载、摘编或利用其它方式使用本白皮书文字or观点的,应注明“来源:中国信息通信研究院和京东探索研究院”。违反上述声明者,编者将追究其相关法律责任。前 言习近平总书记曾指出,“数字技术正以新理念、新业态、新模式全面融入人类经济、政治、文化、社会、生态文明建设各领域和全过程”。在当前数字世界和物理世界加速融合的大背景下,人工智能生成内容(Artificial Intelligence Generated Content,简称 AIGC)正在悄然引导着一场深刻的变革,重塑甚至颠覆数字内容的生产方式和消费模式,将极大地丰富人们的数字生活,是未来全面迈向数字文明新时代不可或缺的支撑力量。",
|
||
"_id": "65abd4b29d1448617cba61dc"
|
||
}
|
||
],
|
||
"datasetId": "65abc9bd9d1448617cba5e6c",
|
||
"collectionId": "65abd4ac9d1448617cba6171",
|
||
"sourceName": "中文-AIGC白皮书2022.pdf",
|
||
"sourceId": "65abd4ac9d1448617cba6166",
|
||
"isOwner": true,
|
||
"canWrite": true
|
||
}
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Update Single Data
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request PUT 'http://localhost:3000/api/core/dataset/data/update' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"dataId":"65abd4b29d1448617cba61db",
|
||
"q":"Test 111",
|
||
"a":"sss",
|
||
"indexes":[
|
||
{
|
||
"dataId": "xxxx",
|
||
"type": "default",
|
||
"text": "Default index"
|
||
},
|
||
{
|
||
"dataId": "xxx",
|
||
"type": "custom",
|
||
"text": "旧的Custom index 1"
|
||
},
|
||
{
|
||
"type":"custom",
|
||
"text":"New custom index"
|
||
}
|
||
]
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- dataId: Data ID
|
||
- q: Primary data (optional)
|
||
- a: Auxiliary data (optional)
|
||
- indexes: Custom indexes (optional). See `Batch Add Data to Collection` for types. If custom indexes exist when created,
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": null
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Delete Single Data
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request DELETE 'http://localhost:3000/api/core/dataset/data/delete?id=65abd4b39d1448617cba624d' \
|
||
--header 'Authorization: Bearer {{authorization}}' \
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- id: Data ID
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"message": "",
|
||
"data": "success"
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
## Search Test
|
||
|
||
<Tabs items={['Request Example','Parameters','Response Example']}>
|
||
<Tab value="Request Example" >
|
||
|
||
```bash
|
||
curl --location --request POST 'http://localhost:3000/api/core/dataset/searchTest' \
|
||
--header 'Authorization: Bearer fastgpt-xxxxx' \
|
||
--header 'Content-Type: application/json' \
|
||
--data-raw '{
|
||
"datasetId": "Dataset ID",
|
||
"text": "Who is the director",
|
||
"limit": 5000,
|
||
"similarity": 0,
|
||
"searchMode": "embedding",
|
||
"usingReRank": false,
|
||
|
||
"datasetSearchUsingExtensionQuery": true,
|
||
"datasetSearchExtensionModel": "gpt-5",
|
||
"datasetSearchExtensionBg": ""
|
||
}'
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Parameters" >
|
||
|
||
<div>
|
||
- datasetId - Dataset ID
|
||
- text - Text to test
|
||
- limit - Maximum tokens
|
||
- similarity - Minimum similarity (0~1, optional)
|
||
- searchMode - Search mode: embedding | fullTextRecall | mixedRecall
|
||
- usingReRank - Use rerank
|
||
- datasetSearchUsingExtensionQuery - Use query extension
|
||
- datasetSearchExtensionModel - Query extension model
|
||
- datasetSearchExtensionBg - Query extension background description
|
||
</div>
|
||
|
||
</Tab>
|
||
|
||
<Tab value="Response Example" >
|
||
|
||
Returns top k results. limit is the maximum tokens, up to 20000 tokens.
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"statusText": "",
|
||
"data": [
|
||
{
|
||
"id": "65599c54a5c814fb803363cb",
|
||
"q": "你是谁",
|
||
"a": "I'm FastGPT Assistant",
|
||
"datasetId": "6554684f7f9ed18a39a4d15c",
|
||
"collectionId": "6556cd795e4b663e770bb66d",
|
||
"sourceName": "GBT 15104-2021 装饰单板贴面人造板.pdf",
|
||
"sourceId": "6556cd775e4b663e770bb65c",
|
||
"score": 0.8050316572189331
|
||
},
|
||
......
|
||
]
|
||
}
|
||
```
|
||
|
||
</Tab>
|
||
</Tabs>
|