mirror of
https://github.com/labring/FastGPT.git
synced 2026-05-07 01:02:55 +08:00
4b24472106
* docs(i18n): translate batch 1 * docs(i18n): translate batch 2 * docs(i18n): translate batch 3 (20 files) - openapi/: app, share - faq/: all 8 files - use-cases/: index, external-integration (5 files), app-cases (4 files) Translated using North American style with natural, concise language. Preserved MDX syntax, code blocks, images, and component imports. * docs(i18n): translate protocol docs * docs(i18n): translate introduction docs (part 1) * docs(i18n): translate use-cases docs * docs(i18n): translate introduction docs (part 2 - batch 1) * docs(i18n): translate final 9 files * fix(i18n): fix YAML and MDX syntax errors in translated files - Add quotes to description with colon in submit_application_template.en.mdx - Remove duplicate Chinese content in translate-subtitle-using-gpt.en.mdx - Fix unclosed details tag issue * docs(i18n): translate all meta.json navigation files * fix(i18n): translate Chinese separators in meta.en.json files * translate * translate * i18n --------- Co-authored-by: archer <archer@archerdeMac-mini.local> Co-authored-by: archer <545436317@qq.com>
121 lines
3.7 KiB
Plaintext
121 lines
3.7 KiB
Plaintext
---
|
|
title: Integrating ChatGLM2-6B
|
|
description: Integrating the private ChatGLM2-6B model with FastGPT
|
|
---
|
|
|
|
import { Alert } from '@/components/docs/Alert';
|
|
|
|
## Introduction
|
|
|
|
FastGPT lets you use your own OpenAI API KEY to quickly call OpenAI APIs. It currently integrates GPT-3.5, GPT-4, and embedding models for building Knowledge Bases. However, for data security reasons, you may not want to send all data to cloud-based LLMs.
|
|
|
|
So how do you connect a private model to FastGPT? This guide walks through integrating Tsinghua's ChatGLM2 as an example.
|
|
|
|
## ChatGLM2-6B Overview
|
|
|
|
ChatGLM2-6B is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B. For details, see the [ChatGLM2-6B project page](https://github.com/THUDM/ChatGLM2-6B).
|
|
|
|
<Alert context="warning">
|
|
Note: ChatGLM2-6B weights are fully open for academic research. Commercial use requires official written permission. This tutorial only demonstrates one integration method and does not grant any license.
|
|
</Alert>
|
|
|
|
## Recommended Configuration
|
|
|
|
According to official data, generating 8192 tokens requires 12.8GB VRAM at FP16, 8.1GB at int8, and 5.1GB at int4. Quantization slightly affects performance, but not significantly.
|
|
|
|
Recommended configurations:
|
|
|
|
|
|
| Type | RAM | VRAM | Disk Space | Start Command |
|
|
|------|---------|---------|----------|--------------------------|
|
|
| fp16 | >=16GB | >=16GB | >=25GB | python openai_api.py 16 |
|
|
| int8 | >=16GB | >=9GB | >=25GB | python openai_api.py 8 |
|
|
| int4 | >=16GB | >=6GB | >=25GB | python openai_api.py 4 |
|
|
|
|
|
|
## Deployment
|
|
|
|
### Environment Requirements
|
|
|
|
- Python 3.8.10
|
|
- CUDA 11.8
|
|
- Network access to download models
|
|
|
|
### Source Code Deployment
|
|
|
|
1. Set up the environment as described above;
|
|
2. Download the [Python file](https://github.com/labring/FastGPT/blob/main/plugins/model/llm-ChatGLM2/openai_api.py)
|
|
3. Run `pip install -r requirements.txt`;
|
|
4. Open the Python file and configure the token in the `verify_token` method -- this adds a layer of authentication to prevent unauthorized access;
|
|
5. Run `python openai_api.py --model_name 16`. Choose the number based on the configuration table above.
|
|
|
|
Wait for the model to download and load. If you encounter errors, try asking GPT for help.
|
|
|
|
On successful startup, you should see an address like this:
|
|
|
|

|
|
|
|
> `http://0.0.0.0:6006` is the connection address.
|
|
|
|
### Docker Deployment
|
|
|
|
**Image and Port**
|
|
|
|
+ Image: `stawky/chatglm2:latest`
|
|
+ China mirror: `registry.cn-hangzhou.aliyuncs.com/fastgpt_docker/chatglm2:latest`
|
|
+ Port: 6006
|
|
|
|
```
|
|
# Set the security token (used as the channel key in OneAPI)
|
|
Default: sk-aaabbbcccdddeeefffggghhhiiijjjkkk
|
|
You can also set it via the environment variable: sk-key. Refer to Docker documentation for how to pass environment variables.
|
|
```
|
|
|
|
## Connect to One API
|
|
|
|
Add a channel for chatglm2 with the following parameters:
|
|
|
|

|
|
|
|
Here, chatglm2 is used as the language model.
|
|
|
|
## Test
|
|
|
|
curl example:
|
|
|
|
```bash
|
|
curl --location --request POST 'https://domain/v1/chat/completions' \
|
|
--header 'Authorization: Bearer sk-aaabbbcccdddeeefffggghhhiiijjjkkk' \
|
|
--header 'Content-Type: application/json' \
|
|
--data-raw '{
|
|
"model": "chatglm2",
|
|
"messages": [{"role": "user", "content": "Hello!"}]
|
|
}'
|
|
```
|
|
|
|
Set Authorization to sk-aaabbbcccdddeeefffggghhhiiijjjkkk. The model field should match the custom model name you entered in One API.
|
|
|
|
## Integrate with FastGPT
|
|
|
|
Edit the config.json file and add chatglm2 to `llmModels`:
|
|
|
|
```json
|
|
"llmModels": [
|
|
// Existing models
|
|
{
|
|
"model": "chatglm2",
|
|
"name": "chatglm2",
|
|
"maxContext": 4000,
|
|
"maxResponse": 4000,
|
|
"quoteMaxToken": 2000,
|
|
"maxTemperature": 1,
|
|
"vision": false,
|
|
"defaultSystemChatPrompt": ""
|
|
}
|
|
]
|
|
```
|
|
|
|
## Usage
|
|
|
|
Simply select chatglm2 as the model.
|