feat: vector integrationTest;feat: ob quantization (#6366)

* feat(vectordb): add OceanBase HNSW quantization (HNSW_SQ/HNSW_BQ) (#6348)

Support OceanBase vector index quantization via VECTOR_VQ_LEVEL:
- 32 (default): hnsw + inner_product
- 8: hnsw_sq + inner_product (2-3x memory savings)
- 1: hnsw_bq + cosine (~15x memory savings)

HNSW_BQ requires cosine distance per OceanBase docs.
Tested on OceanBase 4.3.5.5 (BP5).

Closes #6202

* feat: add test inclusion for vectorDB tests in vitest configuration (#6358)

* feat: add test inclusion for vectorDB tests in vitest configuration

* refactor: update vectorDB README and setup for environment configuration

- Enhanced README to clarify the use of factory pattern for vectorDB integration tests.
- Updated instructions for setting up environment variables from a local file.
- Removed obsolete PG integration test file and adjusted test execution instructions.
- Improved structure explanation for shared test data and factory functions.

* perf: integrationTest

* feat: vector integration

---------

Co-authored-by: ZHANG Yixin <hi.yixinz@gmail.com>
Co-authored-by: Jingchao <alswlx@gmail.com>
This commit is contained in:
Archer
2026-02-02 18:48:25 +08:00
committed by GitHub
parent 358109f556
commit 64f70a41c1
38 changed files with 758 additions and 163 deletions
@@ -0,0 +1,10 @@
VECTOR_VQ_LEVEL=32
# PG
PG_URL=postgresql://username:password@localhost:6001/postgres
# OceanBase 可以用云服务来测
# OCEANBASE_URL=mysql://root%40tenantname:tenantpassword@localhost:6005/mysql
# SeekDB vector database connection
SEEKDB_URL=mysql://root:seekdbpassword@127.0.0.1:6003/mysql
# Milvus vector database connection
MILVUS_ADDRESS=http://localhost:6002
MILVUS_TOKEN=
+42
View File
@@ -0,0 +1,42 @@
# 向量数据库集成测试
对 FastGPT 各向量库控制器(PGVector、后续 Oceanbase/Milvus)做真实环境下的集成测试,保证向量相关操作兼容和稳定。采用**工厂模式**:同一套数据集(fixtures)和同一套用例(factory)驱动 n 个向量库测试。
## 环境变量
测试环境变量由 **test/.env.test.local** 提供(不提交到 git)。请复制模板并填写:
```bash
cp test/.env.test.template test/.env.test.local
# 编辑 test/.env.test.local,填入 PG_URL 等
```
`setup.ts` 会在测试启动时读取 `test/.env.test.local` 并注入到 `process.env`
| 变量 | 说明 | 适用驱动 |
|------|------|----------|
| `PG_URL` | PostgreSQL + pgvector 连接串 | PgVectorCtrl |
| `OCEANBASE_URL` | Oceanbase 连接串(后续) | ObVectorCtrl |
| `MILVUS_ADDRESS` | Milvus 地址(后续) | MilvusCtrl |
未设置对应环境变量时,该驱动的集成测试会**整体跳过**,不会报错。
## 运行方式
在项目根目录执行:
```bash
# 仅运行单元测试(未配置 .env.test.local 或未设 PG_URL 时,vectorDB 集成测试会跳过)
pnpm test
# 运行所有向量库测试(包含 vectorDB 集成测试与相关单元测试)
pnpm test:vector
```
## 结构说明
- **fixtures.ts**:统一测试数据(`TEST_TEAM_ID``TEST_DATASET_ID``TEST_COLLECTION_ID`、1536 维 `TEST_VECTORS`),所有向量库共用。
- **factory.ts**:工厂函数 `runVectorDBTests(driver)`,同一套用例(init、insert、getVectorCount、embRecall、getVectorDataByTime、delete)供各驱动复用。
- **integration.test.ts**:注册各驱动(PG、后续 Oceanbase/Milvus),按 `driver.envKey` 决定是否跳过;每个驱动执行同一套 `runVectorDBTests(driver)`
新增向量库时:在 `integration.test.ts``drivers` 数组中增加一项(`name``envKey``createCtrl`),无需改 fixtures 或 factory。
@@ -0,0 +1,12 @@
// Load vector database environment variables before tests run
export default async function setup() {
console.log('Vector DB integration tests - environment loaded');
console.log('PG_URL configured:', Boolean(process.env.PG_URL));
console.log('OCEANBASE_URL configured:', Boolean(process.env.OCEANBASE_URL));
console.log('MILVUS_ADDRESS configured:', Boolean(process.env.MILVUS_ADDRESS));
console.log('SEEKDB_URL configured:', Boolean(process.env.SEEKDB_URL));
return async () => {
// Cleanup if needed
};
}
@@ -0,0 +1,11 @@
import { describe } from 'vitest';
import { MilvusCtrl } from '@fastgpt/service/common/vectorDB/milvus';
import { createVectorDBTestSuite } from '../testSuites';
const isEnabled = Boolean(process.env.MILVUS_ADDRESS);
const describePg = isEnabled ? describe : describe.skip;
describePg('Milvus Vector Integration', () => {
const vectorCtrl = new MilvusCtrl();
createVectorDBTestSuite(vectorCtrl);
});
@@ -0,0 +1,11 @@
import { describe } from 'vitest';
import { ObVectorCtrl } from '@fastgpt/service/common/vectorDB/oceanbase';
import { createVectorDBTestSuite } from '../testSuites';
const isEnabled = Boolean(process.env.OCEANBASE_URL);
const describePg = isEnabled ? describe : describe.skip;
describePg('Oceanbase Vector Integration', () => {
const vectorCtrl = new ObVectorCtrl({ type: 'oceanbase' });
createVectorDBTestSuite(vectorCtrl);
});
@@ -0,0 +1,11 @@
import { describe } from 'vitest';
import { PgVectorCtrl } from '@fastgpt/service/common/vectorDB/pg';
import { createVectorDBTestSuite } from '../testSuites';
const isEnabled = Boolean(process.env.PG_URL);
const describePg = isEnabled ? describe : describe.skip;
describePg('PG Vector Integration', () => {
const vectorCtrl = new PgVectorCtrl();
createVectorDBTestSuite(vectorCtrl);
});
@@ -0,0 +1,11 @@
import { describe } from 'vitest';
import { SeekVectorCtrl } from '@fastgpt/service/common/vectorDB/seekdb';
import { createVectorDBTestSuite } from '../testSuites';
const isEnabled = Boolean(process.env.SEEKDB_URL);
const describePg = isEnabled ? describe : describe.skip;
describePg('Seekdb Vector Integration', () => {
const vectorCtrl = new SeekVectorCtrl({ type: 'seekdb' });
createVectorDBTestSuite(vectorCtrl);
});
+4
View File
@@ -0,0 +1,4 @@
import { loadVectorDBEnv } from './utils';
// Load env before any modules that read process.env
loadVectorDBEnv({ envFileNames: ['.env.test.local'] });
+24
View File
@@ -0,0 +1,24 @@
export const VECTOR_DIM = 1536;
const buildBaseVector = () =>
Array.from({ length: VECTOR_DIM }, (_, index) => ((index % 10) + 1) / 100);
const baseVector = buildBaseVector();
export const TEST_VECTORS = [
baseVector,
baseVector.map((value) => value * 0.7),
baseVector.map((value) => value * 0.3)
];
export const QUERY_VECTOR = baseVector;
export const TEST_COLLECTION_IDS = ['col_1', 'col_2', 'col_3'];
export const createTestIds = () => {
const suffix = `${Date.now()}_${Math.random().toString(36).slice(2, 10)}`;
return {
teamId: `test_team`,
datasetId: `test_dataset_${suffix}`
};
};
+159
View File
@@ -0,0 +1,159 @@
import { beforeAll, describe, expect, test } from 'vitest';
import type { VectorControllerType } from '@fastgpt/service/common/vectorDB/type';
import { createTestIds, QUERY_VECTOR, TEST_COLLECTION_IDS, TEST_VECTORS } from './testData';
const insertTestVectors = async (
vectorCtrl: VectorControllerType,
teamId: string,
datasetId: string
) => {
const insertIds: string[] = [];
await Promise.all(
TEST_VECTORS.map(async (vector, index) => {
const { insertIds: ids } = await vectorCtrl.insert({
teamId,
datasetId,
collectionId: TEST_COLLECTION_IDS[index],
vectors: [vector]
});
insertIds.push(ids[0]);
})
);
await new Promise((resolve) => setTimeout(resolve, 500));
return insertIds;
};
const cleanupTestVectors = async (
vectorCtrl: VectorControllerType,
teamId: string,
datasetId: string
) => {
try {
await vectorCtrl.delete({
teamId,
datasetIds: [datasetId]
});
} catch (error) {
// Ignore cleanup errors
}
};
export const createVectorDBTestSuite = (vectorCtrl: VectorControllerType) => {
describe.sequential('vectorDB integration', () => {
beforeAll(async () => {
await vectorCtrl.init();
});
test('insert and count', async () => {
const { teamId, datasetId } = createTestIds();
const insertIds = await insertTestVectors(vectorCtrl, teamId, datasetId);
expect(insertIds).toHaveLength(TEST_VECTORS.length);
const count = await vectorCtrl.getVectorCount({ teamId, datasetId });
expect(count).toBe(TEST_VECTORS.length);
const collectionCount = await vectorCtrl.getVectorCount({
teamId,
datasetId,
collectionId: TEST_COLLECTION_IDS[0]
});
expect(collectionCount).toBe(1);
await cleanupTestVectors(vectorCtrl, teamId, datasetId);
});
test('embRecall returns results', async () => {
const { teamId, datasetId } = createTestIds();
await insertTestVectors(vectorCtrl, teamId, datasetId);
const { results } = await vectorCtrl.embRecall({
teamId,
datasetIds: [datasetId],
vector: QUERY_VECTOR,
limit: 3,
forbidCollectionIdList: []
});
expect(results.length).toBeGreaterThan(0);
expect(results.every((item) => TEST_COLLECTION_IDS.includes(item.collectionId))).toBe(true);
await cleanupTestVectors(vectorCtrl, teamId, datasetId);
});
test('embRecall respects forbidCollectionIdList', async () => {
const { teamId, datasetId } = createTestIds();
await insertTestVectors(vectorCtrl, teamId, datasetId);
const { results } = await vectorCtrl.embRecall({
teamId,
datasetIds: [datasetId],
vector: QUERY_VECTOR,
limit: 10,
forbidCollectionIdList: [TEST_COLLECTION_IDS[0]]
});
expect(results.length).toBeGreaterThan(0);
expect(results.every((item) => item.collectionId !== TEST_COLLECTION_IDS[0])).toBe(true);
await cleanupTestVectors(vectorCtrl, teamId, datasetId);
});
test('embRecall respects filterCollectionIdList', async () => {
const { teamId, datasetId } = createTestIds();
await insertTestVectors(vectorCtrl, teamId, datasetId);
const { results } = await vectorCtrl.embRecall({
teamId,
datasetIds: [datasetId],
vector: QUERY_VECTOR,
limit: 10,
forbidCollectionIdList: [],
filterCollectionIdList: [TEST_COLLECTION_IDS[1]]
});
expect(results.length).toBeGreaterThan(0);
expect(results.every((item) => item.collectionId === TEST_COLLECTION_IDS[1])).toBe(true);
await cleanupTestVectors(vectorCtrl, teamId, datasetId);
});
test('getVectorDataByTime returns data', async () => {
const { teamId, datasetId } = createTestIds();
const insertIds = await insertTestVectors(vectorCtrl, teamId, datasetId);
await new Promise((resolve) => setTimeout(resolve, 500));
const start = new Date(0);
const end = new Date(Date.now() + 600_000);
const data = await vectorCtrl.getVectorDataByTime(start, end);
const matchedIds = data
.filter((item) => item.teamId === teamId && item.datasetId === datasetId)
.map((item) => item.id);
expect(matchedIds.length).toBeGreaterThan(0);
expect(matchedIds).toEqual(expect.arrayContaining(insertIds));
await cleanupTestVectors(vectorCtrl, teamId, datasetId);
});
test('delete by idList removes vectors', async () => {
const { teamId, datasetId } = createTestIds();
const insertIds = await insertTestVectors(vectorCtrl, teamId, datasetId);
await vectorCtrl.delete({
teamId,
idList: insertIds.slice(0, 2)
});
const count = await vectorCtrl.getVectorCount({ teamId, datasetId });
expect(count).toBe(TEST_VECTORS.length - 2);
await cleanupTestVectors(vectorCtrl, teamId, datasetId);
});
});
};
+37
View File
@@ -0,0 +1,37 @@
import { existsSync, readFileSync } from 'fs';
import { resolve } from 'path';
type LoadVectorEnvOptions = {
envFileNames?: string[];
};
const parseEnvFile = (filePath: string) => {
const content = readFileSync(filePath, 'utf-8');
const lines = content.split('\n');
for (const rawLine of lines) {
const line = rawLine.trim();
if (!line || line.startsWith('#')) continue;
const separatorIndex = line.indexOf('=');
if (separatorIndex === -1) continue;
const key = line.slice(0, separatorIndex).trim();
const value = line.slice(separatorIndex + 1).trim();
if (!key || process.env[key]) continue;
process.env[key] = value;
}
};
export const loadVectorDBEnv = (options: LoadVectorEnvOptions = {}) => {
const envFileNames = options.envFileNames ?? ['.env.test.local'];
const baseDir = resolve(__dirname);
for (const envFileName of envFileNames) {
const filePath = resolve(baseDir, envFileName);
if (existsSync(filePath)) {
parseEnvFile(filePath);
}
}
};
@@ -0,0 +1,23 @@
import { resolve } from 'path';
import { defineConfig } from 'vitest/config';
export default defineConfig({
resolve: {
alias: {
'@': resolve(__dirname, '../../../projects/app/src'),
'@fastgpt': resolve(__dirname, '../../../packages'),
'@test': resolve(__dirname, '../..')
}
},
test: {
name: 'vectorDB',
root: resolve(__dirname),
setupFiles: './setup.ts',
include: ['**/*.test.ts'],
exclude: ['node_modules', 'dist'],
testTimeout: 60000,
hookTimeout: 60000,
fileParallelism: false,
reporters: ['verbose']
}
});
@@ -0,0 +1,159 @@
# 用于部署的 docker-compose 文件:
# - FastGPT 端口映射为 3000:3000
# - FastGPT-mcp-server 端口映射 3005:3000
# - 建议修改账密后再运行
version: '3.3'
services:
# pg DB
pgTest:
image: registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:0.8.0-pg15
container_name: pgTest
restart: always
ports:
- 6001:5432
networks:
- test
environment:
# 这里的配置只有首次运行生效。修改后,重启镜像是不会生效的。需要把持久化数据删除再重启,才有效果
- POSTGRES_USER=username
- POSTGRES_PASSWORD=password
- POSTGRES_DB=postgres
volumes:
- ./local/pg/data:/var/lib/postgresql/data
healthcheck:
test: ['CMD', 'pg_isready', '-U', 'username', '-d', 'postgres']
interval: 5s
timeout: 5s
retries: 10
# Vector DB
milvus-test-minio:
container_name: milvus-test-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
networks:
- testVector
volumes:
- ./local/milvus-minio:/minio_data
command: minio server /minio_data --console-address ":9001"
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:9000/minio/health/live']
interval: 30s
timeout: 20s
retries: 3
# milvus
milvus-test-etcd:
container_name: milvus-test-etcd
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
networks:
- testVector
volumes:
- ./local/milvus/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
healthcheck:
test: ['CMD', 'etcdctl', 'endpoint', 'health']
interval: 30s
timeout: 20s
retries: 3
milvus-test:
container_name: milvus-test-standalone
image: milvusdb/milvus:v2.4.3
command: ['milvus', 'run', 'standalone']
ports:
- 6002:19530
security_opt:
- seccomp:unconfined
environment:
ETCD_ENDPOINTS: milvus-test-etcd:2379
MINIO_ADDRESS: milvus-test-minio:9000
networks:
- test
- testVector
volumes:
- ./local/milvus/data:/var/lib/milvus
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:9091/healthz']
interval: 30s
start_period: 90s
timeout: 20s
retries: 3
depends_on:
- 'milvus-test-etcd'
- 'milvus-test-minio'
# Ob
ob-test:
image: oceanbase/oceanbase-ce:4.3.5-lts
container_name: ob-test
restart: always
ports: # 生产环境建议不要暴露
- 6005:2881
networks:
- test
environment:
# 这里的配置只有首次运行生效。修改后,重启镜像是不会生效的。需要把持久化数据删除再重启,才有效果
- OB_SYS_PASSWORD=obsyspassword
# 不同于传统数据库,OceanBase 数据库的账号包含更多字段,包括用户名、租户名和集群名。经典格式为"用户名@租户名#集群名"
# 比如用mysql客户端连接时,根据本文件的默认配置,应该指定 "-uroot@tenantname"
- OB_TENANT_NAME=tenantname
- OB_TENANT_PASSWORD=tenantpassword
# MODE分为MINI和NORMAL, 后者会最大程度使用主机资源
- MODE=MINI
- OB_SERVER_IP=127.0.0.1
# 更多环境变量配置见oceanbase官方文档: https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000002013494
volumes:
- ./local/ob/data:/root/ob
- ./local/ob/config:/root/.obd/cluster
configs:
- source: init_sql
target: /root/boot/init.d/init.sql
healthcheck:
# Use sys tenant for health check as tenantname may take long to initialize
# obclient -h127.0.0.1 -P2881 -uroot@sys -pobsyspassword -e "SELECT 1;"
test:
[
'CMD-SHELL',
'obclient -h$${OB_SERVER_IP} -P2881 -uroot@$${OB_TENANT_NAME} -p$${OB_TENANT_PASSWORD} -e "SELECT 1;"'
]
interval: 30s
timeout: 10s
retries: 1000
start_period: 60s
# Seekdb
seekdb-test:
image: oceanbase/seekdb:1.0.1.0-100000392025122619
container_name: seekdb-test
restart: always
ports: # 生产环境建议不要暴露
- 6003:2881
- 6004:2886
networks:
- test
environment:
# SeekDB 连接配置(兼容 MySQL 协议)
- ROOT_PASSWORD=seekdbpassword
# MODE分为MINI和NORMAL, 后者会最大程度使用主机资源
- MODE=MINI
volumes:
- ./local/seekdb/data:/var/lib/mysql
- ./local/seekdb/config:/etc/mysql/conf.d
healthcheck:
test: ['CMD', 'mysqladmin', 'ping', '-h', '127.0.0.1', '-P2881', '-uroot', '-pseekdbpassword']
interval: 30s
timeout: 10s
retries: 1000
start_period: 10s
networks:
test:
testVector:
configs:
init_sql:
name: init_sql
content: |
ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;