chanzhi82020
31c17999b8
This PR introduces evaluation support designed specifically to track and benchmark applications built on the FastGPT platform. ( #5476 )
...
- Adds a lightweight evaluation framework for app-level tracking and benchmarking.
- Changes: 28 files, +1455 additions, -66 deletions.
- Branch: add-evaluations -> main.
- PR: https://github.com/chanzhi82020/FastGPT/pull/1
Applications built on FastGPT need repeatable, comparable benchmarks to measure regressions, track improvements, and validate releases. This initial implementation provides the primitives to define evaluation scenarios, run them against app endpoints or model components, and persist results for later analysis.
I updated the PR description to emphasize that the evaluation system is targeted at FastGPT-built apps and expanded the explanation of the core pieces so reviewers understand the scope and intended use. The new description outlines the feature intent, core components, and how results are captured and aggregated for benchmarking.
- Evaluation definitions
- Define evaluation tasks that reference an app (app id, version, endpoint), test datasets or input cases, expected outputs (when applicable), and run configuration (parallelism, timeouts).
- Support for custom metric plugins so teams can add domain-specific measures.
- Runner / Executor
- Executes evaluation cases against app endpoints or internal model interfaces.
- Captures raw responses, response times, status codes, and any runtime errors.
- Computes per-case metrics (e.g., correctness, latency) immediately after each case run.
- Metrics & Aggregation
- Built-in metrics: accuracy/success rate, latency (p50/p90/p99), throughput, error rate.
- Aggregation produces per-run summaries and per-app historical summaries for trend analysis.
- Allows combining metrics into composite scores for high-level benchmarking.
- Persistence & Logging
- Stores run results, input/output pairs (when needed), timestamps, environment info, and app/version metadata so runs are reproducible and auditable.
- Logs are retained to facilitate debugging and root-cause analysis of regressions.
- Reporting & Comparison
- Produces aggregated reports suitable for CI gating, release notes, or dashboards.
- Supports comparing multiple app versions or deployments side-by-side.
- Extensibility & Integration
- Designed to plug into CI (automated runs on PRs or releases), dashboards, and downstream analysis tools.
- Easy to add new metrics, evaluators, or dataset connectors.
By centering the evaluation system on FastGPT apps, teams can benchmark full application behavior (not only raw model outputs), correlate metrics with deployment configurations, and make informed release decisions.
- Expand built-in metric suite (e.g., F1, BLEU/ROUGE where applicable), add dataset connectors, and provide example evaluation scenarios for sample apps.
- Integrate with CI pipelines and add basic dashboarding for trend visualization.
Related Issue: N/A
Co-authored-by: Archer <545436317@qq.com >
2025-09-16 15:20:59 +08:00
Archer
cb7d1a3205
perf: init shell ( #5651 )
...
* perf: init shell
* fix: tool run select
* border radius
2025-09-15 22:21:24 +08:00
Archer
2ed1545eb5
V4.12.4 features ( #5626 )
...
* fix: push again, user select option button and form input radio content overflow (#5601 )
* fix: push again, user select option button and form input radio content overflow
* fix: use useCallback instead of useMemo, fix unnecessary delete
* fix: Move the variable inside the component
* fix: do not pass valueLabel to MySelect
* ui
* del collection api adapt
* refactor: inherit permission (#5529 )
* refactor: permission update conflict check function
* refactor(permission): app collaborator update api
* refactor(permission): support app update collaborator
* feat: support fe permission conflict check
* refactor(permission): app permission
* refactor(permission): dataset permission
* refactor(permission): team permission
* chore: fe adjust
* fix: type error
* fix: audit pagiation
* fix: tc
* chore: initv4130
* fix: app/dataset auth logic
* chore: move code
* refactor(permission): remove selfPermission
* fix: mock
* fix: test
* fix: app & dataset auth
* fix: inherit
* test(inheritPermission): test syncChildrenPermission
* prompt editor add list plugin (#5620 )
* perf: search result (#5608 )
* fix: table size (#5598 )
* temp: list value
* backspace
* optimize code
---------
Co-authored-by: Archer <545436317@qq.com >
Co-authored-by: 伍闲犬 <whoeverimf5@gmail.com >
* fix: fe & member list (#5619 )
* chore: initv4130
* fix: MemberItemCard
* fix: MemberItemCard
* chore: fe adjust & init script
* perf: test code
* doc
* fix debug variables (#5617 )
* perf: search result (#5608 )
* fix: table size (#5598 )
* fix debug variables
* fix
---------
Co-authored-by: Archer <545436317@qq.com >
Co-authored-by: 伍闲犬 <whoeverimf5@gmail.com >
* perf: member ui
* fix: inherit bug (#5624 )
* refactor(permission): remove getClbsWithInfo, which is useless
* fix: app list privateApp
* fix: get infos
* perf(fe): remove delete icon when it is disable in MemberItemCard
* fix: dataset private dataset
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Archer <545436317@qq.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* perf: auto coupon
* chore: upgrade script & get infos avatar (#5625 )
* fix: get infos
* chore: initv4130
* feat: support WecomRobot publish, and fix AesKey can not save bug (#5526 )
* feat: resolve conflicts
* fix: add param 'show_publish_wecom'
* feat: abstract out WecomCrypto type
* doc: wecom robot document
* fix: solve instability in AI output
* doc: update some pictures
* feat: remove functions from request.ts to chat.ts and toolCall.ts
* doc: wecom robot doc update
* fix
* delete unused code
* doc: update version and prompt
* feat: remove wecom crypto, delete wecom code in workflow
* feat: delete unused codes
---------
Co-authored-by: heheer <zhiyu44@qq.com >
* remove test
* rename init shell
* feat: collection page store
* reload sandbox
* pysandbox
* remove log
* chore: remove useless code (#5629 )
* chore: remove useless code
* fix: checkConflict
* perf: support hidden type for RoleList
* fix: copy node
* update doc
* fix(permission): some bug (#5632 )
* fix: app/dataset list
* fix: inherit bug
* perf: del app;i18n;save chat
* fix: test
* i18n
* fix: sumper overflow return OwnerRoleVal (#5633 )
* remove invalid code
* fix: scroll
* fix: objectId
* update next
* update package
* object id
* mock redis
* feat: add redis append to resolve wecom stream response (#5643 )
* feat: resolve conflicts
* fix: add param 'show_publish_wecom'
* feat: abstract out WecomCrypto type
* doc: wecom robot document
* fix: solve instability in AI output
* doc: update some pictures
* feat: remove functions from request.ts to chat.ts and toolCall.ts
* doc: wecom robot doc update
* fix
* delete unused code
* doc: update version and prompt
* feat: remove wecom crypto, delete wecom code in workflow
* feat: delete unused codes
* feat: add redis append method
---------
Co-authored-by: heheer <zhiyu44@qq.com >
* cache per
* fix(test): init team sub when creating mocked user (#5646 )
* fix: button is not vertically centered (#5647 )
* doc
* fix: gridFs objectId (#5649 )
---------
Co-authored-by: Zeng Qingwen <143274079+fishwww-ww@users.noreply.github.com >
Co-authored-by: Finley Ge <32237950+FinleyGe@users.noreply.github.com >
Co-authored-by: heheer <heheer@sealos.io >
Co-authored-by: 伍闲犬 <whoeverimf5@gmail.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: heheer <zhiyu44@qq.com >
v4.12.4
2025-09-15 20:02:54 +08:00
dependabot[bot]
c8934e3d22
chore(deps): bump jsondiffpatch from 0.6.0 to 0.7.2 ( #5634 )
...
Bumps [jsondiffpatch](https://github.com/benjamine/jsondiffpatch ) from 0.6.0 to 0.7.2.
- [Release notes](https://github.com/benjamine/jsondiffpatch/releases )
- [Commits](https://github.com/benjamine/jsondiffpatch/commits )
---
updated-dependencies:
- dependency-name: jsondiffpatch
dependency-version: 0.7.2
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-14 13:16:30 +08:00
dependabot[bot]
1f0b62e895
chore(deps): bump axios in /plugins/webcrawler/SPIDER ( #5637 )
...
Bumps [axios](https://github.com/axios/axios ) from 1.8.2 to 1.12.0.
- [Release notes](https://github.com/axios/axios/releases )
- [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md )
- [Commits](https://github.com/axios/axios/compare/v1.8.2...v1.12.0 )
---
updated-dependencies:
- dependency-name: axios
dependency-version: 1.12.0
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-14 13:15:24 +08:00
Zeng Qingwen
768fb63b88
feat: scan QRCode auto redeem coupon ( #5616 )
...
* feat: scan QRCode auto redeem coupon
* feat: use hook to auto redeem, instead of modify auth
2025-09-12 09:54:36 +08:00
heheer
635e606ef1
fix http plugin chatconfig undefined ( #5621 )
...
* fix http plugin chatconfig undefined
* fix
2025-09-11 16:04:16 +08:00
伍闲犬
bdb89a3e0c
fix: table size ( #5598 )
2025-09-09 15:05:45 +08:00
Archer
25207c5060
perf: search result ( #5608 )
2025-09-09 14:06:31 +08:00
Deepturn
3fc163c5a0
Update 4101.mdx ( #5602 )
2025-09-08 20:07:20 +08:00
Deepturn
deced4c67f
Update sso.mdx ( #5603 )
2025-09-08 20:07:04 +08:00
Finley Ge
4e17fcfdda
fix: node card copy toolConfig ( #5605 )
2025-09-08 20:06:49 +08:00
Archer
c4632a2222
V4.12.3 document ( #5600 )
...
* doc
* doc
* perf: log
v4.12.3
2025-09-07 20:55:14 +08:00
Archer
3f9b0fa1d4
V4.12.3 features ( #5595 )
...
* refactor: remove ModelProviderIdType and update related types (#5549 )
* perf: model provider
* fix eval create split (#5570 )
* git rebase --continuedoc
* add more variable types (#5540 )
* variable types
* password
* time picker
* internal var
* file
* fix-test
* time select default value & range
* password & type render
* fix
* fix build
* fix
* move method
* split date select
* icon
* perf: variable code
* prompt editor add markdown plugin (#5556 )
* editor markdown
* fix build
* pnpm lock
* add props
* update code
* fix list
* editor ui
* fix variable reset (#5586 )
* perf: variables type code
* customize lexical indent (#5588 )
* perf: multiple selector
* perf: tab plugin
* doc
* refactor: update workflow constants to use ToolTypeEnum (#5491 )
* refactor: replace FlowNodeTemplateTypeEnum with string literals in workflow templates
* perf: tool type
---------
Co-authored-by: archer <545436317@qq.com >
* update doc
* fix: make table's row more natural while dragging it (#5596 )
* feat: add APIGetTemplate function and refactor template fetching logic (#5498 )
* feat: add APIGetTemplate function and refactor template fetching logic
* chore: adjust the code
* chore: update sdk
---------
Co-authored-by: FinleyGe <m13203533462@163.com >
* perf init system
* doc
* remove log
* remove i18n
* perf: variables render
---------
Co-authored-by: Ctrlz <143257420+ctrlz526@users.noreply.github.com >
Co-authored-by: heheer <heheer@sealos.io >
Co-authored-by: 伍闲犬 <whoeverimf5@gmail.com >
Co-authored-by: FinleyGe <m13203533462@163.com >
2025-09-07 14:41:48 +08:00
Archer
c747fc03ad
Update docs-deploy.yml ( #5594 )
2025-09-04 23:03:47 +08:00
Archer
0ede427a96
fix: var selector ( #5593 )
...
* fix: var selector
* doc
* doc time
* doc time
* doc time
* doc time
* doc time
* doc time
* doc time
v4.12.2-fix3
2025-09-04 22:46:19 +08:00
Archer
85ea117481
Fix workflow ( #5592 )
...
* fix: fileselector default
* fix: workflow run process
2025-09-04 20:22:35 +08:00
伍闲犬
9be1e591d3
fix: delete "Content-Length" while redirect request to pro api ( #5589 )
2025-09-04 14:17:07 +08:00
Archer
c67e645469
perf: login page ( #5571 )
2025-09-01 21:03:58 +08:00
Finley Ge
f41775fe56
fix: leave team ( #5554 )
...
* chore: fe copywrite
* feat: auto accept invitation when login
* perf: only show forbidden filter in sync mode
* chore: auto accept invitation
2025-09-01 20:33:29 +08:00
伍闲犬
76a03a8363
fix: incorrect popover position ( #5568 )
2025-09-01 11:52:20 +08:00
伍闲犬
6e8bf8c804
fix: favorite apps ui and permission; fix favorite settings' table row ( #5553 )
...
* fix: fix favorite apps ui and permission; fix favorite settings' table row
* fix: tag overflow
* chore: remove default value
2025-09-01 11:32:37 +08:00
Archer
42e249f30f
perf: rrf code ( #5559 )
v4.12.2-fix2
2025-08-29 01:32:08 +08:00
Archer
a952539875
perf: rrf code ( #5558 )
2025-08-29 01:24:19 +08:00
YeYuheng
e4756c76dd
rrf_weight ( #5551 )
...
Co-authored-by: xxYyh <xxyyh@xxYyhdeMacBook-Pro.local >
2025-08-29 00:54:29 +08:00
Finley Ge
486d791b94
fix: extract node can not extract when using tool-calling-able model. ( #5555 )
2025-08-28 18:09:33 +08:00
Deepturn
8e77060f9c
Update teamMode.mdx ( #5550 )
...
Synchronization mode cannot add users.
2025-08-27 16:59:57 +08:00
Archer
c4799df3fd
perf: workflow code ( #5548 )
...
* perf: workflow code
* add tool call limit
2025-08-27 11:45:46 +08:00
Finley Ge
610634e1a1
fix: mcp tool node hide the version selection ( #5547 )
2025-08-27 10:52:32 +08:00
Archer
4e194d64e1
Update doc ( #5545 )
...
* fix: app list
* fix: toolset conflict interactive node
* fix: doc
* update yml
2025-08-27 00:31:33 +08:00
Archer
324aaae769
fix: ai response test ( #5544 )
...
* fix: ai response test
* fix: skip edge check
* fix: app list
* fix: toolset conflict interactive node
* fix: username show
v4.12.2-fix
2025-08-27 00:08:22 +08:00
Archer
d2d4c76bd5
update doc ( #5543 )
2025-08-26 21:14:41 +08:00
Archer
4d7e0ed78f
fix: team avatar select ( #5542 )
v4.12.2
2025-08-26 18:27:32 +08:00
伍闲犬
3fb1ff2614
fix: read permission; incorrect name; redirect ( #5541 )
2025-08-26 17:57:59 +08:00
Archer
93e9cb675d
fix: oceanbase insert ( #5539 )
2025-08-26 17:29:42 +08:00
伍闲犬
4939271abb
fix: fix redirect timing while enableHome
is false; tweak UI ( #5538 )
2025-08-26 15:57:38 +08:00
Archer
2e2e919d1d
fix: chat navbar ( #5537 )
2025-08-26 15:15:21 +08:00
Archer
970476950a
update package ( #5535 )
2025-08-26 15:09:47 +08:00
Archer
7cd4a8b0bc
feat: Store pdfparse in local ( #5534 )
2025-08-26 14:35:39 +08:00
伍闲犬
3c2bf20666
feat: add switch to control if enable home
( #5531 )
2025-08-26 12:00:18 +08:00
Archer
3b25bf57c4
perf: search key refresh parentId ( #5530 )
2025-08-26 11:30:02 +08:00
Archer
830eb19055
feature: V4.12.2 ( #5525 )
...
* feat: favorite apps & quick apps with their own configuration (#5515 )
* chore: extract chat history and drawer; fix model selector
* feat: display favourite apps and make it configurable
* feat: favorite apps & quick apps with their own configuration
* fix: fix tab title and add loading state for searching
* fix: cascade delete favorite app and quick app while deleting relative app
* chore: make improvements
* fix: favourite apps ui
* fix: add permission for quick apps
* chore: fix permission & clear redundant code
* perf: chat home page code
* chatbox ui
* fix: 4.12.2-dev (#5520 )
* fix: add empty placeholder; fix app quick status; fix tag and layout
* chore: add tab query for the setting tabs
* chore: use `useConfirm` hook instead of `MyModal`
* remove log
* fix: fix modal padding (#5521 )
* perf: manage app
* feat: enhance model provider handling and update icon references (#5493 )
* perf: model provider
* sdk package
* refactor: create llm response (#5499 )
* feat: add LLM response processing functions, including the creation of stream-based and complete responses
* feat: add volta configuration for node and pnpm versions
* refactor: update LLM response handling and event structure in tool choice logic
* feat: update LLM response structure and integrate with tool choice logic
* refactor: clean up imports and remove unused streamResponse function in chat and toolChoice modules
* refactor: rename answer variable to answerBuffer for clarity in LLM response handling
* feat: enhance LLM response handling with tool options and integrate tools into chat and tool choice logic
* refactor: remove volta configuration from package.json
* refactor: reorganize LLM response types and ensure default values for token counts
* refactor: streamline LLM response handling by consolidating response structure and removing redundant checks
* refactor: enhance LLM response handling by consolidating tool options and streamlining event callbacks
* fix: build error
* refactor: update tool type definitions for consistency in tool handling
* feat: llm request function
* fix: ts
* fix: ts
* fix: ahook ts
* fix: variable name
* update lock
* ts version
* doc
* remove log
* fix: translation type
* perf: workflow status check
* fix: ts
* fix: prompt tool call
* fix: fix missing plugin interact window & make tag draggable (#5527 )
* fix: incorrect select quick apps state; filter apps type (#5528 )
* fix: usesafe translation
* perf: add quickapp modal
---------
Co-authored-by: 伍闲犬 <whoeverimf5@gmail.com >
Co-authored-by: Ctrlz <143257420+ctrlz526@users.noreply.github.com >
Co-authored-by: francis <zhichengfan18@gmail.com >
2025-08-25 19:19:43 +08:00
Mingcheng Su
d6af93074b
fix: increase MCP auth config value field maxLength ( #5523 )
...
Fixes #5229
2025-08-24 16:08:50 +08:00
Zeng Qingwen
b5169436cb
feat: new code block style in document ( #5468 )
...
* feat: new code block style in document
* feat: update fonts and icon in code block
* feat: dark theme color, delete fonts files, move copy icon to correct position
* style: code block more obvious in light and dark theme
2025-08-23 13:29:46 +08:00
Archer
d9e28a5b1a
Fix: document preview action ( #5524 )
...
* perf: tag popver color
* fix: action
2025-08-23 13:25:50 +08:00
dependabot[bot]
84127a31b9
chore(deps): bump mermaid from 10.9.3 to 10.9.4 ( #5522 )
...
Bumps [mermaid](https://github.com/mermaid-js/mermaid ) from 10.9.3 to 10.9.4.
- [Release notes](https://github.com/mermaid-js/mermaid/releases )
- [Commits](https://github.com/mermaid-js/mermaid/compare/v10.9.3...v10.9.4 )
---
updated-dependencies:
- dependency-name: mermaid
dependency-version: 10.9.4
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-23 13:08:41 +08:00
dependabot[bot]
1da44a1db1
chore(deps): bump sha.js in /plugins/webcrawler/SPIDER ( #5519 )
...
Bumps [sha.js](https://github.com/crypto-browserify/sha.js ) from 2.4.11 to 2.4.12.
- [Changelog](https://github.com/browserify/sha.js/blob/master/CHANGELOG.md )
- [Commits](https://github.com/crypto-browserify/sha.js/compare/v2.4.11...v2.4.12 )
---
updated-dependencies:
- dependency-name: sha.js
dependency-version: 2.4.12
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-23 13:08:26 +08:00
Archer
b6318aa35a
fix: version schema ref error ( #5518 )
2025-08-22 11:34:46 +08:00
Archer
95325346ff
perf: vector format ( #5516 )
...
* perf: vector format
* feat: embedding batch size
2025-08-22 10:18:24 +08:00
heheer
a92917c05f
fix team app template search ( #5514 )
2025-08-21 20:41:46 +08:00