Files
FastGPT/packages/web/i18n/en/dataset.json
Archer c30f069f2f V4.9.11 feature (#4969)
* Feat: Images dataset collection (#4941)

* New pic (#4858)

* 更新数据集相关类型,添加图像文件ID和预览URL支持;优化数据集导入功能,新增图像数据集处理组件;修复部分国际化文本;更新文件上传逻辑以支持新功能。

* 与原先代码的差别

* 新增 V4.9.10 更新说明,支持 PG 设置`systemEnv.hnswMaxScanTuples`参数,优化 LLM stream 调用超时,修复全文检索多知识库排序问题。同时更新数据集索引,移除 datasetId 字段以简化查询。

* 更换成fileId_image逻辑,并增加训练队列匹配的逻辑

* 新增图片集合判断逻辑,优化预览URL生成流程,确保仅在数据集为图片集合时生成预览URL,并添加相关日志输出以便调试。

* Refactor Docker Compose configuration to comment out exposed ports for production environments, update image versions for pgvector, fastgpt, and mcp_server, and enhance Redis service with a health check. Additionally, standardize dataset collection labels in constants and improve internationalization strings across multiple languages.

* Enhance TrainingStates component by adding internationalization support for the imageParse training mode and update defaultCounts to include imageParse mode in trainingDetail API.

* Enhance dataset import context by adding additional steps for image dataset import process and improve internationalization strings for modal buttons in the useEditTitle hook.

* Update DatasetImportContext to conditionally render MyStep component based on data source type, improving the import process for non-image datasets.

* Refactor image dataset handling by improving internationalization strings, enhancing error messages, and streamlining the preview URL generation process.

* 图片上传到新建的 dataset_collection_images 表,逻辑跟随更改

* 修改了除了controller的其他部分问题

* 把图片数据集的逻辑整合到controller里面

* 补充i18n

* 补充i18n

* resolve评论:主要是上传逻辑的更改和组件复用

* 图片名称的图标显示

* 修改编译报错的命名问题

* 删除不需要的collectionid部分

* 多余文件的处理和改动一个删除按钮

* 除了loading和统一的imageId,其他都resolve掉的

* 处理图标报错

* 复用了MyPhotoView并采用全部替换的方式将imageFileId变成imageId

* 去除不必要文件修改

* 报错和字段修改

* 增加上传成功后删除临时文件的逻辑以及回退一些修改

* 删除path字段,将图片保存到gridfs内,并修改增删等操作的代码

* 修正编译错误

---------

Co-authored-by: archer <545436317@qq.com>

* perf: image dataset

* feat: insert image

* perf: image icon

* fix: training state

---------

Co-authored-by: Zhuangzai fa <143257420+ctrlz526@users.noreply.github.com>

* fix: ts (#4948)

* Thirddatasetmd (#4942)

* add thirddataset.md

* fix thirddataset.md

* fix

* delete wrong png

---------

Co-authored-by: dreamer6680 <146868355@qq.com>

* perf: api dataset code

* perf: log

* add secondary.tsx (#4946)

* add secondary.tsx

* fix

---------

Co-authored-by: dreamer6680 <146868355@qq.com>

* perf: multiple menu

* perf: i18n

* feat: parse queue (#4960)

* feat: parse queue

* feat: sync parse queue

* fix thirddataset.md (#4962)

* fix thirddataset-4.png (#4963)

* feat: Dataset template import (#4934)

* 模版导入部分除了文档还没写

* 修复模版导入的 build 错误

* Document production

* compress pictures

* Change some constants to variables

---------

Co-authored-by: Archer <545436317@qq.com>

* perf: template import

* doc

* llm pargraph

* bocha tool

* fix: del collection

---------

Co-authored-by: Zhuangzai fa <143257420+ctrlz526@users.noreply.github.com>
Co-authored-by: dreamer6680 <1468683855@qq.com>
Co-authored-by: dreamer6680 <146868355@qq.com>
2025-06-06 14:48:44 +08:00

214 lines
13 KiB
JSON

{
"Enable": "Enable",
"add_file": "Import",
"api_file": "API Dataset",
"api_url": "API Url",
"apidataset_configuration": "Configuration information",
"auto_indexes": "Automatically generate supplementary indexes",
"auto_indexes_tips": "Additional index generation is performed through large models to improve semantic richness and improve retrieval accuracy.",
"auto_training_queue": "Enhanced index queueing",
"backup_collection": "Backup data",
"backup_dataset": "Backup import",
"backup_dataset_success": "The backup was created successfully",
"backup_dataset_tip": "You can reimport the downloaded csv file when exporting the knowledge base.",
"backup_mode": "Backup import",
"backup_template_invalid": "The backup file format is incorrect, it should be the csv file with the first column as q,a,indexes",
"chunk_max_tokens": "max_tokens",
"chunk_process_params": "Block processing parameters",
"chunk_size": "Block size",
"chunk_trigger": "Blocking conditions",
"chunk_trigger_force_chunk": "Forced chunking",
"chunk_trigger_max_size": "The original text length is greater than the maximum context of the file processing model 70%",
"chunk_trigger_min_size": "The original text is greater than",
"chunk_trigger_tips": "Block storage is triggered when certain conditions are met, otherwise the original text will be stored in full directly",
"close_auto_sync": "Are you sure you want to turn off automatic sync?",
"collection.Create update time": "Creation/Update Time",
"collection.Training type": "Training",
"collection.training_type": "Chunk type",
"collection_data_count": "Data amount",
"collection_metadata_custom_pdf_parse": "PDF enhancement analysis",
"collection_name": "Collection name",
"collection_not_support_retraining": "This collection type does not support retuning parameters",
"collection_not_support_sync": "This collection does not support synchronization",
"collection_sync": "Sync data",
"collection_sync_confirm_tip": "Confirm to start synchronizing data? \nThe system will pull the latest data for comparison. If the contents are different, a new collection will be created and the old collection will be deleted. Please confirm!",
"collection_tags": "Collection Tags",
"common.dataset.data.Input Error Tip": "[Image Dataset] Process error:",
"common.error.unKnow": "Unknown error",
"common_dataset": "General Dataset",
"common_dataset_desc": "Building a knowledge base by importing files, web page links, or manual entry",
"condition": "condition",
"config_sync_schedule": "Configure scheduled synchronization",
"confirm_import_images": "Total {{num}} | Confirm create",
"confirm_to_rebuild_embedding_tip": "Are you sure you want to switch the index for the Dataset?\nSwitching the index is a significant operation that requires re-indexing all data in your Dataset, which may take a long time. Please ensure your account has sufficient remaining points.\n\nAdditionally, you need to update the applications that use this Dataset to avoid conflicts with other indexed model Datasets.",
"core.dataset.Image collection": "Image dataset",
"core.dataset.import.Adjust parameters": "Adjust parameters",
"custom_data_process_params": "Custom",
"custom_data_process_params_desc": "Customize data processing rules",
"custom_split_sign_tip": "Allows you to chunk according to custom delimiters. \nUsually used for processed data, using specific separators for precise chunking. \nYou can use the | symbol to represent multiple splitters, such as: \".|.\" to represent a period in Chinese and English.\n\nTry to avoid using special symbols related to regular, such as: * () [] {}, etc.",
"data_amount": "{{dataAmount}} Datas, {{indexAmount}} Indexes",
"data_error_amount": "{{errorAmount}} Group training exception",
"data_index_image": "Image index",
"data_index_num": "Index {{index}}",
"data_parsing": "Data analysis",
"data_process_params": "Params",
"data_process_setting": "Processing config",
"data_uploading": "Data is being uploaded: {{num}}%",
"dataset.Chunk_Number": "Block number",
"dataset.Completed": "Finish",
"dataset.Delete_Chunk": "delete",
"dataset.Edit_Chunk": "edit",
"dataset.Error_Message": "Report an error message",
"dataset.No_Error": "No exception information yet",
"dataset.Operation": "operate",
"dataset.ReTrain": "Retrain",
"dataset.Training Process": "Training status",
"dataset.Training_Count": "{{count}} Group training",
"dataset.Training_Errors": "Errors",
"dataset.Training_QA": "{{count}} Group Q&A pair training",
"dataset.Training_Status": "Training status",
"dataset.Training_Waiting": "Need to wait for {{count}} group data",
"dataset.Unsupported operation": "dataset.Unsupported operation",
"dataset.no_collections": "No datasets available",
"dataset.no_tags": "No tags available",
"default_params": "default",
"default_params_desc": "Use system default parameters and rules",
"download_csv_template": "Click to download the CSV template",
"edit_dataset_config": "Edit knowledge base configuration",
"empty_collection": "Blank dataset",
"enhanced_indexes": "Index enhancement",
"error.collectionNotFound": "Collection not found~",
"external_file": "External File Library",
"external_file_dataset_desc": "You can use external file library to build a knowledge library through the API",
"external_id": "File Reading ID",
"external_other_dataset_desc": "Customize API, Feishu, Yuque and other external documents as knowledge bases",
"external_read_url": "External Preview URL",
"external_read_url_tip": "Configure the reading URL of your file library for user authentication. Use the {{fileId}} variable to refer to the external file ID.",
"external_url": "File Access URL",
"failedToLoadRootDirectories": "Failed to load root directories",
"failedToLoadSubDirectories": "Failed to load subdirectories",
"feishu_dataset": "Feishu Dataset",
"feishu_dataset_config": "Feishu Dataset Config",
"feishu_dataset_desc": "Can build a dataset using Feishu documents by configuring permissions, without secondary storage",
"file_list": "File list",
"file_model_function_tip": "Enhances indexing and QA generation",
"filename": "Filename",
"folder_dataset": "Folder",
"getDirectoryFailed": "Get directory failed",
"image_auto_parse": "Automatic image indexing",
"image_auto_parse_tips": "Call VLM to automatically label the pictures in the document and generate additional search indexes",
"image_training_queue": "Queue of image processing",
"images_creating": "Creating",
"immediate_sync": "Immediate Synchronization",
"import.Auto mode Estimated Price Tips": "The text understanding model needs to be called, which requires more points: {{price}} points/1K tokens",
"import.Embedding Estimated Price Tips": "Only use the index model and consume a small amount of AI points: {{price}} points/1K tokens",
"import_confirm": "Confirm upload",
"import_data_preview": "Data preview",
"import_data_process_setting": "Data processing method settings",
"import_file_parse_setting": "File parsing settings",
"import_model_config": "Model selection",
"import_param_setting": "Parameter settings",
"import_select_file": "Select a file",
"import_select_link": "Enter link",
"index_size": "Index size",
"index_size_tips": "When vectorized, the system will automatically further segment the blocks according to this size.",
"input_required_field_to_select_baseurl": "Please enter the required information first",
"insert_images": "Added pictures",
"insert_images_success": "The new picture is successfully added, and you need to wait for the training to be completed before it will be displayed.",
"is_open_schedule": "Enable scheduled synchronization",
"keep_image": "Keep the picture",
"loading": "Loading...",
"max_chunk_size": "Maximum chunk size",
"move.hint": "After moving, the selected knowledge base/folder will inherit the permission settings of the new folder, and the original permission settings will become invalid.",
"noChildren": "No subdirectories",
"noSelectedFolder": "No selected folder",
"noSelectedId": "No selected ID",
"noValidId": "No valid ID",
"open_auto_sync": "After scheduled synchronization is turned on, the system will try to synchronize the collection from time to time every day. During the collection synchronization period, the collection data will not be searched.",
"other_dataset": "Third-party knowledge base",
"paragraph_max_deep": "Maximum paragraph depth",
"paragraph_split": "Partition by paragraph",
"paragraph_split_tip": "Priority is given to chunking according to the Makdown title paragraph. If the chunking is too long, then chunking is done according to the length.",
"params_config": "Config",
"pdf_enhance_parse": "PDF enhancement analysis",
"pdf_enhance_parse_price": "{{price}} points/page",
"pdf_enhance_parse_tips": "Calling PDF recognition model for parsing, you can convert it into Markdown and retain pictures in the document. At the same time, you can also identify scanned documents, which will take a long time to identify them.",
"permission.des.manage": "Can manage the entire knowledge base data and information",
"permission.des.read": "View knowledge base content",
"permission.des.write": "Ability to add and change knowledge base content",
"pleaseFillUserIdAndToken": "Please fill in User ID and Token",
"preview_chunk": "Preview chunks",
"preview_chunk_empty": "File content is empty",
"preview_chunk_intro": "A total of {{total}} blocks, up to 10",
"preview_chunk_not_selected": "Click on the file on the left to preview",
"process.Auto_Index": "Automatic index generation",
"process.Get QA": "Q&A extraction",
"process.Image_Index": "Image index generation",
"process.Is_Ready": "Ready",
"process.Is_Ready_Count": "{{count}} Group is ready",
"process.Parse_Image": "Image analysis",
"process.Parsing": "Parsing",
"process.Vectorizing": "Index vectorization",
"process.Waiting": "Queue",
"rebuild_embedding_start_tip": "Index model switching task has started",
"rebuilding_index_count": "Number of indexes being rebuilt: {{count}}",
"request_headers": "Request headers, will automatically append 'Bearer '",
"retain_collection": "Adjust Training Parameters",
"retrain_task_submitted": "The retraining task has been submitted",
"rootDirectoryFormatError": "Root directory data format is incorrect",
"rootdirectory": "/rootdirectory",
"same_api_collection": "The same API set exists",
"selectDirectory": "Choose",
"selectRootFolder": "Select Root Folder",
"split_chunk_char": "Block by specified splitter",
"split_chunk_size": "Block by length",
"split_sign_break": "1 newline character",
"split_sign_break2": "2 newline characters",
"split_sign_custom": "Customize",
"split_sign_exclamatiob": "exclamation mark",
"split_sign_null": "Not set",
"split_sign_period": "period",
"split_sign_question": "question mark",
"split_sign_semicolon": "semicolon",
"start_sync_website_tip": "Confirm to start synchronizing data? \nThe old data will be deleted and retrieved again, please confirm!",
"status_error": "Running exception",
"sync_collection_failed": "Synchronization collection error, please check whether the source file can be accessed normally",
"sync_schedule": "Timing synchronization",
"sync_schedule_tip": "Only existing collections will be synchronized. \nIncludes linked collections and all collections in the API knowledge base. \nThe system will poll for updates every day, and the specific update time cannot be determined.",
"table_model_tip": "Store each row of data as a chunk",
"tag.Add_new_tag": "add_new Tag",
"tag.Edit_tag": "Edit Tag",
"tag.add": "Create",
"tag.add_new": "add_new",
"tag.cancel": "Cancel",
"tag.delete_tag_confirm": "Confirm to delete the tag?",
"tag.manage": "Tagging",
"tag.searchOrAddTag": "Search or Add Tag",
"tag.tags": "Tags",
"tag.total_tags": "Total {{total}} tags",
"template_dataset": "Template import",
"template_file_invalid": "The template file format is incorrect, it should be the csv file with the first column as q,a,indexes",
"template_mode": "Template import",
"the_knowledge_base_has_indexes_that_are_being_trained_or_being_rebuilt": "The Dataset has indexes that are being trained or rebuilt",
"total_num_files": "Total {{total}} files",
"training.Error": "{{count}} Group exception",
"training.Normal": "Normal",
"training_mode": "Chunk mode",
"training_queue_tip": "Training queue status",
"training_ready": "{{count}} Group",
"upload_by_template_format": "Upload by template file",
"uploading_progress": "Uploading: {{num}}%",
"vector_model_max_tokens_tip": "Each chunk of data has a maximum length of 3000 tokens",
"vector_training_queue": "Vector training queue",
"vllm_model": "Image understanding model",
"vlm_model_required_tooltip": "A Vision Language Model is required to create image collections",
"vlm_model_required_warning": "Image datasets require a Vision Language Model (VLM) to be configured. Please add a model that supports image understanding in the model configuration first.",
"waiting_for_training": "Waiting for training",
"website_dataset": "Website Sync",
"website_dataset_desc": "Build knowledge base by crawling web page data in batches",
"website_info": "Website Information",
"yuque_dataset": "Yuque Knowledge Base",
"yuque_dataset_config": "Configure Yuque Knowledge Base",
"yuque_dataset_desc": "Build knowledge base using Yuque documents by configuring document permissions, documents will not be stored twice"
}