mirror of
https://github.com/labring/FastGPT.git
synced 2026-04-27 02:08:10 +08:00
4b24472106
* docs(i18n): translate batch 1 * docs(i18n): translate batch 2 * docs(i18n): translate batch 3 (20 files) - openapi/: app, share - faq/: all 8 files - use-cases/: index, external-integration (5 files), app-cases (4 files) Translated using North American style with natural, concise language. Preserved MDX syntax, code blocks, images, and component imports. * docs(i18n): translate protocol docs * docs(i18n): translate introduction docs (part 1) * docs(i18n): translate use-cases docs * docs(i18n): translate introduction docs (part 2 - batch 1) * docs(i18n): translate final 9 files * fix(i18n): fix YAML and MDX syntax errors in translated files - Add quotes to description with colon in submit_application_template.en.mdx - Remove duplicate Chinese content in translate-subtitle-using-gpt.en.mdx - Fix unclosed details tag issue * docs(i18n): translate all meta.json navigation files * fix(i18n): translate Chinese separators in meta.en.json files * translate * translate * i18n --------- Co-authored-by: archer <archer@archerdeMac-mini.local> Co-authored-by: archer <545436317@qq.com>
78 lines
2.8 KiB
Plaintext
78 lines
2.8 KiB
Plaintext
---
|
|
title: Web Site Sync
|
|
description: Introduction and usage of the FastGPT Web Site Sync feature
|
|
---
|
|
|
|

|
|
|
|
This feature is currently only available to commercial edition users.
|
|
|
|
## What is Web Site Sync
|
|
|
|
Web Site Sync uses crawler technology to automatically discover all pages under the `same domain` from an entry URL, supporting up to `200` sub-pages. For compliance and security reasons, FastGPT only supports crawling `static sites`, primarily intended for quickly building knowledge bases from documentation sites.
|
|
|
|
Tip: Most China-based media sites are not supported, including WeChat Official Accounts, CSDN, Zhihu, etc. You can verify whether a site is static by sending a `curl` request from the terminal:
|
|
|
|
```bash
|
|
curl https://doc.fastgpt.io/docs/intro/
|
|
```
|
|
|
|
## How to Use
|
|
|
|
### 1. Create a New Knowledge Base and Select Web Site Sync
|
|
|
|

|
|
|
|

|
|
|
|
### 2. Click to Configure Site Information
|
|
|
|

|
|
|
|
### 3. Enter the URL and Selector
|
|
|
|

|
|
|
|

|
|
|
|
Click Start Sync and wait for the system to automatically crawl the site content.
|
|
|
|
## Create an App and Bind the Knowledge Base
|
|
|
|

|
|
|
|
## How to Use Selectors
|
|
|
|
Selectors are based on HTML/CSS/JS. You can use selectors to target specific content to crawl rather than the entire site. Here's how:
|
|
|
|
### Open the Browser DevTools (usually F12, or Right-click > Inspect)
|
|
|
|

|
|
|
|

|
|
|
|
### Enter the Element Selector
|
|
|
|
For a CSS selectors reference, see the [MDN CSS Selectors guide](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_selectors).
|
|
|
|
In the image above, we selected an area corresponding to a `div` tag with three attributes: `data-prismjs-copy`, `data-prismjs-copy-success`, and `data-prismjs-copy-error`. We only need one, so the selector is:
|
|
**`div[data-prismjs-copy]`**
|
|
|
|
Besides attribute selectors, class and ID selectors are also common. For example:
|
|
|
|

|
|
|
|
The `class` in the image contains class names (there may be multiple separated by spaces — just pick one). The selector would be: **`.docs-content`**
|
|
|
|
### Using Multiple Selectors
|
|
|
|
In the earlier demo, we used multiple selectors for the FastGPT documentation site, separated by commas.
|
|
|
|

|
|
|
|
We want to select content from the two tags shown above, which requires two selectors. The first is: `.docs-content .mb-0.d-flex`, meaning child elements under the `docs-content` class that have both the `mb-0` and `d-flex` classes.
|
|
|
|
The second is `.docs-content div[data-prismjs-copy]`, meaning `div` elements under the `docs-content` class that have the `data-prismjs-copy` attribute.
|
|
|
|
Separate the two selectors with a comma: `.docs-content .mb-0.d-flex, .docs-content div[data-prismjs-copy]`
|