--- title: Integrating MinerU PDF Parsing description: Use MinerU to parse PDF documents with image extraction, layout recognition, table recognition, and formula recognition --- ## Background PDF is a relatively complex file format. FastGPT's built-in PDF parser relies on the pdfjs library, which uses logical parsing and cannot effectively handle complex PDF files. When parsing PDFs containing images, tables, formulas, or other non-plain-text content, the results are often poor. There are several PDF parsing solutions available. [MinerU](https://github.com/opendatalab/MinerU) uses YOLO, PaddleOCR, and table recognition models for vision-based parsing, effectively extracting images, tables, formulas, and other complex content. Community edition users can add the `systemEnv.customPdfParse` configuration in `config.json` to use MinerU for PDF parsing. Commercial edition users can configure this directly in the Admin panel via the form -- details are covered in the tutorial below. ## Tutorial Hardware requirements: 16GB+ GPU VRAM, minimum 16GB+ RAM (32GB+ recommended). See the [official page](https://github.com/opendatalab/MinerU) for other requirements. ### 1. Install MinerU Quick Docker installation: Pull the fastgpt-mineru image --> Create and start the parsing service container --> Add the deployed URL to the FastGPT configuration file ```dockerfile docker pull crpi-h3snc261q1dosroc.cn-hangzhou.personal.cr.aliyuncs.com/fastgpt_ck/mineru:v1 docker run --gpus all -itd -p 7231:8001 --name mode_pdf_minerU crpi-h3snc261q1dosroc.cn-hangzhou.personal.cr.aliyuncs.com/fastgpt_ck/mineru:v1 ``` This MinerU integration uses pipeline mode with built-in parallelization inside the Docker container. It creates multiple processes based on the number of GPUs to handle uploaded PDFs concurrently. ### 2. Add FastGPT Configuration ```json { xxx "systemEnv": { xxx "customPdfParse": { "url": "http://xxxx.com/v2/parse/file", // Custom PDF parsing service URL for MinerU "key": "", // Custom PDF parsing service key "doc2xKey": "", // doc2x service key "price": 0 // PDF parsing service price } } } ``` For the commercial edition, configure as shown below: ![alt text](/imgs/mineru6.png) **Note:** Services added via the configuration file require a restart to take effect. ### 3. Test Upload a PDF file through the Knowledge Base and enable the `Enhanced PDF Parsing` option. ![alt text](/imgs/mineru1.png) After uploading, you should see the following logs (LOG_LEVEL must be set to info or debug): ``` [Info] 2024-12-05 15:04:42 Parsing files from an external service [Info] 2024-12-05 15:07:08 Custom file parsing is complete, time: 1316ms ``` Similarly, in apps you can enable `Enhanced PDF Parsing` in the file upload settings. ![alt text](/imgs/mineru2.png) ## Results Using Tsinghua's [ChatDev Communicative Agents for Software Develop.pdf](https://arxiv.org/abs/2307.07924) as an example: | | | | | ------------------------------- | ------------------------------- | ------------------------------- | | ![alt text](/imgs/mineru3-1.png) | ![alt text](/imgs/mineru4-1.png) | ![alt text](/imgs/mineru5-1.png) | | ![alt text](/imgs/mineru3.png) | ![alt text](/imgs/mineru4.png) | ![alt text](/imgs/mineru5.png) | The top row shows chunked results; the bottom row shows the original PDF. Images, formulas, and OCR handwriting are all extracted effectively. Note that [MinerU](https://github.com/opendatalab/MinerU) is licensed under `GPL-3.0 license`. Please ensure compliance with the license when using it.