Name	Name	Last commit message	Last commit date
parent directory ..
awslabs	awslabs
tests	tests
.gitignore	.gitignore
.python-version	.python-version
CHANGELOG.md	CHANGELOG.md
Dockerfile	Dockerfile
LICENSE	LICENSE
NOTICE	NOTICE
README.md	README.md
docker-healthcheck.sh	docker-healthcheck.sh
pyproject.toml	pyproject.toml
uv-requirements.txt	uv-requirements.txt
uv.lock	uv.lock

Document Loader MCP Server

Model Context Protocol (MCP) server for document parsing and content extraction

This MCP server provides tools to parse and extract content from various document formats including PDF, Word documents, Excel spreadsheets, PowerPoint presentations, and images.

Features

PDF Text Extraction: Extract text content from PDF files using pdfplumber
Word Document Processing: Convert DOCX/DOC files to markdown using markitdown
Excel Spreadsheet Reading: Parse XLSX/XLS files and convert to markdown
PowerPoint Presentation Processing: Extract content from PPTX/PPT files
Image Loading: Load and display various image formats (PNG, JPG, GIF, BMP, TIFF, WEBP)
Slide Image Extraction: Extract individual slides/pages as PNG images from PPTX, PPT, or PDF files using LibreOffice and poppler

Prerequisites

Installation Requirements

Install uv from Astral or the GitHub README
Install Python 3.10 or newer using uv python install 3.10 (or a more recent version)

Optional: Slide Image Extraction

The extract_slides_as_images tool requires external system packages:

LibreOffice (for PPTX/PPT → PDF conversion):
- Ubuntu/Debian: sudo apt install libreoffice
- macOS: brew install --cask libreoffice
- Windows: Download from libreoffice.org
poppler-utils (for PDF → image rendering):
- Ubuntu/Debian: sudo apt install poppler-utils
- macOS: brew install poppler
- Windows: Download from GitHub and add to PATH

Installation

Kiro	Cursor	VS Code

Configure the MCP server in your MCP client configuration:

{
  "mcpServers": {
    "awslabs.document-loader-mcp-server": {
      "command": "uvx",
      "args": ["awslabs.document-loader-mcp-server@latest"],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR"
      },
      "disabled": false,
      "autoApprove": []
    }
  }
}

For Kiro MCP configuration, see the Kiro IDE documentation or the Kiro CLI documentation for details.

For global configuration, edit ~/.kiro/settings/mcp.json. For project-specific configuration, edit .kiro/settings/mcp.json in your project directory.

Available Tools

read_document: Extract content from various document formats by specifying file_path and file_type ('pdf', 'docx', 'doc', 'xlsx', 'xls', 'pptx', 'ppt')
read_image: Load image files for LLM viewing and analysis
extract_slides_as_images: Extract slides/pages as individual PNG images from PPTX, PPT, or PDF files. Requires LibreOffice (for PPTX/PPT) and poppler-utils (for PDF-to-image rendering)

Environment Variables

FASTMCP_LOG_LEVEL: Set logging level (ERROR, INFO, DEBUG)
MAX_FILE_SIZE_MB: Maximum allowed file size in megabytes (default: 50). Must be a positive integer.
DOCUMENT_BASE_DIR: Base directory for file access security. Restricts document loading to files within this directory. Defaults to the current working directory.

Development

Setup

# Clone the repository
git clone https://github.com/awslabs/mcp.git
cd mcp/src/document-loader-mcp-server

# Install dependencies
uv sync

# Install in development mode
uv pip install -e .

Testing

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=awslabs.document_loader_mcp_server

The test suite includes:

Server functionality validation
Document parsing tests with generated sample files
Error handling verification

Sample Documents

The test suite automatically generates sample documents for testing:

PDF with multi-page content
DOCX with formatted text and lists
XLSX with multiple sheets and data
PPTX with slides and content
Various image formats

Docker

You can also run this server in a Docker container:

docker build -t document-loader-mcp-server .
docker run -p 8000:8000 document-loader-mcp-server

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

Support

For issues and questions, please use the GitHub issue tracker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Document Loader MCP Server

Features

Prerequisites

Installation Requirements

Optional: Slide Image Extraction

Installation

Available Tools

Environment Variables

Development

Setup

Testing

Sample Documents

Docker

License

Contributing

Support

FilesExpand file tree

document-loader-mcp-server

Directory actions

More options

Directory actions

More options

Latest commit

History

document-loader-mcp-server

Folders and files

parent directory

README.md

Document Loader MCP Server

Features

Prerequisites

Installation Requirements

Optional: Slide Image Extraction

Installation

Available Tools

Environment Variables

Development

Setup

Testing

Sample Documents

Docker

License

Contributing

Support