Skip to content

Conversation

@rusackas
Copy link
Member

@rusackas rusackas commented Dec 23, 2025

User description

Summary

This PR introduces an automated database documentation system that generates documentation pages from the DATABASE_DOCS dictionary in superset/db_engine_specs/lib.py.

Key Features

  • Single source of truth: All database documentation lives in lib.py's DATABASE_DOCS dict
  • Auto-generated pages: Individual MDX pages for each database with connection strings, drivers, auth methods
  • Overview table: Searchable/filterable table showing all databases with scores, features, time grains
  • Compatible databases: PostgreSQL-compatible DBs (YugabyteDB, TimescaleDB, etc.) appear in table
  • README sync: Script can update main README.md database logos via yarn update:readme-db-logos
  • Promoted navigation: Databases section moved to top-level nav (not under Configuration)

Changes

  • superset/db_engine_specs/lib.py: Added logo, homepage_url, docs_url fields to DATABASE_DOCS
  • docs/scripts/generate-database-docs.mjs: New script to generate JSON + MDX from lib.py
  • docs/src/components/databases/: React components for index table and detail pages
  • docs/docs/databases/: Auto-generated MDX pages (53 databases)
  • README.md: Database logos now auto-generated between marker comments

Usage

# Regenerate docs (runs automatically on build)
cd docs && yarn generate:database-docs

# Update README logos (manual, opt-in)
cd docs && yarn update:readme-db-logos

Screenshots

Database overview table with search, filters, and feature badges.
Individual database pages with logos, connection strings, and driver info.

Test Plan

  • yarn build succeeds in docs/
  • Database pages render correctly at /docs/databases/
  • Compatible databases appear in overview table
  • Scores and time grains display correctly
  • README update is opt-in (doesn't break CI)

🤖 Generated with Claude Code


CodeAnt-AI Description

Auto-generate database documentation pages and index from engine specs

What Changed

  • Documentation for 50+ databases is now produced from a single source (DATABASE_DOCS in superset/db_engine_specs/lib.py) and exposed as a filterable index and individual pages showing logos, connection strings, drivers, authentication methods, engine parameters, supported features, and compatible databases.
  • A new generator script is run during docs start/build (yarn generate:database-docs) to produce JSON/MDX data used by the docs site; individual MDX pages and a searchable table component render the generated content.
  • The site homepage database grid now links directly to each database's documentation page; sidebars and README links were updated to point to the new /docs/databases location and README logos are generated from the same data.

Impact

✅ Clearer database pages with connection strings, drivers, and auth methods
✅ Shorter documentation updates via single source of truth (DATABASE_DOCS)
✅ Direct database documentation links from the homepage

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

rusackas and others added 7 commits December 21, 2025 10:46
Rebuild the database documentation system so that lib.py is the
single source of truth. The script outputs JSON that React components
consume to render the documentation pages.

Changes:
- Add comprehensive DATABASE_DOCS dictionary to lib.py with 53 databases
- Create generate-database-docs.mjs build script
- Create DatabaseIndex and DatabasePage React components
- Replace 1900 lines of manual markdown with component-based rendering
- Integrate into docs build pipeline (yarn start/build)

To update documentation, just update DATABASE_DOCS in lib.py.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Run the diagnostic tests with Flask context to get actual feature
scores for each database engine spec.

Top scores:
- Presto: 159/201
- Trino: 149/201
- Apache Hive/Spark: 140/201
- PostgreSQL: 104/201

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add compatible databases (YugabyteDB, TimescaleDB, Hologres) to the
overview table with a link to their parent database's documentation.

Compatible DBs show a "PostgreSQL compatible" tag and inherit feature
scores from their parent.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Update generate-database-docs.mjs to create individual MDX files
- Each database now has its own page at /docs/configuration/databases/{slug}
- Overview page at /docs/configuration/databases/ with filterable table
- Fix category counts in filter dropdown
- Links in table now point to individual pages
- Use cached databases.json when it has full diagnostic data

Generated 64 database pages + index page.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Move Databases section from Configuration to top-level navigation
- Add Databases to Documentation dropdown menu in navbar
- Set "Next" version as default documentation version
- Improve database page layout with larger logos (height: 120)
- Hide duplicate H1 headings via hide_title frontmatter
- Fix diagnostics preservation in fallback mode when Flask context unavailable
- Add logos and homepage URLs to DATABASE_DOCS in lib.py
- Show compatible databases (e.g., YugabyteDB) in overview table
- Dynamically generate front page database grid from databases.json

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The generate-database-docs script now updates the main README.md
with database logos between marker comments:
- <!-- SUPPORTED_DATABASES_START -->
- <!-- SUPPORTED_DATABASES_END -->

This ensures the README stays in sync with DATABASE_DOCS in lib.py.
Also updated docs links to point to new /docs/databases path.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The generate-database-docs script now only updates README.md when
explicitly requested via:
- --update-readme flag
- UPDATE_README=true env var

Added npm script: yarn update:readme-db-logos

This prevents CI from failing due to uncommitted README changes
during docs builds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions github-actions bot added doc Namespace | Anything related to documentation and removed size/XXL labels Dec 23, 2025
@codeant-ai-for-open-source codeant-ai-for-open-source bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Dec 23, 2025
@apache apache deleted a comment from codeant-ai-for-open-source bot Dec 23, 2025
rusackas and others added 2 commits December 23, 2025 11:05
Fixes CodeQL security alert: incomplete string escaping.
Backslashes must be escaped before quotes to prevent
malformed YAML frontmatter in generated MDX files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fixes CodeQL security warning about shell commands built from
environment values. Now uses spawnSync with:
- cwd option instead of cd in shell command
- env option for environment variables
- arguments passed as array (no shell parsing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@apache apache deleted a comment from codeant-ai-for-open-source bot Dec 23, 2025
rusackas and others added 2 commits December 23, 2025 11:10
Security fix for CodeQL warning about shell command injection.
Converted extractDatabaseDocs() and extractDatabaseDocsSimple()
to use spawnSync with cwd option instead of execSync with shell
string interpolation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Compatible databases can share the same name across multiple parent
engines. Using only the name as rowKey leads to duplicate React keys.
Fixed by combining parent engine name with database name for compatible
database entries.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@apache apache deleted a comment from codeant-ai-for-open-source bot Dec 23, 2025
@apache apache deleted a comment from codeant-ai-for-open-source bot Dec 23, 2025
rusackas and others added 3 commits December 23, 2025 11:12
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The language prop was defined but never used since CodeBlock
doesn't implement syntax highlighting.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The fallback extraction was importing from superset which requires
werkzeug and other dependencies not available in CI. Fixed by using
pure Python AST parsing to extract DATABASE_DOCS directly from lib.py
without any imports.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@apache apache deleted a comment from codeant-ai-for-open-source bot Dec 23, 2025
@apache apache deleted a comment from bito-code-review bot Dec 23, 2025
rusackas and others added 2 commits December 23, 2025 11:29
JPG doesn't support transparency, causing ugly background artifacts.
Updated lib.py to use sqlite.png instead of sqlite.jpg.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
rusackas and others added 3 commits December 23, 2025 15:16
- Upgrade MySQL and SAP HANA from JPG to PNG for transparency
- Add logo and homepage_url to IBM Db2 entry
- Remove 21 obsolete/duplicate logo files:
  - Databases without engine specs (Greenplum, MonetDB, Sybase)
  - Duplicate formats (JPG when PNG/SVG exists)
  - Typos (google-biquery.png)
  - Unused alternatives (trino2.jpg, presto.png, etc.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add engine specs and documentation for three databases with
existing SQLAlchemy dialects:

- Greenplum: PostgreSQL-based MPP database (extends PostgresEngineSpec)
- MonetDB: Column-oriented OLAP database with custom time grains
- SAP Sybase: Enterprise RDBMS using T-SQL (extends MssqlEngineSpec)

All three have active SQLAlchemy dialect packages:
- sqlalchemy-greenplum (0.2.1, Aug 2024)
- sqlalchemy-monetdb (1.0.0+)
- sqlalchemy-sybase (2.0.0)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Restored the logo files that were deleted earlier, now that these
databases have official engine specs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

Added DATABASE_DOCS entries for databases that already have engine specs
but were missing documentation:

- Amazon DynamoDB: NoSQL with PartiQL SQL support (pydynamodb)
- MotherDuck: Serverless DuckDB cloud platform (duckdb-engine)
- IBM Db2 for i: AS/400 integrated database (sqlalchemy-ibmi)

Total databases: 59 (up from 56)

Sources:
- https://github.com/passren/PyDynamoDB
- https://motherduck.com/docs/integrations/language-apis-and-drivers/python/sqlalchemy/
- https://github.com/IBM/sqlalchemy-ibmi

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Namespace | Anything related to documentation preset-io size/XXL size:XXL This PR changes 1000+ lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant