Skip to content

StarRocks Roadmap 2024 #39686

Open
Open
@Dshadowzh

Description

@Dshadowzh

Refer to roadmap 2023 2022

Shared-data & StarOS

  • Align with all functionalities to shared-nothing
    • Sync materialized view
    • Generated column
    • Partial update with column mode
    • Optimize table and manual compaction
  • Better cache system
    • Multi-layer cache
    • Global cache
    • Cache warmup Cache warmup
    • Cache blacklist/whitelist
    • Refine evict algorithm
  • StarOS internal optimization
    • Multi-replicas for shard management
    • Shard schedule optimization for large scale (more than 10M shards)
    • Local storage for StarOS
  • Decoupled storage for FE (Finished design)
  • Open API for StarRocks table format (sink and source)
  • Time Travel
  • Backup support Snapshot for shared-data #53999

Performance

  • Full columnar Json index Flat json
  • Cost model with primary key and foreign key constrains
  • Arm optimization for codecs
  • Adaptive DOP and adaptive query engine
  • Global dictionary encoding
  • Enhance IO schedule framework
  • JIT / Codegen
  • Fine granularity Fe lock(from db level to table level)

Easy to use

  • Online optimize table
  • List partition optimization
  • Arrow flight interface Support for apache arrow flight SQL #22944
  • Improve files table function
    • Improve schema inference
    • CSV and json format support
    • Other format: Avro, Arrow, Protobuf
    • Better performance for read, predicates pushdown
  • Insert statement improvement (on duplicate key, insert properties)
  • Unified data ingestion with Pipe
    • Pipe for continuous ingestion from Kafka
    • Read from external stream table(Kafka)
    • Continues data ingestion from SQS with Pipe
  • Out-of-the-box parameters

Data lake analytics

  • Better file format support
    • Parquet reader tuning
    • ORC reader tuning
  • Better table format support
Lake Query Insert DDL Update/Delete/Merge into MV
Hive 1.18 3.2     2.5
Iceberg 2.1 3.1 3.0
Hudi 2.2       3.0
Paimon 3.0       3.2
Delta lake 3.0       3.2
  • Iceberg metadata optimization (Iceberg metadata super optimization #43460 3.3)
  • Materialized view improvement
    • Improve partition mapping (list partition, expression partition) (3.4 unified all partition method)
    • Task scheduler DAG & Lineage
    • Better query rewrite
  • JDBC catalog improvement
  • Enhance JNI reader and implement JNI writer
  • Text File format support (basic csv format 3.3)
  • Presto/Trino/Spark/Hive SQL compatibility
  • Presto/Trino/Spark/Hive UDF compatibility
  • Automatic cooldown to lake format

Data warehousing(batch and streaming)

Batch processing & ETL improvement

Streaming processing & real-time update

  • Schemaless partial update
  • Merge into statement
  • Binlog to flink and spark streaming
  • Transaction level incremental refresh in materialized view (Aggregation, Join, functions)
  • Incremental refresh for iceberg/Hudi/Paimon materialized view

All-in-one scenarios

Release

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions