Open
Description
Shared-data & StarOS
- Align with all functionalities to shared-nothing
- Sync materialized view
- Generated column
- Partial update with column mode
- Optimize table and manual compaction
- Better cache system
- Multi-layer cache
- Global cache
- Cache warmup Cache warmup
- Cache blacklist/
whitelist - Refine evict algorithm
- StarOS internal optimization
- Multi-replicas for shard management
- Shard schedule optimization for large scale (more than 10M shards)
- Local storage for StarOS
- Decoupled storage for FE (Finished design)
- Open API for StarRocks table format (sink and source)
- Time Travel
- Backup support Snapshot for shared-data #53999
Performance
- Full columnar Json index Flat json
- Cost model with primary key and foreign key constrains
- Arm optimization for codecs
- Adaptive DOP and adaptive query engine
- Global dictionary encoding
- Enhance IO schedule framework
- JIT / Codegen
- Fine granularity Fe lock(from db level to table level)
Easy to use
- Online optimize table
- List partition optimization
- Arrow flight interface Support for apache arrow flight SQL #22944
- Improve
files
table function- Improve schema inference
- CSV and json format support
- Other format: Avro, Arrow, Protobuf
- Better performance for read, predicates pushdown
- Insert statement improvement (on duplicate key, insert properties)
- Unified data ingestion with Pipe
- Pipe for continuous ingestion from Kafka
- Read from external stream table(Kafka)
- Continues data ingestion from SQS with Pipe
- Out-of-the-box parameters
Data lake analytics
- Better file format support
- Parquet reader tuning
- ORC reader tuning
- Better table format support
Lake | Query | Insert | DDL | Update/Delete/Merge into | MV |
---|---|---|---|---|---|
Hive | 1.18 | 3.2 | 2.5 | ||
Iceberg | 2.1 | 3.1 | 3.0 | ||
Hudi | 2.2 | 3.0 | |||
Paimon | 3.0 | 3.2 | |||
Delta lake | 3.0 | 3.2 |
- Iceberg metadata optimization (Iceberg metadata super optimization #43460 3.3)
- Materialized view improvement
- Improve partition mapping (list partition, expression partition) (3.4 unified all partition method)
- Task scheduler DAG & Lineage
- Better query rewrite
- JDBC catalog improvement
- Enhance JNI reader and implement JNI writer
- Text File format support (basic csv format 3.3)
- Presto/Trino/Spark/Hive SQL compatibility
- Presto/Trino/Spark/Hive UDF compatibility
- Automatic cooldown to lake format
Data warehousing(batch and streaming)
Batch processing & ETL improvement
- Enable spilling to GA (3.3)
- Multi-statement transaction Support Transaction Statement #53978 (3.5)
- Temporary table (3.3)
- Group execution [Feature] support Group execution #42352 (3.3)
- Task auto retry
Streaming processing & real-time update
- Schemaless partial update
- Merge into statement
- Binlog to flink and spark streaming
- Transaction level incremental refresh in materialized view (Aggregation, Join, functions)
- Incremental refresh for iceberg/Hudi/Paimon materialized view
All-in-one scenarios
- Search: Optimize full text inverted index [inverted_index] )(https://docs.starrocks.io/docs/table_design/indexes/inverted_index/)
- Row store: Optimize row store for high concurrent point lookup Hybrid row-column store
- Time series db: Asof join, high concurrent ingestion
- Vector database: vector index [Feature] Support vector index and ANNS. #46678 (3.4)