StarRocks Roadmap 2024

> Refer to roadmap  [2023](https://github.com/StarRocks/starrocks/issues/16445) [2022](https://github.com/StarRocks/starrocks/issues/1244)

# Shared-data & StarOS

- Align with all functionalities to shared-nothing
    - [x] Sync materialized view
    - [x] Generated column
    - [x] Partial update with column mode
    - [x] Optimize table and manual compaction
- Better cache system 
    - [x] Multi-layer cache
    - [ ] Global cache
    - [x] Cache warmup  [Cache warmup](https://docs.starrocks.io/docs/data_source/data_cache_warmup/)
    - [x] Cache blacklist/~~whitelist~~
    - [x] Refine evict algorithm
- StarOS internal optimization
    - [x] Multi-replicas for shard management
    - [ ] Shard schedule optimization for large scale (more than 10M shards)
    - [ ] Local storage for StarOS
- [ ] Decoupled storage for FE (Finished design)
- [ ] Open API for StarRocks table format  (sink and source)
- [ ] Time Travel
- [ ] Backup support https://github.com/StarRocks/starrocks/issues/53999
   
# Performance 
- [x] Full columnar Json index  [Flat json](https://docs.starrocks.io/docs/using_starrocks/Flat_json/)
- [x] Cost model with primary key and foreign key constrains
- [x] Arm optimization for codecs
- [ ] Adaptive DOP and adaptive query engine
- [x] Global dictionary encoding 
- [ ] Enhance IO schedule framework
- [x] JIT / Codegen
- [x] Fine granularity Fe lock(from db level to table level)

# Easy to use
- [x] Online optimize table
- [x] List partition optimization
- [x] Arrow flight interface https://github.com/StarRocks/starrocks/issues/22944
- Improve `files` table function 
    - [x] Improve schema inference
    - [x] CSV and json format support
    - [ ] Other format: Avro, Arrow, Protobuf
    - [ ] Better performance for read, predicates pushdown
- Insert statement improvement (on duplicate key, insert properties)
- Unified data ingestion with Pipe
    - [ ] Pipe for continuous ingestion from Kafka
    - [ ] Read from external stream table(Kafka) 
    - [ ] Continues data ingestion from SQS with Pipe
- [ ] Out-of-the-box parameters

# Data lake analytics
- Better file format support
   - [x] Parquet reader tuning
   - [ ] ORC reader tuning
- Better table format support

Lake | Query | Insert | DDL | Update/Delete/Merge into | MV 
--- | --- | --- | --- | --- | ---  
Hive | 1.18 | 3.2 |   |   | 2.5 
Iceberg | 2.1  | 3.1 |  |  | 3.0
Hudi | 2.2 |   |   |   | 3.0
Paimon | 3.0 |   |   |   | 3.2
Delta lake | 3.0 |   |   |   | 3.2

- [x] Iceberg metadata optimization  (https://github.com/StarRocks/starrocks/issues/43460 3.3)
- Materialized view improvement 
  - [x] Improve partition mapping (list partition, expression partition) (3.4 unified all partition method)
  - [ ] Task scheduler DAG & Lineage
  - [x] Better query rewrite 
- [x] JDBC catalog improvement
- [ ] Enhance JNI reader and implement JNI writer
- [x] Text File format support (basic csv format 3.3)
- [ ] Presto/Trino/Spark/Hive SQL compatibility 
- [ ] Presto/Trino/Spark/Hive UDF compatibility
- [ ] Automatic cooldown to lake format

# Data warehousing(batch and streaming)
## Batch processing & ETL improvement 
- [x] Enable spilling to GA (3.3)
- [ ] Multi-statement transaction https://github.com/StarRocks/starrocks/issues/53978 (3.5)
- [x] Temporary table (3.3)
- [x] Group execution https://github.com/StarRocks/starrocks/pull/42352 (3.3)
- [ ] Task auto retry
## Streaming processing & real-time update
- [ ] Schemaless partial update
- [ ] Merge into statement
- [ ] Binlog to flink and spark streaming
- [ ] Transaction level incremental refresh in materialized view (Aggregation, Join, functions)
- [ ] Incremental refresh for iceberg/Hudi/Paimon materialized view

# All-in-one scenarios
- [x] Search: Optimize full text inverted index [inverted_index] )(https://docs.starrocks.io/docs/table_design/indexes/inverted_index/)
- [x] Row store: Optimize row store for high concurrent point lookup [Hybrid row-column store](https://docs.starrocks.io/docs/table_design/hybrid_table/)
- [ ] Time series db: Asof join,  high concurrent ingestion
- [x] Vector database:  vector index https://github.com/StarRocks/starrocks/issues/46678 (3.4)


# Release 
- https://github.com/StarRocks/starrocks/issues/40907
- https://github.com/StarRocks/starrocks/issues/52573

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StarRocks Roadmap 2024 #39686

Shared-data & StarOS

Performance

Easy to use

Data lake analytics

Data warehousing(batch and streaming)

Batch processing & ETL improvement

Streaming processing & real-time update

All-in-one scenarios

Release

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Lake	Query	Insert	MV
Hive	1.18	3.2	2.5
Iceberg	2.1	3.1	3.0
Hudi	2.2		3.0
Paimon	3.0		3.2
Delta lake	3.0		3.2

StarRocks Roadmap 2024 #39686

Description

Shared-data & StarOS

Performance

Easy to use

Data lake analytics

Data warehousing(batch and streaming)

Batch processing & ETL improvement

Streaming processing & real-time update

All-in-one scenarios

Release

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions