Skip to content

[POC]Perform a Join flint based PPL query using OpenSearch indices #998

Open
@YANG-DB

Description

@YANG-DB

Is your feature request related to a problem?
Create a POC to perform a spark flint based join query with tables that are mapped to OpenSearch indices.
This will demonstrate how spark can be leveraged to perform OpenSearch indices join using spark engine without the need to use the legacy OpenSearch-hadoop plugin.

What solution would you like?
Flint has today the following capabilities with respect to communicating with OpenSearch:

  • Use the OpenSearchCatalog class which allows Spark to interact with OpenSearch indices as tables. It supports read and write operations, enabling seamless data processing and querying across Spark and OpenSearch.
# To configure and initialize the catalog in your Spark session, set the following configurations:

spark.conf.set("spark.sql.catalog.dev", "org.apache.spark.opensearch.catalog.OpenSearchCatalog")
spark.conf.set("spark.sql.catalog.dev.opensearch.port", "9200")
spark.conf.set("spark.sql.catalog.dev.opensearch.scheme", "http")
spark.conf.set("spark.sql.catalog.dev.opensearch.auth", "noauth")
val df = spark.sql("source=dev.default.customer | join ON c_custkey = o_custkey dev.default.orders | join ON c_nationkey = n_nationkey dev.default.nation | fields c_custkey, c_mktsegment, o_orderkey, o_orderstatus, o_totalprice, n_name | head 10")
...

Do you have any additional context?

Metadata

Metadata

Assignees

Labels

Lang:PPLPipe Processing Language supportRoadmap:Ease of UseProject-wide roadmap labelenhancementNew feature or request

Type

No type

Projects

Status

New

Status

Design

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions