Open
Description
Is your feature request related to a problem?
Create a POC to perform a spark flint based join query with tables that are mapped to OpenSearch indices.
This will demonstrate how spark can be leveraged to perform OpenSearch indices join using spark engine without the need to use the legacy OpenSearch-hadoop plugin.
What solution would you like?
Flint has today the following capabilities with respect to communicating with OpenSearch:
- Use the
OpenSearchCatalog
class which allows Spark to interact with OpenSearch indices as tables. It supports read and write operations, enabling seamless data processing and querying across Spark and OpenSearch.
# To configure and initialize the catalog in your Spark session, set the following configurations:
spark.conf.set("spark.sql.catalog.dev", "org.apache.spark.opensearch.catalog.OpenSearchCatalog")
spark.conf.set("spark.sql.catalog.dev.opensearch.port", "9200")
spark.conf.set("spark.sql.catalog.dev.opensearch.scheme", "http")
spark.conf.set("spark.sql.catalog.dev.opensearch.auth", "noauth")
val df = spark.sql("source=dev.default.customer | join ON c_custkey = o_custkey dev.default.orders | join ON c_nationkey = n_nationkey dev.default.nation | fields c_custkey, c_mktsegment, o_orderkey, o_orderstatus, o_totalprice, n_name | head 10")
...
Do you have any additional context?
Metadata
Metadata
Assignees
Type
Projects
Status
New
Status
Design