Public Preview: This SDK is supported for production use cases and is available to all customers. Databricks is actively working on stabilizing the Zerobus Ingest SDK for Python. Minor version updates may include backwards-incompatible changes.
We are keen to hear feedback from you on this SDK. Please file issues, and we will address them.
The Databricks Zerobus Ingest SDK for Python provides a high-performance client for ingesting data directly into Databricks Delta tables using the Zerobus streaming protocol. | See also the SDK for Rust | See also the SDK for Java
- Disclaimer
- Features
- Requirements
- Quick Start User Guide
- Usage Examples
- Authentication
- Configuration
- Error Handling
- API Reference
- Best Practices
- High-throughput ingestion: Optimized for high-volume data ingestion
- Automatic recovery: Built-in retry and recovery mechanisms
- Flexible configuration: Customizable stream behavior and timeouts
- Multiple serialization formats: Support for JSON and Protocol Buffers
- OAuth 2.0 authentication: Secure authentication with client credentials
- Configurable TLS: Custom TLS configuration support for advanced security requirements
- Sync and Async support: Both synchronous and asynchronous APIs
- Comprehensive logging: Detailed logging using Python's standard logging framework
- Python: 3.9 or higher
- Databricks workspace with Zerobus access enabled
protobuf>= 4.25.0, < 7.0grpcio>= 1.60.0, < 2.0requests>= 2.28.1, < 3
Before using the SDK, you'll need the following:
After logging into your Databricks workspace, look at the browser URL:
https://<databricks-instance>.cloud.databricks.com/o=<workspace-id>
- Workspace URL: The part before
/o=→https://<databricks-instance>.cloud.databricks.com - Workspace ID: The part after
/o=→<workspace-id>
Note: The examples above show AWS endpoints (
.cloud.databricks.com). For Azure deployments, the workspace URL will behttps://<databricks-instance>.azuredatabricks.net.
Example:
- Full URL:
https://dbc-a1b2c3d4-e5f6.cloud.databricks.com/o=1234567890123456 - Workspace URL:
https://dbc-a1b2c3d4-e5f6.cloud.databricks.com - Workspace ID:
1234567890123456
Create a table using Databricks SQL:
CREATE TABLE <catalog_name>.default.air_quality (
device_name STRING,
temp INT,
humidity BIGINT
)
USING DELTA;Replace <catalog_name> with your catalog name (e.g., main).
- Navigate to Settings > Identity and Access in your Databricks workspace
- Click Service principals and create a new service principal
- Generate a new secret for the service principal and save it securely
- Grant the following permissions:
USE_CATALOGon the catalog (e.g.,main)USE_SCHEMAon the schema (e.g.,default)MODIFYandSELECTon the table (e.g.,air_quality)
Grant permissions using SQL:
-- Grant catalog permission
GRANT USE CATALOG ON CATALOG <catalog_name> TO `<service-principal-application-id>`;
-- Grant schema permission
GRANT USE SCHEMA ON SCHEMA <catalog_name>.default TO `<service-principal-application-id>`;
-- Grant table permissions
GRANT SELECT, MODIFY ON TABLE <catalog_name>.default.air_quality TO `<service-principal-application-id>`;Install the latest stable version using pip:
pip install databricks-zerobus-ingest-sdkClone the repository and install from source:
git clone https://github.com/databricks/zerobus-sdk-py.git
cd zerobus-sdk-py
pip install -e .The SDK supports two serialization formats:
- JSON - Simple, no schema compilation needed. Good for getting started.
- Protocol Buffers (Default to maintain backwards compatibility) - Strongly-typed schemas. More efficient over the wire.
Synchronous Example:
import json
import logging
from zerobus.sdk.sync import ZerobusSdk
from zerobus.sdk.shared import RecordType, StreamConfigurationOptions, TableProperties
# Configure logging (optional but recommended)
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
# Configuration
# For AWS:
server_endpoint = "1234567890123456.zerobus.us-west-2.cloud.databricks.com"
workspace_url = "https://dbc-a1b2c3d4-e5f6.cloud.databricks.com"
# For Azure:
# server_endpoint = "1234567890123456.zerobus.us-west-2.azuredatabricks.net"
# workspace_url = "https://dbc-a1b2c3d4-e5f6.azuredatabricks.net"
table_name = "main.default.air_quality"
client_id = "your-service-principal-application-id"
client_secret = "your-service-principal-secret"
# Initialize SDK
sdk = ZerobusSdk(server_endpoint, workspace_url)
# Configure table properties
table_properties = TableProperties(table_name)
# Configure stream with JSON record type
options = StreamConfigurationOptions(record_type=RecordType.JSON)
# Create stream
stream = sdk.create_stream(client_id, client_secret, table_properties, options)
try:
# Ingest records
for i in range(100):
# Create JSON record
record_dict = {
"device_name": f"sensor-{i % 10}",
"temp": 20 + (i % 15),
"humidity": 50 + (i % 40)
}
json_record = json.dumps(record_dict)
ack = stream.ingest_record(json_record)
ack.wait_for_ack() # Optional: Wait for durability confirmation
print(f"Ingested record {i + 1}")
print("Successfully ingested 100 records!")
finally:
stream.close()Asynchronous Example:
import asyncio
import json
import logging
from zerobus.sdk.aio import ZerobusSdk
from zerobus.sdk.shared import RecordType, StreamConfigurationOptions, TableProperties
# Configure logging (optional but recommended)
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
async def main():
# Configuration
# For AWS:
server_endpoint = "1234567890123456.zerobus.us-west-2.cloud.databricks.com"
workspace_url = "https://dbc-a1b2c3d4-e5f6.cloud.databricks.com"
# For Azure:
# server_endpoint = "1234567890123456.zerobus.us-west-2.azuredatabricks.net"
# workspace_url = "https://dbc-a1b2c3d4-e5f6.azuredatabricks.net"
table_name = "main.default.air_quality"
client_id = "your-service-principal-application-id"
client_secret = "your-service-principal-secret"
# Initialize SDK
sdk = ZerobusSdk(server_endpoint, workspace_url)
# Configure table properties
table_properties = TableProperties(table_name)
# Configure stream with JSON record type
options = StreamConfigurationOptions(record_type=RecordType.JSON)
# Create stream
stream = await sdk.create_stream(client_id, client_secret, table_properties, options)
try:
# Ingest records
for i in range(100):
# Create JSON record
record_dict = {
"device_name": f"sensor-{i % 10}",
"temp": 20 + (i % 15),
"humidity": 50 + (i % 40)
}
json_record = json.dumps(record_dict)
future = await stream.ingest_record(json_record)
await future # Optional: Wait for durability confirmation
print(f"Ingested record {i + 1}")
print("Successfully ingested 100 records!")
finally:
await stream.close()
asyncio.run(main())You'll need to define and compile a protobuf schema.
Create a file named record.proto:
syntax = "proto2";
message AirQuality {
optional string device_name = 1;
optional int32 temp = 2;
optional int64 humidity = 3;
}Compile the protobuf:
pip install "grpcio-tools>=1.60.0,<2.0"
python -m grpc_tools.protoc --python_out=. --proto_path=. record.protoThis generates a record_pb2.py file compatible with protobuf 6.x.
Instead of manually writing your protobuf schema, you can automatically generate it from an existing Unity Catalog table using the included generate_proto.py tool.
Basic Usage:
python -m zerobus.tools.generate_proto \
--uc-endpoint "https://dbc-a1b2c3d4-e5f6.cloud.databricks.com" \
--client-id "your-service-principal-application-id" \
--client-secret "your-service-principal-secret" \
--table "main.default.air_quality" \
--output "record.proto" \
--proto-msg "AirQuality"Parameters:
--uc-endpoint: Your workspace URL (required)--client-id: Service principal application ID (required)--client-secret: Service principal secret (required)--table: Fully qualified table name in format catalog.schema.table (required)--output: Output path for the generated proto file (required)--proto-msg: Name of the protobuf message (optional, defaults to table name)
After generating, compile it as shown above.
Type Mappings:
| Delta Type | Proto2 Type |
|---|---|
| INT, SMALLINT, SHORT | int32 |
| BIGINT, LONG | int64 |
| FLOAT | float |
| DOUBLE | double |
| STRING, VARCHAR | string |
| BOOLEAN | bool |
| BINARY | bytes |
| DATE | int32 |
| TIMESTAMP | int64 |
| ARRAY<type> | repeated type |
| MAP<key, value> | map<key, value> |
| STRUCT<fields> | nested message |
Synchronous Example:
import logging
from zerobus.sdk.sync import ZerobusSdk
from zerobus.sdk.shared import TableProperties
import record_pb2
# Configure logging (optional but recommended)
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
# Configuration
# For AWS:
server_endpoint = "1234567890123456.zerobus.us-west-2.cloud.databricks.com"
workspace_url = "https://dbc-a1b2c3d4-e5f6.cloud.databricks.com"
# For Azure:
# server_endpoint = "1234567890123456.zerobus.us-west-2.azuredatabricks.net"
# workspace_url = "https://dbc-a1b2c3d4-e5f6.azuredatabricks.net"
table_name = "main.default.air_quality"
client_id = "your-service-principal-application-id"
client_secret = "your-service-principal-secret"
# Initialize SDK
sdk = ZerobusSdk(server_endpoint, workspace_url)
# Configure table properties with protobuf descriptor
table_properties = TableProperties(table_name, record_pb2.AirQuality.DESCRIPTOR)
# Create stream
stream = sdk.create_stream(client_id, client_secret, table_properties)
try:
# Ingest records
for i in range(100):
record = record_pb2.AirQuality(
device_name=f"sensor-{i % 10}",
temp=20 + (i % 15),
humidity=50 + (i % 40)
)
ack = stream.ingest_record(record)
ack.wait_for_ack() # Optional: Wait for durability confirmation
print(f"Ingested record {i + 1}")
print("Successfully ingested 100 records!")
finally:
stream.close()Asynchronous Example:
import asyncio
import logging
from zerobus.sdk.aio import ZerobusSdk
from zerobus.sdk.shared import TableProperties
import record_pb2
# Configure logging (optional but recommended)
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
async def main():
# Configuration
# For AWS:
server_endpoint = "1234567890123456.zerobus.us-west-2.cloud.databricks.com"
workspace_url = "https://dbc-a1b2c3d4-e5f6.cloud.databricks.com"
# For Azure:
# server_endpoint = "1234567890123456.zerobus.us-west-2.azuredatabricks.net"
# workspace_url = "https://dbc-a1b2c3d4-e5f6.azuredatabricks.net"
table_name = "main.default.air_quality"
client_id = "your-service-principal-application-id"
client_secret = "your-service-principal-secret"
# Initialize SDK
sdk = ZerobusSdk(server_endpoint, workspace_url)
# Configure table properties with protobuf descriptor
table_properties = TableProperties(table_name, record_pb2.AirQuality.DESCRIPTOR)
# Create stream
stream = await sdk.create_stream(client_id, client_secret, table_properties)
try:
# Ingest records
for i in range(100):
record = record_pb2.AirQuality(
device_name=f"sensor-{i % 10}",
temp=20 + (i % 15),
humidity=50 + (i % 40)
)
future = await stream.ingest_record(record)
await future # Optional: Wait for durability confirmation
print(f"Ingested record {i + 1}")
print("Successfully ingested 100 records!")
finally:
await stream.close()
asyncio.run(main())See the examples/ directory for complete, runnable examples in both JSON and protobuf formats (sync and async variants). See examples/README.md for detailed instructions.
import json
import logging
from zerobus.sdk.sync import ZerobusSdk
from zerobus.sdk.shared import RecordType, StreamConfigurationOptions, TableProperties
logging.basicConfig(level=logging.INFO)
sdk = ZerobusSdk(server_endpoint, workspace_url)
table_properties = TableProperties(table_name)
options = StreamConfigurationOptions(record_type=RecordType.JSON)
stream = sdk.create_stream(client_id, client_secret, table_properties, options)
try:
for i in range(1000):
record_dict = {
"device_name": f"sensor-{i}",
"temp": 20 + i % 15,
"humidity": 50 + i % 40
}
json_record = json.dumps(record_dict)
ack = stream.ingest_record(json_record)
ack.wait_for_ack() # Optional: Wait for durability confirmation
finally:
stream.close()import asyncio
import json
import logging
from zerobus.sdk.aio import ZerobusSdk
from zerobus.sdk.shared import RecordType, StreamConfigurationOptions, TableProperties
logging.basicConfig(level=logging.INFO)
async def main():
options = StreamConfigurationOptions(
record_type=RecordType.JSON,
max_inflight_records=50000,
ack_callback=lambda response: print(
f"Acknowledged offset: {response.durability_ack_up_to_offset}"
)
)
sdk = ZerobusSdk(server_endpoint, workspace_url)
table_properties = TableProperties(table_name)
stream = await sdk.create_stream(client_id, client_secret, table_properties, options)
futures = []
try:
for i in range(100000):
record_dict = {
"device_name": f"sensor-{i % 10}",
"temp": 20 + i % 15,
"humidity": 50 + i % 40
}
json_record = json.dumps(record_dict)
future = await stream.ingest_record(json_record)
futures.append(future)
await stream.flush()
await asyncio.gather(*futures)
finally:
await stream.close()
asyncio.run(main())import logging
from zerobus.sdk.sync import ZerobusSdk
from zerobus.sdk.shared import TableProperties
import record_pb2
logging.basicConfig(level=logging.INFO)
sdk = ZerobusSdk(server_endpoint, workspace_url)
table_properties = TableProperties(table_name, record_pb2.AirQuality.DESCRIPTOR)
stream = sdk.create_stream(client_id, client_secret, table_properties)
try:
for i in range(1000):
record = record_pb2.AirQuality(
device_name=f"sensor-{i}",
temp=20 + i % 15,
humidity=50 + i % 40
)
ack = stream.ingest_record(record)
ack.wait_for_ack() # Optional: Wait for durability confirmation
finally:
stream.close()import asyncio
import logging
from zerobus.sdk.aio import ZerobusSdk
from zerobus.sdk.shared import TableProperties, StreamConfigurationOptions
import record_pb2
logging.basicConfig(level=logging.INFO)
async def main():
options = StreamConfigurationOptions(
max_inflight_records=50000,
ack_callback=lambda response: print(
f"Acknowledged offset: {response.durability_ack_up_to_offset}"
)
)
sdk = ZerobusSdk(server_endpoint, workspace_url)
table_properties = TableProperties(table_name, record_pb2.AirQuality.DESCRIPTOR)
stream = await sdk.create_stream(client_id, client_secret, table_properties, options)
futures = []
try:
for i in range(100000):
record = record_pb2.AirQuality(
device_name=f"sensor-{i % 10}",
temp=20 + i % 15,
humidity=50 + i % 40
)
future = await stream.ingest_record(record)
futures.append(future)
await stream.flush()
await asyncio.gather(*futures)
finally:
await stream.close()
asyncio.run(main())The SDK uses OAuth 2.0 Client Credentials for authentication:
from zerobus.sdk.sync import ZerobusSdk
from zerobus.sdk.shared import TableProperties
import record_pb2
sdk = ZerobusSdk(server_endpoint, workspace_url)
table_properties = TableProperties(table_name, record_pb2.AirQuality.DESCRIPTOR)
# Create stream with OAuth authentication
stream = sdk.create_stream(client_id, client_secret, table_properties)The SDK automatically handles OAuth 2.0 authentication and uses secure TLS connections with system CA certificates by default.
For advanced use cases requiring custom authentication headers or TLS configuration, see the HeadersProvider and TlsConfig sections in the API Reference below.
| Option | Default | Description |
|---|---|---|
record_type |
RecordType.PROTO |
Serialization format: RecordType.PROTO or RecordType.JSON |
max_inflight_records |
50000 | Maximum number of unacknowledged records |
recovery |
True | Enable automatic stream recovery |
recovery_timeout_ms |
15000 | Timeout for recovery operations (ms) |
recovery_backoff_ms |
2000 | Delay between recovery attempts (ms) |
recovery_retries |
3 | Maximum number of recovery attempts |
flush_timeout_ms |
300000 | Timeout for flush operations (ms) |
server_lack_of_ack_timeout_ms |
60000 | Server acknowledgment timeout (ms) |
ack_callback |
None | Callback invoked on record acknowledgment |
from zerobus.sdk.shared import StreamConfigurationOptions
options = StreamConfigurationOptions(
max_inflight_records=10000,
recovery=True,
recovery_timeout_ms=20000,
ack_callback=lambda response: print(
f"Ack: {response.durability_ack_up_to_offset}"
)
)
stream = sdk.create_stream(
client_id,
client_secret,
table_properties,
options
)The SDK raises two types of exceptions:
ZerobusException: Retriable errors (e.g., network issues, temporary server errors)NonRetriableException: Non-retriable errors (e.g., invalid credentials, missing table)
from zerobus.sdk.shared import ZerobusException, NonRetriableException
try:
stream.ingest_record(record)
except NonRetriableException as e:
# Fatal error - do not retry
print(f"Non-retriable error: {e}")
raise
except ZerobusException as e:
# Retriable error - can retry with backoff
print(f"Retriable error: {e}")
# Implement retry logicMain entry point for the SDK.
Synchronous API:
from zerobus.sdk.sync import ZerobusSdk
sdk = ZerobusSdk(server_endpoint, unity_catalog_endpoint)Constructor Parameters:
server_endpoint(str) - The Zerobus gRPC endpoint (e.g.,<workspace-id>.zerobus.<region>.cloud.databricks.comfor AWS, or<workspace-id>.zerobus.<region>.azuredatabricks.netfor Azure)unity_catalog_endpoint(str) - The Unity Catalog endpoint (your workspace URL)
Methods:
def create_stream(
client_id: str,
client_secret: str,
table_properties: TableProperties,
options: StreamConfigurationOptions = None,
tls_config: TlsConfig = None,
headers_provider: HeadersProvider = None
) -> ZerobusStreamCreates a new ingestion stream using OAuth 2.0 Client Credentials authentication.
Parameters:
client_id(str) - OAuth client ID (ignored ifheaders_provideris provided)client_secret(str) - OAuth client secret (ignored ifheaders_provideris provided)table_properties(TableProperties) - Target table configurationoptions(StreamConfigurationOptions) - Stream behavior configuration (optional)tls_config(TlsConfig) - Custom TLS configuration (optional, defaults toSecureTlsConfig)headers_provider(HeadersProvider) - Custom headers provider (optional, defaults to OAuth)
Automatically includes these headers (when using default OAuth):
"authorization": "Bearer <oauth_token>"(fetched via OAuth 2.0 Client Credentials flow)"x-databricks-zerobus-table-name": "<table_name>"
Returns a ZerobusStream instance.
Asynchronous API:
from zerobus.sdk.aio import ZerobusSdk
sdk = ZerobusSdk(server_endpoint, unity_catalog_endpoint)Methods:
async def create_stream(
client_id: str,
client_secret: str,
table_properties: TableProperties,
options: StreamConfigurationOptions = None,
tls_config: TlsConfig = None,
headers_provider: HeadersProvider = None
) -> ZerobusStreamCreates a new ingestion stream using OAuth 2.0 Client Credentials authentication.
Parameters:
client_id(str) - OAuth client ID (ignored ifheaders_provideris provided)client_secret(str) - OAuth client secret (ignored ifheaders_provideris provided)table_properties(TableProperties) - Target table configurationoptions(StreamConfigurationOptions) - Stream behavior configuration (optional)tls_config(TlsConfig) - Custom TLS configuration (optional, defaults toSecureTlsConfig)headers_provider(HeadersProvider) - Custom headers provider (optional, defaults to OAuth)
Automatically includes these headers (when using default OAuth):
"authorization": "Bearer <oauth_token>"(fetched via OAuth 2.0 Client Credentials flow)"x-databricks-zerobus-table-name": "<table_name>"
Returns a ZerobusStream instance.
Represents an active ingestion stream.
Synchronous Methods:
def ingest_record(record: Union[str, bytes, Message]) -> RecordAcknowledgmentIngests a single record. Pass a JSON string (JSON mode) or protobuf message/bytes (protobuf mode). Returns a RecordAcknowledgment for tracking.
def flush() -> NoneFlushes all pending records and waits for server acknowledgment. Does not close the stream.
def close() -> NoneFlushes and closes the stream gracefully. Always call in a finally block.
def get_state() -> StreamStateReturns the current stream state.
@property
def stream_id() -> strReturns the unique stream ID assigned by the server.
Asynchronous Methods:
async def ingest_record(record: Union[str, bytes, Message]) -> AwaitableIngests a single record. Pass a JSON string (JSON mode) or protobuf message/bytes (protobuf mode). Returns an awaitable that completes when the record is durably written.
async def flush() -> NoneFlushes all pending records and waits for server acknowledgment. Does not close the stream.
async def close() -> NoneFlushes and closes the stream gracefully. Always call in a finally block.
def get_state() -> StreamStateReturns the current stream state.
@property
def stream_id() -> strReturns the unique stream ID assigned by the server.
Configuration for the target table.
Constructor:
TableProperties(table_name: str, descriptor: Descriptor = None)Parameters:
table_name(str) - Fully qualified table name (e.g.,catalog.schema.table)descriptor(Descriptor) - Protobuf message descriptor (e.g.,MyMessage.DESCRIPTOR). Required for protobuf mode, not needed for JSON mode.
Examples:
# JSON mode
table_properties = TableProperties("catalog.schema.table")
# Protobuf mode (default)
table_properties = TableProperties("catalog.schema.table", record_pb2.MyMessage.DESCRIPTOR)Abstract base class for providing authentication headers to gRPC streams.
Default: The SDK uses OAuthHeadersProvider internally, which handles OAuth 2.0 Client Credentials authentication automatically when you call create_stream().
Custom Implementation: For advanced use cases, you can implement a custom HeadersProvider by extending the base class and implementing the get_headers() method. Custom providers must include both the authorization and x-databricks-zerobus-table-name headers. See example files for implementation details.
Abstract base class for configuring TLS/SSL settings for gRPC connections.
Default: The SDK uses SecureTlsConfig with system CA certificates automatically when you call create_stream().
Custom Implementation: For advanced use cases (custom CA certificates, mutual TLS, custom cipher suites), you can implement a custom TlsConfig by extending the base class and implementing the to_channel_credentials() method. See example files for implementation details.
Configuration options for stream behavior.
Constructor:
StreamConfigurationOptions(
record_type: RecordType = RecordType.PROTO,
max_inflight_records: int = 50000,
recovery: bool = True,
recovery_timeout_ms: int = 15000,
recovery_backoff_ms: int = 2000,
recovery_retries: int = 3,
flush_timeout_ms: int = 300000,
server_lack_of_ack_timeout_ms: int = 60000,
ack_callback: Callable = None
)Parameters:
record_type(RecordType) - Serialization format:RecordType.PROTO(default) orRecordType.JSONmax_inflight_records(int) - Maximum number of unacknowledged records (default: 50000)recovery(bool) - Enable or disable automatic stream recovery (default: True)recovery_timeout_ms(int) - Recovery operation timeout in milliseconds (default: 15000)recovery_backoff_ms(int) - Delay between recovery attempts in milliseconds (default: 2000)recovery_retries(int) - Maximum number of recovery attempts (default: 3)flush_timeout_ms(int) - Flush operation timeout in milliseconds (default: 300000)server_lack_of_ack_timeout_ms(int) - Server acknowledgment timeout in milliseconds (default: 60000)ack_callback(Callable) - Callback to be invoked when records are acknowledged by the server (default: None)
Future-like object for waiting on acknowledgments.
Methods:
def wait_for_ack(timeout_sec: float = None) -> NoneBlocks until the record is acknowledged or timeout is reached.
def add_done_callback(callback: Callable) -> NoneAdds a callback to be invoked when the record is acknowledged.
def is_done() -> boolReturns True if the record has been acknowledged.
Represents the lifecycle state of a stream.
Values:
UNINITIALIZED- Stream created but not yet initializedOPENED- Stream is open and accepting recordsFLUSHING- Stream is flushing pending recordsRECOVERING- Stream is recovering from a failureCLOSED- Stream has been gracefully closedFAILED- Stream has failed and cannot be recovered
Base exception for retriable errors.
Constructor:
ZerobusException(message: str, cause: Exception = None)Exception for non-retriable errors (extends ZerobusException).
Constructor:
NonRetriableException(message: str, cause: Exception = None)- Reuse SDK instances: Create one
ZerobusSdkinstance per application - Stream lifecycle: Always close streams in a
finallyblock to ensure all records are flushed - Batch size: Adjust
max_inflight_recordsbased on your throughput requirements - Error handling: Implement proper retry logic for retriable errors
- Monitoring: Use
ack_callbackto track ingestion progress - Choose the right API: Use sync API for low-volume, async API for high-volume ingestion
- Token refresh: Tokens are automatically refreshed on stream creation and recovery