Skip to content

Conversation

@singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented Dec 6, 2025

About the change

Presently since not all the location properties are stored in the entity one has to do complete loadTable table even for existing use cases such as credential endpoint which in present scenario since polaris doesn't store the complete metadata copy inside the persistence is kind of expensive.

With this we will have a way to do cred vending, in future (to remote sign) without going to object store.
Note if we take dependency on this we would have to think of backfill but for step 1 it seems reasonable step, would love to know what other folks think

catalog.loadTableWithAccessDelegation(
tableIdentifier,
"all",
Optional.of(new PolarisResourcePaths(prefix).credentialsPath(tableIdentifier)));
return Response.ok(
ImmutableLoadCredentialsResponse.builder()
.credentials(loadTableResponse.credentials())
.build())
.build();
});

co-author : @adutra

Checklist

  • 🛡️ Don't disclose security issues! (contact [email protected])
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Copy link
Contributor

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a reasonable change to me - I don't see any reason why storing these storage locations is a hefty and/or unreasonable addition to the entities. Happy to switch my opinion if anyone has valid concerns regarding the addition of these locations.

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change raises a question:
Does Polaris actively support setting these properties at all?
Polaris performs a lot of effort to ensure that table/view/namespace locations do not overlap and conflict with each other.
Seeing this change, I realize that neither of these two properties is subject to the same location-overlap or location-validity checks.

I think this change deserves a broader dev mailing list discussion about these two properties in general, mentioning that these properties (and their deprecated, but still evaluated older properties) can break both location validity and overlap checks and assumptions.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff in this PR LGTM in isolation.

However, making use of the new entity properties can be tricky as Polaris code will have to deal with old entities (on upgrade), which may not have the new properties.

So all in all, having a dev discussion could be useful, indeed.

@adutra
Copy link
Contributor

adutra commented Dec 8, 2025

I might be wrong but it seems the locations are taken into account for overlap in BasePolarisTableOperations#doCommit:

Set<String> dataLocations =
StorageUtil.getLocationsUsedByTable(metadata.location(), metadata.properties());
CatalogUtils.validateLocationsForTableLike(
realmConfig, tableIdentifier, dataLocations, resolvedStorageEntity);
// also validate that the table location doesn't overlap an existing table
dataLocations.forEach(
location ->
validateNoLocationOverlap(

The problem is that later on in that same method, the internal properties map is computed:

Map<String, String> storedProperties = buildTableMetadataPropertiesMap(metadata);

But the map currently doesn't include write.data.path or write.metadata.path:

private static Map<String, String> buildTableMetadataPropertiesMap(TableMetadata metadata) {
Map<String, String> storedProperties = new HashMap<>();
// Location specific properties
storedProperties.put(IcebergTableLikeEntity.LOCATION, metadata.location());
if (metadata.properties().containsKey(TableProperties.WRITE_DATA_LOCATION)) {
storedProperties.put(
IcebergTableLikeEntity.USER_SPECIFIED_WRITE_DATA_LOCATION_KEY,
metadata.properties().get(TableProperties.WRITE_DATA_LOCATION));
}
if (metadata.properties().containsKey(TableProperties.WRITE_METADATA_LOCATION)) {
storedProperties.put(
IcebergTableLikeEntity.USER_SPECIFIED_WRITE_METADATA_LOCATION_KEY,
metadata.properties().get(TableProperties.WRITE_METADATA_LOCATION));
}
storedProperties.put(
IcebergTableLikeEntity.FORMAT_VERSION, String.valueOf(metadata.formatVersion()));
storedProperties.put(IcebergTableLikeEntity.TABLE_UUID, metadata.uuid());
storedProperties.put(
IcebergTableLikeEntity.CURRENT_SCHEMA_ID, String.valueOf(metadata.currentSchemaId()));
if (metadata.currentSnapshot() != null) {
storedProperties.put(
IcebergTableLikeEntity.CURRENT_SNAPSHOT_ID,
String.valueOf(metadata.currentSnapshot().snapshotId()));
}
storedProperties.put(
IcebergTableLikeEntity.LAST_COLUMN_ID, String.valueOf(metadata.lastColumnId()));
storedProperties.put(IcebergTableLikeEntity.NEXT_ROW_ID, String.valueOf(metadata.nextRowId()));
storedProperties.put(
IcebergTableLikeEntity.LAST_SEQUENCE_NUMBER, String.valueOf(metadata.lastSequenceNumber()));
storedProperties.put(
IcebergTableLikeEntity.LAST_UPDATED_MILLIS, String.valueOf(metadata.lastUpdatedMillis()));
if (metadata.sortOrder() != null) {
storedProperties.put(
IcebergTableLikeEntity.DEFAULT_SORT_ORDER_ID,
String.valueOf(metadata.defaultSortOrderId()));
}
if (metadata.spec() != null) {
storedProperties.put(
IcebergTableLikeEntity.DEFAULT_SPEC_ID, String.valueOf(metadata.defaultSpecId()));
storedProperties.put(
IcebergTableLikeEntity.LAST_PARTITION_ID,
String.valueOf(metadata.lastAssignedPartitionId()));
}
return storedProperties;
}

So, the only way to check those locations is to load the table from the object store.

@adutra
Copy link
Contributor

adutra commented Dec 8, 2025

However, making use of the new entity properties can be tricky as Polaris code will have to deal with old entities (on upgrade), which may not have the new properties.

Correct – code would have to be lenient to the properties being absent. For request signing at least, the idea would be to load the table if the properties are absent, then update the entity, so that next time the properties are there.

@singhpk234
Copy link
Contributor Author

singhpk234 commented Dec 9, 2025

So, the only way to check those locations is to load the table from the object store.

precisely, the idea is to have location props in sync with what is present in the object store, to avoid redundant call as presently the loadCreds is doing unneccessary call to load from objectstore.
I see this PR atleast (which is first step to what i wanna achieve) as an extension to Mike's recent PR my POV

when we plan to use it we need to handle backfill, Here is the discuss in Alex on the same

Re: opening mail list thread, sure happy to facilitate the discussion over MT on how to handle back fill, me and Alex had some idea which Alex mentioned, on this context but defintely more inputs are certainly welcomed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants