Core, AWS, REST: Promote the S3 signing endpoint to the main spec#15112
Core, AWS, REST: Promote the S3 signing endpoint to the main spec#15112adutra wants to merge 17 commits intoapache:mainfrom
Conversation
ad95a85 to
f3fc095
Compare
open-api/rest-catalog-open-api.yaml
Outdated
|
|
||
| If remote signing for a specific storage provider is enabled, clients must respect the following configurations when creating a remote signer client: | ||
| - `signer.uri`: the base URI of the remote signer endpoint. Optional; if absent, defaults to the catalog's base URI. | ||
| - `signer.endpoint`: the path of the remote signer endpoint. Required. Should be concatenated with `signer.uri` to form the complete URI. |
There was a problem hiding this comment.
It's complicated 😄
The signer client impl uses org.apache.iceberg.rest.RESTUtil#resolveEndpoint to perform the concatenation of signer.uri and signer.endpoint.
So, signer.endpoint could also be an absolute URL, in which case, signer.uri would be ignored.
I will try to come up with a better wording.
There was a problem hiding this comment.
Rephrased, lmk what you think!
| allOf: | ||
| - $ref: '#/components/schemas/Expression' | ||
|
|
||
| MultiValuedMap: |
There was a problem hiding this comment.
I believe this is S3Headers eq section in the s3 signer spec ? can we say like ObjectStoreProviderHeader ?
There was a problem hiding this comment.
I went for a more generic name because there is nothing specific to remote signing here. This component could perfectly be used for something else in the spec.
| - `s3.secret-access-key`: secret for credentials that provide access to data in S3 | ||
| - `s3.session-token`: if present, this value should be used for as the session token | ||
| - `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `s3-signer-open-api.yaml` specification | ||
| - `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `RemoteSignRequest` schema section of this spec document. |
There was a problem hiding this comment.
FYI I chose to keep this property specific to S3. I think that even if the signer endpoint is now generic, enablement should be performed for each specific object storage.
There was a problem hiding this comment.
you can actually do google GCS cloud access via its s3 gateway; same signing algorithm, just a few different settings to change listing version, endpoint, &c
aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java
Outdated
Show resolved
Hide resolved
| * @deprecated since 1.11.0, will be removed in 1.12.0; use {@link CatalogProperties#SIGNER_URI} | ||
| * instead. | ||
| */ | ||
| @Deprecated public static final String S3_SIGNER_URI = CatalogProperties.SIGNER_URI; |
There was a problem hiding this comment.
I don't think we can just change the value here as that would break backwards compatibility
| "true", | ||
| CatalogProperties.URI, | ||
| uri, | ||
| CatalogProperties.SIGNER_ENDPOINT, |
There was a problem hiding this comment.
this wasn't needed before but is needed now, which indicates that this is a breaking change for users?
There was a problem hiding this comment.
For now it's not required, but it will become required in a future release (1.12 or later).
There is a check + warning here:
I proactively updated the tests so that they don't break when we make this property required.
|
|
||
| paths: | ||
|
|
||
| /v1/aws/s3/sign: |
There was a problem hiding this comment.
I don't think we would want to remove this spec yet. We should probably first deprecate it
| } | ||
| } | ||
|
|
||
| public static class RemoteSignRequestSerializer<T extends RemoteSignRequest> |
There was a problem hiding this comment.
these all should probably just be package-private and not public
| gen.writeEndArray(); | ||
| } | ||
| gen.writeEndObject(); | ||
| public static void headersToJson( |
There was a problem hiding this comment.
not sure whether we need to make this one and the one below public
| default: | ||
| throw new UnsupportedOperationException("Unsupported grant_type: " + grantType); | ||
| protected void validateSignRequest(RemoteSignRequest request) { | ||
| if ("POST".equalsIgnoreCase(request.method()) && request.uri().getQuery().contains("delete")) { |
There was a problem hiding this comment.
minor : should we use rest constants instead of raw string for "POST" ?
There was a problem hiding this comment.
Sorry I went back to raw strings because org.apache.iceberg.rest.HttpMethod is missing constants for PUT, OPTIONS, etc. which was causing integration tests to fail.
| * @param response the HTTP response to add headers to | ||
| */ | ||
| protected void addSignResponseHeaders(RemoteSignRequest request, HttpServletResponse response) { | ||
| if (request.method().equalsIgnoreCase("GET") || request.method().equalsIgnoreCase("HEAD")) { |
There was a problem hiding this comment.
we previously had this defined in a CACHEABLE_METHODS set, so would be good to keep this for easier readability
aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java
Show resolved
Hide resolved
| } | ||
|
|
||
| /** | ||
| * The base URI of the remote signer endpoint. Optional, defaults to {@linkplain |
There was a problem hiding this comment.
nit: why not just {@link CatalogProperties#URI} here?
core/src/main/java/org/apache/iceberg/rest/requests/RemoteSignRequestParser.java
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/responses/RemoteSignResponseParser.java
Show resolved
Hide resolved
|
|
||
| lint-spec: | ||
| uv run yamllint --strict rest-catalog-open-api.yaml | ||
| uv run yamllint --strict ../aws/src/main/resources/s3-signer-open-api.yaml |
There was a problem hiding this comment.
I think we should still leave this in until we actually remove the openapi spec file
| * @deprecated since 1.11.0, will be removed in 1.12.0; use {@link | ||
| * org.apache.iceberg.rest.requests.RemoteSignRequestParser} instead. | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
I actually liked that you switched the impl here to using functionality from the new RemoteSignRequestParser for stuff, not sure why you decided to change that
There was a problem hiding this comment.
OK 😅 I thought it was a bit too invasive. Let me go back to the previous version.
| * @deprecated since 1.11.0, will be removed in 1.12.0; the serializers for S3 signing are now | ||
| * registered in {@link RESTSerializers}. | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
this class I would probably just deprecate and not switch to using RESTSerializers.registerAll(MAPPER).
nastra
left a comment
There was a problem hiding this comment.
overall LGTM, but left a few minor comments that would be good to address
7ee473b to
90e4eba
Compare
|
Heads up: I had to rebase because of a conflict with fec9800. |
0fcc2da to
ee33e03
Compare
Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7 This commit promotes the S3 remote signing endpoint from an AWS-specific implementation to a first-class REST catalog API endpoint. This enables other storage providers (GCS, Azure, etc.) to eventually reuse the same signing endpoint pattern without duplicating the API definition. OpenAPI Specification changes: - Add `/v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}` endpoint to the main REST catalog OpenAPI spec - Define `RemoteSignRequest`, `RemoteSignResult` and `RemoteSignResponse` schemas - Remove the separate `s3-signer-open-api.yaml` from the AWS module - Update the Python client Core Module changes (iceberg-core): - Add `RemoteSignRequest` and `RemoteSignResponse` model classes, copied from the iceberg-aws module - Add `RemoteSignRequestParser` and `RemoteSignResponseParser` for JSON serialization, copied from the iceberg-aws module - Add `SIGNER_URI` and `SIGNER_ENDPOINT` properties to `CatalogProperties` for configuring the signing endpoint - Add `V1_TABLE_REMOTE_SIGN` field and `remoteSign()` method to `ResourcePaths` - Register the new endpoint in `Endpoint.java` - Add abstract `RemoteSignerServlet` base class for remote signing tests, copied from the iceberg-aws module AWS Module changes (iceberg-aws): - Deprecate `S3SignRequest` and `S3SignResponse` for removal - Deprecate `S3SignRequestParser` and `S3SignResponseParser` for removal - Deprecate `S3ObjectMapper` for removal - Refactor `S3SignerServlet` to extend `RemoteSignerServlet` - Update `S3V4RestSignerClient`
This reverts commit 432ca04.
914bb9b to
852f1cb
Compare
| for requests where the body of the message contains content which must be validated before a request is | ||
| signed, such as the S3 DeleteObjects call. | ||
| provider: | ||
| type: string |
There was a problem hiding this comment.
should we define an ENUM for this, as s3a / s3e .... can create confusion for the catalog, we can say all are s3 ? similar for GCS / Azure check the resolving file IO i believe we do something similar there
There was a problem hiding this comment.
An enum means we need a spec change for every new provider – unless we go ahead an create enums for all major cloud providers?
There was a problem hiding this comment.
unless we go ahead an create enums for all major cloud providers?
thats what we had in mind, like let say s3 / azure / gcp and we can add more in case to case basis, open string still leave room for interpretation in catalog is what i think, but i can be convinced other wise :)
There was a problem hiding this comment.
OK I'm fine with that! I just introduced the enum, the constants are s3, gcs and adls (these names match the naming convention used by ResolvingFileIO).
singhpk234
left a comment
There was a problem hiding this comment.
Spec changes looks good to me.
Added a suggestion for the provider being ENUM
|
@adutra I'm doing some stuff with the signer (#15417 with draft pr and tests in #15428 15428). The s3 signer needs to track which headers are being signed, so when it is safe/unsafe to use the cached signature. new Range? good everwhere. if-modified-since? only with #15428 in. new aws-encryption option: nothing. Either it parses the signature string coming back from the s3 servlet or it gets back a header explicitly listing those headers signed. That seems cleaner, but might need to be part of this spec
Given this PR is explicitly a promotion of the current signing, it'll have to be a followup. It's just at the moment the signing is very brittle |
| - s3 | ||
| - gcs | ||
| - adls | ||
| default: s3 |
There was a problem hiding this comment.
should we maybe omit the default? I also don't see defaulting to s3 anywhere in the java impl
|
The code changes PR is also up for review: #15451. |
Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7
This commit promotes the S3 remote signing endpoint from an AWS-specific implementation to a first-class REST catalog API endpoint.
This enables other storage providers (GCS, Azure, etc.) to eventually reuse the same signing endpoint pattern without duplicating the API definition.
OpenAPI Specification changes:
/v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}endpoint to the main REST catalog OpenAPI specRemoteSignRequest,RemoteSignResultandRemoteSignResponseschemass3-signer-open-api.yamlfrom the AWS moduleCore Module changes (iceberg-core):
RemoteSignRequestandRemoteSignResponsemodel classes, copied from the iceberg-aws moduleRemoteSignRequestParserandRemoteSignResponseParserfor JSON serialization, copied from the iceberg-aws moduleSIGNER_URIandSIGNER_ENDPOINTproperties toCatalogPropertiesfor configuring the signing endpointV1_TABLE_REMOTE_SIGNfield andremoteSign()method toResourcePathsEndpoint.javaRemoteSignerServletbase class for remote signing tests, copied from the iceberg-aws moduleAWS Module changes (iceberg-aws):
S3SignRequestandS3SignResponsefor removalS3SignRequestParserandS3SignResponseParserfor removalS3ObjectMapperfor removalS3SignerServletto extendRemoteSignerServletS3V4RestSignerClient