Skip to content

AWS: S3V4RestSignerClient should strip out more request headers for caching#15428

Open
steveloughran wants to merge 13 commits intoapache:mainfrom
steveloughran:pr/15417-S3SignerServlet
Open

AWS: S3V4RestSignerClient should strip out more request headers for caching#15428
steveloughran wants to merge 13 commits intoapache:mainfrom
steveloughran:pr/15417-S3SignerServlet

Conversation

@steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Feb 24, 2026

Fixes #15417

Expands the set of headers that the S3V4RestSignerClient should ignore when signing to include

  1. all aws sdk headers
  2. classic http header that may be changed in different requests
  3. if-modified/unless-modified conditional headers.

The intent is to reduce the risk of different requests having conflicting headers from any cached signature where the cache key is (verb, url, region), as seen in #15166.

@steveloughran steveloughran marked this pull request as draft February 24, 2026 11:03
@github-actions github-actions bot added the AWS label Feb 24, 2026
@steveloughran
Copy link
Contributor Author

I don't like that the server client needs to guess which headers in a new request are excluded from the sign and hence safe to cache. Change encryption for example and everything blows up with a signing failure.

Either the signer should parse and cache the header list from the signature, or (slightly better) the rest servlet should return that list independently. The signer can then use that header list in the cache information to decide whether to reuse. Simplest strategy: include the list of headers alongside the signature, and if changed, don't use the cache value.

@steveloughran steveloughran changed the title S3SignerServlet should strip out more request headers for caching AWS: S3SignerServlet should strip out more request headers for caching Feb 25, 2026
@steveloughran steveloughran force-pushed the pr/15417-S3SignerServlet branch from c56182e to 7f8cbfb Compare February 25, 2026 14:13
@steveloughran
Copy link
Contributor Author

If the signing string is parsed then the cache should already have everything needed to identify conflicting s3 requests.

@steveloughran steveloughran changed the title AWS: S3SignerServlet should strip out more request headers for caching WiP: AWS: S3SignerServlet should strip out more request headers for caching Mar 2, 2026
@steveloughran
Copy link
Contributor Author

Thinking about how best to do this

  • cached request should only include the signed values; the rest jitter between irrelevant (referrer) to dangerous (if-modified)
  • the list of headers known to be signed can be fixed, (current PR), but that's too brittle.
  • Best to parse the signature, get the list of signed headers from there, and discard the rest.

@steveloughran
Copy link
Contributor Author

FWIW, headers on requests picked up by running some cloudstore commands against s3 london, s3a client set to use sse-kms encryption. The bucket is versioned so on repeated reads the if-match header would be used to declare the version.

HEAD /f

2026-03-03 16:22:07,956 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "HEAD /f HTTP/1.1[\r][\n]"
2026-03-03 16:22:07,956 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "Host: stevel-london.s3.eu-west-2.amazonaws.com[\r][\n]"
2026-03-03 16:22:07,956 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "amz-sdk-invocation-id: a31481a9-7186-008d-b3b3-ea125702f45a[\r][\n]"
2026-03-03 16:22:07,956 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "amz-sdk-request: attempt=1; max=3[\r][\n]"
2026-03-03 16:22:07,956 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "Authorization: AWS4-HMAC-SHA256 Credential=AKI............7/20260303/eu-west-2/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;host;referer;x-amz-content-sha256;x-amz-date, Signature=1dc......b[\r][\n]"
2026-03-03 16:22:07,956 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "Referer: https://audit.example.org/hadoop/1/op_delete/4022a50f-dd44-4915-b6ec-85c6cb7337c5-00000026/?op=op_delete&p1=s3a://stevel-london/f&pr=stevel&ps=0b35c845-dbf9-4a18-9edd-61469b1f3644&cm=StoreDiag&id=4022a50f-dd44-4915-b6ec-85c6cb7337c5-00000026&t0=1&fs=4022a50f-dd44-4915-b6ec-85c6cb7337c5&t1=1&ts=1772554927951[\r][\n]"
2026-03-03 16:22:07,957 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "User-Agent: Hadoop 3.4.3 aws-sdk-java/2.35.4 md/io#sync md/http#Apache ua/2.1 api/S3#2.35.x os/Mac_OS_X#26.3 lang/java#17.0.17 md/OpenJDK_64-Bit_Server_VM#17.0.17+10-LTS md/vendor#Amazon.com_Inc. md/en_GB m/F,G hll/cross-region[\r][\n]"
2026-03-03 16:22:07,957 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "x-amz-content-sha256: UNSIGNED-PAYLOAD[\r][\n]"
2026-03-03 16:22:07,957 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "X-Amz-Date: 20260303T162207Z[\r][\n]"
2026-03-03 16:22:07,957 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "Connection: Keep-Alive[\r][\n]"
2026-03-03 16:22:07,957 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "[\r][\n]"

PUT of directory marker (not relevant to s3file IO). adds x-amz-server-side-encryption x-amz-server-side-encryption-aws-kms-key-id, Content-Length, Content-Type.

2026-03-03 16:22:07,852 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "PUT /f/ HTTP/1.1[\r][\n]"
2026-03-03 16:22:07,852 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "Host: stevel-london.s3.eu-west-2.amazonaws.com[\r][\n]"
2026-03-03 16:22:07,852 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "amz-sdk-invocation-id: 8473b1fd-3959-a715-6236-0ed4ac5e4b21[\r][\n]"
2026-03-03 16:22:07,852 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "amz-sdk-request: attempt=1; max=3[\r][\n]"
2026-03-03 16:22:07,853 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "Authorization: AWS4-HMAC-SHA256 Credential=A...7/./eu-west-2/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;referer;x-amz-content-sha256;x-amz-date;x-amz-server-side-encryption;x-amz-server-side-encryption-aws-kms-key-id, Signature=5...44[\r][\n]"
2026-03-03 16:22:07,853 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "Content-Type: application/x-directory[\r][\n]"
2026-03-03 16:22:07,853 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "Referer: https://audit.example.org/hadoop/1/op_delete/4022a50f-dd44-4915-b6ec-85c6cb7337c5-00000025/?op=op_delete&p1=s3a://stevel-london/f/dir-2cbe3df1-1998-49da-b466-947aae0920f1&pr=stevel&ps=0b35c845-dbf9-4a18-9edd-61469b1f3644&ks=1&cm=StoreDiag&id=4022a50f-dd44-4915-b6ec-85c6cb7337c5-00000025&t0=1&fs=4022a50f-dd44-4915-b6ec-85c6cb7337c5&t1=1&ts=1772554927525[\r][\n]"
2026-03-03 16:22:07,853 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "User-Agent: Hadoop 3.4.3 aws-sdk-java/2.35.4 md/io#sync md/http#Apache ua/2.1 api/S3#2.35.x os/Mac_OS_X#26.3 lang/java#17.0.17 md/OpenJDK_64-Bit_Server_VM#17.0.17+10-LTS md/vendor#Amazon.com_Inc. md/en_GB md/rb#u m/F,G hll/cross-region[\r][\n]"
2026-03-03 16:22:07,853 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "x-amz-content-sha256: UNSIGNED-PAYLOAD[\r][\n]"
2026-03-03 16:22:07,853 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "X-Amz-Date: 20260303T162207Z[\r][\n]"
2026-03-03 16:22:07,853 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "x-amz-server-side-encryption: aws:kms[\r][\n]"
2026-03-03 16:22:07,855 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "x-amz-server-side-encryption-aws-kms-key-id: arn:aws:kms:eu-west-2:152813717728:key/c92f3bc9-ecb1-49d3-a259-9004dbac4443[\r][\n]"
2026-03-03 16:22:07,856 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "Content-Length: 0[\r][\n]"
2026-03-03 16:22:07,856 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "Connection: Keep-Alive[\r][\n]"
2026-03-03 16:22:07,856 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-2 >> "[\r][\n]"

single object DELETE

2026-03-03 16:22:07,667 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(133)) - http-outgoing-2 >> DELETE /f/dir-2cbe3df1-1998-49da-b466-947aae0920f1/ HTTP/1.1
2026-03-03 16:22:07,667 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-2 >> Host: stevel-london.s3.eu-west-2.amazonaws.com
2026-03-03 16:22:07,667 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-2 >> amz-sdk-invocation-id: 92132426-701f-3a48-b96a-cd8bb52ec7a8
2026-03-03 16:22:07,667 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-2 >> amz-sdk-request: attempt=1; max=3
2026-03-03 16:22:07,667 [main] DEBUG http.headers 
2026-03-03 16:22:07,667 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-2 >> Referer: https://audit.example.org/hadoop/1/op_delete/4022a50f-dd44-4915-b6ec-85c6cb7337c5-00000025/?op=op_delete&p1=s3a://stevel-london/f/dir-2cbe3df1-1998-49da-b466-947aae0920f1&pr=stevel&ps=0b35c845-dbf9-4a18-9edd-61469b1f3644&ks=1&cm=StoreDiag&id=4022a50f-dd44-4915-b6ec-85c6cb7337c5-00000025&t0=1&fs=4022a50f-dd44-4915-b6ec-85c6cb7337c5&t1=1&ts=1772554927525
2026-03-03 16:22:07,667 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-2 >> User-Agent: Hadoop 3.4.3 aws-sdk-java/2.35.4 md/io#sync md/http#Apache ua/2.1 api/S3#2.35.x os/Mac_OS_X#26.3 lang/java#17.0.17 md/OpenJDK_64-Bit_Server_VM#17.0.17+10-LTS md/vendor#Amazon.com_Inc. md/en_GB m/F,G hll/cross-region
2026-03-03 16:22:07,667 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-2 >> x-amz-content-sha256: UNSIGNED-PAYLOAD
2026-03-03 16:22:07,667 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-2 >> X-Amz-Date: 20260303T162207Z
2026-03-03 16:22:07,667 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-2 >> Connection: Keep-Alive

initiate multipart PUT

2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "POST /multipart?uploads HTTP/1.1[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Host: stevel-london.s3.eu-west-2.amazonaws.com[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "amz-sdk-invocation-id: 713e0c15-ffa7-b7c8-cdc3-91644ce93434[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "amz-sdk-request: attempt=1; max=3[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Authorization: AWS4-HMAC-SHA256 Credential=AKIASHFDIJDQPVOFIO47/20260303/eu-west-2/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;referer;x-amz-content-sha256;x-amz-date;x-amz-meta-headername;x-amz-server-side-encryption;x-amz-server-side-encryption-aws-kms-key-id, Signature=682a8b56055d50b2832ab72f911e0975b4b15b49324ac94dd3e0e2beaa27ff85[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Content-Type: binary/octet-stream[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Referer: https://audit.example.org/hadoop/1/op_createfile/16b1c6a1-d5a4-4a6e-849a-21eef332bd6d-00000005/?op=op_createfile&p1=multipart&pr=stevel&ps=0565a125-fb37-4a3d-9115-729515d614fb&cm=Put&id=16b1c6a1-d5a4-4a6e-849a-21eef332bd6d-00000005&t0=1&fs=16b1c6a1-d5a4-4a6e-849a-21eef332bd6d&t1=1&ts=1772556180848[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "User-Agent: Hadoop 3.5.0 aws-sdk-java/2.35.4 md/io#sync md/http#Apache ua/2.1 api/S3#2.35.x os/Mac_OS_X#26.3 lang/java#17.0.17 md/OpenJDK_64-Bit_Server_VM#17.0.17+10-LTS md/vendor#Amazon.com_Inc. md/en_GB m/F,G hll/cross-region[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "x-amz-content-sha256: UNSIGNED-PAYLOAD[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "X-Amz-Date: 20260303T164301Z[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "x-amz-meta-headername: value[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "x-amz-server-side-encryption: aws:kms[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "x-amz-server-side-encryption-aws-kms-key-id: arn:aws:kms:eu-west-2:152813717728:key/c92f3bc9-ecb1-49d3-a259-9004dbac4443[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Content-Length: 0[\r][\n]"
2026-03-03 16:43:01,615 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]"

the x-amz-meta-headername is because "headername" is an explicit custom header.

part

2026-03-03 16:43:01,807 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "PUT /multipart?partNumber=1&uploadId=1YjxcUi4oeYfKetmEQ5BKuHvy0lJuHVl.2s3N3v97RkzUxXbB4j0KZJXcJlQjTCY8qnAJeFKey2wpuwOwcEWCIf9Mg6srz2TddP.w5KoFTIWuPZrodVDycOz0xHeUysj HTTP/1.1[\r][\n]"
2026-03-03 16:43:01,808 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Host: stevel-london.s3.eu-west-2.amazonaws.com[\r][\n]"
2026-03-03 16:43:01,808 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "amz-sdk-invocation-id: 20b5c96e-dc4c-1528-cf05-0643832e8fc0[\r][\n]"
2026-03-03 16:43:01,808 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "amz-sdk-request: attempt=1; max=3[\r][\n]"
2026-03-03 16:43:01,808 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Authorization: AWS4-HMAC-SHA256 Credential=st, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;referer;x-amz-content-sha256;x-amz-date, Signature=2...2[\r][\n]"
2026-03-03 16:43:01,808 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Content-Type: application/octet-stream[\r][\n]"
2026-03-03 16:43:01,808 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Expect: 100-continue[\r][\n]"
2026-03-03 16:43:01,808 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Referer: https://audit.example.org/hadoop/1/op_createfile/16b1c6a1-d5a4-4a6e-849a-21eef332bd6d-00000005/?op=op_createfile&p1=multipart&pr=stevel&ps=0565a125-fb37-4a3d-9115-729515d614fb&cm=Put&id=16b1c6a1-d5a4-4a6e-849a-21eef332bd6d-00000005&t0=1&fs=16b1c6a1-d5a4-4a6e-849a-21eef332bd6d&t1=24&ts=1772556180848[\r][\n]"
2026-03-03 16:43:01,810 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "User-Agent: Hadoop 3.5.0 aws-sdk-java/2.35.4 md/io#sync md/http#Apache ua/2.1 api/S3#2.35.x os/Mac_OS_X#26.3 lang/java#17.0.17 md/OpenJDK_64-Bit_Server_VM#17.0.17+10-LTS md/vendor#Amazon.com_Inc. md/en_GB md/rb#u m/F,G hll/cross-region[\r][\n]"
2026-03-03 16:43:01,810 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "x-amz-content-sha256: UNSIGNED-PAYLOAD[\r][\n]"
2026-03-03 16:43:01,810 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "X-Amz-Date: 20260303T164301Z[\r][\n]"
2026-03-03 16:43:01,810 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Content-Length: 9201[\r][\n]"
2026-03-03 16:43:01,810 [s3a-transfer-stevel-london-bounded-pool1-t1] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]"

adds Expect: 100-continue, but like Connection: Keep-Alive this is being put out on the wire after signing has taken place. The signer doesn't see it, so it's not relevant.

completion of MPU

2026-03-03 16:43:01,932 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(133)) - http-outgoing-0 >> POST /multipart?uploadId=1YjxcUi4oeYfKetmEQ5BKuHvy0lJuHVl.2s3N3v97RkzUxXbB4j0KZJXcJlQjTCY8qnAJeFKey2wpuwOwcEWCIf9Mg6srz2TddP.w5KoFTIWuPZrodVDycOz0xHeUysj HTTP/1.1
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Host: stevel-london.s3.eu-west-2.amazonaws.com
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> amz-sdk-invocation-id: 76067516-d2b1-644b-65bd-d6184521df63
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> amz-sdk-request: attempt=1; max=3
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Authorization: AWS4-HMAC-SHA256 Credential=AK3/eu-west-2/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;referer;x-amz-content-sha256;x-amz-date, Signature=44
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Content-Type: application/xml
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Referer: https://audit.example.org/hadoop/1/op_createfile/16b1c6a1-d5a4-4a6e-849a-21eef332bd6d-00000005/?op=op_createfile&p1=multipart&pr=stevel&ps=0565a125-fb37-4a3d-9115-729515d614fb&cm=Put&id=16b1c6a1-d5a4-4a6e-849a-21eef332bd6d-00000005&t0=1&fs=16b1c6a1-d5a4-4a6e-849a-21eef332bd6d&t1=1&ts=1772556180848
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> User-Agent: Hadoop 3.5.0 aws-sdk-java/2.35.4 md/io#sync md/http#Apache ua/2.1 api/S3#2.35.x os/Mac_OS_X#26.3 lang/java#17.0.17 md/OpenJDK_64-Bit_Server_VM#17.0.17+10-LTS md/vendor#Amazon.com_Inc. md/en_GB m/F,G hll/cross-region
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> x-amz-content-sha256: UNSIGNED-PAYLOAD
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> X-Amz-Date: 20260303T164301Z
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Content-Length: 233
2026-03-03 16:43:01,933 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Connection: Keep-Alive
2026-03-03 16:43:01,933 [main] DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-0 >> "POST /multipart?uploadId=1YjxcUi4oeYfKetmEQ5BKuHvy0lJuHVl.2s3N3v97RkzUxX

no new headers.

single file PUT

this with with options

fs.s3a.performance.flags=*
fs.s3a.create.conditional.enabled=true

But I think as the put command always overwrites, we don't get delayed if-overwrite checks.

2026-03-03 16:58:27,830 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(133)) - http-outgoing-0 >> PUT /oneline HTTP/1.1
2026-03-03 16:58:27,830 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Host: stevel-london.s3.eu-west-2.amazonaws.com
2026-03-03 16:58:27,830 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> amz-sdk-invocation-id: 8445eba8-0eb8-9588-0cec-1c54aa726cf5
2026-03-03 16:58:27,830 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> amz-sdk-request: attempt=1; max=3
2026-03-03 16:58:27,830 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Authorization: AWS4-HMAC-SHA256 Credential=st, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;referer;x-amz-content-sha256;x-amz-date;x-amz-server-side-encryption;x-amz-server-side-encryption-aws-kms-key-id, Signature=51
2026-03-03 16:58:27,830 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Content-Type: application/octet-stream
2026-03-03 16:58:27,830 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Expect: 100-continue
2026-03-03 16:58:27,830 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Referer: https://audit.example.org/hadoop/1/op_createfile/eca05bec-6b3a-4f47-90e4-d4b23028134a-00000005/?op=op_createfile&p1=oneline&pr=stevel&ps=3d5c8edb-b0da-424f-b19e-ffb6be9bebde&cm=Put&id=eca05bec-6b3a-4f47-90e4-d4b23028134a-00000005&t0=1&fs=eca05bec-6b3a-4f47-90e4-d4b23028134a&t1=1&ts=1772557106981
2026-03-03 16:58:27,831 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> User-Agent: Hadoop 3.5.0 aws-sdk-java/2.35.4 md/io#sync md/http#Apache ua/2.1 api/S3#2.35.x os/Mac_OS_X#26.3 lang/java#17.0.17 md/OpenJDK_64-Bit_Server_VM#17.0.17+10-LTS md/vendor#Amazon.com_Inc. md/en_GB md/rb#u m/F,G hll/cross-region
2026-03-03 16:58:27,831 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> x-amz-content-sha256: UNSIGNED-PAYLOAD
2026-03-03 16:58:27,831 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> X-Amz-Date: 20260303T165827Z
2026-03-03 16:58:27,831 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> x-amz-server-side-encryption: aws:kms
2026-03-03 16:58:27,831 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> x-amz-server-side-encryption-aws-kms-key-id: arn:aws:kms:eu-west-2:152813717728:key/c92f3bc9-ecb1-49d3-a259-9004dbac4443
2026-03-03 16:58:27,831 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Content-Length: 5
2026-03-03 16:58:27,831 [main] DEBUG http.headers (LoggingManagedHttpClientConnection.java:onRequestSubmitted(136)) - http-outgoing-0 >> Connection: Keep-Alive

@steveloughran
Copy link
Contributor Author

FWIW there's also an x-amz-te which comes with put requests. Ignoring those as it is GET and HEAD which are cacheable.

@steveloughran steveloughran marked this pull request as ready for review March 4, 2026 18:28
@steveloughran steveloughran changed the title WiP: AWS: S3SignerServlet should strip out more request headers for caching AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching Mar 4, 2026
@steveloughran
Copy link
Contributor Author

steveloughran commented Mar 5, 2026

And a put to S3 express with s3 sessions enabled adds x-amz-session-token, which is the key mechanism for permission performance with s3 express. If set, that MUST be retained for performance. It does not change from request to request.

"PUT /oneline HTTP/1.1[\r][\n]"
"Host: stevel--usw2-az1--x-s3.s3express-usw2-az1.us-west-2.amazonaws.com[\r][\n]"
"amz-sdk-invocation-id: 805794ee-7ae6-b968-d406-7288c2c9f9e0[\r][\n]"
"amz-sdk-request: attempt=1; max=3[\r][\n]"
"Authorization: AWS4-HMAC-SHA256 Credential=3CQPBE3VTCSR3XYY7GTRGQNSYI/20260305/us-west-2/s3express/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;referer;x-amz-content-sha256;x-amz-date;x-amz-s3session-token, Signature=9a5c2c8769447555b232f388b77c9d0e70e2133ab9e7da6c8ef47957b783d020[\r][\n]"
"Content-Type: application/octet-stream[\r][\n]"
"Expect: 100-continue[\r][\n]"
"Referer: https://audit.example.org/hadoop/1/op_createfile/8dcd2c60-f463-4b76-9910-a5af09f3454a-00000005/?op=op_createfile&p1=oneline&pr=stevel&ps=bac2e4e3-7d13-4b40-8b16-bfe1cb0ffbb9&cm=Put&id=8dcd2c60-f463-4b76-9910-a5af09f3454a-00000005&t0=1&fs=8dcd2c60-f463-4b76-9910-a5af09f3454a&t1=1&ts=1772723079703[\r][\n]"
"User-Agent: Hadoop 3.5.0 aws-sdk-java/2.35.4 md/io#sync md/http#Apache ua/2.1 api/S3#2.35.x os/Mac_OS_X#26.3 lang/java#17.0.17 md/OpenJDK_64-Bit_Server_VM#17.0.17+10-LTS md/vendor#Amazon.com_Inc. md/en_GB md/rb#u m/F,G hll/cross-region[\r][\n]"
"x-amz-content-sha256: UNSIGNED-PAYLOAD[\r][\n]"
"X-Amz-Date: 20260305T150441Z[\r][\n]"
"x-amz-s3session-token: AwAAAAMAAAATx5El1XCwcU+jKOJF+FJAL/5VuUPb08Ah4fHUBAAAAAAAAACKPc+fkgMAAAAAAADFX8QAa/tUHq6h5W/p1/pdbCq0rdsCIj84DVh8gByA65vdFvMC6heyI2oLJ23QUX0UVAo9udlJuBLOoeMwr4DGV6MkRsVNk0sozghkZHE2DR/QMXTqszqvWmqEISeJLrfwJfWzJllIoLussagjAySaJ+refJcliXoiUYhYPr1xNdWVw8TkrMUS+7k0Vb8xbkYowDRWX1esRjrKBT/v820SGpLGg0wvPzi3kmwYmxa0ycGbtScdhQ5PPDipZxK/OfRVapiCNtVOvfZVZ0zmuEIpmdO3BxpJsTvPEfBRWsrPWt65kBD6vE9VQYUBpTiPZQmHhU2l+b+U4n2AxrwhpTNFtzJqVm7OvpDf9qx0Sf/IJul3/cYU8EsPUzao3IkHZ9V4jyLDw3VzSmABJ9rnFgXIz9l7cZF7TN/gI0pIZukUPLz0H/DyP86iUnMPAOLh20I4pBYOT4qPn4VdBEjvdLK4LhppBEKk93/9+yrAi4brUDR/oFUvlAUknQ3CbNZOc7ALyj6hNe9p8AzLf3FQX8EJ+z2axhTavSvaHV0MkrfJSdAZx9CQfxIWMLIeqYNDeKHH8Mrk9Dliz2Parr0uuDp6itww0NbdHh2xiAEKRFku4YY4b8/HoD82kgimLNrU5mH8Qzs1ktHWMOvniu1/lHsgGzfhg0ixllv6Y2cMq7f3o2nsyvYLZs+5nkTPa7TTBuKC/9nif70CQ2aJDWWuBaL6apNfoZI1cQsrvtdUI5355ikF+r54Xd/3GHjKBTj0ad63Lq/scNaaSja+E+n8ZlKcZWnKYdvKn7rcnKtBYSEg9h1K+uhUJJUkhmtqxHQAKvh4w1BMZULeiDEC67IKS1hBBg7Eq3Z7wopyiNLfwn8QZ7qA1VCLRl+DPAdVxV/Qw+m8g6bNefsvixOoJw4KC/INLpk1q3Vu5SFvXz6GbJgNazWIXk56CaJiWRxDA2U3+7w3Y/Lx06pskf3yoVkW9+zURrbbx6ka5Q5Zksvf84C9H7LPbhGowGbZ24oNnvtR/cCQJsIQQGyrCLVpKyBPG3loXmVlxSDoDZtNTcnCGNRKhxuzt9dHNeZj8VpQGnpDsBR13VKw8bOkIl6ZKHJEvCB+ZfLgBdIddqgC9LKmt8i3QWfaXtePKVCfcPnOg8VJerAmv/7/XRecTmAi6DHWp/AhZJvXI06HcNOgnvp9vjnCFRbIG1dn26tz8KNq55QLdOBsRA==[\r][\n]"
"Content-Length: 5[\r][\n]"
"Connection: Keep-Alive[\r][\n]"
"[\r][\n]"

@danielcweeks
Copy link
Contributor

Just cross-referencing my comment from the 15171.

I think we're leaning too much on the client to handle the complexity. This should be catalog responsibility.

@steveloughran
Copy link
Contributor Author

steveloughran commented Mar 5, 2026

@danielcweeks look at the top to my alternative proposal; I've discussed this with @adutra and will implement it. In fact if you look at 777cbcf you can see that this was part of my original design: the auth header spitting exists, absent any tests.

  1. server does all filtering, makes cache/no-cache decision
  2. client uses key for the request, but caches only the signed headers.
  3. next request, client does lookup as present, but compares signed headers and considers it a cache miss if those headers don't match what it wants to send.
  4. If a hit, it adds its unsigned headers to the request.

Outcome

  1. server is in charge of what to sign
  2. client never sends requests with invalid signature
  3. there's never unintentional retention of an optional header which affects the new request (example, a range header in cache when the new request doesn't have one).

Does this make sense? Full control in server, with the client adapting to its choices. Polaris can choose what to sign, clients can choose various checksum options, and the only consequence is an increase in cache misses if the signing service signs more headers than before.

}
}
if (headerEnum == null) {
LOG.warn("No signed headers found in response: {}", response);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not print the signature

A bit concerned that the cache is static so spans all signers; for
s3a I'd like a prefix in the cache with every fs instance having
a different prefix.
This allows for different encryption settings per instance
untested and not-wired-up code to enumerateSignedHeaders() from an auth response.
----
AWS: Fix S3V4RestSignerClient cache key to include all request components

    The cache key for signed responses only included method, region, and URI, but not headers like `x-amz-content-sha256` that are part of the signature. This caused 403 errors when different content was uploaded to the same URI within the cache TTL.

    This fix uses the full `S3SignRequest` as the cache key. This is the only 100% safe option because we cannot know which headers the server will sign and which ones it will ignore; any header included in the signature *must* be part of the cache key.

    **This change reduces cache efficiency**; but that's the price to pay for correctness.
----

However, this version retains the SignedComponent design.

What is different is: headers we consider acceptable to not-sign are not attached to the signing request,
hence not to the cached entry.

This means that
* no need to worry about if the signer will/will-not sign those values
* provides the correctness adutra needed: if a signed header changes, the cache will not retrieve a signature
  which will be rejected by the S3 endpoint
Add assertions in TestS3RestSigner to verify cache hit/misses
as as expected.
@steveloughran steveloughran force-pushed the pr/15417-S3SignerServlet branch from 2d4911a to b0691a5 Compare March 5, 2026 20:28
@danielcweeks
Copy link
Contributor

danielcweeks commented Mar 5, 2026

Does this make sense? Full control in server, with the client adapting to its choices. Polaris can choose what to sign, clients can choose various checksum options, and the only consequence is an increase in cache misses if the signing service signs more headers than before.

This just feels like we're overcomplicating the actual use case and how this is beneficial to the client and the service.

The only two requests that should be cached are HEAD and GET. HEAD requests shouldn't really have any difference in the request. GET requests are reused to reduce the sign request rate for range based queries when reading parquet. Each request is isolated to a single path and there's very little reuse beyond what happens within the context of a single file read action.

What led to this line of exploration appears to stem from trying to implement PUT or other methods, which should not be cached. A server shouldn't do that because it's not secure for it to omit critical headers like checksum or size from the signature when creating/overwriting objects. That also renders those types of operations un-cacheable, so they shouldn't be considered here.

Edit:

Just to add one more thing. I'm actually quite supportive of improving this in meaningful ways. I just don't think it's necessary to overly complicate this, so maybe there's a middle ground like having the client only scope to GET and HEAD and either providing guidance on what headers should be signed or not. I'm open to reviewing what you're proposing but strongly urge to focus on simplicity and not complicating the protocol.

We've had multiple versions of production systems running securely with the existing functionality for years, so I'm very hesitant to say there's something fundamentally wrong with the current approach. Changes should be grounded in real observable problems.

assertCacheHitsAndMisses(0, 1);

// the etag is passed in: the same object is returned and the same cached signature is retained.
// if the ifMatch header was cached, this would have resulted in a failure as there would
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if the ifMatch header was cached, this would have resulted in a failure as there would
// if the ifMatch header was signed, this would have resulted in a failure as there would


@Test
public void validatePutObject() {
int hits = S3V4RestSignerClient.cacheHits();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int hits = S3V4RestSignerClient.cacheHits();

Comment on lines +265 to +266
int hits = S3V4RestSignerClient.cacheHits();
int misses = S3V4RestSignerClient.cacheMisses();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int hits = S3V4RestSignerClient.cacheHits();
int misses = S3V4RestSignerClient.cacheMisses();

s3.listObjectsV2(ListObjectsV2Request.builder().bucket(BUCKET).prefix("some/prefix/").build());
// list is a GET.
assertCacheHitsAndMisses(1, 1);
assertCacheHitsAndMisses(1, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertCacheHitsAndMisses(1, 1);

static final Cache<S3SignRequest, SignedComponent> SIGNED_COMPONENT_CACHE =
Caffeine.newBuilder().expireAfterWrite(30, TimeUnit.SECONDS).maximumSize(100).build();

private static final AtomicInteger CACHE_HITS = new AtomicInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use Caffeine's .recordStats() instead? And ideally, we would call .recordStats() only when running tests.

Copy link
Contributor Author

@steveloughran steveloughran Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happily. though I was thinking of another stat "entry in cache but header mismatch" as that's also of interest.

AwsS3V4SignerParams signerParams =
extractSignerParams(AwsS3V4SignerParams.builder(), executionAttributes).build();

// Strip-off headers that should be signed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Strip-off headers that should be signed
// Strip-off headers that shouldn't be signed

@adutra
Copy link
Contributor

adutra commented Mar 6, 2026

The only two requests that should be cached are HEAD and GET.

But this rule isn't enforced. Nothing in the specs prevents a server from sending a Cache-Control: private header for other methods. And by doing so, it would break the client. I'm not saying it makes sense to do so, I'm saying that it's not fair for a server to break the client so easily. If the client's cache is designed to only handle these 2 methods and nothing else, I think the client should make sure to filter out other methods. It seems a bit pointless to me to require from servers to send the Cache-Control header when the client already knows what requests it can and cannot cache.

What led to this line of exploration appears to stem from trying to implement PUT or other methods, which should not be cached.

Yes but in fact, the most problematic scenario for me is a GET request with a range header. If a server decides to sign the range header (which is imho totally valid), the client would break. The prevailing philosophy is that "the server decides what to to sign," but in reality, the server's control appears limited due to potential client-side cache issues. Again, it appears to me that, if the client already knows that it would break if the server signs some header, it's best for the client to proactively remove that header from the request to sign.

But as I said, having now a good understanding of the limitations, I've already adapted Polaris to adhere to the implicit constraints. I'm leaving the ongoing work here for @steveloughran to continue, and will close #15171.

@steveloughran
Copy link
Contributor Author

@adutra @danielcweeks I've gone back to the servlet-only decision about what to sign, but with an expanded set of headers: more SDK, more "aws-" service and some of the classic http protocol fluff. That's all I care about, as it's enough to reduce the risk that slightly different GET requests will match in the cache but be inconsistently signed.

Alex's tests in TestS3V4RestSignerClient are retained, as are my ones in TestS3RestSigner, which expands validation of multipart upload completion and a few more opeations. It doesn't make any assertions on cache hit/miss though, due to the roll-back of S3V4RestSignerClient, instead it is just making sure that the operations work at all.

I'm personally happy with just the expanded set of stripped headers, which this iteration provides.

@danielcweeks
Copy link
Contributor

danielcweeks commented Mar 6, 2026

The only two requests that should be cached are HEAD and GET.

But this rule isn't enforced. Nothing in the specs prevents a server from sending a Cache-Control: private header for other methods. And by doing so, it would break the client. I'm not saying it makes sense to do so, I'm saying that it's not fair for a server to break the client so easily. If the client's cache is designed to only handle these 2 methods and nothing else, I think the client should make sure to filter out other methods. It seems a bit pointless to me to require from servers to send the Cache-Control header when the client already knows what requests it can and cannot cache.

I believe the default is that the client doesn't cache unless told to do so, which makes caching a server responsibility. While it might make sense to limit to just the two methods we expect, it should be the client's responsibility to fix a bad server implementation. Yes, the client would break, but it's really the server that needs to be fixed.

Yes but in fact, the most problematic scenario for me is a GET request with a range header. If a server decides to sign the range header (which is imho totally valid), the client would break. The prevailing philosophy is that "the server decides what to to sign," but in reality, the server's control appears limited due to potential client-side cache issues. Again, it appears to me that, if the client already knows that it would break if the server signs some header, it's best for the client to proactively remove that header from the request to sign.

I think that's putting to much control in the client's hands and limits what functionality the server has in deciding what to sign for. If a client "hides" the range header, a server would only have the option to sign for everything or nothing. While in practice, I don't know of any implementation is protecting ranges of files, it is entirely feasible and since it's the servers responsibility to protect the data, it should have the final say on what it allows to be read.

@steveloughran steveloughran changed the title AWS: S3V4RestSignerClient and S3SignerServlet should strip out more request headers for caching AWS: S3V4RestSignerClient should strip out more request headers for caching Mar 8, 2026
@steveloughran
Copy link
Contributor Author

@danielcweeks it'd be hard to read a parquet file with a signing service restricting range access, not least because that service would need to be aware of the layout of the specific .par files. Restricting avro access is possible. But if the service was signing ranges, then it'd have to decide what to do if the range wasn't specified, as otherwise a client would just bypass any restrictions by not passing in the range.

Presumably the service would have to reject the request as "incomplete headers"

Which is out of scope of the current PR that simply expands the set of ignored headers.

It might be useful to add as a response to the API specification though, somehow.

@steveloughran
Copy link
Contributor Author

I should add that's where the discussion on parquet dev about having binary files outside the file itself gets interesting. That's mainly driven by the need to manage and ingest large scale BLOBs, but it'd also permit a catalog which was aware of the external files to restrict access to specific users.

I hadn't thought of the security aspects of it.

@steveloughran
Copy link
Contributor Author

@danielcweeks @adutra this is ready for review; it's so minimal that it shouldn't be controversial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AWS: S3SignerServlet should strip out more request headers for caching

3 participants