Skip to content

feat(arrow-avro): Configurable Arrow timezone ID for Avro timestamps#9280

Open
mzabaluev wants to merge 15 commits intoapache:mainfrom
mzabaluev:configurable-utc-tz-id
Open

feat(arrow-avro): Configurable Arrow timezone ID for Avro timestamps#9280
mzabaluev wants to merge 15 commits intoapache:mainfrom
mzabaluev:configurable-utc-tz-id

Conversation

@mzabaluev
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Enable an alternative representation of UTC timestamp data types with the "UTC" timezone ID, which is useful for interoperability with applications preferring that form.

What changes are included in this PR?

In the ReaderBuilder API, add a new method with_tz that allows users to specify the timezone ID for Avro logical types that represent UTC timestamps. The choices are between "+00:00" and "UTC" and can be selected by the new Tz enumeration.

Are these changes tested?

Added unit tests to verify the representation with different Tz parameter values.

Are there any user-facing changes?

A new with_tz method is added to arrow_avro::reader::Builder.

In the ReaderBuilder API, add a new method `with_tz` that allows users
to specify the timezone ID for Avro logical types that represent
UTC timestamps. The choices are between "+00:00" and "UTC"
and can be selected by the new Tz enumeration.
@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Jan 27, 2026
@EmilyMatt
Copy link
Contributor

This makes more sense in my opinion than using "+00:00" exclusively

Copy link
Contributor

@jecsand838 jecsand838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzabaluev LMGTM. Really like this improvement to arrow-avro!

For me the big things are:

  1. An e2e test in reader/mod.rs using the new with_tz.
  2. Better documentation on the public with_tz method.

Comment on lines +1170 to +1175
/// Sets the timezone representation for Avro timestamp fields.
pub fn with_tz(mut self, tz: Tz) -> Self {
self.use_tz = tz;
self
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding an e2e test using this?

Also may help to explain in the doc comment what the default value is.

Comment on lines +990 to +998
Codec::TimestampMillis(Some(Tz::OffsetZero)) => Self::TimestampMillisUtc,
Codec::TimestampMillis(Some(Tz::Utc)) => Self::TimestampMillisUtc,
Codec::TimestampMillis(None) => Self::TimestampMillisLocal,
Codec::TimestampMicros(Some(Tz::OffsetZero)) => Self::TimestampMicrosUtc,
Codec::TimestampMicros(Some(Tz::Utc)) => Self::TimestampMicrosUtc,
Codec::TimestampMicros(None) => Self::TimestampMicrosLocal,
Codec::TimestampNanos(Some(Tz::OffsetZero)) => Self::TimestampNanosUtc,
Codec::TimestampNanos(Some(Tz::Utc)) => Self::TimestampNanosUtc,
Codec::TimestampNanos(None) => Self::TimestampNanosLocal,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe a bit cleaner.

Suggested change
Codec::TimestampMillis(Some(Tz::OffsetZero)) => Self::TimestampMillisUtc,
Codec::TimestampMillis(Some(Tz::Utc)) => Self::TimestampMillisUtc,
Codec::TimestampMillis(None) => Self::TimestampMillisLocal,
Codec::TimestampMicros(Some(Tz::OffsetZero)) => Self::TimestampMicrosUtc,
Codec::TimestampMicros(Some(Tz::Utc)) => Self::TimestampMicrosUtc,
Codec::TimestampMicros(None) => Self::TimestampMicrosLocal,
Codec::TimestampNanos(Some(Tz::OffsetZero)) => Self::TimestampNanosUtc,
Codec::TimestampNanos(Some(Tz::Utc)) => Self::TimestampNanosUtc,
Codec::TimestampNanos(None) => Self::TimestampNanosLocal,
Codec::TimestampMillis(Some(_)) => Self::TimestampMillisUtc,
Codec::TimestampMillis(None) => Self::TimestampMillisLocal,
Codec::TimestampMicros(Some(_)) => Self::TimestampMicrosUtc,
Codec::TimestampMicros(None) => Self::TimestampMicrosLocal,
Codec::TimestampNanos(Some(_)) => Self::TimestampNanosUtc,
Codec::TimestampNanos(None) => Self::TimestampNanosLocal,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wary of matching Some(_) because if Tz ever gets non-UTC variants, the suggested code will still compile, but will be wrong.

Comment on lines +749 to +760
Self::TimestampMillis(tz) => DataType::Timestamp(
TimeUnit::Millisecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Self::TimestampMicros(tz) => DataType::Timestamp(
TimeUnit::Microsecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Self::TimestampNanos(tz) => DataType::Timestamp(
TimeUnit::Nanosecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then down here you could do:

Suggested change
Self::TimestampMillis(tz) => DataType::Timestamp(
TimeUnit::Millisecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Self::TimestampMicros(tz) => DataType::Timestamp(
TimeUnit::Microsecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Self::TimestampNanos(tz) => DataType::Timestamp(
TimeUnit::Nanosecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Self::TimestampMillis(tz) => DataType::Timestamp(
TimeUnit::Millisecond,
tz.as_ref().map(|tz| tz.as_str().into()),
),
Self::TimestampMicros(tz) => DataType::Timestamp(
TimeUnit::Microsecond,
tz.as_ref().map(|tz| tz.as_str().into()),
),
Self::TimestampNanos(tz) => DataType::Timestamp(
TimeUnit::Nanosecond,
tz.as_ref().map(|tz| tz.as_str().into()),
),

@alamb alamb marked this pull request as draft February 11, 2026 20:25
@alamb
Copy link
Contributor

alamb commented Feb 11, 2026

Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look

mzabaluev and others added 9 commits February 26, 2026 22:08
Add the as_str method to avro Tz as a more performant
way to get the timezone ID string.

Co-authored-by: Connor Sanders <170039284+jecsand838@users.noreply.github.com>
Add with_tz method to AsyncAvroFileReaderBuilder to allow configuring
the time zone representation for Avro timestamp fields the same way
as the synchronous Avro reader.
@mzabaluev mzabaluev marked this pull request as ready for review March 3, 2026 22:25
@mzabaluev
Copy link
Contributor Author

@alamb I have added tests, the same builder option is also added to the async reader now that it's merged.

Copy link
Contributor

@jecsand838 jecsand838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configurable Arrow representation of UTC timestamps for Avro reader

4 participants