Implement TimeStampXXXTZVector for parquet isAdjustedToUTC #926 #576 #577#927
Implement TimeStampXXXTZVector for parquet isAdjustedToUTC #926 #576 #577#927laurentperez wants to merge 7 commits intoKotlin:masterfrom
Conversation
|
@fb64 might interest you ;) |
dataframe-arrow/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/arrowReadingImpl.kt
Show resolved
Hide resolved
| if (isNull(i)) { | ||
| null | ||
| } else { | ||
| DateUtility.getLocalDateTimeFromEpochMilli(TimeUnit.SECONDS.toMillis(getObject(it)), this.timeZone) |
There was a problem hiding this comment.
at second glance I'm not so sure about this one.
Parquet does not have seconds precision and my PR is for Parquet, but pyarrow feather can .floor('S') its datetimes.
I'm not sure what's going to be present in its .feather file, either seconds or milliseconds.
I need to test this against a .feather file together with seconds precision and timezone awareness, perhaps one from https://github.com/Kotlin/dataframe/tree/master/dataframe-arrow/src/test/resources
There was a problem hiding this comment.
Your pull request focuses on Parquet, but it would be beneficial to incorporate upgrades for all Arrow TimeStamp types as well, because Parquet reading (in my PR) relies totally on Arrow.
The test testTimeStamp covers both cases for IPC format and Feather format (by using ArrowStreamWriter and ArrowFileWriter), so you only should add TimeStampNanoTZVector,TimeStampMicroVector and TimeStampSecTZVector to the writeArrowTimestamp method and to the expected DataFrame in testTimeStamp as you did for TimeStampMicroTZVector.
Finally as your PR improves Arrow types compatibility it could be merge independently of #577 (IMHO) 😃
#926 #576 #577