Skip to content

Core: Move Hadoop conf serialization into SerializableConfiguration#15583

Merged
nastra merged 3 commits intoapache:mainfrom
nastra:hadoop-conf-serde
Mar 13, 2026
Merged

Core: Move Hadoop conf serialization into SerializableConfiguration#15583
nastra merged 3 commits intoapache:mainfrom
nastra:hadoop-conf-serde

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Mar 11, 2026

This moves the Hadoop-specific config serialization out of SerializableTable into Hadoop's SerializableConfiguration

@github-actions github-actions bot added the core label Mar 11, 2026
@nastra nastra force-pushed the hadoop-conf-serde branch 2 times, most recently from ccdace1 to a2cd680 Compare March 11, 2026 09:04
@nastra nastra force-pushed the hadoop-conf-serde branch 2 times, most recently from 7a6914b to d9036b9 Compare March 11, 2026 11:36
@nastra nastra requested review from RussellSpitzer and rdblue March 11, 2026 12:47
this(new SerializableConfiguration(hadoopConf));
}

public HadoopFileIO(SerializableSupplier<Configuration> hadoopConf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we deprecate this constructor and the serializeConfWith method? That method will double-wrap the configuration if it is called. I like that this should now be Kryo-compatible as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like at least the method is being used in

private void checkAndSkipIoConfigSerialization(Configuration config, Table table) {
if (table != null
&& config.getBoolean(
InputFormatConfig.CONFIG_SERIALIZATION_DISABLED,
InputFormatConfig.CONFIG_SERIALIZATION_DISABLED_DEFAULT)
&& table.io() instanceof HadoopConfigurable) {
((HadoopConfigurable) table.io())
.serializeConfWith(conf -> new NonSerializingConfig(config)::get);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a question of whether that is important enough to keep this method. As far as I can tell it's just for an optimization of not serializing the Hadoop Conf?

It's only opt in (CONFIG_SERIALIZATION_DISABLED) is false by default and I'm dubious it's even that beneficial now. Since we are going to be serializing this all over the place in non-hadoop situations right? I would recommend we just deprecate it then remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I've deprecated both

@nastra nastra force-pushed the hadoop-conf-serde branch from d9036b9 to df2dc69 Compare March 12, 2026 06:28
@RussellSpitzer
Copy link
Member

Two other places that may need updating?

SerializationUtil.serializeToBytes line 46:

return serializeToBytes(obj, conf -> new SerializableConfiguration(conf)::get);

ResolvingFileIO.setConf line 157:

this.hadoopConf = new SerializableConfiguration(conf)::get;

@nastra nastra force-pushed the hadoop-conf-serde branch from df2dc69 to e34f70e Compare March 13, 2026 13:22
@nastra
Copy link
Contributor Author

nastra commented Mar 13, 2026

thanks everyone for the reviews, I'll go ahead and merge this

@nastra nastra merged commit ba40300 into apache:main Mar 13, 2026
35 checks passed
@nastra nastra deleted the hadoop-conf-serde branch March 13, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants