Skip to content

Conversation

@abstractdog
Copy link
Contributor

@abstractdog abstractdog commented Feb 9, 2026

What changes were proposed in this pull request?

See jira for initial analysis.
The patch introduces a table createTime check while using the acid dir cache. This create_time property is propagated to the TezAM in the vertex-level JobConf, making it able to invalidate stale entries that belong to the previous instance of the same table (before DROP).
An RPC call-based solution wouldn't work because HS2 has no such interface to the AMs, and introducing such functionality to the TezClient/DagClient would be an epic hack, so it's not an option.

Why are the changes needed?

Because stale cache can cause problems, that are hard to investigate and that weren't taken care of by HIVE-26060.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually tested with minihs2.

mvn clean install -Dtest=StartMiniHS2Cluster -DminiHS2.clusterType=llap -DminiHS2.run=true -DminiHS2.usePortsFromConf=true -T 1C -Denforcer.skip=true -pl itests/hive-unit -pl itests/util -Pitests -nsu DminiHS2.isMetastoreRemote=true

set hive.explain.user=false;
set hive.query.results.cache.enabled=false;
set hive.fetch.task.conversion=none;

set hive.support.concurrency=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

set hive.exec.orc.split.strategy=BI;


CREATE TABLE test_part(id int) PARTITIONED BY(dt string) CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');
INSERT OVERWRITE TABLE test_part PARTITION (dt) SELECT 1, '1';

SELECT * FROM test_part;

DROP TABLE test_part;


CREATE TABLE test_part(id int) PARTITIONED BY(dt string) CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');
INSERT OVERWRITE TABLE test_part PARTITION (dt) SELECT 1, '1';

SELECT * FROM test_part;

looking for log entry in the AM log:

find . -name "syslog" | xargs grep "invalidating entry"
...
2026-02-09T08:12:00,509 INFO [ORC_GET_SPLITS #1] io.AcidUtils: Table default.test_part was recreated (at: 1770653502) since it was stored in acid cache (at: 1770653453), invalidating entry

@abstractdog abstractdog changed the title HIVE-27328: Acid dirCache is not invalidated in TezAMs while dropping… HIVE-27328: Acid dirCache is not invalidated in TezAMs while dropping table Feb 9, 2026
@abstractdog
Copy link
Contributor Author

abstractdog commented Feb 10, 2026

oh, this causes a lot of qtest noise because of the create_time value added to TableDesc properties, need to think about this
I think I can put it directly to the MapWork's conf somehow to make it consumed in the split generation process

@abstractdog
Copy link
Contributor Author

oh, this causes a lot of qtest noise because of the create_time value added to TableDesc properties, need to think about this I think I can put it directly to the MapWork's conf somehow to make it consumed in the split generation process

UPDATE: solved from MapWork

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants