Skip to content

Add strict_provenance config flag for upstream-only enforcement in make() #1425

@dimitri-yatsenko

Description

@dimitri-yatsenko

Summary

Add a dj.config["strict_provenance"] flag that, when enabled, enforces the upstream-only convention at runtime: inside make(), self.upstream is the only way to read data, and self (including its Part tables) is the only way to write.

Context

Discussion: #1232
Depends on: #1423 (Diagram.trace()), #1424 (self.upstream in make())

Problem

The upstream-only convention — that make() should only access declared upstream dependencies — is the foundation of DataJoint's provenance guarantee. Today the framework defines this convention but does not enforce it. A make() method can fetch() from any table, making the undeclared dependency invisible to the provenance graph. Similarly, make() can insert into arbitrary tables, not just the target table and its parts.

Design

dj.config["strict_provenance"] = True

Read enforcement

When enabled, only self.upstream[Table] can access data inside make(). Direct fetch() / to_dicts() / to_pandas() / to_arrays() calls on table objects that are not part of the pre-restricted ancestor graph raise an error.

Write enforcement

When enabled, inserts inside make() are restricted to:

  • self — the target table being populated
  • self's Part tables — e.g. self.PartName.insert(...)

Inserts into any other table raise an error. Additionally, every inserted row's primary key must be consistent with the current key — preventing make() from inserting rows for keys it wasn't called with.

Operation Allowed target Blocked
Read (fetch) self.upstream[Ancestor] All other tables
Write (insert) self and self's Part tables All other tables
Key scope Must match current key Mismatched primary keys

Default behavior

When not enabled (the default), everything works as before. Zero breaking changes.

Provenance guarantee

Strict mode ensures:

  • The trace only contains declared ancestors (provenance-complete)
  • The trace is restricted by the key (no access to unrelated entities)
  • Writes are scoped to the target table and its parts with matching keys
  • Every data access is mediated and auditable
  • Undeclared dependencies become impossible, not just unconventional

Implementation approach

The framework sets a context flag during make() execution.

  • Read gating: Query execution on table objects checks whether strict provenance is active and whether the table is part of the current self.upstream ancestor graph. If not, it raises a DataJointError.
  • Write gating: Insert calls check whether the target is self or one of self's Part tables, and whether the primary key is consistent with the current key. The existing _allow_insert mechanism on the class can be narrowed to enforce this.

This is an operational concern, not a schema property — the same schema definitions work in both modes. Teams can enable it globally without touching schema constructors. Useful for enabling in production while leaving it off during development/debugging.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureIndicates new features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions