[FEAT] Expose total_pages_processed in execution API response by pk-zipstack · Pull Request #1801 · Zipstack/unstract

pk-zipstack · 2026-02-23T08:51:46Z

What

Add UsageHelper.get_aggregated_pages_processed() to aggregate page count from PageUsage model per run_id
Add aggregated_total_pages_processed property on WorkflowExecution model to aggregate pages across all file executions
Enrich per-file metadata with total_pages_processed for API destinations in _process_final_output()
Enrich combined metadata with total_pages_processed for DB destinations in get_combined_metadata()
Expose aggregated_total_pages_processed in ExecutionSerializer for the execution list API

Why

The PageUsage model already tracks pages_processed per run_id (file execution ID), but this data was not surfaced in API responses
Exposing page counts alongside token usage and cost data enables clients to track document processing volume per execution

How

usage_v2/helper.py: Added get_aggregated_pages_processed(run_id) static method that queries PageUsage.objects.filter(run_id=run_id) and aggregates Sum('pages_processed'), returning int | None
workflow_v2/models/execution.py: Added aggregated_total_pages_processed property that collects file execution IDs via self.file_executions, converts to strings, and queries PageUsage with run_id__in
workflow_v2/file_execution_tasks.py: In _process_final_output(), after destination.get_metadata() for API destinations, injects total_pages_processed into execution_metadata
endpoint_v2/destination.py: In get_combined_metadata(), adds total_pages_processed alongside existing usage token data
execution/serializer/execution.py: Added aggregated_total_pages_processed as a SerializerMethodField

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No. All changes are additive — new helper method, new model property, new serializer field, and new keys in metadata dicts. No existing fields or behavior are modified. The total_pages_processed field gracefully returns None when no PageUsage data exists.

Database Migrations

None

Env Config

None

Relevant Docs

N/A

Related Issues or PRs

N/A

Dependencies Versions

No changes

Notes on Testing

Trigger a workflow execution via API deployment and verify total_pages_processed appears in per-file response metadata
Trigger a workflow execution via DB destination and verify total_pages_processed appears in combined metadata
Call the execution list API and verify aggregated_total_pages_processed appears in the response
Verify None is returned gracefully when no PageUsage records exist for an execution

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

…adata Surface page usage data from PageUsage model in API responses to support tracking total pages processed per file execution and per workflow execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

for more information, see https://pre-commit.ci

coderabbitai · 2026-02-23T08:52:14Z

Summary by CodeRabbit

New Features
- Aggregated total pages processed is now tracked for workflow executions.
- The aggregated value is exposed in execution details and serialized output, and added to API destination metadata for runs.
- If no pages are found for a run, the field will be empty/blank.

Walkthrough

Adds aggregated pages-processed tracking: a helper that sums PageUsage by run_id(s), a WorkflowExecution property exposing the aggregate, a serializer field, and enrichment of endpoint/task metadata with the aggregated total_pages_processed. Returns None when no input or no matching records.

Changes

Cohort / File(s)	Summary
Usage Aggregation Helper `backend/usage_v2/helper.py`	Added `UsageHelper.get_aggregated_pages_processed(run_id: str
Model Layer `backend/workflow_manager/workflow_v2/models/execution.py`	Added `aggregated_total_pages_processed` property on `WorkflowExecution` that collects related `file_execution` IDs and delegates aggregation to `UsageHelper.get_aggregated_pages_processed(run_ids=...)`.
Serializer `backend/workflow_manager/execution/serializer/execution.py`	Added `aggregated_total_pages_processed` SerializerMethodField and `get_aggregated_total_pages_processed` to expose the model property in serialized output.
Endpoint & Task Enrichment `backend/workflow_manager/endpoint_v2/destination.py`, `backend/workflow_manager/workflow_v2/file_execution_tasks.py`	Attach `total_pages_processed` to combined metadata / execution metadata for API destinations by calling `UsageHelper.get_aggregated_pages_processed(run_id=...)`.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Endpoint as Endpoint/Destination
    participant Task as FileExecutionTasks
    participant Serializer
    participant Model as WorkflowExecution
    participant Helper as UsageHelper
    participant DB as PageUsage

    Client->>Endpoint: request combined metadata
    Endpoint->>Helper: get_aggregated_pages_processed(run_id)
    Helper->>DB: query PageUsage by run_id/run_ids
    DB-->>Helper: return records
    Helper-->>Endpoint: return summed pages_processed
    Endpoint-->>Client: return metadata (includes total_pages_processed)

    Task->>Helper: get_aggregated_pages_processed(run_id)
    Helper->>DB: query PageUsage
    DB-->>Helper: return records
    Helper-->>Task: return sum
    Task->>Serializer: include total_pages_processed in execution metadata

    Serializer->>Model: access aggregated_total_pages_processed
    Model->>Helper: get_aggregated_pages_processed(run_ids)
    Helper-->>Model: return sum
    Serializer-->>Client: serialized execution with aggregated_total_pages_processed

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: exposing total_pages_processed in the execution API response, which aligns with the core objective of the PR.
Description check	✅ Passed	The description is comprehensive and well-structured, covering all required template sections including What, Why, How, breaking changes assessment, testing notes, and includes visual evidence via a screenshot.
Docstring Coverage	✅ Passed	Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/expose-total-pages-in-extraction-llm-metadata

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (4)

backend/workflow_manager/workflow_v2/models/execution.py (2)
262-280: PageUsage.run_id has no database index, but is now on multiple hot query paths.

The PageUsage model only indexes organization_id. Both filter(run_id=run_id) (in UsageHelper.get_aggregated_pages_processed) and filter(run_id__in=str_ids) (here) will full-scan the page_usage table as it grows. These queries are now triggered per-file in API deployments and per-row in execution list views.

A migration adding a db_index=True on run_id (or a Meta.indexes entry) is recommended:
# In account_usage/models.py – PageUsage.Meta
indexes = [
    models.Index(fields=["organization_id"]),
    models.Index(fields=["run_id"]),   # +
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/workflow_v2/models/execution.py` around lines 262 -
280, Add a database index on PageUsage.run_id to avoid full table scans for
queries used by aggregated_total_pages_processed and
UsageHelper.get_aggregated_pages_processed: update the PageUsage model Meta to
include an index for "run_id" (e.g., add models.Index(fields=["run_id"])
alongside the existing organization_id index) and create/apply a Django
migration so the new index is created in the database.
274-280: Same redundant .exists() + .aggregate() double-query as in UsageHelper.

Sum("pages_processed") on an empty queryset returns None for the key, so the .exists() check buys nothing. Collapsing to a single .aggregate() call saves one round-trip per property access (and this property is called per row in list-view serialization).
♻️ Proposed fix
-        str_ids = [str(fid) for fid in file_execution_ids]
-        queryset = PageUsage.objects.filter(run_id__in=str_ids)
-        if not queryset.exists():
-            return None
-
-        result = queryset.aggregate(total_pages=Sum("pages_processed"))
-        return result.get("total_pages")
+        str_ids = [str(fid) for fid in file_execution_ids]
+        result = PageUsage.objects.filter(run_id__in=str_ids).aggregate(
+            total_pages=Sum("pages_processed")
+        )
+        return result.get("total_pages")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/workflow_v2/models/execution.py` around lines 274 -
280, The current code does a redundant .exists() followed by .aggregate() which
causes an extra DB round-trip; replace the two-step pattern in the block that
builds str_ids and assigns queryset =
PageUsage.objects.filter(run_id__in=str_ids) with a single aggregate call: call
PageUsage.objects.filter(run_id__in=str_ids).aggregate(total_pages=Sum("pages_processed"))
and return result.get("total_pages") (this preserves None for no rows) — update
the logic around the variables file_execution_ids, queryset and the use of
Sum("pages_processed")/pages_processed to remove the .exists() check and avoid
the double query.
backend/workflow_manager/execution/serializer/execution.py (1)
36-38: aggregated_total_pages_processed adds up to 3 extra DB queries per execution in list views.

The model property fires: (1) a values_list on file_executions, (2) a PageUsage.exists(), (3) a PageUsage.aggregate(). Combined with the existing per-item queries for get_successful_files and get_failed_files, execution list endpoints are now executing ~7 queries per row. The class-level TODO already calls this out; this field makes addressing it more urgent.

Consider annotating the aggregate in the queryset that feeds the list view (e.g., via a subquery annotation or a bulk prefetch_related approach) rather than resolving it lazily per object.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/workflow_manager/execution/serializer/execution.py` around lines 36 -
38, The current get_aggregated_total_pages_processed serializer method calls the
WorkflowExecution model property aggregated_total_pages_processed which triggers
multiple DB queries per row; instead annotate the queryset that feeds the list
view with the aggregated total (using a Subquery/OuterRef or a bulk aggregation
with Prefetch over file_executions/PageUsage) and change
get_aggregated_total_pages_processed to return that annotated value (e.g., read
obj.annotated_aggregated_total_pages_processed or similar) so no per-object
queries are run; update the view/queryset that constructs the list to include
the annotation name you choose and ensure the serializer reads that attribute
rather than accessing the model property.
backend/usage_v2/helper.py (1)
37-45: Redundant .exists() check creates an unnecessary extra DB query.

Sum on an empty queryset already returns None for the aggregated key, so result.get("total_pages") will return None when there are no records — the .exists() guard is superfluous. The sibling method get_aggregated_token_count uses a single .aggregate() call for this reason.

Additionally, two static-analysis hints are valid here:

BLE001: Replace bare except Exception with a narrower exception type, or at minimum annotate the intent.

TRY400: logger.error suppresses the traceback; logger.exception (or logger.error(..., exc_info=True)) is preferred.
♻️ Proposed fix
-        try:
-            queryset = PageUsage.objects.filter(run_id=run_id)
-            if not queryset.exists():
-                return None
-            result = queryset.aggregate(total_pages=Sum("pages_processed"))
-            return result.get("total_pages")
-        except Exception as e:
-            logger.error(f"Error aggregating pages processed for run_id {run_id}: {e}")
-            return None
+        try:
+            result = PageUsage.objects.filter(run_id=run_id).aggregate(
+                total_pages=Sum("pages_processed")
+            )
+            return result.get("total_pages")
+        except Exception as e:  # noqa: BLE001
+            logger.exception(f"Error aggregating pages processed for run_id {run_id}: {e}")
+            return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/usage_v2/helper.py` around lines 37 - 45, Remove the redundant
.exists() check and perform a single aggregate call on
PageUsage.objects.filter(run_id=run_id) returning result.get("total_pages")
(same pattern as get_aggregated_token_count); replace the bare except Exception
with a narrower exception (e.g., catch django.db.DatabaseError) and log the
failure with full traceback using logger.exception(...) (or logger.error(...,
exc_info=True)) to preserve stack information while keeping the behavior of
returning None on error.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/usage_v2/helper.py`:
- Around line 37-45: Remove the redundant .exists() check and perform a single
aggregate call on PageUsage.objects.filter(run_id=run_id) returning
result.get("total_pages") (same pattern as get_aggregated_token_count); replace
the bare except Exception with a narrower exception (e.g., catch
django.db.DatabaseError) and log the failure with full traceback using
logger.exception(...) (or logger.error(..., exc_info=True)) to preserve stack
information while keeping the behavior of returning None on error.

In `@backend/workflow_manager/execution/serializer/execution.py`:
- Around line 36-38: The current get_aggregated_total_pages_processed serializer
method calls the WorkflowExecution model property
aggregated_total_pages_processed which triggers multiple DB queries per row;
instead annotate the queryset that feeds the list view with the aggregated total
(using a Subquery/OuterRef or a bulk aggregation with Prefetch over
file_executions/PageUsage) and change get_aggregated_total_pages_processed to
return that annotated value (e.g., read
obj.annotated_aggregated_total_pages_processed or similar) so no per-object
queries are run; update the view/queryset that constructs the list to include
the annotation name you choose and ensure the serializer reads that attribute
rather than accessing the model property.

In `@backend/workflow_manager/workflow_v2/models/execution.py`:
- Around line 262-280: Add a database index on PageUsage.run_id to avoid full
table scans for queries used by aggregated_total_pages_processed and
UsageHelper.get_aggregated_pages_processed: update the PageUsage model Meta to
include an index for "run_id" (e.g., add models.Index(fields=["run_id"])
alongside the existing organization_id index) and create/apply a Django
migration so the new index is created in the database.
- Around line 274-280: The current code does a redundant .exists() followed by
.aggregate() which causes an extra DB round-trip; replace the two-step pattern
in the block that builds str_ids and assigns queryset =
PageUsage.objects.filter(run_id__in=str_ids) with a single aggregate call: call
PageUsage.objects.filter(run_id__in=str_ids).aggregate(total_pages=Sum("pages_processed"))
and return result.get("total_pages") (this preserves None for no rows) — update
the logic around the variables file_execution_ids, queryset and the use of
Sum("pages_processed")/pages_processed to remove the .exists() check and avoid
the double query.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d7babec and 25de508.

📒 Files selected for processing (5)

backend/usage_v2/helper.py
backend/workflow_manager/endpoint_v2/destination.py
backend/workflow_manager/execution/serializer/execution.py
backend/workflow_manager/workflow_v2/file_execution_tasks.py
backend/workflow_manager/workflow_v2/models/execution.py

chandrasekharan-zipstack · 2026-02-25T08:54:33Z

backend/usage_v2/helper.py

+        except Exception as e:
+            logger.error(f"Error aggregating pages processed for run_id {run_id}: {e}")
+            return None


NIT: Check this exception handling behaviour. In almost all cases its better to let the exception bubble up and let it get handled by the middleware. Test it after removing this - we need to ensure that user sees an appropriate error message (need not be too specific) but at the same time, we should log it with a traceback if its a 5xx error

chandrasekharan-zipstack · 2026-02-25T08:56:53Z

backend/usage_v2/helper.py


 class UsageHelper:
+    @staticmethod
+    def get_aggregated_pages_processed(run_id: str) -> int | None:


There's some code duplication between this method and aggregated_total_pages_processed() of WorkflowExecution model. Unify these implementations

chandrasekharan-zipstack · 2026-02-25T08:57:13Z

backend/workflow_manager/workflow_v2/models/execution.py

+        queryset = PageUsage.objects.filter(run_id__in=str_ids)
+        if not queryset.exists():
+            return None
+
+        result = queryset.aggregate(total_pages=Sum("pages_processed"))


NIT: Code duplication

…ption handling - Unify UsageHelper.get_aggregated_pages_processed() to accept either run_id or run_ids, eliminating duplicate PageUsage query logic - WorkflowExecution.aggregated_total_pages_processed now delegates to UsageHelper instead of duplicating the aggregation - Remove broad try/except so exceptions bubble up to middleware Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-02-25T09:42:41Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-02-25T09:42:41Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 66 passed, 0 failed (66 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/usage_v2/helper.py (1)

49-52: Remove the redundant exists() check — it causes an extra DB round-trip.

Django's Sum on an empty queryset returns {"total_pages": None}, so result.get("total_pages") already returns None with no records present. The exists() guard adds a second database hit for every call without changing the outcome.

♻️ Proposed simplification

-        if not queryset.exists():
-            return None
-        result = queryset.aggregate(total_pages=Sum("pages_processed"))
-        return result.get("total_pages")
+        result = queryset.aggregate(total_pages=Sum("pages_processed"))
+        return result.get("total_pages")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/usage_v2/helper.py` around lines 49 - 52, The current code performs
an extra DB round-trip by calling queryset.exists() before aggregating; remove
the exists() check and directly call
queryset.aggregate(total_pages=Sum("pages_processed")) and return
result.get("total_pages") — keep the aggregate on the same queryset (using Sum
and "pages_processed") so empty querysets yield None without the extra exists()
query.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/workflow_manager/workflow_v2/models/execution.py`:
- Around line 261-277: The aggregated_total_pages_processed property issues
per-object DB queries (file_executions.values_list(...) and
UsageHelper.get_aggregated_pages_processed(...)) causing an N+1 when used in
ExecutionSerializer list endpoints; fix by either adding a DB index on
PageUsage.run_id to speed each aggregate query and adding a migration for that
index, or (preferred) batch the totals in the list view/serializer by computing
aggregated page totals for all execution IDs in one query and attaching them to
the queryset (override the list view or ExecutionSerializer to accept a
precomputed map keyed by execution id and avoid calling
aggregated_total_pages_processed per instance); reference the property
aggregated_total_pages_processed, the call to file_executions.values_list, and
UsageHelper.get_aggregated_pages_processed when implementing the change.

---

Nitpick comments:
In `@backend/usage_v2/helper.py`:
- Around line 49-52: The current code performs an extra DB round-trip by calling
queryset.exists() before aggregating; remove the exists() check and directly
call queryset.aggregate(total_pages=Sum("pages_processed")) and return
result.get("total_pages") — keep the aggregate on the same queryset (using Sum
and "pages_processed") so empty querysets yield None without the extra exists()
query.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 25de508 and adf2dc2.

📒 Files selected for processing (2)

backend/usage_v2/helper.py
backend/workflow_manager/workflow_v2/models/execution.py

coderabbitai · 2026-02-25T09:47:23Z

backend/workflow_manager/workflow_v2/models/execution.py

+    @property
+    def aggregated_total_pages_processed(self) -> int | None:
+        """Retrieve aggregated total pages processed for this execution.
+
+        Returns:
+            int | None: Total pages processed across all file executions,
+            or None if no page usage data exists.
+        """
+        from usage_v2.helper import UsageHelper
+
+        file_execution_ids = list(self.file_executions.values_list("id", flat=True))
+        if not file_execution_ids:
+            return None
+
+        return UsageHelper.get_aggregated_pages_processed(
+            run_ids=[str(fid) for fid in file_execution_ids]
+        )


⚠️ Potential issue | 🟠 Major

N+1 query risk when aggregated_total_pages_processed is serialized in a list endpoint.

Each call to this property issues at minimum 2 queries (1 for file_executions.values_list, 1 for PageUsage.aggregate). The AI summary confirms this property is exposed in ExecutionSerializer for the execution list API, making this O(2N) extra queries for N executions — on top of any existing per-object properties already doing the same.

Unlike aggregated_usage_cost (which filters Usage directly by execution_id), this property must first resolve file_execution_ids and then fan out to PageUsage, because there is no direct execution_id column on PageUsage (only run_id, which maps to a file execution ID). The fan-out is structurally inherent to the data model.

Mitigation options to consider:

Add a DB index on PageUsage.run_id to at least make each aggregate query fast.

Batch-load in the serializer: override the list view's queryset to prefetch/annotate page totals per execution, bypassing the per-object property for list responses.

Accept the cost if the list endpoint is paginated tightly (e.g., page size ≤ 10) and usage is low, but make this explicit.

Run the following to confirm there's no existing index on PageUsage.run_id:

#!/bin/bash # Confirm PageUsage model's Meta.indexes and any migrations that add an index on run_id rg -n "run_id" --type py -C3 backend/account_usage/models.py # Also check migrations for any index on page_usage.run_id rg -rn "page_usage" --type py -g "**/migrations/**" -C2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/workflow_manager/workflow_v2/models/execution.py` around lines 261 - 277, The aggregated_total_pages_processed property issues per-object DB queries (file_executions.values_list(...) and UsageHelper.get_aggregated_pages_processed(...)) causing an N+1 when used in ExecutionSerializer list endpoints; fix by either adding a DB index on PageUsage.run_id to speed each aggregate query and adding a migration for that index, or (preferred) batch the totals in the list view/serializer by computing aggregated page totals for all execution IDs in one query and attaching them to the queryset (override the list view or ExecutionSerializer to accept a precomputed map keyed by execution id and avoid calling aggregated_total_pages_processed per instance); reference the property aggregated_total_pages_processed, the call to file_executions.values_list, and UsageHelper.get_aggregated_pages_processed when implementing the change.

pk-zipstack and others added 2 commits February 23, 2026 14:21

[pre-commit.ci] auto fixes from pre-commit.com hooks

25de508

for more information, see https://pre-commit.ci

pk-zipstack requested review from Deepak-Kesavan and chandrasekharan-zipstack February 23, 2026 08:55

pk-zipstack self-assigned this Feb 23, 2026

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

chandrasekharan-zipstack requested a review from muhammad-ali-e February 25, 2026 08:50

chandrasekharan-zipstack reviewed Feb 25, 2026

View reviewed changes

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[FEAT] Expose total_pages_processed in execution API response#1801

[FEAT] Expose total_pages_processed in execution API response#1801
pk-zipstack wants to merge 3 commits intomainfrom
feat/expose-total-pages-in-extraction-llm-metadata

pk-zipstack commented Feb 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

chandrasekharan-zipstack Feb 25, 2026

Uh oh!

chandrasekharan-zipstack Feb 25, 2026

Uh oh!

chandrasekharan-zipstack Feb 25, 2026

Uh oh!

sonarqubecloud bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

pk-zipstack commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

chandrasekharan-zipstack Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

chandrasekharan-zipstack Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

chandrasekharan-zipstack Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Feb 25, 2026

Quality Gate passed

Uh oh!

github-actions bot commented Feb 25, 2026

Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pk-zipstack commented Feb 23, 2026 •

edited

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading