[FEAT] Dashboard metrics system by athul-rs · Pull Request #1724 · Zipstack/unstract

athul-rs · 2025-12-29T19:31:42Z

What

Add dashboard metrics Django app with multi-tier storage (hourly/daily/monthly tables)
Add core metrics library with queue backend for event capture
Add dedicated metrics Celery worker (worker-metrics)
Add live query endpoints for immediate data access from source tables
Add production readiness features (retry config, rate limiting, health checks)

Why

Enable tracking and visualization of platform usage metrics
Provide real-time insights into documents processed, LLM calls, API requests, etc.
Support dashboard reporting without impacting main application performance

How

Created backend/dashboard_metrics/ Django app with models, views, services, and tasks
Created unstract/core/src/unstract/core/metrics/ library with queue backend using Kombu
Used celery-batches for efficient batch processing of metric events
Added dedicated Celery worker consuming from dashboard_metric_events queue
Implemented live query endpoints that read directly from source tables (usage, page_usage, workflow_execution)

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No, this is an entirely new feature with no modifications to existing functionality
New Django app is isolated and only adds new API endpoints
New Celery worker runs on a separate queue

Database Migrations

0001_initial: Creates EventMetricsHourly table
0002_setup_cleanup_periodic_task: Adds periodic cleanup task
0003_add_daily_monthly_tables: Creates EventMetricsDaily and EventMetricsMonthly tables
0004_update_cleanup_periodic_tasks: Adds daily cleanup periodic task

Env Config

WORKER_METRICS_AUTOSCALE (optional): Autoscale config for metrics worker (default: 4,1)
DASHBOARD_METRICS_ENABLED (optional): Enable/disable metrics capture

Relevant Docs

N/A

Related Issues or PRs

UN-1798

Dependencies Versions

celery-batches>=0.8.0 (new dependency)

Notes on Testing

Run unit tests: pytest backend/dashboard_metrics/tests/
Verify worker starts: docker compose up worker-metrics
Test health endpoint: GET /api/v2/metrics/health/
Verify metrics queue routing in RabbitMQ

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

coderabbitai · 2025-12-29T19:31:55Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a Dashboard Metrics feature: new Django app with models, migrations, Celery tasks and worker, Redis-backed bucket cache, a core metrics API and queue backend, services and management commands, tests, frontend dashboard UI/components/hooks/styles, and runtime tooling (docker service and poe task).

Changes

Cohort / File(s)	Summary
Celery config & runtime `backend/backend/celery_config.py`, `backend/pyproject.toml`, `docker/docker-compose.yaml`	Adds broker resilience and transport options, worker settings, defines `dashboard_metric_events` queue and task routes, adds a poe task for a metrics worker, and a new `worker-metrics` docker service.
Django app registration & settings `backend/backend/settings/base.py`, `backend/dashboard_metrics/apps.py`, `backend/dashboard_metrics/__init__.py`	Registers `dashboard_metrics` app and adds dashboard cache TTL settings.
Routing `backend/backend/urls_v2.py`, `backend/dashboard_metrics/urls.py`	Mounts metrics app at `/metrics/` and defines viewset routes (list, detail, summary, series, overview, live_*, health, recent_activity, workflow_token_usage).
Models & migrations `backend/dashboard_metrics/models.py`, `backend/dashboard_metrics/migrations/*`	Adds MetricType and three time-bucket models (hourly/daily/monthly) with indexes, GIN on labels, unique constraints; migrations create tables and schedule aggregation/cleanup periodic tasks and enforce tag NOT NULL.
Caching layer `backend/dashboard_metrics/cache.py`	New Redis-backed bucket cache with time-aware TTLs, bucket key generation, mget/mset helpers, decorator, invalidate/warm helpers, pipeline usage and Redis-unavailable fallbacks.
Metric capture & core framework `backend/dashboard_metrics/capture.py`, `unstract/core/src/unstract/core/metrics/...`	Adds MetricsCapture (opt-in), core metrics API (record), types/registry, backends (Noop, Queue) with RabbitMQ publishing, and backend configuration selection.
Query/service layer `backend/dashboard_metrics/services.py`	Adds MetricsQueryService to query and aggregate metrics from source tables with many metric getters and summary aggregation.
Celery tasks & processing `backend/dashboard_metrics/tasks.py`	Adds Celery tasks for aggregation from sources and cleanup, time-bucketing utilities, upsert helpers, per-org processing, and retention constants.
API serializers & views `backend/dashboard_metrics/serializers.py`, `backend/dashboard_metrics/views.py`	Adds request/response serializers and DashboardMetricsViewSet with throttling, source selection, bucket-cache-aware hourly fetch, health checks, and endpoints.
Management commands & tests `backend/dashboard_metrics/management/commands/`, `backend/dashboard_metrics/tests/`	Adds backfill, generate/populate test data commands and tests covering task helpers and cleanup behavior.
Frontend deps, routes & components `frontend/package.json`, `frontend/src/routes/useMainAppRoutes.js`, `frontend/src/pages/MetricsDashboardPage.jsx`, `frontend/src/components/metrics-dashboard/*`, `frontend/src/hooks/useMetricsData.js`, `frontend/src/helpers/metricsCache.js`, `frontend/src/components/navigations/side-nav-bar/SideNavBar.jsx`	Adds `recharts` dependency, routes and page for dashboard, LocalStorage TTL cache helpers, hooks (including workflow token usage), new dashboard components (summary, charts, recent activity, LLM usage), styles, and side-nav item.
Docs `backend/dashboard_metrics/README.md`	New README documenting the dashboard metrics module and architecture.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(220,240,255,0.5)
    participant Client as Frontend Client
    end
    rect rgba(240,255,220,0.5)
    participant API as DashboardMetricsViewSet
    end
    rect rgba(255,240,220,0.5)
    participant Cache as Redis Cache
    participant Service as MetricsQueryService
    participant DB as Metrics DB
    end

    Client->>API: GET /metrics/summary?org=X&start=Y&end=Z
    API->>API: _select_source(params)
    alt hourly source & bucket-cache enabled
        API->>Cache: mget_metrics_buckets(org, range)
        Cache-->>API: cached buckets + missing list
        API->>Service: query missing buckets
        Service->>DB: fetch missing rows
        DB-->>Service: rows
        Service-->>API: rows
        API->>Cache: mset_metrics_buckets(new buckets)
    else daily/monthly or full cache miss
        API->>Service: query aggregated data
        Service->>DB: fetch aggregates
        DB-->>Service: aggregates
        Service-->>API: aggregates
    end
    API->>Client: HTTP 200 MetricsResponse

sequenceDiagram
    rect rgba(255,240,240,0.5)
    participant Producer as Integration Point
    participant Capture as MetricsCapture
    participant Core as core.metrics.record
    end
    rect rgba(240,255,240,0.5)
    participant Backend as QueueBackend
    participant Broker as RabbitMQ
    participant Worker as worker-metrics
    participant Task as process_dashboard_metric_events
    participant DB as Metrics DB
    end

    Producer->>Capture: record_*(...)
    Capture->>Core: record(org_id, metric_name, value,...)
    Core->>Backend: backend.record(MetricEvent)
    Backend->>Broker: publish Celery-style message to queue
    Broker->>Worker: deliver to dashboard_metric_events
    Worker->>Task: process batch of events
    Task->>DB: bulk upsert EventMetricsHourly/Daily/Monthly
    DB-->>Task: upsert results

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (26 files): ⚔️ `backend/account_v2/constants.py` (content) ⚔️ `backend/backend/celery_config.py` (content) ⚔️ `backend/backend/settings/base.py` (content) ⚔️ `backend/backend/urls_v2.py` (content) ⚔️ `backend/migrating/v2/unstract_migrations.py` (content) ⚔️ `backend/plugins/workflow_manager/workflow_v2/api_hub_usage_utils.py` (content) ⚔️ `backend/plugins/workflow_manager/workflow_v2/utils.py` (content) ⚔️ `backend/pyproject.toml` (content) ⚔️ `backend/sample.env` (content) ⚔️ `backend/workflow_manager/endpoint_v2/destination.py` (content) ⚔️ `backend/workflow_manager/endpoint_v2/queue_utils.py` (content) ⚔️ `docker/docker-compose.yaml` (content) ⚔️ `docker/dockerfiles/backend.Dockerfile` (content) ⚔️ `docker/dockerfiles/frontend.Dockerfile` (content) ⚔️ `frontend/package.json` (content) ⚔️ `frontend/src/components/error/GenericError/GenericError.jsx` (content) ⚔️ `frontend/src/components/log-in/Login.jsx` (content) ⚔️ `frontend/src/components/navigations/side-nav-bar/SideNavBar.jsx` (content) ⚔️ `frontend/src/routes/useMainAppRoutes.js` (content) ⚔️ `run-platform.sh` (content) ⚔️ `unstract/filesystem/src/unstract/filesystem/file_storage_config.py` (content) ⚔️ `unstract/filesystem/src/unstract/filesystem/file_storage_types.py` (content) ⚔️ `unstract/sdk1/src/unstract/sdk1/adapters/x2text/llm_whisperer_v2/src/helper.py` (content) ⚔️ `workers/shared/models/result_models.py` (content) ⚔️ `workers/shared/workflow/destination_connector.py` (content) ⚔️ `workers/shared/workflow/execution/service.py` (content) These conflicts must be resolved before merging into `main`.	Resolve conflicts locally and push changes to this branch.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[FEAT] Dashboard metrics system' clearly and concisely summarizes the main feature addition—a comprehensive dashboard metrics system—which aligns with the substantial changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 89.38% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The pull request description is comprehensive and follows the template with all major sections completed: What, Why, How, breaking changes assessment, database migrations, environment config, testing notes, and screenshots.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch UN-1798-simple-dashboard

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

Fix all issues with AI Agents 🤖

In @backend/dashboard_metrics/capture.py:
- Around line 51-338: MetricsCapture methods call record() with the metric enum
as the first positional arg and keyword names like org_id= and value=, but core
record signature expects org_id first, metric_name second and metric_value
third, so change every call in record_api_request, record_etl_execution,
record_document_processed (both DOCUMENTS_PROCESSED and PAGES_PROCESSED),
record_llm_call (LLM_CALLS and LLM_USAGE), record_challenge,
record_summarization, and record_prompt_execution to match the core signature:
call record(org_id, MetricName.<X>, <metric_value>, labels=metric_labels,
project=project, tag=None) (use 1 for single-count events, pages for pages, cost
for LLM_USAGE as float), and remove the invalid value= keyword and duplicate
org_id keyword usage.

In @backend/dashboard_metrics/urls.py:
- Around line 9-15: The URL configuration is missing a mapping for the
DashboardMetricsViewSet.health action; add a new view mapping similar to the
others (e.g., create metrics_health = DashboardMetricsViewSet.as_view({"get":
"health"})) and include it in the URL patterns so GET /api/v2/metrics/health/ is
exposed; update any import/urlpatterns list references where other metrics_*
views are registered to include metrics_health.

In @backend/dashboard_metrics/views.py:
- Around line 267-273: Both live_summary and live_series call
UserContext.get_organization() and access .id directly; add the same guard used
in get_queryset(): if organization is None (or falsy) log an explanatory message
and raise PermissionDenied before accessing organization.id, then pass org_id to
MetricsQueryService.get_all_metrics_summary / get_all_metrics_series; apply the
same check for the other occurrence around lines referenced (the block using
UserContext.get_organization() / organization.id at ~310-312).

In @unstract/core/src/unstract/core/metrics/types.py:
- Around line 35-48: Add a unit test that ensures every value of the MetricName
enum has a corresponding key in METRIC_TYPE_MAP to prevent KeyError in
get_metric_type(); specifically, iterate over MetricName (or
MetricName.__members__.values()/MetricName) and assert that each .value is
present in METRIC_TYPE_MAP, referencing METRIC_TYPE_MAP and MetricName; this
complements validate_metric() and prevents missing entries that would cause
get_metric_type() to fail.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/pyproject.toml (1)

15-21: celery-batches version too old—use >=0.9.0 for Celery 5.3.4 and Python 3.12 support

The constraint celery-batches>=0.8.0 pins to an outdated release (Jun 2023). For Celery 5.3.4 and Python 3.12, use celery-batches>=0.9.0, which is documented to support Celery ~5.0 and Python 3.9–3.13. Version 0.8.0 predates Python 3.12 and lacks testing against current Celery 5.x releases.

🧹 Nitpick comments (16)

backend/dashboard_metrics/tasks.py (2)

37-75: Time bucketing helpers look correct; naive datetimes are assumed UTC

The truncation helpers correctly normalize numeric timestamps to UTC and bucket to hour/day/month. Note that naive datetimes are implicitly treated as UTC; ensure all producers either send UTC or explicitly aware datetimes so you don’t silently mis-bucket mixed timezones.

77-206: Batched aggregation logic is solid; consider tightening event validation

The batching and in-memory aggregation into hourly/daily/monthly buckets looks good and aligns with celery-batches’ request shape. Two minor points to consider:

event = request.args[0] if request.args else request.kwargs.get("event", {}) assumes the first arg is always a dict; if other producers appear later, a simple isinstance(..., dict) guard could avoid noisy warnings.

timestamp is expected as float/datetime; if there’s any chance of ISO strings coming through, you may want to normalize them explicitly rather than relying on a generic except Exception to skip them.

Nothing blocking here; more of a future-hardening note.

backend/backend/urls_v2.py (1)

63-67: Metrics URL include is consistent with existing routing

Mounting dashboard_metrics.urls at "metrics/" follows the same pattern as other feature areas (e.g., "usage/", "notifications/"). Just ensure any external docs or clients expecting /api/v2/metrics/... are updated to match this base path if needed.

docker/docker-compose.yaml (1)

120-138: worker-metrics service wiring looks consistent with existing workers

The new worker-metrics service mirrors the existing Celery worker patterns (image, entrypoint, queue, autoscale, env) and targets the dashboard_metric_events queue as intended. If you expect metrics tasks to need shared workflow data or tool configs, consider whether it should also mount the same volumes as other workers; otherwise this setup looks good.

backend/pyproject.toml (1)

145-148: worker-metrics Poe task aligns with docker-compose configuration

The worker-metrics Poe task mirrors the docker-compose command (same app, log level, queue, autoscale), which is helpful for running the metrics worker locally outside Docker. Keep this in sync with any future queue or autoscale changes to avoid drift between CLI and compose.
backend/dashboard_metrics/tests/test_tasks.py (1)
149-164: Consider renaming unused loop variable.

The loop variable i is not used within the loop body. Following Python convention, consider renaming it to _ to indicate it's intentionally unused.
🔎 Proposed refactor
-        for i in range(5):
+        for _ in range(5):
             mock_request = MagicMock()
unstract/core/src/unstract/core/metrics/__init__.py (1)
113-115: Use logging.exception to preserve the traceback.

When catching exceptions during event creation, using logger.exception() instead of logger.error() will include the full traceback, which aids debugging.
🔎 Suggested fix
     except Exception as e:
-        logger.error(f"Failed to create metric event: {e}")
+        logger.exception(f"Failed to create metric event: {e}")
         return False
unstract/core/src/unstract/core/metrics/registry.py (1)
42-48: Non-deterministic ordering in get_all_metrics().

The function returns list(_REGISTERED_METRICS) where _REGISTERED_METRICS is a set. While this works, the order of returned metrics may vary between calls. If consistent ordering is needed for API responses or testing, consider sorting the result.
🔎 Optional fix for deterministic ordering
 def get_all_metrics() -> list[str]:
     """Get all registered metric names.

     Returns:
         List of all registered metric names
     """
-    return list(_REGISTERED_METRICS)
+    return sorted(_REGISTERED_METRICS)
backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py (1)

38-57: Clarify naming: task cleans "daily" aggregates on a weekly schedule.

The comment says "weekly cleanup task for daily metrics" but the task is named dashboard_metrics_cleanup_daily. This is technically correct (it cleans data from the daily aggregation table on a weekly schedule), but the naming could cause confusion. Consider renaming to dashboard_metrics_cleanup_daily_weekly or updating the description to clarify it runs weekly to clean daily-aggregated data.

unstract/core/src/unstract/core/metrics/types.py (1)

85-96: Consider defensive error handling in from_dict.

If required keys (org_id, metric_name, metric_value, metric_type) are missing from data, a KeyError will be raised. While the caller should ensure valid data, adding a brief check or using .get() with explicit validation could provide clearer error messages during deserialization failures.
unstract/core/src/unstract/core/metrics/backends/queue.py (1)
151-156: Use logger.exception for error logging with traceback.

Replace logger.error with logger.exception in the exception handlers to include stack traces, which aids debugging production issues.
🔎 Suggested fix
         except KombuError as e:
-            logger.error(f"Failed to queue metric {event.metric_name}: {e}")
+            logger.exception(f"Failed to queue metric {event.metric_name}: {e}")
             return False
         except Exception as e:
-            logger.error(f"Unexpected error queuing metric: {e}")
+            logger.exception(f"Unexpected error queuing metric: {e}")
             return False
backend/dashboard_metrics/views.py (1)
335-369: Improve error handling and observability in live_series metric loop.

Catching a blanket Exception for each metric is reasonable to keep the endpoint partially available, but the current logging (logger.error) loses the traceback and makes debugging failures harder.

Switching to logger.exception here will preserve the stack trace while keeping behavior the same.
Proposed logging tweak
-            except Exception as e:
-                logger.error(f"Failed to fetch metric {metric_name}: {e}")
+            except Exception:
+                logger.exception("Failed to fetch metric %s", metric_name)
                 errors.append(metric_name)
backend/dashboard_metrics/cache.py (1)

38-69: Time‑aware TTL helper is well‑designed but currently unused.

get_time_aware_cache_ttl() correctly distinguishes queries that include the current hour from purely historical ones, which would be useful to avoid over‑caching near‑real‑time data. Right now it isn’t wired into cache_metrics_response, so all endpoints rely on static TTLs.

Consider integrating get_time_aware_cache_ttl() into the decorator (e.g., deriving TTL from a parsed end_date param) for summary/series endpoints that query recent data.
backend/dashboard_metrics/models.py (3)
14-19: Avoid duplicating MetricType that already exists in core metrics

There’s already a MetricType enum in unstract/core/src/unstract/core/metrics/types.py with the same values ("counter", "histogram"). Having two different MetricType definitions (one Enum, one TextChoices) increases the chance of divergence or confusion.

Consider either:

Reusing the core MetricType as the single source of truth and mapping its values into a TextChoices, or

Renaming this one (e.g., DashboardMetricType) to make the separation explicit.

21-37: Manager inheritance can be simplified

DefaultOrganizationManagerMixin already subclasses models.Manager (per backend/utils/models/organization_mixin.py), so inheriting from both is redundant:
class EventMetricsHourlyManager(DefaultOrganizationManagerMixin):
    ...
Same applies to EventMetricsDailyManager and EventMetricsMonthlyManager.

113-130: Address Ruff RUF012 on mutable Meta attributes (indexes, constraints)

Ruff is flagging indexes and constraints as mutable class attributes (RUF012). In Django Meta classes this is common, but if you want to satisfy the linter and improve type hints you can annotate them as ClassVar:
from typing import ClassVar
from django.db import models

class Meta:
    indexes: ClassVar[list[models.Index]] = [
        ...
    ]
    constraints: ClassVar[list[models.Constraint]] = [
        ...
    ]
You can apply the same pattern to the Meta classes of all three models, or alternatively disable RUF012 for Django Meta if you prefer the standard Django style.

Also applies to: 219-236, 325-342

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 3469b02 and 62b1c3b.

⛔ Files ignored due to path filters (1)

backend/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (30)

backend/backend/celery_config.py
backend/backend/settings/base.py
backend/backend/urls_v2.py
backend/dashboard_metrics/__init__.py
backend/dashboard_metrics/apps.py
backend/dashboard_metrics/cache.py
backend/dashboard_metrics/capture.py
backend/dashboard_metrics/migrations/0001_initial.py
backend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.py
backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py
backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py
backend/dashboard_metrics/migrations/__init__.py
backend/dashboard_metrics/models.py
backend/dashboard_metrics/serializers.py
backend/dashboard_metrics/services.py
backend/dashboard_metrics/tasks.py
backend/dashboard_metrics/tests/__init__.py
backend/dashboard_metrics/tests/test_tasks.py
backend/dashboard_metrics/urls.py
backend/dashboard_metrics/views.py
backend/pyproject.toml
docker/docker-compose.yaml
unstract/core/src/unstract/core/metrics/__init__.py
unstract/core/src/unstract/core/metrics/backends/__init__.py
unstract/core/src/unstract/core/metrics/backends/base.py
unstract/core/src/unstract/core/metrics/backends/noop.py
unstract/core/src/unstract/core/metrics/backends/queue.py
unstract/core/src/unstract/core/metrics/config.py
unstract/core/src/unstract/core/metrics/registry.py
unstract/core/src/unstract/core/metrics/types.py

🧰 Additional context used

🧬 Code graph analysis (16)

unstract/core/src/unstract/core/metrics/backends/noop.py (3)

unstract/core/src/unstract/core/metrics/types.py (1)

MetricEvent (52-96)

unstract/core/src/unstract/core/metrics/backends/base.py (3)

AbstractMetricBackend (8-40)

record (15-24)

flush (27-32)

unstract/core/src/unstract/core/metrics/__init__.py (1)

record (58-119)

unstract/core/src/unstract/core/metrics/backends/base.py (4)

unstract/core/src/unstract/core/metrics/types.py (1)

MetricEvent (52-96)

unstract/core/src/unstract/core/metrics/__init__.py (1)

record (58-119)

unstract/core/src/unstract/core/metrics/backends/noop.py (2)

record (25-40)

flush (42-44)

unstract/core/src/unstract/core/metrics/backends/queue.py (3)

record (117-156)

flush (200-205)

close (207-215)

unstract/core/src/unstract/core/metrics/registry.py (2)

unstract/core/src/unstract/core/metrics/types.py (2)

MetricName (15-32)

MetricType (8-12)

backend/dashboard_metrics/models.py (1)

MetricType (14-18)

backend/dashboard_metrics/tasks.py (1)

backend/dashboard_metrics/models.py (4)

EventMetricsDaily (145-248)

EventMetricsHourly (39-142)

EventMetricsMonthly (251-354)

MetricType (14-18)

unstract/core/src/unstract/core/metrics/types.py (1)

backend/dashboard_metrics/models.py (1)

MetricType (14-18)

backend/dashboard_metrics/urls.py (1)

backend/dashboard_metrics/views.py (1)

DashboardMetricsViewSet (40-451)

backend/dashboard_metrics/views.py (4)

backend/permissions/permission.py (1)

IsOrganizationMember (17-22)

backend/utils/user_context.py (1)

UserContext (8-32)

backend/dashboard_metrics/cache.py (1)

cache_metrics_response (88-145)

backend/dashboard_metrics/services.py (2)

MetricsQueryService (24-447)

get_documents_processed (54-86)

backend/dashboard_metrics/models.py (3)

backend/utils/models/organization_mixin.py (2)

DefaultOrganizationManagerMixin (26-29)

DefaultOrganizationMixin (7-23)

unstract/core/src/unstract/core/metrics/types.py (1)

MetricType (8-12)

backend/dashboard_metrics/serializers.py (1)

Meta (110-124)

backend/dashboard_metrics/capture.py (5)

unstract/core/src/unstract/core/metrics/types.py (2)

MetricName (15-32)

MetricType (8-12)

unstract/core/src/unstract/core/metrics/__init__.py (1)

record (58-119)

unstract/core/src/unstract/core/metrics/backends/base.py (1)

record (15-24)

unstract/core/src/unstract/core/metrics/backends/noop.py (1)

record (25-40)

unstract/core/src/unstract/core/metrics/backends/queue.py (1)

record (117-156)

unstract/core/src/unstract/core/metrics/backends/queue.py (3)

unstract/core/src/unstract/core/metrics/types.py (1)

MetricEvent (52-96)

unstract/core/src/unstract/core/metrics/backends/base.py (4)

AbstractMetricBackend (8-40)

close (34-40)

record (15-24)

flush (27-32)

unstract/core/src/unstract/core/metrics/__init__.py (1)

record (58-119)

unstract/core/src/unstract/core/metrics/config.py (3)

unstract/core/src/unstract/core/metrics/backends/base.py (1)

AbstractMetricBackend (8-40)

unstract/core/src/unstract/core/metrics/backends/noop.py (1)

NoopBackend (14-44)

unstract/core/src/unstract/core/metrics/backends/queue.py (1)

QueueBackend (28-215)

backend/dashboard_metrics/serializers.py (2)

backend/dashboard_metrics/models.py (4)

EventMetricsHourly (39-142)

Meta (109-142)

Meta (215-248)

Meta (321-354)

backend/dashboard_metrics/views.py (2)

series (113-189)

summary (70-109)

backend/dashboard_metrics/migrations/0001_initial.py (1)

backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py (1)

Migration (10-261)

unstract/core/src/unstract/core/metrics/backends/__init__.py (3)

unstract/core/src/unstract/core/metrics/backends/base.py (1)

AbstractMetricBackend (8-40)

unstract/core/src/unstract/core/metrics/backends/noop.py (1)

NoopBackend (14-44)

unstract/core/src/unstract/core/metrics/backends/queue.py (1)

QueueBackend (28-215)

backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py (2)

backend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.py (1)

Migration (40-51)

backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py (1)

Migration (10-261)

backend/dashboard_metrics/cache.py (3)

backend/utils/user_context.py (1)

UserContext (8-32)

backend/utils/cache_service.py (1)

clear_cache (41-43)

backend/dashboard_metrics/views.py (1)

overview (193-245)

🪛 Ruff (0.14.10)

unstract/core/src/unstract/core/metrics/backends/noop.py

25-25: Unused method argument: event

(ARG002)

unstract/core/src/unstract/core/metrics/__init__.py

39-53: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

113-113: Do not catch blind exception: Exception

(BLE001)

114-114: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backend/backend/celery_config.py

42-49: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

56-66: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

69-73: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

unstract/core/src/unstract/core/metrics/registry.py

38-38: Avoid specifying long messages outside the exception class

(TRY003)

backend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.py

6-6: Unused function argument: schema_editor

(ARG001)

33-33: Unused function argument: schema_editor

(ARG001)

41-44: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

46-51: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/dashboard_metrics/tasks.py

184-184: Do not catch blind exception: Exception

(BLE001)

257-257: Do not catch blind exception: Exception

(BLE001)

258-258: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

311-311: Do not catch blind exception: Exception

(BLE001)

312-312: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

365-365: Do not catch blind exception: Exception

(BLE001)

366-366: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backend/dashboard_metrics/tests/test_tasks.py

149-149: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

backend/dashboard_metrics/views.py

47-47: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

48-48: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

50-50: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

51-51: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

52-52: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

60-60: Abstract raise to an inner function

(TRY301)

60-60: Avoid specifying long messages outside the exception class

(TRY003)

65-65: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

66-66: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

66-66: Avoid specifying long messages outside the exception class

(TRY003)

193-193: Unused method argument: request

(ARG002)

359-359: Do not catch blind exception: Exception

(BLE001)

360-360: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

403-403: Unused method argument: request

(ARG002)

436-436: Consider moving this statement to an else block

(TRY300)

437-437: Do not catch blind exception: Exception

(BLE001)

438-438: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

448-448: Consider moving this statement to an else block

(TRY300)

449-449: Do not catch blind exception: Exception

(BLE001)

450-450: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backend/dashboard_metrics/models.py

113-130: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

131-142: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

219-236: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

237-248: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

325-342: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

343-354: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/dashboard_metrics/capture.py

45-45: Consider moving this statement to an else block

(TRY300)

unstract/core/src/unstract/core/metrics/backends/queue.py

95-96: try-except-pass detected, consider logging the exception

(S110)

95-95: Do not catch blind exception: Exception

(BLE001)

101-101: Consider moving this statement to an else block

(TRY300)

103-103: Do not catch blind exception: Exception

(BLE001)

113-113: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

115-115: Avoid specifying long messages outside the exception class

(TRY003)

149-149: Consider moving this statement to an else block

(TRY300)

152-152: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

154-154: Do not catch blind exception: Exception

(BLE001)

155-155: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

212-212: Do not catch blind exception: Exception

(BLE001)

backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py

11-14: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

16-261: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/dashboard_metrics/serializers.py

112-123: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/dashboard_metrics/migrations/0001_initial.py

13-15: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

17-136: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py

9-9: Unused function argument: schema_editor

(ARG001)

60-60: Unused function argument: schema_editor

(ARG001)

69-72: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

74-79: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/dashboard_metrics/cache.py

84-84: Probable use of insecure hash functions in hashlib: md5

(S324)

163-163: Consider moving this statement to an else block

(TRY300)

164-164: Do not catch blind exception: Exception

(BLE001)

165-165: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

252-252: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (28)

backend/dashboard_metrics/__init__.py (1)

1-4: LGTM!

Clean package initializer with clear documentation describing the Dashboard Metrics app's purpose.

backend/dashboard_metrics/migrations/__init__.py (1)

1-1: LGTM!

Standard migrations package initializer following Django conventions.

backend/backend/settings/base.py (1)

345-345: LGTM!

The dashboard_metrics app is correctly registered in SHARED_APPS, making it available across all organizations in the multi-tenant setup. This is appropriate for organization-scoped metrics collection and querying.

unstract/core/src/unstract/core/metrics/backends/__init__.py (1)

1-11: LGTM!

Clean public API definition for the metrics backends package. The __all__ declaration correctly matches the imported backends, providing clear control over the public interface.

backend/dashboard_metrics/apps.py (1)

6-11: LGTM!

Standard Django app configuration following Django 4.2+ best practices. The BigAutoField default is appropriate, and the app naming is correct.

backend/dashboard_metrics/tests/__init__.py (1)

1-1: Tests package init is fine

Docstring-only __init__ is harmless and documents the test package clearly.

unstract/core/src/unstract/core/metrics/backends/base.py (1)

1-40: LGTM! Clean abstract interface design.

The abstract base class follows SOLID principles with a minimal, focused interface. The concrete close() method with a default implementation calling flush() is a good design choice that simplifies backend implementations while allowing customization when needed.

unstract/core/src/unstract/core/metrics/backends/noop.py (1)

21-44: LGTM! Appropriate no-op implementation.

The NoopBackend correctly implements the abstract interface and provides helpful one-time logging to inform users that metrics are disabled. The event parameter in record() is required by the AbstractMetricBackend interface, so the static analysis hint (ARG002) is a false positive.

backend/dashboard_metrics/tests/test_tasks.py (1)

1-553: Excellent test coverage!

The test suite is comprehensive and well-organized, covering time truncation helpers, batch processing, bulk upsert operations, cleanup tasks, and full integration flows. Good use of fixtures, edge cases, and assertions to verify data integrity across all metrics tables.

backend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.py (1)

1-51: LGTM! Migration follows Django conventions.

The migration correctly sets up a periodic cleanup task with proper forward and rollback operations. The static analysis hints (ARG001, RUF012) are false positives—the schema_editor parameter and migration class attributes follow Django's required conventions for data migrations.

unstract/core/src/unstract/core/metrics/config.py (1)

1-68: LGTM! Well-designed configuration module.

The environment-driven backend selection with singleton caching is clean and robust. The fallback to NoopBackend when metrics are disabled or misconfigured ensures safe defaults, and the logging at each decision point aids troubleshooting. The reset_backend() function is a good addition for testing scenarios.

unstract/core/src/unstract/core/metrics/__init__.py (1)

58-119: Well-structured public API.

The record() function provides a clean entry point with proper validation, enum-to-string conversion, and graceful error handling that returns False on failures rather than propagating exceptions. The design appropriately decouples the public API from backend implementation details.

unstract/core/src/unstract/core/metrics/registry.py (1)

1-60: Clean registry implementation.

The registry provides efficient O(1) validation via set lookup, clear separation between validation and type retrieval, and appropriate error handling with ValueError for unknown metrics.

backend/backend/celery_config.py (2)

36-53: Good production-ready resilience settings.

The broker connection retry configuration with exponential backoff (interval_step, interval_max), socket keepalive, and worker limits (max_tasks_per_child) will help maintain stability under broker disruptions and prevent memory leaks from long-running workers.

68-73: Task names in task_routes are correctly defined and match registered Celery tasks.

The task names (dashboard_metrics.process_events, dashboard_metrics.cleanup_hourly_data, dashboard_metrics.cleanup_daily_data) match their corresponding @shared_task definitions in backend/dashboard_metrics/tasks.py. Routing is correctly configured.

unstract/core/src/unstract/core/metrics/types.py (1)

8-32: Well-structured metric type definitions.

The dual inheritance from str and Enum enables both type safety and string comparisons. The separation of counters (summed values) and histograms (distributions) is clearly documented and follows established metrics patterns.

backend/dashboard_metrics/urls.py (1)

17-33: Clean URL configuration with proper ordering.

The URL patterns are well-organized with specific paths before the UUID capture pattern, preventing routing conflicts. Using format_suffix_patterns enables content negotiation for API responses.

backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py (1)

60-65: Reverse migration is correct as written.

Migration 0002 creates the dashboard_metrics_cleanup_hourly task. Migration 0004's forward operation uses update_or_create on that existing task (not creating a new one), adds the daily task, and deletes a legacy task. The reverse should return to the state after 0002, which means retaining the hourly task (still managed by 0002's reverse function) and removing only what 0004 added—the daily task. The current implementation is correct.

Likely an incorrect or invalid review comment.

unstract/core/src/unstract/core/metrics/backends/queue.py (1)

178-198: Celery protocol v2 format is compatible with the project's Celery version (5.3.4+).

The message construction follows the standard kombu protocol v2 format [args, kwargs, embed], which remains compatible with Celery 5.3.4+ (and 5.5.3 in the workers package). The headers correctly specify the task name and ID for Celery task routing. No compatibility concerns.

backend/dashboard_metrics/views.py (1)

402-450: Health checks look sound and side‑effect free.

health, _check_database, and _check_cache do minimal, safe operations (an exists() call and a short‑lived cache set/get) and return structured status with appropriate HTTP 200/503 responses; this is suitable for liveness/readiness probes.

backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py (1)

16-260: Schema and indexing for daily/monthly tables look consistent and appropriate.

The daily/monthly models mirror the hourly schema (including labels default, project/tag, organization FK, unique constraints, and GIN/index coverage), which should give predictable semantics and good query performance across all granularities.

backend/dashboard_metrics/services.py (2)

383-447: get_all_metrics_summary aggregates are consistent with individual query methods.

The summary method correctly reuses the per‑metric query functions and normalizes potential NULL sums (e.g., pages and LLM usage) with or 0, producing a clean dict of totals for the live summary endpoint.

230-269: The join logic in get_deployed_api_requests is correct.

WorkflowExecution.pipeline_id is a UUIDField designed to hold IDs of either Pipeline or APIDeployment entities (as evidenced by the conditional logic in backend/workflow_manager/internal_serializers.py). The subquery filtering pipeline_id__in=Subquery(APIDeployment.objects.filter(...).values("id")) correctly extracts deployed API request counts by matching executions where the pipeline_id refers to an API deployment.

backend/dashboard_metrics/migrations/0001_initial.py (1)

17-135: Hourly metrics migration aligns with model definition and indexing strategy.

Field definitions, indexes, and the unique_hourly_metric constraint all match the EventMetricsHourly model description, so the initial schema for hourly metrics looks correct and production‑ready.

backend/dashboard_metrics/serializers.py (1)

11-124: Query and response serializers are consistent with view usage and enforce sane bounds.

MetricsQuerySerializer.validate() correctly defaults the date range, enforces start_date <= end_date, and caps queries at 90 days, while the nested series/summary serializers match the shapes constructed in DashboardMetricsViewSet. The EventMetricsHourlySerializer is read‑only and exposes the expected fields.

backend/dashboard_metrics/cache.py (2)

71-145: Caching decorator behavior and keying strategy look solid.

The decorator scopes cache entries by organization, endpoint name, and a stable hash of sorted query params, and only caches 200 responses. It also cleanly bypasses caching when no organization context is present, which is important for correctness with per‑org data.

148-167: No changes needed – code is correct.

CacheService is a class in backend/utils/cache_service.py, and clear_cache is a static method of that class. The import and call in invalidate_metrics_cache are correct. The implementation will properly invalidate cached metrics for the organization.

Likely an incorrect or invalid review comment.

backend/dashboard_metrics/models.py (1)

57-85: Confirm BaseModel doesn’t define id, and that metric_count’s default is intentional

Two small checks:

If BaseModel already defines an id field, redefining id here as a primary key can cause confusion or conflicts. If BaseModel only adds timestamps and no id, you’re fine; otherwise, consider centralizing the PK definition in one place.

metric_count defaults to 1. That makes sense if every row is always created from at least one event, but if you ever create rows with explicit metric_count or rely on DB-side aggregation, this default could overcount silently.

Please double-check that BaseModel does not define id, and that you never rely on the default metric_count for anything other than “one event just created this row”.

backend/dashboard_metrics/capture.py

backend/dashboard_metrics/models.py

backend/dashboard_metrics/tasks.py

backend/dashboard_metrics/urls.py

backend/dashboard_metrics/views.py

unstract/core/src/unstract/core/metrics/backends/queue.py

unstract/core/src/unstract/core/metrics/types.py

chandrasekharan-zipstack

Changes mostly LGTM. Left some minor comments.

One main question I have is around cache invalidation of the cached, aggregated data - does it happen only on the respective TTLs? I hope possibly receiving stale data for that duration is okay in this usecase

backend/backend/celery_config.py

backend/dashboard_metrics/tests/test_tasks.py

backend/dashboard_metrics/cache.py

backend/dashboard_metrics/tasks.py

backend/dashboard_metrics/views.py

chandrasekharan-zipstack · 2026-01-05T07:59:01Z

backend/dashboard_metrics/cache.py

NIT: This is a cool idea and if refactored a bit to make it generic, we can easily extend the same to any other API / DB call that we wish to cache. Consider this suggestion if its easy to make it generic

unstract/core/src/unstract/core/metrics/backends/noop.py

unstract/core/src/unstract/core/metrics/backends/queue.py

coderabbitai

Actionable comments posted: 6

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents

In @backend/dashboard_metrics/management/commands/backfill_metrics.py:
- Around line 269-298: The outer transaction.atomic() in _upsert_hourly wraps
all update_or_create calls so a DB error (e.g., IntegrityError) can leave the
transaction unusable and abort the rest of the loop; remove the outer
transaction.atomic() (or alternatively use a per-row savepoint) so each
EventMetricsHourly._base_manager.update_or_create call runs in its own atomic
context and the loop can continue on exceptions; also change the unused returned
variable name obj to _ and apply the same change (remove outer atomic or use
per-row savepoints and rename obj → _) in the corresponding daily/monthly upsert
methods that use update_or_create.

In @backend/dashboard_metrics/management/commands/generate_metrics_test_data.py:
- Around line 235-283: The re-query after bulk_create in
_generate_workflow_executions is unreliable and may misalign _target_created_at
values; instead use the list returned by
WorkflowExecution.objects.bulk_create(executions) (these objects have PKs
populated) and iterate that returned list to update each record's created_at via
individual QuerySet.update(id=exec_obj.id) using exec_obj._target_created_at (do
not attempt bulk_update because created_at uses auto_now_add).

In @backend/dashboard_metrics/tasks.py:
- Around line 222-262: The current broad except in the upsert loop swallows
IntegrityError from concurrent inserts causing lost increments; import
django.db.IntegrityError and explicitly catch IntegrityError around the
get_or_create in EventMetricsHourly upsert logic (the block that calls
select_for_update().get_or_create and currently sets obj, was_created), then on
IntegrityError re-query the row (EventMetricsHourly.objects.get(...) for the
same organization_id, timestamp, metric_name, project, tag), set
was_created=False and proceed to update counters and labels as in the existing
else-path; keep the generic Exception handler for other errors but ensure you
continue after logging; apply the same explicit IntegrityError catch-and-retry
pattern to the _bulk_upsert_daily and _bulk_upsert_monthly blocks as well.

In @unstract/core/src/unstract/core/metrics/backends/queue.py:
- Around line 35-62: The QueueBackend __init__ creates self.exchange =
Exchange("", type="direct") which uses the AMQP default exchange and doesn't
match the implicit queue-named exchange used by Celery; change the exchange to a
named direct exchange that matches the queue name (e.g., use
Exchange(self.queue_name, type="direct") or Exchange("dashboard_metric_events",
type="direct") when constructing self.exchange) so published messages route to
the KombuQueue (self.queue) consumers.

🟠 Major comments (19)

backend/dashboard_metrics/management/commands/populate_test_metrics.py-22-33 (1)
22-33: Metric types appear inconsistent with documentation.

The METRICS list defines:

pages_processed as COUNTER (line 25)

deployed_api_requests as HISTOGRAM (line 30)

etl_pipeline_executions as HISTOGRAM (line 31)

However, the README documentation (lines 173-179) indicates:

pages_processed should be HISTOGRAM (sums page counts)

deployed_api_requests and etl_pipeline_executions should be COUNTER (counting events)

This mismatch could cause incorrect test data generation and potentially mask issues during testing.
🐛 Proposed fix
     METRICS = [
         ("documents_processed", MetricType.COUNTER),
-        ("pages_processed", MetricType.COUNTER),
+        ("pages_processed", MetricType.HISTOGRAM),
         ("prompt_executions", MetricType.COUNTER),
         ("llm_calls", MetricType.COUNTER),
         ("challenges", MetricType.COUNTER),
         ("summarization_calls", MetricType.COUNTER),
-        ("deployed_api_requests", MetricType.HISTOGRAM),
-        ("etl_pipeline_executions", MetricType.HISTOGRAM),
+        ("deployed_api_requests", MetricType.COUNTER),
+        ("etl_pipeline_executions", MetricType.COUNTER),
         ("llm_usage", MetricType.HISTOGRAM),
     ]
frontend/src/hooks/useMetricsData.js-21-75 (1)

21-75: Fix hook dependency handling: include callbacks in effect dependency arrays instead of using eslint-disable-line.

The eslint-disable-line comments hide a real issue: useMetricsOverview, useMetricsSummary, and useMetricsSeries all omit their fetch callbacks from the effect dependencies (lines 141, 227, 313), even though these callbacks are in the useCallback dependencies. If axiosPrivate changes (e.g., during auth token refresh), the callbacks are recreated, but the effects won't re-run—so they'll continue calling stale callbacks with outdated axiosPrivate instances.

Additionally, useMetricsData unnecessarily depends on the entire options object instead of destructuring individual properties, causing callback recreation whenever a new options object is passed, even if its values haven't changed.

Proposed fixes:

For useMetricsData (lines 21-75), destructure options and depend on individual properties:

Add at the top: const { startDate, endDate, granularity, metricName } = options;

Change useCallback deps from [axiosPrivate, options, orgId] to [axiosPrivate, orgId, startDate, endDate, granularity, metricName]

Update references from options.startDate → startDate etc.

For the three other hooks (lines 137-145, 223-231, 309-317), include the fetch callback in effect dependencies:

useMetricsOverview: }, [fetchOverview]); (removes the guard; callback has its own internal guard)

useMetricsSummary: }, [fetchSummary]);

useMetricsSeries: }, [fetchSeries]);

backend/dashboard_metrics/migrations/0001_initial.py-17-136 (1)

17-136: Make organization non-nullable in all three metrics tables.

PostgreSQL UNIQUE constraints allow multiple NULL values (NULL ≠ NULL in SQL), so with organization=null, the constraint cannot prevent duplicate rows for the same (bucket, metric_name, project, tag). All actual metric creation explicitly passes organization_id (never None), and the manager filters exclude NULL organizations. Since this is a multi-tenant application and metrics are always org-scoped, organization should be null=False with no default=None to ensure constraint integrity and prevent orphaned records.

Also applies to: 137–258 (EventMetricsDaily), 259–380 (EventMetricsMonthly)
backend/backend/celery_config.py-36-49 (1)
36-49: Fix invalid socket options in broker_transport_options — socket_timeout and socket_keepalive are not recognized by Kombu/py-amqp.

In Celery 5.x with RabbitMQ (py-amqp), the socket timeout and keepalive options in broker_transport_options are misconfigured:

socket_timeout is not a valid transport option; use read_timeout and write_timeout instead.

socket_keepalive is not a valid transport option; py-amqp enables SO_KEEPALIVE by default. If you need to tune keepalive behavior, use the socket_settings dict with platform-specific constants.

These invalid keys will be silently ignored, leaving sockets without timeout protection. The other configuration keys (broker_connection_retry_on_startup, broker_connection_retry, broker_connection_max_retries, and task_routes) are correctly formatted for Celery 5.x.
Recommended fix

Replace the current broker_transport_options with:
broker_transport_options = {
    "max_retries": 3,
    "interval_start": 0,
    "interval_step": 0.2,
    "interval_max": 1,
    "read_timeout": 30,
    "write_timeout": 30,
}
Or, if you need platform-specific keepalive tuning:
from amqp.platform import TCP_USER_TIMEOUT

broker_transport_options = {
    "max_retries": 3,
    "interval_start": 0,
    "interval_step": 0.2,
    "interval_max": 1,
    "read_timeout": 30,
    "write_timeout": 30,
    "socket_settings": {
        TCP_USER_TIMEOUT: 60000,  # ms
    },
}
frontend/src/components/metrics-dashboard/MetricsDashboard.jsx-55-95 (1)
55-95: Normalize date range boundaries to prevent timezone-induced off-by-one-day queries.

Converting local dates directly to UTC ISO strings can shift day boundaries by hours. When a user picks "Jan 30" in EST (UTC-5), .toISOString() produces 2024-01-30T05:00:00Z, causing the query to start mid-day and miss early morning metrics. For hourly aggregations this causes timestamp misalignment; for daily/monthly it silently excludes or includes unintended days.

Explicitly set .startOf("day") and .endOf("day") before converting to ISO to ensure the backend receives the full day range in UTC:
Proposed fix (normalize dates before sending to hooks)
@@
   const [dateRange, setDateRange] = useState([
-    dayjs().subtract(30, "day"),
-    dayjs(),
+    dayjs().subtract(30, "day").startOf("day"),
+    dayjs().endOf("day"),
   ]);
@@
-  } = useMetricsOverview(
-    dateRange[0]?.toISOString(),
-    dateRange[1]?.toISOString()
-  );
+  const startIso = dateRange[0]?.startOf("day").toISOString();
+  const endIso = dateRange[1]?.endOf("day").toISOString();
+
+  } = useMetricsOverview(startIso, endIso);
@@
   } = useMetricsSummary(
-    shouldFetchSummary ? dateRange[0]?.toISOString() : null,
-    shouldFetchSummary ? dateRange[1]?.toISOString() : null,
+    shouldFetchSummary ? startIso : null,
+    shouldFetchSummary ? endIso : null,
     null, // metricName
     dataSource
   );
@@
   const handleDateChange = useCallback((dates) => {
     if (dates && dates.length === 2) {
-      setDateRange(dates);
+      setDateRange([dates[0].startOf("day"), dates[1].endOf("day")]);
     }
   }, []);
backend/dashboard_metrics/tasks.py-526-716 (1)

526-716: aggregate_metrics_from_sources() writes metric_count as “number of buckets”, not “number of events” (breaks averages/derived stats).
You do count += 1 per returned period row (Lines 608-609 / 633-634 / 649-652), so metric_count becomes “hours with data” / “days with data”, not the underlying event count. This will conflict with event-based ingestion where metric_count is the number of recorded events.

At minimum, define/rename the semantic (bucket_count vs event_count) or adjust MetricsQueryService queries to also return an event count (e.g., Count("id")) so metric_count is consistent.
backend/dashboard_metrics/views.py-786-797 (1)
786-797: live_series error entries don’t match the success schema (frontend may crash on missing keys).
Success entries include metric_type, data, total_value; error entries omit metric_type and also omit project/tag/total_count fields that the aggregated endpoints provide.
Proposed fix (keep shape stable)
                 series.append(
                     {
                         "metric_name": metric_name,
+                        "metric_type": "histogram"
+                        if metric_name == "llm_usage"
+                        else "counter",
                         "error": "unavailable",
                         "data": [],
                         "total_value": 0,
+                        "total_count": 0,
                     }
                 )
backend/dashboard_metrics/views.py-277-310 (1)
277-310: average_value should use total_count (events), not len(values) (buckets).
metric_value is aggregated and metric_count is the number of events; dividing by bucket count will skew averages (especially if there are sparse buckets).
Proposed fix
                 summary_list.append(
                     {
@@
-                        "average_value": agg["total_value"] / len(values)
-                        if values
-                        else 0,
+                        "average_value": (
+                            agg["total_value"] / agg["total_count"]
+                            if agg["total_count"]
+                            else 0
+                        ),
                         "min_value": min(values) if values else 0,
                         "max_value": max(values) if values else 0,
                     }
                 )
backend/dashboard_metrics/views.py-129-152 (1)
129-152: Monthly filtering is wrong when start_date is mid-month (you’ll drop the starting month).
month rows are typically stored as the first day of the month; filtering month__gte=start_date.date() will exclude that month unless start_date is on day 1.
Proposed fix
         queryset = queryset.filter(
             **{
-                f"{ts_field}__gte": params["start_date"]
-                if source == "hourly"
-                else params["start_date"].date(),
+                f"{ts_field}__gte": (
+                    params["start_date"]
+                    if source == "hourly"
+                    else (
+                        params["start_date"].date().replace(day=1)
+                        if source == "monthly"
+                        else params["start_date"].date()
+                    )
+                ),
                 f"{ts_field}__lte": params["end_date"]
                 if source == "hourly"
                 else params["end_date"].date(),
             }
         )
backend/dashboard_metrics/services.py-1-35 (1)
1-35: Docstring says “Uses _base_manager for models…”, but only Usage is bypassed (Celery aggregation will likely return zeros).
APIDeployment, Pipeline, and often WorkflowExecution/PageUsage also use organization-aware managers; in Celery context UserContext is absent, so .objects may auto-filter to empty.
#!/bin/bash
# Verify which models use DefaultOrganizationManagerMixin-like filtering.
rg -n --type=py "DefaultOrganizationManagerMixin|DefaultOrganizationMixin|UserContext\.get_organization\(" backend | head -n 200
rg -n --type=py "objects\s*=\s*.*ModelManager|objects\s*=" backend/api_v2/models.py backend/pipeline_v2/models.py backend/workflow_manager/workflow_v2/models/execution.py backend/account_usage/models.py 2>/dev/null || true
backend/dashboard_metrics/tasks.py-79-209 (1)
79-209: Normalize metric_type (string vs enum) and protect against non-numeric metric_value.
QueueBackend sends metric_type as a string; here default uses MetricType.COUNTER (enum). Mixed types can leak into DB writes.
Proposed fix
-            metric_value = event.get("metric_value", 1)
-            metric_type = event.get("metric_type", MetricType.COUNTER)
+            metric_value = event.get("metric_value", 1)
+            try:
+                metric_value = float(metric_value)
+            except (TypeError, ValueError):
+                raise ValueError(f"Invalid metric_value: {metric_value!r}")
+
+            metric_type_raw = event.get("metric_type", MetricType.COUNTER)
+            try:
+                metric_type = MetricType(metric_type_raw)
+            except Exception:
+                metric_type = MetricType.COUNTER
backend/dashboard_metrics/services.py-67-399 (1)
67-399: Use a consistent “unfiltered manager” helper for all source models used by Celery tasks.
Otherwise aggregate_metrics_from_sources() will compute incorrect results for at least deployed_api_requests / etl_pipeline_executions (and possibly others).
Suggested pattern
+def _all(model):
+    # Prefer _base_manager to bypass org-context managers in Celery.
+    return getattr(model, "_base_manager", model.objects).all()
@@
-            WorkflowFileExecution.objects.filter(
+            _all(WorkflowFileExecution).filter(
@@
-            PageUsage.objects.filter(
+            _all(PageUsage).filter(
@@
-        api_subquery = APIDeployment.objects.filter(
+        api_subquery = _all(APIDeployment).filter(
             organization_id=organization_id
         ).values("id")
@@
-            WorkflowExecution.objects.filter(
+            _all(WorkflowExecution).filter(
@@
-        pipeline_subquery = Pipeline.objects.filter(
+        pipeline_subquery = _all(Pipeline).filter(
             organization_id=organization_id,
             pipeline_type="ETL",
         ).values("id")
@@
-            WorkflowExecution.objects.filter(
+            _all(WorkflowExecution).filter(
@@
-            WorkflowExecution.objects.filter(
+            _all(WorkflowExecution).filter(
backend/dashboard_metrics/management/commands/backfill_metrics.py-278-280 (1)
278-280: Be consistent with _base_manager vs .objects when bypassing org-context filters.
Hourly uses _base_manager, daily/monthly use .objects. If these models have a default manager that applies org scoping (as noted in PR context), daily/monthly backfill could be incomplete.
Proposed fix
-                    obj, was_created = EventMetricsDaily.objects.update_or_create(
+                    _, was_created = EventMetricsDaily._base_manager.update_or_create(
@@
-                    obj, was_created = EventMetricsMonthly.objects.update_or_create(
+                    _, was_created = EventMetricsMonthly._base_manager.update_or_create(
Also applies to: 309-311, 340-343
backend/dashboard_metrics/cache.py-163-179 (1)
163-179: Cache invalidation does not clear bucket caches for the org.
invalidate_metrics_cache() deletes dashboard_metrics:{org_id}:*, but bucket keys are dashboard_metrics:bucket:<granularity>:<org_id>:..., so they survive invalidation.
Proposed fix (clear both namespaces)
 def invalidate_metrics_cache(org_id: str) -> int:
@@
-    pattern = f"{CACHE_PREFIX}:{org_id}:*"
+    patterns = [
+        f"{CACHE_PREFIX}:{org_id}:*",
+        f"{CACHE_PREFIX}:bucket:*:{org_id}:*",
+    ]
     try:
-        CacheService.clear_cache(pattern)
+        for pattern in patterns:
+            CacheService.clear_cache(pattern)
         logger.info(f"Invalidated metrics cache for org {org_id}")
         return 1  # Pattern-based delete doesn't return count
Also applies to: 333-364
backend/dashboard_metrics/capture.py-79-82 (1)
79-82: Don’t mutate caller-provided labels dict.
metric_labels = labels or {} + adding keys mutates the caller’s dict when labels is passed in.
Proposed fix
-        metric_labels = labels or {}
+        metric_labels = {**(labels or {})}
Apply similarly in all methods that enrich labels.
Also applies to: 116-119, 155-158, 208-213, 260-263, 328-331
backend/dashboard_metrics/capture.py-33-35 (1)
33-35: Avoid import-time settings capture for METRICS_ENABLED (stale toggle in tests/runtime).
METRICS_ENABLED is fixed at import time, so changing settings.DASHBOARD_METRICS_ENABLED during tests or runtime won’t take effect.
Proposed fix
-# Flag to control whether metrics are captured (can be disabled for testing)
-METRICS_ENABLED = getattr(settings, "DASHBOARD_METRICS_ENABLED", True)
+def _metrics_enabled() -> bool:
+    # Read at call time so tests/runtime overrides work.
+    return getattr(settings, "DASHBOARD_METRICS_ENABLED", True)
Then replace if not METRICS_ENABLED: with if not _metrics_enabled(): in all methods.
backend/dashboard_metrics/management/commands/backfill_metrics.py-35-49 (1)
35-49: Use ClassVar + ensure metric_name values match the rest of the metrics system.
Two concerns:

METRIC_CONFIGS is a mutable class attribute (RUF012).

Using raw strings ("documents_processed", etc.) risks drift vs canonical MetricName identifiers used by capture/queue processing.
Proposed fix
+from typing import ClassVar
@@
-    METRIC_CONFIGS = [
+    METRIC_CONFIGS: ClassVar[list[tuple[str, Any, bool]]] = [
         ("documents_processed", MetricsQueryService.get_documents_processed, False),
Separately, consider importing the canonical metric name enum/constants and using them here (if available).
backend/dashboard_metrics/management/commands/backfill_metrics.py-58-63 (1)
58-63: --org-id uses wrong field: should query organization_id, not id.

The help text says "organization ID" and the parameter accepts a string (line 60), but the code queries the integer id field (line 105). Throughout the codebase, the standard pattern is to use Organization.objects.get(organization_id=org_id) for string identifiers. If a caller passes the organization's string identifier (the organization_id field), this will silently fail with Organization.DoesNotExist.

Change line 105 to:
orgs = [Organization.objects.get(organization_id=org_id)]
This aligns with the consistent pattern used in internal_api_views.py, user_context.py, and the rate limit management commands.
backend/dashboard_metrics/management/commands/backfill_metrics.py-194-257 (1)

194-257: metric_count currently counts periods (rows returned), not events or samples, which breaks histogram semantics and is meaningless for counters.

Given that MetricsQueryService.* returns pre-aggregated data (one row per period with value already aggregated), count += 1 makes metric_count represent the number of time periods rather than the number of events or samples. For example:

documents_processed: Returns 1 row per hour with value=Count(id) of 100 documents, but metric_count=1 (the row count)

pages_processed (histogram): Returns 1 row per hour with value=Sum(pages_processed) of 500 pages from 10 source records, but metric_count=1 instead of 10

To fix this, query methods need to return sample/event count information, or the backfill code should be restructured to count source records directly. For counters, metric_count could equal value (event count already in the value); for histograms, it should reflect the actual number of aggregated records.

🟡 Minor comments (13)

backend/dashboard_metrics/README.md-729-733 (1)

729-733: Migration file names may be out of sync.

The file structure section lists migrations as 0002_setup_cleanup_tasks.py and 0003_setup_aggregation_task.py, but the PR description indicates different migration names (0002_setup_cleanup_periodic_task, 0003_add_daily_monthly_tables, 0004_update_cleanup_periodic_tasks). Please update the documentation to match the actual migration filenames.

backend/dashboard_metrics/management/commands/generate_metrics_test_data.py-397-425 (1)

397-425: PageUsage.created_at backfill ignores the originally chosen timestamps.
You compute created_at per record, but then overwrite with a new random timestamp during the update loop. Store _target_created_at like other generators and update from that.

frontend/src/hooks/useMetricsData.js-77-84 (1)

77-84: Docs don’t match behavior for default date range.
JSDoc says overview defaults to “7 days ago → now”, but the hook doesn’t implement those defaults (it passes null through). Either implement defaults or fix the docs to avoid caller confusion.

Also applies to: 85-92

frontend/src/components/metrics-dashboard/MetricsChart.jsx-74-87 (1)

74-87: Handle “no metrics selected” when allowClear is used.
With allowClear, selectedMetrics can become [], and the chart renders with no series (looks broken). Consider either preventing empty selection, auto-falling back to the first available metric, or rendering an <Empty /> state when metricsToShow.length === 0.

Also applies to: 118-258
frontend/src/components/metrics-dashboard/MetricsTable.jsx-64-77 (1)
64-77: Handle null values in min/max render functions.

The check value !== undefined passes for null, but null.toFixed(2) throws a TypeError. The API may return null for metrics without min/max values.
🐛 Proposed fix
       {
         title: "Min",
         dataIndex: "min_value",
         key: "min_value",
-        render: (value) => (value !== undefined ? value.toFixed(2) : "-"),
+        render: (value) => (value != null ? value.toFixed(2) : "-"),
         align: "right",
       },
       {
         title: "Max",
         dataIndex: "max_value",
         key: "max_value",
-        render: (value) => (value !== undefined ? value.toFixed(2) : "-"),
+        render: (value) => (value != null ? value.toFixed(2) : "-"),
         align: "right",
       },
frontend/src/components/metrics-dashboard/MetricsSummary.jsx-13-118 (1)

13-118: Potential React key collision if totals aren’t unique by metric_name.
If the API ever returns multiple rows for the same metric_name (e.g., per project/tag), cards will overwrite each other due to key={metric.metric_name}. Consider keying by a stable composite (or fallback to index).
unstract/core/src/unstract/core/metrics/__init__.py-94-101 (1)
94-101: Use parameterized logging for the unknown-metric warning.
Avoid f-strings in logger calls.
Proposed fix
@@
     if not validate_metric(metric_name):
-        logger.warning(f"Unknown metric: {metric_name}. Skipping.")
+        logger.warning("Unknown metric: %s. Skipping.", metric_name)
         return False
backend/backend/celery_config.py-34-74 (1)
34-74: Add ClassVar annotations for mutable CeleryConfig attributes (Ruff RUF012).
This avoids accidental instance-level mutation patterns and satisfies the linter.
Proposed fix
@@
 from urllib.parse import quote_plus
 
 from django.conf import settings
 from kombu import Queue
+from typing import ClassVar, Any
 
 
 class CeleryConfig:
@@
-    broker_transport_options = {
+    broker_transport_options: ClassVar[dict[str, Any]] = {
         "max_retries": 3,
         "interval_start": 0,
         "interval_step": 0.2,
         "interval_max": 1,
         "socket_timeout": 30,
         "socket_keepalive": True,
     }
@@
-    task_queues = [
+    task_queues: ClassVar[list[Queue]] = [
         Queue("celery", routing_key="celery"),
         Queue("dashboard_metric_events", routing_key="dashboard_metric_events"),
         Queue("celery_periodic_logs", routing_key="celery_periodic_logs"),
         Queue("celery_log_task_queue", routing_key="celery_log_task_queue"),
         Queue("celery_api_deployments", routing_key="celery_api_deployments"),
         Queue("file_processing", routing_key="file_processing"),
         Queue("api_file_processing", routing_key="api_file_processing"),
         Queue("file_processing_callback", routing_key="file_processing_callback"),
         Queue("api_file_processing_callback", routing_key="api_file_processing_callback"),
     ]
@@
-    task_routes = {
+    task_routes: ClassVar[dict[str, dict[str, str]]] = {
         "dashboard_metrics.process_events": {"queue": "dashboard_metric_events"},
         "dashboard_metrics.aggregate_from_sources": {"queue": "dashboard_metric_events"},
         "dashboard_metrics.cleanup_hourly_data": {"queue": "dashboard_metric_events"},
         "dashboard_metrics.cleanup_daily_data": {"queue": "dashboard_metric_events"},
     }
backend/dashboard_metrics/tasks.py-553-567 (1)
553-567: pages_processed is marked histogram, but it’s a counter-like metric (sum of pages).
Unless you intentionally treat it as histogram in UI/rollups, set it to False here.
Proposed fix
-        ("pages_processed", MetricsQueryService.get_pages_processed, True),
+        ("pages_processed", MetricsQueryService.get_pages_processed, False),
backend/dashboard_metrics/cache.py-185-196 (1)
185-196: Type mismatch: set_cached_overview expects dict but warm_cache() passes a list.
Either change the annotation to list[dict[str, Any]] or change what’s stored.
Proposed fix (make it a list everywhere)
 class MetricsCacheService:
@@
-    def get_cached_overview(org_id: str) -> dict[str, Any] | None:
+    def get_cached_overview(org_id: str) -> list[dict[str, Any]] | None:
@@
-    def set_cached_overview(org_id: str, data: dict[str, Any]) -> None:
+    def set_cached_overview(org_id: str, data: list[dict[str, Any]]) -> None:
Also applies to: 231-256
backend/dashboard_metrics/capture.py-215-233 (1)
215-233: Return value should reflect both LLM_CALLS and LLM_USAGE writes (partial success hidden).
If cost > 0 and usage write fails, the method still returns call_result.
Proposed fix
-        if cost > 0:
-            record(
+        usage_result = True
+        if cost > 0:
+            usage_result = record(
                 org_id=org_id,
                 metric_name=MetricName.LLM_USAGE,
                 metric_value=cost,
                 labels=metric_labels,
                 project=project,
             )
-
-        return call_result
+        return call_result and usage_result
backend/dashboard_metrics/capture.py-159-177 (1)
159-177: Validate pages to avoid negative/zero page metrics.
As-is, pages can be 0/negative and will write incorrect histogram/counter values.
Proposed fix
     def record_document_processed(
         org_id: str,
         pages: int = 1,
@@
         if record is None:
             return False
+        if pages < 0:
+            return False
backend/dashboard_metrics/cache.py-142-157 (1)

142-157: Cached Response from serialized data is fragile for future customization.

The current pattern caches response.data (a dict) and reconstructs it as Response(cached) on cache hits. This works fine for plain JSON endpoints, and the four decorated endpoints here (overview, live_summary, live_series, live-summary) currently set no custom headers or renderers, so there's no immediate impact.

However, this pattern is brittle: if custom headers or non-default renderers are added to a cached endpoint in the future, they would silently be lost on cache hits without warning. Consider documenting that this decorator is only suitable for endpoints that return unadorned JSON, or cache/return the full response object with headers intact.

🧹 Nitpick comments (16)

backend/dashboard_metrics/README.md (1)

15-23: Consider adding language hints to code blocks.

Several fenced code blocks in the documentation (ASCII diagrams, data flow, file structure) lack language specifiers. While these are primarily visual diagrams, adding text or plaintext as the language identifier would satisfy markdown linters and improve consistency.
frontend/src/helpers/metricsCache.js (1)
24-30: Consider handling complex parameter values.

The buildCacheKey function converts params to strings using template literals, which would produce [object Object] for nested objects or inconsistent results for arrays. If the metrics API ever uses complex query parameters, this could cause cache key collisions or misses.
♻️ Suggested improvement
 function buildCacheKey(endpoint, params) {
   const sortedKeys = Object.keys(params).sort();
-  const paramStr = sortedKeys.map((k) => `${k}=${params[k]}`).join("&");
+  const paramStr = sortedKeys
+    .map((k) => `${k}=${JSON.stringify(params[k])}`)
+    .join("&");
   // Use simple encoding to avoid btoa issues with unicode
   const hash = paramStr ? `_${encodeURIComponent(paramStr)}` : "";
   return `${CACHE_PREFIX}${endpoint}${hash}`;
 }
backend/dashboard_metrics/management/commands/populate_test_metrics.py (1)
60-67: Consider chaining exceptions for better debugging.

When re-raising exceptions, chaining them preserves the original traceback which aids debugging.
♻️ Suggested improvement
         if org_id:
             try:
                 org = Organization.objects.get(id=org_id)
             except Organization.DoesNotExist:
-                raise CommandError(f"Organization with ID {org_id} not found")
+                raise CommandError(f"Organization with ID {org_id} not found") from None
         else:
             org = Organization.objects.first()
             if not org:
                 raise CommandError("No organizations found. Create one first.")
frontend/src/components/metrics-dashboard/MetricsChart.jsx (1)
266-292: Stabilize MultiSeriesChart metric order for consistent colors/legend.
metricNames comes from Object.entries(data.series) iteration order; sorting will keep line colors stable across runs and payload shapes.
Proposed diff
   const { chartData, metricNames } = useMemo(() => {
     if (!data?.series) return { chartData: [], metricNames: [] };

     const dateMap = new Map();
-    const names = [];
+    const names = [];

     Object.entries(data.series).forEach(([metricName, values]) => {
       names.push(metricName);
       values.forEach(({ date, value }) => {
         if (!dateMap.has(date)) {
           dateMap.set(date, { date });
         }
         dateMap.get(date)[metricName] = value;
       });
     });

     const sorted = Array.from(dateMap.values()).sort(
       (a, b) => new Date(a.date) - new Date(b.date)
     );

-    return { chartData: sorted, metricNames: names };
+    return { chartData: sorted, metricNames: names.sort() };
   }, [data]);
Also applies to: 341-353
backend/dashboard_metrics/models.py (1)

63-105: Consider making metric_count non-negative.
metric_count should never be negative; PositiveIntegerField (or PositiveBigIntegerField if large) better encodes the invariant.

Also applies to: 169-211, 275-317

unstract/core/src/unstract/core/metrics/__init__.py (1)

39-53: Ruff RUF022: __all__ ordering.
Not functionally important, but easy to clean up if you want to keep lint noise down.

unstract/core/src/unstract/core/metrics/backends/queue.py (2)

69-115: Retry/backoff here can stall request threads for several seconds when RabbitMQ is down.
If record() is called in synchronous web paths, the exponential sleeps (Line 111) can become user-facing latency. Consider making retry configurable per environment (or “fail fast” by default), and/or using Kombu’s publish retry policies instead of reconnect sleeps.

90-97: Swallowing connection-close errors hides useful diagnostics.
At least log at debug level so broker flaps are diagnosable.

backend/dashboard_metrics/views.py (4)

165-232: Bucket cache DB fallback is N-queries-per-missing-hour (cold cache over 7 days can be ~168 queries).
This will be expensive under load. Prefer one ranged query for the missing span and then bucket in Python.

391-401: Remove the inner from datetime import datetime (it’s redundant and easy to shadow).
You already import datetime at the top (Line 5); the inner import (Line 391) is unnecessary.

486-523: Monthly path uses TruncDay(month) and then returns “daily” shaped points; clarify granularity semantics.
If the UI requests granularity=week/day but source is monthly, you’ll still return one point per month-start day. If that’s intended, OK—otherwise consider forcing granularity to month when source=monthly (and exposing month in the serializer choices).

868-878: Cache health check key is global; use a namespaced/random key to avoid cross-worker interference.
Multiple instances probing the same key can produce false negatives.

backend/dashboard_metrics/services.py (1)

101-134: Sums can return NULL; consider Coalesce(..., 0) to avoid downstream None checks.
You already guard in get_all_metrics_summary for some metrics; pushing it into the query keeps call sites simpler.

Also applies to: 330-364

backend/dashboard_metrics/tasks.py (1)

577-580: Iterate orgs with .iterator() to avoid loading all org IDs into memory.
Especially if this runs every 15 minutes.

backend/dashboard_metrics/serializers.py (1)

11-72: Validation is solid; consider making naive datetimes explicitly timezone-aware.
Depending on DRF settings, DateTimeField may accept naive values; the views mix both patterns (overview makes aware). Centralizing this in the serializer avoids surprises.
backend/dashboard_metrics/cache.py (1)
101-105: Prefer SHA-256 over MD5 for cache key hashing (FIPS/security posture).
MD5 isn’t a security risk here, but it is often disallowed in hardened environments.
Proposed fix
-    params_hash = hashlib.md5(sorted_params.encode()).hexdigest()[:12]
+    params_hash = hashlib.sha256(sorted_params.encode()).hexdigest()[:12]

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 62b1c3b and cfc4dc5.

⛔ Files ignored due to path filters (2)

backend/uv.lock is excluded by !**/*.lock
frontend/src/assets/dashboard.svg is excluded by !**/*.svg

📒 Files selected for processing (36)

backend/backend/celery_config.py
backend/backend/settings/base.py
backend/dashboard_metrics/README.md
backend/dashboard_metrics/cache.py
backend/dashboard_metrics/capture.py
backend/dashboard_metrics/management/__init__.py
backend/dashboard_metrics/management/commands/__init__.py
backend/dashboard_metrics/management/commands/backfill_metrics.py
backend/dashboard_metrics/management/commands/generate_metrics_test_data.py
backend/dashboard_metrics/management/commands/populate_test_metrics.py
backend/dashboard_metrics/migrations/0001_initial.py
backend/dashboard_metrics/migrations/0002_setup_cleanup_tasks.py
backend/dashboard_metrics/migrations/0003_setup_aggregation_task.py
backend/dashboard_metrics/models.py
backend/dashboard_metrics/serializers.py
backend/dashboard_metrics/services.py
backend/dashboard_metrics/tasks.py
backend/dashboard_metrics/tests/test_tasks.py
backend/dashboard_metrics/urls.py
backend/dashboard_metrics/views.py
backend/pyproject.toml
frontend/package.json
frontend/src/components/metrics-dashboard/MetricsChart.jsx
frontend/src/components/metrics-dashboard/MetricsDashboard.css
frontend/src/components/metrics-dashboard/MetricsDashboard.jsx
frontend/src/components/metrics-dashboard/MetricsSummary.jsx
frontend/src/components/metrics-dashboard/MetricsTable.jsx
frontend/src/components/metrics-dashboard/index.js
frontend/src/components/navigations/side-nav-bar/SideNavBar.jsx
frontend/src/helpers/metricsCache.js
frontend/src/hooks/useMetricsData.js
frontend/src/pages/MetricsDashboardPage.jsx
frontend/src/routes/useMainAppRoutes.js
unstract/core/src/unstract/core/metrics/__init__.py
unstract/core/src/unstract/core/metrics/backends/noop.py
unstract/core/src/unstract/core/metrics/backends/queue.py

✅ Files skipped from review due to trivial changes (1)

frontend/src/components/metrics-dashboard/MetricsDashboard.css

🚧 Files skipped from review as they are similar to previous changes (4)

backend/pyproject.toml
backend/dashboard_metrics/tests/test_tasks.py
backend/dashboard_metrics/urls.py
backend/backend/settings/base.py

🧰 Additional context used

🧬 Code graph analysis (8)

frontend/src/components/metrics-dashboard/MetricsChart.jsx (1)

frontend/src/components/metrics-dashboard/MetricsDashboard.jsx (1)

Typography (34-34)

frontend/src/components/metrics-dashboard/MetricsDashboard.jsx (1)

frontend/src/components/metrics-dashboard/MetricsChart.jsx (2)

useMemo (130-164)

useMemo (267-291)

backend/dashboard_metrics/services.py (5)

backend/api_v2/models.py (1)

APIDeployment (43-154)

backend/pipeline_v2/models.py (1)

Pipeline (35-156)

backend/usage_v2/models.py (1)

Usage (26-81)

backend/workflow_manager/file_execution/models.py (1)

WorkflowFileExecution (68-250)

backend/workflow_manager/workflow_v2/models/execution.py (2)

WorkflowExecution (102-372)

organization_id (266-274)

backend/dashboard_metrics/serializers.py (2)

backend/dashboard_metrics/models.py (4)

EventMetricsHourly (39-142)

Meta (109-142)

Meta (215-248)

Meta (321-354)

backend/dashboard_metrics/views.py (2)

series (355-566)

summary (243-352)

backend/dashboard_metrics/management/commands/backfill_metrics.py (3)

backend/account_v2/models.py (1)

Organization (12-46)

backend/dashboard_metrics/models.py (4)

EventMetricsDaily (145-248)

EventMetricsHourly (39-142)

EventMetricsMonthly (251-354)

MetricType (14-18)

backend/dashboard_metrics/services.py (10)

MetricsQueryService (37-464)

get_documents_processed (67-99)

get_pages_processed (102-133)

get_llm_calls (136-169)

get_challenges (172-206)

get_summarization_calls (209-243)

get_deployed_api_requests (246-285)

get_etl_pipeline_executions (288-328)

get_llm_usage_cost (331-364)

get_prompt_executions (367-398)

unstract/core/src/unstract/core/metrics/backends/noop.py (3)

unstract/core/src/unstract/core/metrics/backends/base.py (3)

AbstractMetricBackend (8-40)

record (15-24)

flush (27-32)

unstract/core/src/unstract/core/metrics/types.py (1)

MetricEvent (52-96)

unstract/core/src/unstract/core/metrics/__init__.py (1)

record (58-119)

backend/dashboard_metrics/migrations/0001_initial.py (2)

backend/dashboard_metrics/migrations/0003_setup_aggregation_task.py (1)

Migration (40-51)

backend/dashboard_metrics/migrations/0002_setup_cleanup_tasks.py (1)

Migration (70-81)

unstract/core/src/unstract/core/metrics/__init__.py (5)

unstract/core/src/unstract/core/metrics/config.py (2)

get_backend (28-60)

reset_backend (63-68)

unstract/core/src/unstract/core/metrics/registry.py (3)

get_all_metrics (42-48)

get_metric_type (25-39)

validate_metric (13-22)

unstract/core/src/unstract/core/metrics/types.py (3)

MetricEvent (52-96)

MetricName (15-32)

MetricType (8-12)

unstract/core/src/unstract/core/metrics/backends/noop.py (1)

record (25-40)

unstract/core/src/unstract/core/metrics/backends/queue.py (1)

record (117-156)

🪛 markdownlint-cli2 (0.18.1)

backend/dashboard_metrics/README.md

15-15: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

26-26: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

47-47: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

54-54: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

86-86: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

187-187: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

299-299: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

482-482: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

512-512: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

712-712: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.11)

backend/dashboard_metrics/migrations/0003_setup_aggregation_task.py

6-6: Unused function argument: schema_editor

(ARG001)

33-33: Unused function argument: schema_editor

(ARG001)

41-44: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

46-51: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/backend/celery_config.py

42-49: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

56-66: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

69-74: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

unstract/core/src/unstract/core/metrics/backends/queue.py

95-96: try-except-pass detected, consider logging the exception

(S110)

95-95: Do not catch blind exception: Exception

(BLE001)

101-101: Consider moving this statement to an else block

(TRY300)

103-103: Do not catch blind exception: Exception

(BLE001)

113-113: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

115-115: Avoid specifying long messages outside the exception class

(TRY003)

149-149: Consider moving this statement to an else block

(TRY300)

212-212: Do not catch blind exception: Exception

(BLE001)

backend/dashboard_metrics/models.py

113-130: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

131-142: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

219-236: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

237-248: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

325-342: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

343-354: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/dashboard_metrics/management/commands/populate_test_metrics.py

23-33: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

53-53: Unused method argument: args

(ARG002)

63-63: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

63-63: Avoid specifying long messages outside the exception class

(TRY003)

67-67: Avoid specifying long messages outside the exception class

(TRY003)

91-91: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

117-117: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

167-167: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

172-172: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

174-174: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

176-176: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

177-177: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

184-186: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

187-187: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

190-190: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

193-193: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

backend/dashboard_metrics/tasks.py

186-186: Do not catch blind exception: Exception

(BLE001)

398-398: Unpacked variable obj is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

439-439: Unpacked variable obj is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

480-480: Unpacked variable obj is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

669-669: Do not catch blind exception: Exception

(BLE001)

692-692: Redundant exception object included in logging.exception call

(TRY401)

backend/dashboard_metrics/management/commands/generate_metrics_test_data.py

29-32: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

34-41: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

43-45: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

48-54: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

56-62: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

89-89: Unused method argument: args

(ARG002)

232-232: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

244-244: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

246-246: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

250-252: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

253-255: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

258-258: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

259-259: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

285-285: Unused method argument: start_time

(ARG002)

285-285: Unused method argument: end_time

(ARG002)

292-292: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

299-299: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

302-302: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

307-307: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

308-308: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

309-309: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

313-313: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

351-351: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

361-361: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

364-364: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

365-365: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

369-369: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

379-379: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

408-408: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

409-409: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

411-411: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

412-412: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

421-421: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

backend/dashboard_metrics/migrations/0002_setup_cleanup_tasks.py

6-6: Unused function argument: schema_editor

(ARG001)

58-58: Unused function argument: schema_editor

(ARG001)

71-74: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

76-81: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/dashboard_metrics/serializers.py

121-132: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/dashboard_metrics/capture.py

45-45: Consider moving this statement to an else block

(TRY300)

backend/dashboard_metrics/management/commands/backfill_metrics.py

35-49: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

85-85: Unused method argument: args

(ARG002)

258-258: Do not catch blind exception: Exception

(BLE001)

278-278: Unpacked variable obj is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

295-295: Do not catch blind exception: Exception

(BLE001)

309-309: Unpacked variable obj is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

326-326: Do not catch blind exception: Exception

(BLE001)

340-340: Unpacked variable obj is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

357-357: Do not catch blind exception: Exception

(BLE001)

unstract/core/src/unstract/core/metrics/backends/noop.py

25-25: Unused method argument: event

(ARG002)

backend/dashboard_metrics/migrations/0001_initial.py

13-15: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

17-381: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

backend/dashboard_metrics/cache.py

103-103: Probable use of insecure hash functions in hashlib: md5

(S324)

176-176: Consider moving this statement to an else block

(TRY300)

277-277: Do not catch blind exception: Exception

(BLE001)

470-470: Consider moving this statement to an else block

(TRY300)

510-510: Do not catch blind exception: Exception

(BLE001)

backend/dashboard_metrics/views.py

59-59: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

60-60: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

62-62: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

63-63: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

64-64: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

72-72: Abstract raise to an inner function

(TRY301)

72-72: Avoid specifying long messages outside the exception class

(TRY003)

78-78: Avoid specifying long messages outside the exception class

(TRY003)

84-84: Avoid specifying long messages outside the exception class

(TRY003)

235-235: Loop control variable bucket_ts not used within loop body

(B007)

237-237: Loop control variable bucket_ts not used within loop body

(B007)

679-679: Avoid specifying long messages outside the exception class

(TRY003)

737-737: Avoid specifying long messages outside the exception class

(TRY003)

830-830: Unused method argument: request

(ARG002)

863-863: Consider moving this statement to an else block

(TRY300)

875-875: Consider moving this statement to an else block

(TRY300)

unstract/core/src/unstract/core/metrics/__init__.py

39-53: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

backend/dashboard_metrics/cache.py

coderabbitai · 2026-01-13T18:23:03Z

backend/dashboard_metrics/management/commands/backfill_metrics.py

+    def _upsert_hourly(self, aggregations: dict) -> tuple[int, int]:
+        """Upsert hourly aggregations."""
+        created, updated = 0, 0
+        with transaction.atomic():
+            for key, agg in aggregations.items():
+                org_id, ts_str, metric_name, project, tag = key
+                timestamp = datetime.fromisoformat(ts_str)
+
+                try:
+                    obj, was_created = EventMetricsHourly._base_manager.update_or_create(
+                        organization_id=org_id,
+                        timestamp=timestamp,
+                        metric_name=metric_name,
+                        project=project,
+                        tag=tag,
+                        defaults={
+                            "metric_type": agg["metric_type"],
+                            "metric_value": agg["value"],
+                            "metric_count": agg["count"],
+                            "labels": agg["labels"],
+                        },
+                    )
+                    if was_created:
+                        created += 1
+                    else:
+                        updated += 1
+                except Exception as e:
+                    logger.warning(f"Error upserting hourly {key}: {e}")
+
+        return created, updated


⚠️ Potential issue | 🔴 Critical

Catching exceptions inside a single outer transaction.atomic() can break the whole backfill.
If any DB error occurs (e.g., IntegrityError), the transaction may become unusable and subsequent iterations can fail even though exceptions are caught.

Simplest fix: remove the outer transaction.atomic() (each update_or_create() is already atomic), or use a per-row savepoint.

Proposed fix (remove outer atomic + keep loop going safely)

def _upsert_hourly(self, aggregations: dict) -> tuple[int, int]: """Upsert hourly aggregations.""" created, updated = 0, 0 - with transaction.atomic(): - for key, agg in aggregations.items(): - org_id, ts_str, metric_name, project, tag = key - timestamp = datetime.fromisoformat(ts_str) + for key, agg in aggregations.items(): + org_id, ts_str, metric_name, project, tag = key + timestamp = datetime.fromisoformat(ts_str) - try: - obj, was_created = EventMetricsHourly._base_manager.update_or_create( + try: + _, was_created = EventMetricsHourly._base_manager.update_or_create( organization_id=org_id, timestamp=timestamp, metric_name=metric_name, project=project, tag=tag, defaults={ "metric_type": agg["metric_type"], "metric_value": agg["value"], "metric_count": agg["count"], "labels": agg["labels"], }, ) - if was_created: - created += 1 - else: - updated += 1 - except Exception as e: - logger.warning(f"Error upserting hourly {key}: {e}") + if was_created: + created += 1 + else: + updated += 1 + except Exception as e: + logger.warning(f"Error upserting hourly {key}: {e}") return created, updated

Apply the same pattern to daily/monthly; also prefix unused obj as _ (RUF059).

Also applies to: 300-329, 331-360

🧰 Tools

🪛 Ruff (0.14.11)

278-278: Unpacked variable obj is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

295-295: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In @backend/dashboard_metrics/management/commands/backfill_metrics.py around lines 269 - 298, The outer transaction.atomic() in _upsert_hourly wraps all update_or_create calls so a DB error (e.g., IntegrityError) can leave the transaction unusable and abort the rest of the loop; remove the outer transaction.atomic() (or alternatively use a per-row savepoint) so each EventMetricsHourly._base_manager.update_or_create call runs in its own atomic context and the loop can continue on exceptions; also change the unused returned variable name obj to _ and apply the same change (remove outer atomic or use per-row savepoints and rename obj → _) in the corresponding daily/monthly upsert methods that use update_or_create.

No, that concern is not relevant here. The backfill_metrics.py command does not use transaction.atomic() at all. Each update_or_create call in _upsert_hourly, _upsert_daily, and _upsert_monthly is independently atomic (Django's default), and exceptions are caught per-row with except Exception — so one failure doesn't break subsequent iterations.

It looks like that transaction.atomic() issue was from a previous version that was already fixed. The current code is fine.

@athul-rs, I appreciate the clarification, but looking at the diff for this file, I can see that transaction.atomic() is present in the current code:

Line 271 in the diff shows:

def _upsert_hourly(self, aggregations: dict) -> tuple[int, int]: """Upsert hourly aggregations.""" created, updated = 0, 0 with transaction.atomic(): # ← Line 271 for key, agg in aggregations.items():

The outer with transaction.atomic(): wraps all iterations of the loop, which means if any update_or_create() call raises a database error (e.g., IntegrityError), the entire transaction becomes unusable and subsequent iterations will fail even though the exception is caught.

Are you perhaps viewing a different version of the file, or has this already been fixed in a more recent commit not shown in this diff?

backend/dashboard_metrics/management/commands/generate_metrics_test_data.py

backend/dashboard_metrics/tasks.py

unstract/core/src/unstract/core/metrics/backends/queue.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@backend/dashboard_metrics/views.py`:
- Around line 845-848: The current parsing of limit uses limit =
min(int(request.query_params.get("limit", 10)), 50) which will raise a
ValueError for non-integer input and cause a 500; update the parsing in the view
(where organization = self._get_organization() is used and limit is computed) to
defensively parse the query param by catching ValueError/TypeError, falling back
to a safe default (e.g., 10), enforcing a minimum (e.g., 1) and maximum of 50,
and then assign the sanitized value to limit; reference the
request.query_params.get("limit", 10) expression and the limit variable when
making the change.

🧹 Nitpick comments (6)

backend/dashboard_metrics/views.py (1)

331-341: Avoid per-metric lookup for metric_type in summary.

The loop issues one query per metric to fetch metric_type. You can pull it in the initial aggregation to remove the N+1.

♻️ Suggested refactor

-            summary = (
-                queryset.values("metric_name")
-                .annotate(
-                    total_value=Sum("metric_value"),
-                    total_count=Sum("metric_count"),
-                    average_value=Avg("metric_value"),
-                    min_value=Min("metric_value"),
-                    max_value=Max("metric_value"),
-                )
-                .order_by("metric_name")
-            )
-
-            # Add metric_type from the first record of each metric
-            summary_list = []
-            for row in summary:
-                metric_type_record = (
-                    queryset.filter(metric_name=row["metric_name"])
-                    .values("metric_type")
-                    .first()
-                )
-                row["metric_type"] = (
-                    metric_type_record["metric_type"] if metric_type_record else "counter"
-                )
-                summary_list.append(row)
+            summary_list = list(
+                queryset.values("metric_name", "metric_type")
+                .annotate(
+                    total_value=Sum("metric_value"),
+                    total_count=Sum("metric_count"),
+                    average_value=Avg("metric_value"),
+                    min_value=Min("metric_value"),
+                    max_value=Max("metric_value"),
+                )
+                .order_by("metric_name")
+            )

frontend/src/components/metrics-dashboard/MetricsChart.jsx (5)

122-128: Clean up forEach callback to avoid returning a value.

The static analysis correctly identifies that Set.add() returns a value, and forEach callbacks should not return values. While this works, it's cleaner to use a block body.
♻️ Suggested fix
       data.daily_trend.forEach((item) => {
         if (item.metrics) {
-          Object.keys(item.metrics).forEach((key) => metricNamesSet.add(key));
+          Object.keys(item.metrics).forEach((key) => {
+            metricNamesSet.add(key);
+          });
         }
       });
179-197: Move static color constants to module level.

METRIC_BAR_COLORS, DEFAULT_BAR_COLORS, and getBarColor are recreated on every render. Since these are static, move them outside the component for minor performance improvement and consistency with TREND_COLORS at line 21.
♻️ Suggested refactor

Move these definitions after line 31 (after TREND_COLORS):
// Bar colors - metric-specific for key metrics, fallback array for others
const METRIC_BAR_COLORS = {
  pages_processed: "#1890ff",
  documents_processed: "#52c41a",
  failed_pages: "#ff4d4f",
  llm_calls: "#722ed1",
  prompt_executions: "#faad14",
  llm_usage: "#13c2c2",
};

const DEFAULT_BAR_COLORS = [
  "#1890ff",
  "#faad14",
  "#52c41a",
  "#722ed1",
  "#13c2c2",
];

const getBarColor = (metric, idx) =>
  METRIC_BAR_COLORS[metric] ||
  DEFAULT_BAR_COLORS[idx % DEFAULT_BAR_COLORS.length];
410-416: Same forEach callback issue as noted earlier.

Apply the same fix here for consistency.
♻️ Suggested fix
     data.daily_trend.forEach((item) => {
       if (item.metrics) {
-        Object.keys(item.metrics).forEach((key) => metricNamesSet.add(key));
+        Object.keys(item.metrics).forEach((key) => {
+          metricNamesSet.add(key);
+        });
       }
     });
779-792: Move TREND_BAR_COLORS and getTrendColor to module level.

Same issue as in MetricsChart — these static constants are recreated on every render.

396-551: Consider extracting shared chart logic into reusable utilities.

MetricsBreakdown and TrendAnalysisChart share nearly identical patterns:

Filter dropdown state and handlers (selectedMetrics, handleFilterClick, cardExtra)

Metric extraction from daily_trend

Loading/empty state rendering

A custom hook (e.g., useMetricFilter) and a wrapper component for the card chrome could reduce ~100 lines of duplication across these components. This is optional and can be deferred.

Also applies to: 674-848

backend/dashboard_metrics/views.py

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@backend/dashboard_metrics/tasks.py`:
- Around line 586-694: The exception handlers inside the org-processing block
are currently catching all Exceptions (the inner try around the metric loop and
the outer try wrapping org processing), which swallows
DatabaseError/OperationalError and prevents Celery's autoretry; change both
handlers to re-raise DB errors so Celery can retry: in the inner except around
the metric loop (where logger.warning(...) and stats["errors"] += 1 happen),
detect and re-raise DatabaseError and OperationalError before handling other
exceptions, and in the outer except (where logger.exception(...) and
stats["errors"] += 1 happen) do the same; keep the existing logging/stats
increments for non-DB exceptions and ensure _upsert_hourly_replace,
_upsert_daily_replace, and _upsert_monthly_replace DB errors will propagate.

In `@frontend/src/components/metrics-dashboard/MetricsChart.jsx`:
- Around line 258-286: The chart can render blank when metricsToShow (computed
from selectedMetrics and availableMetrics) is empty; add an explicit early
return that checks metricsToShow.length === 0 and renders the same Card used for
other empty states (e.g., title "Trend Analysis (Last 30 Days)" with an Empty
component and a helpful description like "No metrics selected or available")
before the chart rendering logic so users see a clear empty state instead of a
blank chart.

In `@frontend/src/hooks/useMetricsData.js`:
- Around line 114-118: When orgId becomes falsy the effect should clear the
recent-activity state to avoid showing stale data; inside the useEffect that
currently calls fetchActivity() when orgId is truthy (the effect depending on
[orgId, limit]), add an else branch that resets your activity state (e.g., call
the state updater used to store recent activity such as setRecentActivity([]) or
setActivity([])) so recent activity is cleared whenever orgId is
null/undefined/empty.
- Around line 67-71: In useMetricsData's useEffect (the effect that depends on
[orgId, startDate, endDate]) ensure that when orgId is falsy you clear the
hook's cached metric state instead of simply skipping fetchOverview; add an else
branch that resets the relevant state variables (e.g., metrics/overview state,
loading/error flags) so stale data from the previous org is removed on
logout/org switch; update the effect that currently calls fetchOverview() to
conditionally call fetchOverview() when orgId is present and otherwise call the
state-reset logic.

🧹 Nitpick comments (1)

backend/dashboard_metrics/tasks.py (1)

396-411: Avoid unused obj from update_or_create.
Rename to _/_obj to address RUF059 and reduce lint noise.

♻️ Proposed cleanup

-                obj, was_created = EventMetricsHourly._base_manager.update_or_create(
+                _, was_created = EventMetricsHourly._base_manager.update_or_create(
                     organization_id=org_id,
                     timestamp=hour_ts,
                     metric_name=metric_name,
                     project=project,
                     tag=tag,
                     defaults={
                         "metric_type": agg["metric_type"],
                         "metric_value": agg["value"],
                         "metric_count": agg["count"],
                         "labels": agg["labels"],
                     },
                 )

-                obj, was_created = EventMetricsDaily._base_manager.update_or_create(
+                _, was_created = EventMetricsDaily._base_manager.update_or_create(
                     organization_id=org_id,
                     date=date_val,
                     metric_name=metric_name,
                     project=project,
                     tag=tag,
                     defaults={
                         "metric_type": agg["metric_type"],
                         "metric_value": agg["value"],
                         "metric_count": agg["count"],
                         "labels": agg["labels"],
                     },
                 )

-                obj, was_created = EventMetricsMonthly._base_manager.update_or_create(
+                _, was_created = EventMetricsMonthly._base_manager.update_or_create(
                     organization_id=org_id,
                     month=month_val,
                     metric_name=metric_name,
                     project=project,
                     tag=tag,
                     defaults={
                         "metric_type": agg["metric_type"],
                         "metric_value": agg["value"],
                         "metric_count": agg["count"],
                         "labels": agg["labels"],
                     },
                 )

Also applies to: 438-452, 479-492

backend/dashboard_metrics/tasks.py

frontend/src/components/metrics-dashboard/MetricsChart.jsx

frontend/src/hooks/useMetricsData.js

- Implement multi-tier metrics aggregation with celery-batches - Add hourly/daily/monthly storage tables with proper indexing - Add dedicated metrics Celery worker (worker-metrics) - Add live query endpoints for immediate data access - Add production readiness: retry config, rate limiting, health checks - Add queue routing and broker resilience configuration - Add comprehensive unit tests for batch processing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

Backend: - Add EventMetricsHourly/Daily/Monthly models for pre-aggregated data - Add MetricsQueryService for querying source tables (Usage, PageUsage) - Add aggregate_from_sources Celery task (runs every 15 min) - Add cleanup tasks for hourly (30 days) and daily (365 days) retention - Add /overview/, /usage/, /activity/ API endpoints - Fix Celery task queue routing to dashboard_metric_events - Use _base_manager to bypass org context filter in Celery tasks Frontend: - Add MetricsDashboard component with tabs (Overview, Usage, Activity) - Add MetricsChart, MetricsSummary, MetricsTable components - Add useMetricsData hook with caching - Add recharts dependency for visualizations - Add dashboard.svg icon and sidebar navigation - Support date range picker for all tabs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

- Remove redundant Detailed View tab and MetricsTable component - Add Recent Activity component showing real-time file executions - Differentiates by type: ETL, API, Workflow - Shows status, file name, and relative timestamps - Convert charts to line charts for 30-day visualization - Create PagesChart (pages_processed + failed_pages) - Create TrendAnalysisChart (other metrics with filter dropdown) - Date picker now only affects summary cards, charts fixed at 30 days - Add recent-activity API endpoint with proper URL registration - Fix organization filter queries in services.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Reduce default limit from 10 to 5 items - Add max-height with overflow scroll for compact display Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Cleanup after dashboard refactoring: - Remove unused chart components (MetricsChart, MetricsBreakdown, MultiSeriesChart, MetricsComparison) - Remove unused hooks (useMetricsData, useMetricsSummary, useMetricsSeries) - Remove unused imports (BarChart, Bar, Progress) - Remove unused CSS styles (.usage-quota-*, .trend-*, .breakdown-*) - Update exports to only include used components - Add failed_pages to metrics aggregation task Reduces bundle size by ~982 lines of dead code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/pyproject.toml (1)

15-20: ⚠️ Potential issue | 🟠 Major

Update celery-batches to 0.10 or later for Celery 5.3.x and Python 3.12 compatibility.

celery-batches 0.9.0 targets Celery 5.0–5.1 and does not support Celery 5.3.x. For Celery 5.3.x and Python 3.12, use celery-batches>=0.10 (or latest 0.11). No known security advisories exist for this package.

🤖 Fix all issues with AI agents

In `@backend/dashboard_metrics/management/commands/backfill_metrics.py`:
- Around line 202-256: The code currently increments the per-key "count" by 1 in
hourly_agg/daily_agg/monthly_agg which yields bucket counts instead of
source-record counts; update the increments in the loops in backfill_metrics.py
(the blocks that update hourly_agg[key]["count"], daily_agg[key]["count"], and
monthly_agg[key]["count"]) to use an explicit row-level count value from the
query result (e.g., row.get("count")) falling back to 1 if missing: set
row_count = row.get("count", None) and then hourly_agg[key]["count"] +=
row_count if row_count is not None else (value if metric_type == "counter" else
1); apply the same change for the daily and monthly aggregation loops so
averages use source-record counts (and confirm MetricsQueryService returns a
"count" field).

In `@backend/dashboard_metrics/tasks.py`:
- Around line 121-129: The event handler is passing a None tag into model
upserts which conflicts with the EventMetrics* CharField (null=False); normalize
tag to a non-null string (e.g., "") right after reading it from the event
(replace tag = event.get("tag") with a normalized value) and use that normalized
variable in all subsequent get_or_create()/update_or_create() calls so the model
default is not overridden by None.

In `@frontend/src/components/metrics-dashboard/MetricsDashboard.jsx`:
- Around line 26-70: chartStart/chartEnd are frozen by useMemo at mount so
refetchChart() keeps using stale dates; update handleRefresh to compute a fresh
30-day range and pass it to the chart refetch, e.g. compute const newStart =
dayjs().subtract(30, "day").toISOString() and const newEnd =
dayjs().toISOString() inside handleRefresh and call refetchChart(newStart,
newEnd). If refetchChart currently doesn't accept parameters, modify
useMetricsOverview(chartStart, chartEnd) to return a refetch that accepts
optional (start,end) overrides and use those in handleRefresh; touch the symbols
chartStart, chartEnd, refetchChart, useMetricsOverview, and handleRefresh.

In `@frontend/src/components/metrics-dashboard/RecentActivity.jsx`:
- Around line 91-125: The List rendering of dataSource={data.activity} lacks a
stable key causing reconciliation warnings; update the List component (the one
using dataSource, renderItem, and STATUS_CONFIG/TYPE_CONFIG) to include
rowKey="id" so React uses each activity's stable id field for keys (ensure the
id property on items is used as the unique key).

🧹 Nitpick comments (4)

unstract/core/src/unstract/core/metrics/registry.py (1)

42-48: Return a stable order from get_all_metrics().

Converting a set to a list yields nondeterministic ordering; sorting avoids flaky output and makes results predictable.
♻️ Suggested fix
-    return list(_REGISTERED_METRICS)
+    return sorted(_REGISTERED_METRICS)

backend/backend/celery_config.py (1)

1-74: Annotate mutable class attributes as ClassVar to satisfy Ruff.
This clears RUF012 warnings and makes the config intent explicit.

♻️ Suggested fix

-from urllib.parse import quote_plus
+from urllib.parse import quote_plus
+from typing import ClassVar
 
 from django.conf import settings
 from kombu import Queue
@@
-    broker_transport_options = {
+    broker_transport_options: ClassVar[dict[str, object]] = {
         "max_retries": 3,
         "interval_start": 0,
         "interval_step": 0.2,
         "interval_max": 1,
         "socket_timeout": 30,
         "socket_keepalive": True,
     }
@@
-    task_queues = [
+    task_queues: ClassVar[list[Queue]] = [
         Queue("celery", routing_key="celery"),
         Queue("dashboard_metric_events", routing_key="dashboard_metric_events"),
         Queue("celery_periodic_logs", routing_key="celery_periodic_logs"),
         Queue("celery_log_task_queue", routing_key="celery_log_task_queue"),
         Queue("celery_api_deployments", routing_key="celery_api_deployments"),
         Queue("file_processing", routing_key="file_processing"),
         Queue("api_file_processing", routing_key="api_file_processing"),
         Queue("file_processing_callback", routing_key="file_processing_callback"),
         Queue("api_file_processing_callback", routing_key="api_file_processing_callback"),
     ]
@@
-    task_routes = {
+    task_routes: ClassVar[dict[str, dict[str, str]]] = {
         "dashboard_metrics.process_events": {"queue": "dashboard_metric_events"},
         "dashboard_metrics.aggregate_from_sources": {"queue": "dashboard_metric_events"},
         "dashboard_metrics.cleanup_hourly_data": {"queue": "dashboard_metric_events"},
         "dashboard_metrics.cleanup_daily_data": {"queue": "dashboard_metric_events"},
     }

backend/dashboard_metrics/capture.py (1)

79-88: Avoid mutating caller-supplied labels.
labels or {} followed by in-place updates will modify the caller’s dict. Copy before extending and apply the same pattern in all record_* methods.
♻️ Suggested pattern
-        metric_labels = labels or {}
+        metric_labels = dict(labels) if labels else {}

backend/dashboard_metrics/management/commands/populate_test_metrics.py (1)

53-77: Consider chaining the original exception for better debugging.

When re-raising CommandError, chain the original exception to preserve the stack trace.
🔧 Suggested improvement
         try:
             org = Organization.objects.get(id=org_id)
         except Organization.DoesNotExist:
-            raise CommandError(f"Organization with ID {org_id} not found")
+            raise CommandError(f"Organization with ID {org_id} not found") from None

backend/dashboard_metrics/management/commands/backfill_metrics.py

backend/dashboard_metrics/tasks.py

frontend/src/components/metrics-dashboard/MetricsDashboard.jsx

frontend/src/components/metrics-dashboard/RecentActivity.jsx

- Remove event-driven capture code (signals, kafka dependency, real-time hooks) - Add per-workflow LLM token usage endpoint with Redis caching - Add LLM Usage tab with sortable table showing tokens/cost per workflow - Add HITL Reviews & Completions bar chart to overview - Add HITL metrics (reviews, completions) to summary cards and trend chart - Add frontend caching layer (localStorage) for workflow token usage - Fix double queryset evaluation in MetricsQueryService - Add migration to fix tag NOT NULL constraint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@backend/dashboard_metrics/README.md`:
- Around line 786-794: Update the migration filenames listed in the README to
match the actual migrations in the PR: replace `0002_setup_cleanup_tasks.py`
with `0002_setup_cleanup_periodic_task`, ensure `0003_setup_aggregation_task.py`
is updated to the actual `0003_add_daily_monthly_tables` (or keep
`0003_setup_aggregation_task` only if that file exists), and add the missing
`0004_fix_tag_not_null` entry so the README's migrations block reflects
`0001_initial.py`, `0002_setup_cleanup_periodic_task.py`,
`0003_add_daily_monthly_tables.py` (or `0003_setup_aggregation_task.py` if
applicable), and `0004_fix_tag_not_null.py`.

In `@backend/dashboard_metrics/services.py`:
- Around line 649-670: The loop building results can raise if
execution.workflow_execution.workflow is None; update the dict construction
inside the recent loop (where execution, workflow_execution, and
workflow.workflow_name are referenced) to safely handle a missing workflow by
using a conditional or getattr to set "workflow_name" to a sensible fallback
(e.g., None or empty string) when execution.workflow_execution.workflow is None,
and keep all other fields unchanged.

In `@backend/dashboard_metrics/tasks.py`:
- Around line 296-369: The aggregator is currently incrementing "count" by 1 per
returned row (thus counting buckets, not events); change it to consume the
event-count returned by query methods (use row.get("count", 1) instead of += 1)
in hourly_agg and daily_agg and when building monthly_buckets (add
row.get("count", 1) to the bucket["count"]), and update MetricsQueryService
methods to return both "value" and "count" (e.g., Count("id")) so metric records
reflect the actual number of events aggregated into each record.

In `@frontend/src/hooks/useMetricsData.js`:
- Around line 182-186: When orgId becomes falsy the hook currently skips calling
fetchUsage but leaves previous org's data in state; inside the useEffect that
depends on [orgId, startDate, endDate] (the block that calls fetchUsage), add
logic to detect when orgId is falsy and explicitly reset the hook's state (the
usage/metrics state variables managed by useMetricsData, e.g., the variables
updated by fetchUsage) to their initial empty values before returning so stale
cross-org data is cleared; implement the reset where you currently call
fetchUsage (or in the effect cleanup) and reference useEffect, fetchUsage,
orgId, startDate, endDate and the hook's state setters to locate the change.

🧹 Nitpick comments (4)

backend/dashboard_metrics/README.md (1)

15-15: Add language identifiers to fenced code blocks.

Several ASCII-art / plain-text fenced code blocks lack a language specifier (e.g., ```text or ```plaintext). This triggers MD040 warnings and prevents proper rendering in some Markdown processors.

Also applies to: 86-86, 299-299, 480-480, 771-771
backend/dashboard_metrics/migrations/0004_fix_tag_not_null.py (1)
69-98: reverse_sql drops NOT NULL but doesn't drop the DEFAULT.

The forward SQL sets both SET DEFAULT '' and SET NOT NULL, but the reverse only does DROP NOT NULL. If this migration is reversed, the DEFAULT '' will remain. This is harmless in practice but technically asymmetric.
Suggested fix (if full reversibility is desired)
         reverse_sql=(
-            "ALTER TABLE event_metrics_hourly " "ALTER COLUMN tag DROP NOT NULL;"
+            "ALTER TABLE event_metrics_hourly "
+            "ALTER COLUMN tag DROP NOT NULL, "
+            "ALTER COLUMN tag DROP DEFAULT;"
         ),
Apply similarly to the other two tables.
frontend/src/helpers/metricsCache.js (1)

66-83: After clearMetricsCache() on quota error, the current write is silently lost.

When QuotaExceededError is caught, all metrics cache is cleared but the current data is not retried. This means the response that triggered the write will not be cached. Consider retrying the setItem once after clearing.
frontend/src/components/metrics-dashboard/MetricsChart.jsx (1)
322-334: TREND_BAR_COLORS is re-created on every render.

This constant map is defined inside the component body after early returns. Consider hoisting it to module scope alongside TREND_COLORS to avoid unnecessary re-creation.
Suggested move
+// Colors for specific trend metrics
+const TREND_BAR_COLORS = {
+  documents_processed: "#faad14",
+  llm_calls: "#722ed1",
+  prompt_executions: "#52c41a",
+  deployed_api_requests: "#13c2c2",
+  llm_usage: "#eb2f96",
+  etl_pipeline_executions: "#fa8c16",
+  challenges: "#2f54eb",
+  summarization_calls: "#a0d911",
+  hitl_reviews: "#283593",
+  hitl_completions: "#00897b",
+};
+
 // Metrics for Pages chart
 const PAGES_METRICS = ["pages_processed", "failed_pages"];
Then remove the identical block from inside TrendAnalysisChart.

backend/dashboard_metrics/README.md

backend/dashboard_metrics/services.py

backend/dashboard_metrics/tasks.py

frontend/src/hooks/useMetricsData.js

chandrasekharan-zipstack

Only checked a few files myself, here's some review comments from my agent:

N+1 aggregation task - queries ALL orgs x ALL metrics every 15 min - aggregate_metrics_from_sources iterates every org, runs 12 metric queries x 3 granularities (hourly/daily/monthly) = ~36 DB queries per org. With many orgs, this is a DB performance bomb every 15 minutes. No batching or parallelism.
recent_activity endpoint lacks rate limiting/caching - Unlike other endpoints, recent_activity has no @cache_metrics_response decorator and does multiple DB queries (Pipeline, APIDeployment, WorkflowFileExecution, Usage) on every request
Individual update_or_create calls inside loops (no bulk operations - _upsert_hourly_replace, _upsert_daily_replace, _upsert_monthly_replace all do individual update_or_create per record inside a single transaction.atomic(). For large datasets this is extremely slow. Should use bulk_create(..., update_conflicts=True).
Frontend adds recharts dependency but no lockfile update
Some FE code has inline CSS that needs to be avoided
Avoid management commands for seeding the DB to test this dashboard feature, instead make it a part of test scaffolding

The cleanup of this also dangerously clears other data in some of the tables

Check if capture.py + core metrics library is dead code. The agent reports that its not called from anywhere

chandrasekharan-zipstack · 2026-02-17T08:13:28Z

backend/backend/settings/base.py

 )

+# Dashboard Metrics Cache TTL settings (in seconds)
+DASHBOARD_CACHE_TTL_CURRENT_HOUR = int(


@athul-rs do you think a 30s cache TTL is too aggressive here for hourly information?

How expensive is the query which is being cached?

If the data that we show using this cache is mainly charts over a longer period, I don't think it needs to be this recent (maybe 1-2 mins could be fine)

It's a simple indexed range scan — not expensive per bucket. But the data it's caching is for dashboard charts showing trends over days/weeks. Users aren't expecting real-time second-by-second updates on chart data. 60s/120s is a reasonable middle ground.

backend/backend/celery_config.py

backend/dashboard_metrics/management/commands/generate_metrics_test_data.py

backend/dashboard_metrics/management/commands/populate_test_metrics.py

backend/dashboard_metrics/migrations/0004_fix_tag_not_null.py

- Revert celery_config.py to original (remove redundant/side-effect-prone global settings that affected all workers) - Remove redundant DASHBOARD_CACHE_TTL_* from base.py (cache.py uses getattr with same defaults) - Fix backfill_metrics.py: add missing failed_pages, fix tag=None→"", use _base_manager for daily/monthly upserts - Restore HITL metrics with try/except ImportError guard (safe for OSS, functional on cloud) - Make HITLChart data-driven: returns null when no HITL data available - Filter zero-value metrics from summary cards Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In PostgreSQL, when an exception occurs inside transaction.atomic(), the transaction is marked as aborted and ALL subsequent queries in that block fail with InternalError. Since we catch exceptions per-row to allow partial progress, the outer transaction.atomic() defeats that intent. Each update_or_create() is already individually atomic in Django's default autocommit mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove abandoned unstract.core.metrics real-time capture package (8 files, zero external imports) - Remove labels JSONField and GIN indexes from all 3 metric models and migration - Remove unused serializers (MetricDataPointSerializer, MetricSeriesSerializer, MetricsSummarySerializer, MetricsResponseSerializer) - Remove unused cache functions (invalidate_metrics_cache, MetricsCacheService, get_bucket_cache_stats, get_time_aware_cache_ttl) - Remove dead _apply_filters method from ViewSet - Remove deleted management commands (generate_metrics_test_data, populate_test_metrics) - Remove capture.py and consolidated stale migrations - Fix cross-org data bleed in useMetricsData.js hooks (reset state when orgId is falsy) - Restore corrupted upsert functions in tasks.py - Update README with architecture docs and current metrics list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

chandrasekharan-zipstack · 2026-02-19T09:08:48Z

backend/dashboard_metrics/migrations/0002_setup_periodic_tasks.py

+- Hourly cleanup: Daily at 2:00 AM UTC, removes hourly data older than 30 days
+- Daily cleanup: Weekly on Sundays at 3:00 AM UTC, removes daily data older than 365 days


Suggested change

- Hourly cleanup: Daily at 2:00 AM UTC, removes hourly data older than 30 days

- Daily cleanup: Weekly on Sundays at 3:00 AM UTC, removes daily data older than 365 days

- Hourly data cleanup: Daily at 2:00 AM UTC, removes hourly data older than 30 days

- Daily data cleanup: Weekly on Sundays at 3:00 AM UTC, removes daily data older than 365 days

backend/dashboard_metrics/migrations/0002_setup_periodic_tasks.py

chandrasekharan-zipstack · 2026-02-19T09:14:26Z

backend/dashboard_metrics/management/commands/backfill_metrics.py

@athul-rs how will this command be run during deployment? Is it a separate K8s job? Why isn't it a data migration instead?

@athul-rs how will this command be run during deployment? Is it a separate K8s job? Why isn't it a data migration instead?

Intentionally done this way: Backfilling 90 days of data across many orgs could take significant time. A data migration would block the entire migration chain and slow every deploy. Rerunning if required is also a factor.

backend/dashboard_metrics/tests/test_tasks.py

chandrasekharan-zipstack · 2026-02-19T09:16:40Z

backend/dashboard_metrics/models.py

NIT: Consider following a rule of only 1 model per file. It makes it easy for humans and agents to understand context easily. Should be an easy refactor with Claude

chandrasekharan-zipstack · 2026-02-19T09:24:13Z

@athul-rs Few minor points from one of the screenshots -

Can we remove the pages and docs suffix that we add to the cards?
Rename Total Pages Processed -> Pages Processed for brevity in the card
How does the line charts behave when the date time filter at the top bar is:
- Less than a month
- More than a month

- Move RangePicker and refresh button into tab bar extra content - Replace topbar-right with Documentation and Slack Community buttons - Default trend chart to documents_processed only Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

muhammad-ali-e

@athul-rs please move celery tasks from backend to workers

But I think the breat tasks are still in backend, So my comment might be invalidated

muhammad-ali-e · 2026-02-24T04:53:04Z

backend/dashboard_metrics/migrations/0002_setup_periodic_tasks.py

+    PeriodicTask = apps.get_model("django_celery_beat", "PeriodicTask")
+
+    # --- Aggregation task: every 15 minutes ---
+    schedule_15min, _ = IntervalSchedule.objects.get_or_create(


It would be better to create a worker that runs every N seconds.
Please check workers/log_consumer.
Avoid using periodic tasks or Celery Beat for this.

any reason to avoid using celery beat for this ?

muhammad-ali-e · 2026-02-24T04:54:43Z

backend/dashboard_metrics/tasks.py

+from datetime import datetime, timedelta
+from typing import Any
+
+from celery import shared_task


please use workers for tasks , We already dedjangofying the clery and moved away from backend.

Note: But I think the breat tasks are still in backend, So my comment might be invalidated

What exactly need to be changed ? since "dedjango-fying" pattern applies to worker apps, not scheduling. Correct me if am wrong ?

muhammad-ali-e · 2026-02-24T05:05:53Z

backend/dashboard_metrics/views.py

+
+    @action(detail=False, methods=["get"], url_path="live-series")
+    @cache_metrics_response(endpoint="live_series")
+    def live_series(self, request: Request) -> Response:


is it from aggregted table or or actual source tables?

It queries directly from source tables (Usage, PageUsage, WorkflowExecution, etc.)

It is being removed, not used now.

muhammad-ali-e · 2026-02-24T05:08:44Z

docker/docker-compose.yaml

      - ${TOOL_REGISTRY_CONFIG_SRC_PATH}:/data/tool_registry_config

+  # Celery worker for dashboard metrics processing
+  worker-metrics:


What this worker do? How this diffrent from tasks in backend/dashborad_metrics , is it not by beat?

dashboard_metrics/tasks.py defines the tasks (aggregate_metrics_from_sources, cleanup_hourly_metrics, cleanup_daily_metrics)
Celery Beat schedules them (via the migration that creates PeriodicTask entries routed to the dashboard_metric_events queue)
worker-metrics docker service consumes that queue: celery -A backend worker -Q dashboard_metric_events
It's the same relationship as worker-logging consuming the celery_periodic_logs queue. The dedicated docker service gives it its own process/scaling independent of the general worker.

athul-rs · 2026-02-24T11:42:09Z

@athul-rs Few minor points from one of the screenshots -

Can we remove the pages and docs suffix that we add to the cards?

Rename Total Pages Processed -> Pages Processed for brevity in the card

How does the line charts behave when the date time filter at the top bar is:

Less than a month

More than a month

@chandrasekharan-zipstack Updated.
Charts are set at 30 days period and doesn't get affeced by top date picker

Frontend: - Rename side nav "Metrics" to "Dashboard" with "New" tag - Rename header from "Metrics Dashboard" to "Dashboard" - Fix dashboard icon color to light grey (#90A4B7) matching other nav icons - HITL chart colors changed to blue/gold palette (#69b1ff/#faad14) - HITL and Trend charts side by side (lg={12}) - LLM Usage table: header outside card, styled table with background - Summary cards: remove suffixes, use Math.round(), fix prettier - Add cloud-only PlanBanner import via try-catch require() Backend: - Move cache TTLs to Django settings (base.py + sample.env) - Fix _base_manager usage in tasks.py and services.py for Celery context - Simplify views.py: use model manager for org filtering - Add recent_activity caching, fix limit parsing - Use logger.exception() instead of f-string in error handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace md5 with sha256 for cache key hashing - Add noopener,noreferrer to window.open calls - Remove unused DASHBOARD_METRICS_ENABLED from docker-compose Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: Athul <89829560+athul-rs@users.noreply.github.com>

for more information, see https://pre-commit.ci

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-25T09:35:37Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 66 passed, 0 failed (66 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-25T09:38:12Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 66 passed, 0 failed (66 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

sonarqubecloud · 2026-02-25T09:38:29Z

Quality Gate failed

Failed conditions
7.9% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

muhammad-ali-e · 2026-02-25T12:20:56Z

Code review

Found 1 issue:

window.location.pathname used instead of globalThis.location.pathname -- regression of commit ed27ec8 which deliberately migrated all active: checks in this file from window to globalThis. Every other active check (15+ instances, e.g. line 116, 193, 202, 218, 226) uses globalThis.location.pathname.

unstract/frontend/src/components/navigations/side-nav-bar/SideNavBar.jsx

Lines 322 to 332 in 3756ed4

    
           // Add metrics dashboard menu item (available for both OSS and cloud) 
        
           unstractMenuItems[1].subMenu.unshift({ 
        
             id: 2.0, 
        
             title: "Dashboard", 
        
             tag: "New", 
        
             description: "View platform usage metrics and analytics", 
        
             image: DashboardIcon, 
        
             path: `/${orgName}/metrics`, 
        
             active: window.location.pathname.startsWith(`/${orgName}/metrics`), 
        
           });

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

muhammad-ali-e · 2026-02-25T12:28:55Z

backend/dashboard_metrics/cache.py

+Uses Redis through Django's cache framework.
+
+Cache TTL Strategy:
+- Current hour/day/month data: Configurable via DASHBOARD_CACHE_TTL_CURRENT_BUCKET (default 60s)


Hope its count and size are not huge, so it won’t fill the Redis memory quickly.

muhammad-ali-e · 2026-02-25T12:43:04Z

backend/sample.env

 # Examples: 900 (15 min), 1800 (30 min), 3600 (60 min)
 MIN_SCHEDULE_INTERVAL_SECONDS=1800

+# Dashboard Metrics Cache TTLs (in seconds)


why it commnetd?

muhammad-ali-e · 2026-02-25T12:47:22Z

backend/dashboard_metrics/services.py

+    def _get_trunc_func(granularity: str):
+        """Get the appropriate truncation function for the granularity."""
+        trunc_map = {
+            "hour": TruncHour,


Making these hour,day etc.. to an enum would be better

athul-rs marked this pull request as draft December 29, 2025 19:32

athul-rs requested review from chandrasekharan-zipstack and muhammad-ali-e January 5, 2026 04:28

athul-rs marked this pull request as ready for review January 5, 2026 04:28

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

chandrasekharan-zipstack reviewed Jan 5, 2026

View reviewed changes

athul-rs changed the title ~~[FEAT] Add dashboard metrics system with production readiness~~ [FEAT] Dashboard metrics system Jan 8, 2026

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

backend/dashboard_metrics/views.py Show resolved Hide resolved

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

backend/dashboard_metrics/tasks.py Show resolved Hide resolved

frontend/src/components/metrics-dashboard/MetricsChart.jsx Show resolved Hide resolved

frontend/src/hooks/useMetricsData.js Show resolved Hide resolved

frontend/src/hooks/useMetricsData.js Show resolved Hide resolved

athul-rs and others added 8 commits February 2, 2026 09:28

Commit uv.lock changes

727e4cd

[pre-commit.ci] auto fixes from pre-commit.com hooks

4f9d42f

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

d3cc7be

for more information, see https://pre-commit.ci

UN-1798: Limit Recent Activity to 5 items with scroll

42b1fb4

- Reduce default limit from 10 to 5 items - Add max-height with overflow scroll for compact display Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

athul-rs force-pushed the UN-1798-simple-dashboard branch from acb9e9c to ce6bba3 Compare February 2, 2026 04:22

athul-rs requested a review from chandrasekharan-zipstack February 2, 2026 04:22

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

athul-rs and others added 2 commits February 17, 2026 00:47

[pre-commit.ci] auto fixes from pre-commit.com hooks

a24e390

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Feb 16, 2026

View reviewed changes

backend/dashboard_metrics/README.md Outdated Show resolved Hide resolved

backend/dashboard_metrics/services.py Show resolved Hide resolved

backend/dashboard_metrics/tasks.py Show resolved Hide resolved

frontend/src/hooks/useMetricsData.js Show resolved Hide resolved

chandrasekharan-zipstack reviewed Feb 17, 2026

View reviewed changes

athul-rs and others added 4 commits February 18, 2026 10:02

[pre-commit.ci] auto fixes from pre-commit.com hooks

056c78f

for more information, see https://pre-commit.ci

chandrasekharan-zipstack reviewed Feb 19, 2026

View reviewed changes

muhammad-ali-e reviewed Feb 24, 2026

View reviewed changes

athul-rs and others added 6 commits February 25, 2026 12:42

UN-1798: Fix Sonar issues and remove dead env var

81aa925

- Replace md5 with sha256 for cache key hashing - Add noopener,noreferrer to window.open calls - Remove unused DASHBOARD_METRICS_ENABLED from docker-compose Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'main' into UN-1798-simple-dashboard

9a9771a

Signed-off-by: Athul <89829560+athul-rs@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

e37f70b

for more information, see https://pre-commit.ci

UN-1798: Remove unused variable flagged by ruff

b62cf53

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

UN-1798: Add explicit compare function to sort()

5ff167e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

UN-1798: Add explicit compare function to sort() in metricsCache

3756ed4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

muhammad-ali-e reviewed Feb 25, 2026

View reviewed changes

		- Hourly cleanup: Daily at 2:00 AM UTC, removes hourly data older than 30 days
		- Daily cleanup: Weekly on Sundays at 3:00 AM UTC, removes daily data older than 365 days

Comments

Conversation

athul-rs commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chandrasekharan-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

athul-rs commented Dec 29, 2025 •

edited

Loading

coderabbitai bot commented Dec 29, 2025 •

edited

Loading

coderabbitai bot Jan 13, 2026 •

edited

Loading

muhammad-ali-e left a comment •

edited

Loading