Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a Dashboard Metrics feature: new Django app with models, migrations, Celery tasks and worker, Redis-backed bucket cache, a core metrics API and queue backend, services and management commands, tests, frontend dashboard UI/components/hooks/styles, and runtime tooling (docker service and poe task). Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(220,240,255,0.5)
participant Client as Frontend Client
end
rect rgba(240,255,220,0.5)
participant API as DashboardMetricsViewSet
end
rect rgba(255,240,220,0.5)
participant Cache as Redis Cache
participant Service as MetricsQueryService
participant DB as Metrics DB
end
Client->>API: GET /metrics/summary?org=X&start=Y&end=Z
API->>API: _select_source(params)
alt hourly source & bucket-cache enabled
API->>Cache: mget_metrics_buckets(org, range)
Cache-->>API: cached buckets + missing list
API->>Service: query missing buckets
Service->>DB: fetch missing rows
DB-->>Service: rows
Service-->>API: rows
API->>Cache: mset_metrics_buckets(new buckets)
else daily/monthly or full cache miss
API->>Service: query aggregated data
Service->>DB: fetch aggregates
DB-->>Service: aggregates
Service-->>API: aggregates
end
API->>Client: HTTP 200 MetricsResponse
sequenceDiagram
rect rgba(255,240,240,0.5)
participant Producer as Integration Point
participant Capture as MetricsCapture
participant Core as core.metrics.record
end
rect rgba(240,255,240,0.5)
participant Backend as QueueBackend
participant Broker as RabbitMQ
participant Worker as worker-metrics
participant Task as process_dashboard_metric_events
participant DB as Metrics DB
end
Producer->>Capture: record_*(...)
Capture->>Core: record(org_id, metric_name, value,...)
Core->>Backend: backend.record(MetricEvent)
Backend->>Broker: publish Celery-style message to queue
Broker->>Worker: deliver to dashboard_metric_events
Worker->>Task: process batch of events
Task->>DB: bulk upsert EventMetricsHourly/Daily/Monthly
DB-->>Task: upsert results
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Fix all issues with AI Agents 🤖
In @backend/dashboard_metrics/capture.py:
- Around line 51-338: MetricsCapture methods call record() with the metric enum
as the first positional arg and keyword names like org_id= and value=, but core
record signature expects org_id first, metric_name second and metric_value
third, so change every call in record_api_request, record_etl_execution,
record_document_processed (both DOCUMENTS_PROCESSED and PAGES_PROCESSED),
record_llm_call (LLM_CALLS and LLM_USAGE), record_challenge,
record_summarization, and record_prompt_execution to match the core signature:
call record(org_id, MetricName.<X>, <metric_value>, labels=metric_labels,
project=project, tag=None) (use 1 for single-count events, pages for pages, cost
for LLM_USAGE as float), and remove the invalid value= keyword and duplicate
org_id keyword usage.
In @backend/dashboard_metrics/urls.py:
- Around line 9-15: The URL configuration is missing a mapping for the
DashboardMetricsViewSet.health action; add a new view mapping similar to the
others (e.g., create metrics_health = DashboardMetricsViewSet.as_view({"get":
"health"})) and include it in the URL patterns so GET /api/v2/metrics/health/ is
exposed; update any import/urlpatterns list references where other metrics_*
views are registered to include metrics_health.
In @backend/dashboard_metrics/views.py:
- Around line 267-273: Both live_summary and live_series call
UserContext.get_organization() and access .id directly; add the same guard used
in get_queryset(): if organization is None (or falsy) log an explanatory message
and raise PermissionDenied before accessing organization.id, then pass org_id to
MetricsQueryService.get_all_metrics_summary / get_all_metrics_series; apply the
same check for the other occurrence around lines referenced (the block using
UserContext.get_organization() / organization.id at ~310-312).
In @unstract/core/src/unstract/core/metrics/types.py:
- Around line 35-48: Add a unit test that ensures every value of the MetricName
enum has a corresponding key in METRIC_TYPE_MAP to prevent KeyError in
get_metric_type(); specifically, iterate over MetricName (or
MetricName.__members__.values()/MetricName) and assert that each .value is
present in METRIC_TYPE_MAP, referencing METRIC_TYPE_MAP and MetricName; this
complements validate_metric() and prevents missing entries that would cause
get_metric_type() to fail.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/pyproject.toml (1)
15-21: celery-batches version too old—use >=0.9.0 for Celery 5.3.4 and Python 3.12 supportThe constraint
celery-batches>=0.8.0pins to an outdated release (Jun 2023). For Celery 5.3.4 and Python 3.12, usecelery-batches>=0.9.0, which is documented to support Celery ~5.0 and Python 3.9–3.13. Version 0.8.0 predates Python 3.12 and lacks testing against current Celery 5.x releases.
🧹 Nitpick comments (16)
backend/dashboard_metrics/tasks.py (2)
37-75: Time bucketing helpers look correct; naive datetimes are assumed UTCThe truncation helpers correctly normalize numeric timestamps to UTC and bucket to hour/day/month. Note that naive datetimes are implicitly treated as UTC; ensure all producers either send UTC or explicitly aware datetimes so you don’t silently mis-bucket mixed timezones.
77-206: Batched aggregation logic is solid; consider tightening event validationThe batching and in-memory aggregation into hourly/daily/monthly buckets looks good and aligns with celery-batches’ request shape. Two minor points to consider:
event = request.args[0] if request.args else request.kwargs.get("event", {})assumes the first arg is always a dict; if other producers appear later, a simpleisinstance(..., dict)guard could avoid noisy warnings.timestampis expected as float/datetime; if there’s any chance of ISO strings coming through, you may want to normalize them explicitly rather than relying on a genericexcept Exceptionto skip them.Nothing blocking here; more of a future-hardening note.
backend/backend/urls_v2.py (1)
63-67: Metrics URL include is consistent with existing routingMounting
dashboard_metrics.urlsat"metrics/"follows the same pattern as other feature areas (e.g.,"usage/","notifications/"). Just ensure any external docs or clients expecting/api/v2/metrics/...are updated to match this base path if needed.docker/docker-compose.yaml (1)
120-138: worker-metrics service wiring looks consistent with existing workersThe new
worker-metricsservice mirrors the existing Celery worker patterns (image, entrypoint, queue, autoscale, env) and targets thedashboard_metric_eventsqueue as intended. If you expect metrics tasks to need shared workflow data or tool configs, consider whether it should also mount the same volumes as other workers; otherwise this setup looks good.backend/pyproject.toml (1)
145-148: worker-metrics Poe task aligns with docker-compose configurationThe
worker-metricsPoe task mirrors the docker-compose command (same app, log level, queue, autoscale), which is helpful for running the metrics worker locally outside Docker. Keep this in sync with any future queue or autoscale changes to avoid drift between CLI and compose.backend/dashboard_metrics/tests/test_tasks.py (1)
149-164: Consider renaming unused loop variable.The loop variable
iis not used within the loop body. Following Python convention, consider renaming it to_to indicate it's intentionally unused.🔎 Proposed refactor
- for i in range(5): + for _ in range(5): mock_request = MagicMock()unstract/core/src/unstract/core/metrics/__init__.py (1)
113-115: Uselogging.exceptionto preserve the traceback.When catching exceptions during event creation, using
logger.exception()instead oflogger.error()will include the full traceback, which aids debugging.🔎 Suggested fix
except Exception as e: - logger.error(f"Failed to create metric event: {e}") + logger.exception(f"Failed to create metric event: {e}") return Falseunstract/core/src/unstract/core/metrics/registry.py (1)
42-48: Non-deterministic ordering inget_all_metrics().The function returns
list(_REGISTERED_METRICS)where_REGISTERED_METRICSis a set. While this works, the order of returned metrics may vary between calls. If consistent ordering is needed for API responses or testing, consider sorting the result.🔎 Optional fix for deterministic ordering
def get_all_metrics() -> list[str]: """Get all registered metric names. Returns: List of all registered metric names """ - return list(_REGISTERED_METRICS) + return sorted(_REGISTERED_METRICS)backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py (1)
38-57: Clarify naming: task cleans "daily" aggregates on a weekly schedule.The comment says "weekly cleanup task for daily metrics" but the task is named
dashboard_metrics_cleanup_daily. This is technically correct (it cleans data from the daily aggregation table on a weekly schedule), but the naming could cause confusion. Consider renaming todashboard_metrics_cleanup_daily_weeklyor updating the description to clarify it runs weekly to clean daily-aggregated data.unstract/core/src/unstract/core/metrics/types.py (1)
85-96: Consider defensive error handling infrom_dict.If required keys (
org_id,metric_name,metric_value,metric_type) are missing fromdata, aKeyErrorwill be raised. While the caller should ensure valid data, adding a brief check or using.get()with explicit validation could provide clearer error messages during deserialization failures.unstract/core/src/unstract/core/metrics/backends/queue.py (1)
151-156: Uselogger.exceptionfor error logging with traceback.Replace
logger.errorwithlogger.exceptionin the exception handlers to include stack traces, which aids debugging production issues.🔎 Suggested fix
except KombuError as e: - logger.error(f"Failed to queue metric {event.metric_name}: {e}") + logger.exception(f"Failed to queue metric {event.metric_name}: {e}") return False except Exception as e: - logger.error(f"Unexpected error queuing metric: {e}") + logger.exception(f"Unexpected error queuing metric: {e}") return Falsebackend/dashboard_metrics/views.py (1)
335-369: Improve error handling and observability inlive_seriesmetric loop.Catching a blanket
Exceptionfor each metric is reasonable to keep the endpoint partially available, but the current logging (logger.error) loses the traceback and makes debugging failures harder.Switching to
logger.exceptionhere will preserve the stack trace while keeping behavior the same.Proposed logging tweak
- except Exception as e: - logger.error(f"Failed to fetch metric {metric_name}: {e}") + except Exception: + logger.exception("Failed to fetch metric %s", metric_name) errors.append(metric_name)backend/dashboard_metrics/cache.py (1)
38-69: Time‑aware TTL helper is well‑designed but currently unused.
get_time_aware_cache_ttl()correctly distinguishes queries that include the current hour from purely historical ones, which would be useful to avoid over‑caching near‑real‑time data. Right now it isn’t wired intocache_metrics_response, so all endpoints rely on static TTLs.Consider integrating
get_time_aware_cache_ttl()into the decorator (e.g., deriving TTL from a parsedend_dateparam) for summary/series endpoints that query recent data.backend/dashboard_metrics/models.py (3)
14-19: Avoid duplicatingMetricTypethat already exists in core metricsThere’s already a
MetricTypeenum inunstract/core/src/unstract/core/metrics/types.pywith the same values ("counter","histogram"). Having two differentMetricTypedefinitions (oneEnum, oneTextChoices) increases the chance of divergence or confusion.Consider either:
- Reusing the core
MetricTypeas the single source of truth and mapping its values into aTextChoices, or- Renaming this one (e.g.,
DashboardMetricType) to make the separation explicit.
21-37: Manager inheritance can be simplified
DefaultOrganizationManagerMixinalready subclassesmodels.Manager(perbackend/utils/models/organization_mixin.py), so inheriting from both is redundant:class EventMetricsHourlyManager(DefaultOrganizationManagerMixin): ...Same applies to
EventMetricsDailyManagerandEventMetricsMonthlyManager.
113-130: Address Ruff RUF012 on mutable Meta attributes (indexes,constraints)Ruff is flagging
indexesandconstraintsas mutable class attributes (RUF012). In DjangoMetaclasses this is common, but if you want to satisfy the linter and improve type hints you can annotate them asClassVar:from typing import ClassVar from django.db import models class Meta: indexes: ClassVar[list[models.Index]] = [ ... ] constraints: ClassVar[list[models.Constraint]] = [ ... ]You can apply the same pattern to the
Metaclasses of all three models, or alternatively disableRUF012for DjangoMetaif you prefer the standard Django style.Also applies to: 219-236, 325-342
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (30)
backend/backend/celery_config.pybackend/backend/settings/base.pybackend/backend/urls_v2.pybackend/dashboard_metrics/__init__.pybackend/dashboard_metrics/apps.pybackend/dashboard_metrics/cache.pybackend/dashboard_metrics/capture.pybackend/dashboard_metrics/migrations/0001_initial.pybackend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.pybackend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.pybackend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.pybackend/dashboard_metrics/migrations/__init__.pybackend/dashboard_metrics/models.pybackend/dashboard_metrics/serializers.pybackend/dashboard_metrics/services.pybackend/dashboard_metrics/tasks.pybackend/dashboard_metrics/tests/__init__.pybackend/dashboard_metrics/tests/test_tasks.pybackend/dashboard_metrics/urls.pybackend/dashboard_metrics/views.pybackend/pyproject.tomldocker/docker-compose.yamlunstract/core/src/unstract/core/metrics/__init__.pyunstract/core/src/unstract/core/metrics/backends/__init__.pyunstract/core/src/unstract/core/metrics/backends/base.pyunstract/core/src/unstract/core/metrics/backends/noop.pyunstract/core/src/unstract/core/metrics/backends/queue.pyunstract/core/src/unstract/core/metrics/config.pyunstract/core/src/unstract/core/metrics/registry.pyunstract/core/src/unstract/core/metrics/types.py
🧰 Additional context used
🧬 Code graph analysis (16)
unstract/core/src/unstract/core/metrics/backends/noop.py (3)
unstract/core/src/unstract/core/metrics/types.py (1)
MetricEvent(52-96)unstract/core/src/unstract/core/metrics/backends/base.py (3)
AbstractMetricBackend(8-40)record(15-24)flush(27-32)unstract/core/src/unstract/core/metrics/__init__.py (1)
record(58-119)
unstract/core/src/unstract/core/metrics/backends/base.py (4)
unstract/core/src/unstract/core/metrics/types.py (1)
MetricEvent(52-96)unstract/core/src/unstract/core/metrics/__init__.py (1)
record(58-119)unstract/core/src/unstract/core/metrics/backends/noop.py (2)
record(25-40)flush(42-44)unstract/core/src/unstract/core/metrics/backends/queue.py (3)
record(117-156)flush(200-205)close(207-215)
unstract/core/src/unstract/core/metrics/registry.py (2)
unstract/core/src/unstract/core/metrics/types.py (2)
MetricName(15-32)MetricType(8-12)backend/dashboard_metrics/models.py (1)
MetricType(14-18)
backend/dashboard_metrics/tasks.py (1)
backend/dashboard_metrics/models.py (4)
EventMetricsDaily(145-248)EventMetricsHourly(39-142)EventMetricsMonthly(251-354)MetricType(14-18)
unstract/core/src/unstract/core/metrics/types.py (1)
backend/dashboard_metrics/models.py (1)
MetricType(14-18)
backend/dashboard_metrics/urls.py (1)
backend/dashboard_metrics/views.py (1)
DashboardMetricsViewSet(40-451)
backend/dashboard_metrics/views.py (4)
backend/permissions/permission.py (1)
IsOrganizationMember(17-22)backend/utils/user_context.py (1)
UserContext(8-32)backend/dashboard_metrics/cache.py (1)
cache_metrics_response(88-145)backend/dashboard_metrics/services.py (2)
MetricsQueryService(24-447)get_documents_processed(54-86)
backend/dashboard_metrics/models.py (3)
backend/utils/models/organization_mixin.py (2)
DefaultOrganizationManagerMixin(26-29)DefaultOrganizationMixin(7-23)unstract/core/src/unstract/core/metrics/types.py (1)
MetricType(8-12)backend/dashboard_metrics/serializers.py (1)
Meta(110-124)
backend/dashboard_metrics/capture.py (5)
unstract/core/src/unstract/core/metrics/types.py (2)
MetricName(15-32)MetricType(8-12)unstract/core/src/unstract/core/metrics/__init__.py (1)
record(58-119)unstract/core/src/unstract/core/metrics/backends/base.py (1)
record(15-24)unstract/core/src/unstract/core/metrics/backends/noop.py (1)
record(25-40)unstract/core/src/unstract/core/metrics/backends/queue.py (1)
record(117-156)
unstract/core/src/unstract/core/metrics/backends/queue.py (3)
unstract/core/src/unstract/core/metrics/types.py (1)
MetricEvent(52-96)unstract/core/src/unstract/core/metrics/backends/base.py (4)
AbstractMetricBackend(8-40)close(34-40)record(15-24)flush(27-32)unstract/core/src/unstract/core/metrics/__init__.py (1)
record(58-119)
unstract/core/src/unstract/core/metrics/config.py (3)
unstract/core/src/unstract/core/metrics/backends/base.py (1)
AbstractMetricBackend(8-40)unstract/core/src/unstract/core/metrics/backends/noop.py (1)
NoopBackend(14-44)unstract/core/src/unstract/core/metrics/backends/queue.py (1)
QueueBackend(28-215)
backend/dashboard_metrics/serializers.py (2)
backend/dashboard_metrics/models.py (4)
EventMetricsHourly(39-142)Meta(109-142)Meta(215-248)Meta(321-354)backend/dashboard_metrics/views.py (2)
series(113-189)summary(70-109)
backend/dashboard_metrics/migrations/0001_initial.py (1)
backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py (1)
Migration(10-261)
unstract/core/src/unstract/core/metrics/backends/__init__.py (3)
unstract/core/src/unstract/core/metrics/backends/base.py (1)
AbstractMetricBackend(8-40)unstract/core/src/unstract/core/metrics/backends/noop.py (1)
NoopBackend(14-44)unstract/core/src/unstract/core/metrics/backends/queue.py (1)
QueueBackend(28-215)
backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py (2)
backend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.py (1)
Migration(40-51)backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py (1)
Migration(10-261)
backend/dashboard_metrics/cache.py (3)
backend/utils/user_context.py (1)
UserContext(8-32)backend/utils/cache_service.py (1)
clear_cache(41-43)backend/dashboard_metrics/views.py (1)
overview(193-245)
🪛 Ruff (0.14.10)
unstract/core/src/unstract/core/metrics/backends/noop.py
25-25: Unused method argument: event
(ARG002)
unstract/core/src/unstract/core/metrics/__init__.py
39-53: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
113-113: Do not catch blind exception: Exception
(BLE001)
114-114: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
backend/backend/celery_config.py
42-49: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
56-66: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
69-73: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
unstract/core/src/unstract/core/metrics/registry.py
38-38: Avoid specifying long messages outside the exception class
(TRY003)
backend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.py
6-6: Unused function argument: schema_editor
(ARG001)
33-33: Unused function argument: schema_editor
(ARG001)
41-44: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
46-51: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/tasks.py
184-184: Do not catch blind exception: Exception
(BLE001)
257-257: Do not catch blind exception: Exception
(BLE001)
258-258: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
311-311: Do not catch blind exception: Exception
(BLE001)
312-312: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
365-365: Do not catch blind exception: Exception
(BLE001)
366-366: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
backend/dashboard_metrics/tests/test_tasks.py
149-149: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
backend/dashboard_metrics/views.py
47-47: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
48-48: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
50-50: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
51-51: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
52-52: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
60-60: Abstract raise to an inner function
(TRY301)
60-60: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
66-66: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
66-66: Avoid specifying long messages outside the exception class
(TRY003)
193-193: Unused method argument: request
(ARG002)
359-359: Do not catch blind exception: Exception
(BLE001)
360-360: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
403-403: Unused method argument: request
(ARG002)
436-436: Consider moving this statement to an else block
(TRY300)
437-437: Do not catch blind exception: Exception
(BLE001)
438-438: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
448-448: Consider moving this statement to an else block
(TRY300)
449-449: Do not catch blind exception: Exception
(BLE001)
450-450: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
backend/dashboard_metrics/models.py
113-130: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
131-142: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
219-236: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
237-248: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
325-342: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
343-354: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/capture.py
45-45: Consider moving this statement to an else block
(TRY300)
unstract/core/src/unstract/core/metrics/backends/queue.py
95-96: try-except-pass detected, consider logging the exception
(S110)
95-95: Do not catch blind exception: Exception
(BLE001)
101-101: Consider moving this statement to an else block
(TRY300)
103-103: Do not catch blind exception: Exception
(BLE001)
113-113: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
115-115: Avoid specifying long messages outside the exception class
(TRY003)
149-149: Consider moving this statement to an else block
(TRY300)
152-152: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
154-154: Do not catch blind exception: Exception
(BLE001)
155-155: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
212-212: Do not catch blind exception: Exception
(BLE001)
backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py
11-14: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
16-261: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/serializers.py
112-123: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/migrations/0001_initial.py
13-15: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
17-136: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py
9-9: Unused function argument: schema_editor
(ARG001)
60-60: Unused function argument: schema_editor
(ARG001)
69-72: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
74-79: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/cache.py
84-84: Probable use of insecure hash functions in hashlib: md5
(S324)
163-163: Consider moving this statement to an else block
(TRY300)
164-164: Do not catch blind exception: Exception
(BLE001)
165-165: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
252-252: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (28)
backend/dashboard_metrics/__init__.py (1)
1-4: LGTM!Clean package initializer with clear documentation describing the Dashboard Metrics app's purpose.
backend/dashboard_metrics/migrations/__init__.py (1)
1-1: LGTM!Standard migrations package initializer following Django conventions.
backend/backend/settings/base.py (1)
345-345: LGTM!The dashboard_metrics app is correctly registered in SHARED_APPS, making it available across all organizations in the multi-tenant setup. This is appropriate for organization-scoped metrics collection and querying.
unstract/core/src/unstract/core/metrics/backends/__init__.py (1)
1-11: LGTM!Clean public API definition for the metrics backends package. The
__all__declaration correctly matches the imported backends, providing clear control over the public interface.backend/dashboard_metrics/apps.py (1)
6-11: LGTM!Standard Django app configuration following Django 4.2+ best practices. The BigAutoField default is appropriate, and the app naming is correct.
backend/dashboard_metrics/tests/__init__.py (1)
1-1: Tests package init is fineDocstring-only
__init__is harmless and documents the test package clearly.unstract/core/src/unstract/core/metrics/backends/base.py (1)
1-40: LGTM! Clean abstract interface design.The abstract base class follows SOLID principles with a minimal, focused interface. The concrete
close()method with a default implementation callingflush()is a good design choice that simplifies backend implementations while allowing customization when needed.unstract/core/src/unstract/core/metrics/backends/noop.py (1)
21-44: LGTM! Appropriate no-op implementation.The NoopBackend correctly implements the abstract interface and provides helpful one-time logging to inform users that metrics are disabled. The
eventparameter inrecord()is required by theAbstractMetricBackendinterface, so the static analysis hint (ARG002) is a false positive.backend/dashboard_metrics/tests/test_tasks.py (1)
1-553: Excellent test coverage!The test suite is comprehensive and well-organized, covering time truncation helpers, batch processing, bulk upsert operations, cleanup tasks, and full integration flows. Good use of fixtures, edge cases, and assertions to verify data integrity across all metrics tables.
backend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.py (1)
1-51: LGTM! Migration follows Django conventions.The migration correctly sets up a periodic cleanup task with proper forward and rollback operations. The static analysis hints (ARG001, RUF012) are false positives—the
schema_editorparameter and migration class attributes follow Django's required conventions for data migrations.unstract/core/src/unstract/core/metrics/config.py (1)
1-68: LGTM! Well-designed configuration module.The environment-driven backend selection with singleton caching is clean and robust. The fallback to
NoopBackendwhen metrics are disabled or misconfigured ensures safe defaults, and the logging at each decision point aids troubleshooting. Thereset_backend()function is a good addition for testing scenarios.unstract/core/src/unstract/core/metrics/__init__.py (1)
58-119: Well-structured public API.The
record()function provides a clean entry point with proper validation, enum-to-string conversion, and graceful error handling that returnsFalseon failures rather than propagating exceptions. The design appropriately decouples the public API from backend implementation details.unstract/core/src/unstract/core/metrics/registry.py (1)
1-60: Clean registry implementation.The registry provides efficient O(1) validation via set lookup, clear separation between validation and type retrieval, and appropriate error handling with
ValueErrorfor unknown metrics.backend/backend/celery_config.py (2)
36-53: Good production-ready resilience settings.The broker connection retry configuration with exponential backoff (
interval_step,interval_max), socket keepalive, and worker limits (max_tasks_per_child) will help maintain stability under broker disruptions and prevent memory leaks from long-running workers.
68-73: Task names intask_routesare correctly defined and match registered Celery tasks.The task names (
dashboard_metrics.process_events,dashboard_metrics.cleanup_hourly_data,dashboard_metrics.cleanup_daily_data) match their corresponding@shared_taskdefinitions inbackend/dashboard_metrics/tasks.py. Routing is correctly configured.unstract/core/src/unstract/core/metrics/types.py (1)
8-32: Well-structured metric type definitions.The dual inheritance from
strandEnumenables both type safety and string comparisons. The separation of counters (summed values) and histograms (distributions) is clearly documented and follows established metrics patterns.backend/dashboard_metrics/urls.py (1)
17-33: Clean URL configuration with proper ordering.The URL patterns are well-organized with specific paths before the UUID capture pattern, preventing routing conflicts. Using
format_suffix_patternsenables content negotiation for API responses.backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py (1)
60-65: Reverse migration is correct as written.Migration 0002 creates the
dashboard_metrics_cleanup_hourlytask. Migration 0004's forward operation usesupdate_or_createon that existing task (not creating a new one), adds the daily task, and deletes a legacy task. The reverse should return to the state after 0002, which means retaining the hourly task (still managed by 0002's reverse function) and removing only what 0004 added—the daily task. The current implementation is correct.Likely an incorrect or invalid review comment.
unstract/core/src/unstract/core/metrics/backends/queue.py (1)
178-198: Celery protocol v2 format is compatible with the project's Celery version (5.3.4+).The message construction follows the standard kombu protocol v2 format
[args, kwargs, embed], which remains compatible with Celery 5.3.4+ (and 5.5.3 in the workers package). The headers correctly specify the task name and ID for Celery task routing. No compatibility concerns.backend/dashboard_metrics/views.py (1)
402-450: Health checks look sound and side‑effect free.
health,_check_database, and_check_cachedo minimal, safe operations (anexists()call and a short‑lived cache set/get) and return structured status with appropriate HTTP 200/503 responses; this is suitable for liveness/readiness probes.backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py (1)
16-260: Schema and indexing for daily/monthly tables look consistent and appropriate.The daily/monthly models mirror the hourly schema (including
labelsdefault, project/tag, organization FK, unique constraints, and GIN/index coverage), which should give predictable semantics and good query performance across all granularities.backend/dashboard_metrics/services.py (2)
383-447:get_all_metrics_summaryaggregates are consistent with individual query methods.The summary method correctly reuses the per‑metric query functions and normalizes potential
NULLsums (e.g., pages and LLM usage) withor 0, producing a clean dict of totals for the live summary endpoint.
230-269: The join logic inget_deployed_api_requestsis correct.
WorkflowExecution.pipeline_idis a UUIDField designed to hold IDs of eitherPipelineorAPIDeploymententities (as evidenced by the conditional logic inbackend/workflow_manager/internal_serializers.py). The subquery filteringpipeline_id__in=Subquery(APIDeployment.objects.filter(...).values("id"))correctly extracts deployed API request counts by matching executions where the pipeline_id refers to an API deployment.backend/dashboard_metrics/migrations/0001_initial.py (1)
17-135: Hourly metrics migration aligns with model definition and indexing strategy.Field definitions, indexes, and the
unique_hourly_metricconstraint all match theEventMetricsHourlymodel description, so the initial schema for hourly metrics looks correct and production‑ready.backend/dashboard_metrics/serializers.py (1)
11-124: Query and response serializers are consistent with view usage and enforce sane bounds.
MetricsQuerySerializer.validate()correctly defaults the date range, enforcesstart_date <= end_date, and caps queries at 90 days, while the nested series/summary serializers match the shapes constructed inDashboardMetricsViewSet. TheEventMetricsHourlySerializeris read‑only and exposes the expected fields.backend/dashboard_metrics/cache.py (2)
71-145: Caching decorator behavior and keying strategy look solid.The decorator scopes cache entries by organization, endpoint name, and a stable hash of sorted query params, and only caches 200 responses. It also cleanly bypasses caching when no organization context is present, which is important for correctness with per‑org data.
148-167: No changes needed – code is correct.
CacheServiceis a class inbackend/utils/cache_service.py, andclear_cacheis a static method of that class. The import and call ininvalidate_metrics_cacheare correct. The implementation will properly invalidate cached metrics for the organization.Likely an incorrect or invalid review comment.
backend/dashboard_metrics/models.py (1)
57-85: ConfirmBaseModeldoesn’t defineid, and thatmetric_count’s default is intentionalTwo small checks:
- If
BaseModelalready defines anidfield, redefiningidhere as a primary key can cause confusion or conflicts. IfBaseModelonly adds timestamps and noid, you’re fine; otherwise, consider centralizing the PK definition in one place.metric_countdefaults to1. That makes sense if every row is always created from at least one event, but if you ever create rows with explicitmetric_countor rely on DB-side aggregation, this default could overcount silently.Please double-check that
BaseModeldoes not defineid, and that you never rely on the defaultmetric_countfor anything other than “one event just created this row”.
chandrasekharan-zipstack
left a comment
There was a problem hiding this comment.
Changes mostly LGTM. Left some minor comments.
One main question I have is around cache invalidation of the cached, aggregated data - does it happen only on the respective TTLs? I hope possibly receiving stale data for that duration is okay in this usecase
There was a problem hiding this comment.
NIT: This is a cool idea and if refactored a bit to make it generic, we can easily extend the same to any other API / DB call that we wish to cache. Consider this suggestion if its easy to make it generic
There was a problem hiding this comment.
Actionable comments posted: 6
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🤖 Fix all issues with AI agents
In @backend/dashboard_metrics/management/commands/backfill_metrics.py:
- Around line 269-298: The outer transaction.atomic() in _upsert_hourly wraps
all update_or_create calls so a DB error (e.g., IntegrityError) can leave the
transaction unusable and abort the rest of the loop; remove the outer
transaction.atomic() (or alternatively use a per-row savepoint) so each
EventMetricsHourly._base_manager.update_or_create call runs in its own atomic
context and the loop can continue on exceptions; also change the unused returned
variable name obj to _ and apply the same change (remove outer atomic or use
per-row savepoints and rename obj → _) in the corresponding daily/monthly upsert
methods that use update_or_create.
In @backend/dashboard_metrics/management/commands/generate_metrics_test_data.py:
- Around line 235-283: The re-query after bulk_create in
_generate_workflow_executions is unreliable and may misalign _target_created_at
values; instead use the list returned by
WorkflowExecution.objects.bulk_create(executions) (these objects have PKs
populated) and iterate that returned list to update each record's created_at via
individual QuerySet.update(id=exec_obj.id) using exec_obj._target_created_at (do
not attempt bulk_update because created_at uses auto_now_add).
In @backend/dashboard_metrics/tasks.py:
- Around line 222-262: The current broad except in the upsert loop swallows
IntegrityError from concurrent inserts causing lost increments; import
django.db.IntegrityError and explicitly catch IntegrityError around the
get_or_create in EventMetricsHourly upsert logic (the block that calls
select_for_update().get_or_create and currently sets obj, was_created), then on
IntegrityError re-query the row (EventMetricsHourly.objects.get(...) for the
same organization_id, timestamp, metric_name, project, tag), set
was_created=False and proceed to update counters and labels as in the existing
else-path; keep the generic Exception handler for other errors but ensure you
continue after logging; apply the same explicit IntegrityError catch-and-retry
pattern to the _bulk_upsert_daily and _bulk_upsert_monthly blocks as well.
In @unstract/core/src/unstract/core/metrics/backends/queue.py:
- Around line 35-62: The QueueBackend __init__ creates self.exchange =
Exchange("", type="direct") which uses the AMQP default exchange and doesn't
match the implicit queue-named exchange used by Celery; change the exchange to a
named direct exchange that matches the queue name (e.g., use
Exchange(self.queue_name, type="direct") or Exchange("dashboard_metric_events",
type="direct") when constructing self.exchange) so published messages route to
the KombuQueue (self.queue) consumers.
🟠 Major comments (19)
backend/dashboard_metrics/management/commands/populate_test_metrics.py-22-33 (1)
22-33: Metric types appear inconsistent with documentation.The
METRICSlist defines:
pages_processedasCOUNTER(line 25)deployed_api_requestsasHISTOGRAM(line 30)etl_pipeline_executionsasHISTOGRAM(line 31)However, the README documentation (lines 173-179) indicates:
pages_processedshould beHISTOGRAM(sums page counts)deployed_api_requestsandetl_pipeline_executionsshould beCOUNTER(counting events)This mismatch could cause incorrect test data generation and potentially mask issues during testing.
🐛 Proposed fix
METRICS = [ ("documents_processed", MetricType.COUNTER), - ("pages_processed", MetricType.COUNTER), + ("pages_processed", MetricType.HISTOGRAM), ("prompt_executions", MetricType.COUNTER), ("llm_calls", MetricType.COUNTER), ("challenges", MetricType.COUNTER), ("summarization_calls", MetricType.COUNTER), - ("deployed_api_requests", MetricType.HISTOGRAM), - ("etl_pipeline_executions", MetricType.HISTOGRAM), + ("deployed_api_requests", MetricType.COUNTER), + ("etl_pipeline_executions", MetricType.COUNTER), ("llm_usage", MetricType.HISTOGRAM), ]frontend/src/hooks/useMetricsData.js-21-75 (1)
21-75: Fix hook dependency handling: include callbacks in effect dependency arrays instead of using eslint-disable-line.The
eslint-disable-linecomments hide a real issue:useMetricsOverview,useMetricsSummary, anduseMetricsSeriesall omit their fetch callbacks from the effect dependencies (lines 141, 227, 313), even though these callbacks are in the useCallback dependencies. IfaxiosPrivatechanges (e.g., during auth token refresh), the callbacks are recreated, but the effects won't re-run—so they'll continue calling stale callbacks with outdatedaxiosPrivateinstances.Additionally,
useMetricsDataunnecessarily depends on the entireoptionsobject instead of destructuring individual properties, causing callback recreation whenever a newoptionsobject is passed, even if its values haven't changed.Proposed fixes:
For
useMetricsData(lines 21-75), destructureoptionsand depend on individual properties:
- Add at the top:
const { startDate, endDate, granularity, metricName } = options;- Change useCallback deps from
[axiosPrivate, options, orgId]to[axiosPrivate, orgId, startDate, endDate, granularity, metricName]- Update references from
options.startDate→startDateetc.For the three other hooks (lines 137-145, 223-231, 309-317), include the fetch callback in effect dependencies:
useMetricsOverview:}, [fetchOverview]);(removes the guard; callback has its own internal guard)useMetricsSummary:}, [fetchSummary]);useMetricsSeries:}, [fetchSeries]);backend/dashboard_metrics/migrations/0001_initial.py-17-136 (1)
17-136: Makeorganizationnon-nullable in all three metrics tables.PostgreSQL UNIQUE constraints allow multiple NULL values (NULL ≠ NULL in SQL), so with
organization=null, the constraint cannot prevent duplicate rows for the same (bucket, metric_name, project, tag). All actual metric creation explicitly passesorganization_id(never None), and the manager filters exclude NULL organizations. Since this is a multi-tenant application and metrics are always org-scoped,organizationshould benull=Falsewith nodefault=Noneto ensure constraint integrity and prevent orphaned records.Also applies to: 137–258 (EventMetricsDaily), 259–380 (EventMetricsMonthly)
backend/backend/celery_config.py-36-49 (1)
36-49: Fix invalid socket options in broker_transport_options — socket_timeout and socket_keepalive are not recognized by Kombu/py-amqp.In Celery 5.x with RabbitMQ (py-amqp), the socket timeout and keepalive options in
broker_transport_optionsare misconfigured:
socket_timeoutis not a valid transport option; useread_timeoutandwrite_timeoutinstead.socket_keepaliveis not a valid transport option; py-amqp enables SO_KEEPALIVE by default. If you need to tune keepalive behavior, use thesocket_settingsdict with platform-specific constants.These invalid keys will be silently ignored, leaving sockets without timeout protection. The other configuration keys (
broker_connection_retry_on_startup,broker_connection_retry,broker_connection_max_retries, andtask_routes) are correctly formatted for Celery 5.x.Recommended fix
Replace the current
broker_transport_optionswith:broker_transport_options = { "max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 1, "read_timeout": 30, "write_timeout": 30, }Or, if you need platform-specific keepalive tuning:
from amqp.platform import TCP_USER_TIMEOUT broker_transport_options = { "max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 1, "read_timeout": 30, "write_timeout": 30, "socket_settings": { TCP_USER_TIMEOUT: 60000, # ms }, }frontend/src/components/metrics-dashboard/MetricsDashboard.jsx-55-95 (1)
55-95: Normalize date range boundaries to prevent timezone-induced off-by-one-day queries.Converting local dates directly to UTC ISO strings can shift day boundaries by hours. When a user picks "Jan 30" in EST (UTC-5),
.toISOString()produces2024-01-30T05:00:00Z, causing the query to start mid-day and miss early morning metrics. For hourly aggregations this causes timestamp misalignment; for daily/monthly it silently excludes or includes unintended days.Explicitly set
.startOf("day")and.endOf("day")before converting to ISO to ensure the backend receives the full day range in UTC:Proposed fix (normalize dates before sending to hooks)
@@ const [dateRange, setDateRange] = useState([ - dayjs().subtract(30, "day"), - dayjs(), + dayjs().subtract(30, "day").startOf("day"), + dayjs().endOf("day"), ]); @@ - } = useMetricsOverview( - dateRange[0]?.toISOString(), - dateRange[1]?.toISOString() - ); + const startIso = dateRange[0]?.startOf("day").toISOString(); + const endIso = dateRange[1]?.endOf("day").toISOString(); + + } = useMetricsOverview(startIso, endIso); @@ } = useMetricsSummary( - shouldFetchSummary ? dateRange[0]?.toISOString() : null, - shouldFetchSummary ? dateRange[1]?.toISOString() : null, + shouldFetchSummary ? startIso : null, + shouldFetchSummary ? endIso : null, null, // metricName dataSource ); @@ const handleDateChange = useCallback((dates) => { if (dates && dates.length === 2) { - setDateRange(dates); + setDateRange([dates[0].startOf("day"), dates[1].endOf("day")]); } }, []);backend/dashboard_metrics/tasks.py-526-716 (1)
526-716:aggregate_metrics_from_sources()writesmetric_countas “number of buckets”, not “number of events” (breaks averages/derived stats).
You docount += 1per returned period row (Lines 608-609 / 633-634 / 649-652), sometric_countbecomes “hours with data” / “days with data”, not the underlying event count. This will conflict with event-based ingestion wheremetric_countis the number of recorded events.At minimum, define/rename the semantic (bucket_count vs event_count) or adjust
MetricsQueryServicequeries to also return an event count (e.g.,Count("id")) sometric_countis consistent.backend/dashboard_metrics/views.py-786-797 (1)
786-797:live_serieserror entries don’t match the success schema (frontend may crash on missing keys).
Success entries includemetric_type,data,total_value; error entries omitmetric_typeand also omitproject/tag/total_countfields that the aggregated endpoints provide.Proposed fix (keep shape stable)
series.append( { "metric_name": metric_name, + "metric_type": "histogram" + if metric_name == "llm_usage" + else "counter", "error": "unavailable", "data": [], "total_value": 0, + "total_count": 0, } )backend/dashboard_metrics/views.py-277-310 (1)
277-310:average_valueshould usetotal_count(events), notlen(values)(buckets).
metric_valueis aggregated andmetric_countis the number of events; dividing by bucket count will skew averages (especially if there are sparse buckets).Proposed fix
summary_list.append( { @@ - "average_value": agg["total_value"] / len(values) - if values - else 0, + "average_value": ( + agg["total_value"] / agg["total_count"] + if agg["total_count"] + else 0 + ), "min_value": min(values) if values else 0, "max_value": max(values) if values else 0, } )backend/dashboard_metrics/views.py-129-152 (1)
129-152: Monthly filtering is wrong whenstart_dateis mid-month (you’ll drop the starting month).
monthrows are typically stored as the first day of the month; filteringmonth__gte=start_date.date()will exclude that month unless start_date is on day 1.Proposed fix
queryset = queryset.filter( **{ - f"{ts_field}__gte": params["start_date"] - if source == "hourly" - else params["start_date"].date(), + f"{ts_field}__gte": ( + params["start_date"] + if source == "hourly" + else ( + params["start_date"].date().replace(day=1) + if source == "monthly" + else params["start_date"].date() + ) + ), f"{ts_field}__lte": params["end_date"] if source == "hourly" else params["end_date"].date(), } )backend/dashboard_metrics/services.py-1-35 (1)
1-35: Docstring says “Uses _base_manager for models…”, but onlyUsageis bypassed (Celery aggregation will likely return zeros).
APIDeployment,Pipeline, and oftenWorkflowExecution/PageUsagealso use organization-aware managers; in Celery contextUserContextis absent, so.objectsmay auto-filter to empty.#!/bin/bash # Verify which models use DefaultOrganizationManagerMixin-like filtering. rg -n --type=py "DefaultOrganizationManagerMixin|DefaultOrganizationMixin|UserContext\.get_organization\(" backend | head -n 200 rg -n --type=py "objects\s*=\s*.*ModelManager|objects\s*=" backend/api_v2/models.py backend/pipeline_v2/models.py backend/workflow_manager/workflow_v2/models/execution.py backend/account_usage/models.py 2>/dev/null || truebackend/dashboard_metrics/tasks.py-79-209 (1)
79-209: Normalizemetric_type(string vs enum) and protect against non-numericmetric_value.
QueueBackendsendsmetric_typeas a string; here default usesMetricType.COUNTER(enum). Mixed types can leak into DB writes.Proposed fix
- metric_value = event.get("metric_value", 1) - metric_type = event.get("metric_type", MetricType.COUNTER) + metric_value = event.get("metric_value", 1) + try: + metric_value = float(metric_value) + except (TypeError, ValueError): + raise ValueError(f"Invalid metric_value: {metric_value!r}") + + metric_type_raw = event.get("metric_type", MetricType.COUNTER) + try: + metric_type = MetricType(metric_type_raw) + except Exception: + metric_type = MetricType.COUNTERbackend/dashboard_metrics/services.py-67-399 (1)
67-399: Use a consistent “unfiltered manager” helper for all source models used by Celery tasks.
Otherwiseaggregate_metrics_from_sources()will compute incorrect results for at leastdeployed_api_requests/etl_pipeline_executions(and possibly others).Suggested pattern
+def _all(model): + # Prefer _base_manager to bypass org-context managers in Celery. + return getattr(model, "_base_manager", model.objects).all() @@ - WorkflowFileExecution.objects.filter( + _all(WorkflowFileExecution).filter( @@ - PageUsage.objects.filter( + _all(PageUsage).filter( @@ - api_subquery = APIDeployment.objects.filter( + api_subquery = _all(APIDeployment).filter( organization_id=organization_id ).values("id") @@ - WorkflowExecution.objects.filter( + _all(WorkflowExecution).filter( @@ - pipeline_subquery = Pipeline.objects.filter( + pipeline_subquery = _all(Pipeline).filter( organization_id=organization_id, pipeline_type="ETL", ).values("id") @@ - WorkflowExecution.objects.filter( + _all(WorkflowExecution).filter( @@ - WorkflowExecution.objects.filter( + _all(WorkflowExecution).filter(backend/dashboard_metrics/management/commands/backfill_metrics.py-278-280 (1)
278-280: Be consistent with_base_managervs.objectswhen bypassing org-context filters.
Hourly uses_base_manager, daily/monthly use.objects. If these models have a default manager that applies org scoping (as noted in PR context), daily/monthly backfill could be incomplete.Proposed fix
- obj, was_created = EventMetricsDaily.objects.update_or_create( + _, was_created = EventMetricsDaily._base_manager.update_or_create( @@ - obj, was_created = EventMetricsMonthly.objects.update_or_create( + _, was_created = EventMetricsMonthly._base_manager.update_or_create(Also applies to: 309-311, 340-343
backend/dashboard_metrics/cache.py-163-179 (1)
163-179: Cache invalidation does not clear bucket caches for the org.
invalidate_metrics_cache()deletesdashboard_metrics:{org_id}:*, but bucket keys aredashboard_metrics:bucket:<granularity>:<org_id>:..., so they survive invalidation.Proposed fix (clear both namespaces)
def invalidate_metrics_cache(org_id: str) -> int: @@ - pattern = f"{CACHE_PREFIX}:{org_id}:*" + patterns = [ + f"{CACHE_PREFIX}:{org_id}:*", + f"{CACHE_PREFIX}:bucket:*:{org_id}:*", + ] try: - CacheService.clear_cache(pattern) + for pattern in patterns: + CacheService.clear_cache(pattern) logger.info(f"Invalidated metrics cache for org {org_id}") return 1 # Pattern-based delete doesn't return countAlso applies to: 333-364
backend/dashboard_metrics/capture.py-79-82 (1)
79-82: Don’t mutate caller-providedlabelsdict.
metric_labels = labels or {}+ adding keys mutates the caller’s dict whenlabelsis passed in.Proposed fix
- metric_labels = labels or {} + metric_labels = {**(labels or {})}Apply similarly in all methods that enrich labels.
Also applies to: 116-119, 155-158, 208-213, 260-263, 328-331
backend/dashboard_metrics/capture.py-33-35 (1)
33-35: Avoid import-time settings capture forMETRICS_ENABLED(stale toggle in tests/runtime).
METRICS_ENABLEDis fixed at import time, so changingsettings.DASHBOARD_METRICS_ENABLEDduring tests or runtime won’t take effect.Proposed fix
-# Flag to control whether metrics are captured (can be disabled for testing) -METRICS_ENABLED = getattr(settings, "DASHBOARD_METRICS_ENABLED", True) +def _metrics_enabled() -> bool: + # Read at call time so tests/runtime overrides work. + return getattr(settings, "DASHBOARD_METRICS_ENABLED", True)Then replace
if not METRICS_ENABLED:withif not _metrics_enabled():in all methods.backend/dashboard_metrics/management/commands/backfill_metrics.py-35-49 (1)
35-49: UseClassVar+ ensuremetric_namevalues match the rest of the metrics system.
Two concerns:
METRIC_CONFIGSis a mutable class attribute (RUF012).- Using raw strings (
"documents_processed", etc.) risks drift vs canonicalMetricNameidentifiers used by capture/queue processing.Proposed fix
+from typing import ClassVar @@ - METRIC_CONFIGS = [ + METRIC_CONFIGS: ClassVar[list[tuple[str, Any, bool]]] = [ ("documents_processed", MetricsQueryService.get_documents_processed, False),Separately, consider importing the canonical metric name enum/constants and using them here (if available).
backend/dashboard_metrics/management/commands/backfill_metrics.py-58-63 (1)
58-63:--org-iduses wrong field: should queryorganization_id, notid.The help text says "organization ID" and the parameter accepts a string (line 60), but the code queries the integer
idfield (line 105). Throughout the codebase, the standard pattern is to useOrganization.objects.get(organization_id=org_id)for string identifiers. If a caller passes the organization's string identifier (theorganization_idfield), this will silently fail withOrganization.DoesNotExist.Change line 105 to:
orgs = [Organization.objects.get(organization_id=org_id)]This aligns with the consistent pattern used in
internal_api_views.py,user_context.py, and the rate limit management commands.backend/dashboard_metrics/management/commands/backfill_metrics.py-194-257 (1)
194-257:metric_countcurrently counts periods (rows returned), not events or samples, which breaks histogram semantics and is meaningless for counters.Given that
MetricsQueryService.*returns pre-aggregated data (one row per period with value already aggregated),count += 1makesmetric_countrepresent the number of time periods rather than the number of events or samples. For example:
documents_processed: Returns 1 row per hour withvalue=Count(id)of 100 documents, butmetric_count=1(the row count)pages_processed(histogram): Returns 1 row per hour withvalue=Sum(pages_processed)of 500 pages from 10 source records, butmetric_count=1instead of 10To fix this, query methods need to return sample/event count information, or the backfill code should be restructured to count source records directly. For counters,
metric_countcould equalvalue(event count already in the value); for histograms, it should reflect the actual number of aggregated records.
🟡 Minor comments (13)
backend/dashboard_metrics/README.md-729-733 (1)
729-733: Migration file names may be out of sync.The file structure section lists migrations as
0002_setup_cleanup_tasks.pyand0003_setup_aggregation_task.py, but the PR description indicates different migration names (0002_setup_cleanup_periodic_task,0003_add_daily_monthly_tables,0004_update_cleanup_periodic_tasks). Please update the documentation to match the actual migration filenames.backend/dashboard_metrics/management/commands/generate_metrics_test_data.py-397-425 (1)
397-425:PageUsage.created_atbackfill ignores the originally chosen timestamps.
You computecreated_atper record, but then overwrite with a new random timestamp during the update loop. Store_target_created_atlike other generators and update from that.frontend/src/hooks/useMetricsData.js-77-84 (1)
77-84: Docs don’t match behavior for default date range.
JSDoc says overview defaults to “7 days ago → now”, but the hook doesn’t implement those defaults (it passesnullthrough). Either implement defaults or fix the docs to avoid caller confusion.Also applies to: 85-92
frontend/src/components/metrics-dashboard/MetricsChart.jsx-74-87 (1)
74-87: Handle “no metrics selected” whenallowClearis used.
WithallowClear,selectedMetricscan become[], and the chart renders with no series (looks broken). Consider either preventing empty selection, auto-falling back to the first available metric, or rendering an<Empty />state whenmetricsToShow.length === 0.Also applies to: 118-258
frontend/src/components/metrics-dashboard/MetricsTable.jsx-64-77 (1)
64-77: Handlenullvalues in min/max render functions.The check
value !== undefinedpasses fornull, butnull.toFixed(2)throws a TypeError. The API may returnnullfor metrics without min/max values.🐛 Proposed fix
{ title: "Min", dataIndex: "min_value", key: "min_value", - render: (value) => (value !== undefined ? value.toFixed(2) : "-"), + render: (value) => (value != null ? value.toFixed(2) : "-"), align: "right", }, { title: "Max", dataIndex: "max_value", key: "max_value", - render: (value) => (value !== undefined ? value.toFixed(2) : "-"), + render: (value) => (value != null ? value.toFixed(2) : "-"), align: "right", },frontend/src/components/metrics-dashboard/MetricsSummary.jsx-13-118 (1)
13-118: Potential React key collision if totals aren’t unique bymetric_name.
If the API ever returns multiple rows for the samemetric_name(e.g., perproject/tag), cards will overwrite each other due tokey={metric.metric_name}. Consider keying by a stable composite (or fallback to index).unstract/core/src/unstract/core/metrics/__init__.py-94-101 (1)
94-101: Use parameterized logging for the unknown-metric warning.
Avoid f-strings in logger calls.Proposed fix
@@ if not validate_metric(metric_name): - logger.warning(f"Unknown metric: {metric_name}. Skipping.") + logger.warning("Unknown metric: %s. Skipping.", metric_name) return Falsebackend/backend/celery_config.py-34-74 (1)
34-74: AddClassVarannotations for mutable CeleryConfig attributes (Ruff RUF012).
This avoids accidental instance-level mutation patterns and satisfies the linter.Proposed fix
@@ from urllib.parse import quote_plus from django.conf import settings from kombu import Queue +from typing import ClassVar, Any class CeleryConfig: @@ - broker_transport_options = { + broker_transport_options: ClassVar[dict[str, Any]] = { "max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 1, "socket_timeout": 30, "socket_keepalive": True, } @@ - task_queues = [ + task_queues: ClassVar[list[Queue]] = [ Queue("celery", routing_key="celery"), Queue("dashboard_metric_events", routing_key="dashboard_metric_events"), Queue("celery_periodic_logs", routing_key="celery_periodic_logs"), Queue("celery_log_task_queue", routing_key="celery_log_task_queue"), Queue("celery_api_deployments", routing_key="celery_api_deployments"), Queue("file_processing", routing_key="file_processing"), Queue("api_file_processing", routing_key="api_file_processing"), Queue("file_processing_callback", routing_key="file_processing_callback"), Queue("api_file_processing_callback", routing_key="api_file_processing_callback"), ] @@ - task_routes = { + task_routes: ClassVar[dict[str, dict[str, str]]] = { "dashboard_metrics.process_events": {"queue": "dashboard_metric_events"}, "dashboard_metrics.aggregate_from_sources": {"queue": "dashboard_metric_events"}, "dashboard_metrics.cleanup_hourly_data": {"queue": "dashboard_metric_events"}, "dashboard_metrics.cleanup_daily_data": {"queue": "dashboard_metric_events"}, }backend/dashboard_metrics/tasks.py-553-567 (1)
553-567:pages_processedis marked histogram, but it’s a counter-like metric (sum of pages).
Unless you intentionally treat it as histogram in UI/rollups, set it toFalsehere.Proposed fix
- ("pages_processed", MetricsQueryService.get_pages_processed, True), + ("pages_processed", MetricsQueryService.get_pages_processed, False),backend/dashboard_metrics/cache.py-185-196 (1)
185-196: Type mismatch:set_cached_overviewexpects dict butwarm_cache()passes a list.
Either change the annotation tolist[dict[str, Any]]or change what’s stored.Proposed fix (make it a list everywhere)
class MetricsCacheService: @@ - def get_cached_overview(org_id: str) -> dict[str, Any] | None: + def get_cached_overview(org_id: str) -> list[dict[str, Any]] | None: @@ - def set_cached_overview(org_id: str, data: dict[str, Any]) -> None: + def set_cached_overview(org_id: str, data: list[dict[str, Any]]) -> None:Also applies to: 231-256
backend/dashboard_metrics/capture.py-215-233 (1)
215-233: Return value should reflect bothLLM_CALLSandLLM_USAGEwrites (partial success hidden).
Ifcost > 0and usage write fails, the method still returnscall_result.Proposed fix
- if cost > 0: - record( + usage_result = True + if cost > 0: + usage_result = record( org_id=org_id, metric_name=MetricName.LLM_USAGE, metric_value=cost, labels=metric_labels, project=project, ) - - return call_result + return call_result and usage_resultbackend/dashboard_metrics/capture.py-159-177 (1)
159-177: Validatepagesto avoid negative/zero page metrics.
As-is,pagescan be 0/negative and will write incorrect histogram/counter values.Proposed fix
def record_document_processed( org_id: str, pages: int = 1, @@ if record is None: return False + if pages < 0: + return Falsebackend/dashboard_metrics/cache.py-142-157 (1)
142-157: CachedResponsefrom serialized data is fragile for future customization.The current pattern caches
response.data(a dict) and reconstructs it asResponse(cached)on cache hits. This works fine for plain JSON endpoints, and the four decorated endpoints here (overview, live_summary, live_series, live-summary) currently set no custom headers or renderers, so there's no immediate impact.However, this pattern is brittle: if custom headers or non-default renderers are added to a cached endpoint in the future, they would silently be lost on cache hits without warning. Consider documenting that this decorator is only suitable for endpoints that return unadorned JSON, or cache/return the full response object with headers intact.
🧹 Nitpick comments (16)
backend/dashboard_metrics/README.md (1)
15-23: Consider adding language hints to code blocks.Several fenced code blocks in the documentation (ASCII diagrams, data flow, file structure) lack language specifiers. While these are primarily visual diagrams, adding
textorplaintextas the language identifier would satisfy markdown linters and improve consistency.frontend/src/helpers/metricsCache.js (1)
24-30: Consider handling complex parameter values.The
buildCacheKeyfunction converts params to strings using template literals, which would produce[object Object]for nested objects or inconsistent results for arrays. If the metrics API ever uses complex query parameters, this could cause cache key collisions or misses.♻️ Suggested improvement
function buildCacheKey(endpoint, params) { const sortedKeys = Object.keys(params).sort(); - const paramStr = sortedKeys.map((k) => `${k}=${params[k]}`).join("&"); + const paramStr = sortedKeys + .map((k) => `${k}=${JSON.stringify(params[k])}`) + .join("&"); // Use simple encoding to avoid btoa issues with unicode const hash = paramStr ? `_${encodeURIComponent(paramStr)}` : ""; return `${CACHE_PREFIX}${endpoint}${hash}`; }backend/dashboard_metrics/management/commands/populate_test_metrics.py (1)
60-67: Consider chaining exceptions for better debugging.When re-raising exceptions, chaining them preserves the original traceback which aids debugging.
♻️ Suggested improvement
if org_id: try: org = Organization.objects.get(id=org_id) except Organization.DoesNotExist: - raise CommandError(f"Organization with ID {org_id} not found") + raise CommandError(f"Organization with ID {org_id} not found") from None else: org = Organization.objects.first() if not org: raise CommandError("No organizations found. Create one first.")frontend/src/components/metrics-dashboard/MetricsChart.jsx (1)
266-292: StabilizeMultiSeriesChartmetric order for consistent colors/legend.
metricNamescomes fromObject.entries(data.series)iteration order; sorting will keep line colors stable across runs and payload shapes.Proposed diff
const { chartData, metricNames } = useMemo(() => { if (!data?.series) return { chartData: [], metricNames: [] }; const dateMap = new Map(); - const names = []; + const names = []; Object.entries(data.series).forEach(([metricName, values]) => { names.push(metricName); values.forEach(({ date, value }) => { if (!dateMap.has(date)) { dateMap.set(date, { date }); } dateMap.get(date)[metricName] = value; }); }); const sorted = Array.from(dateMap.values()).sort( (a, b) => new Date(a.date) - new Date(b.date) ); - return { chartData: sorted, metricNames: names }; + return { chartData: sorted, metricNames: names.sort() }; }, [data]);Also applies to: 341-353
backend/dashboard_metrics/models.py (1)
63-105: Consider makingmetric_countnon-negative.
metric_countshould never be negative;PositiveIntegerField(orPositiveBigIntegerFieldif large) better encodes the invariant.Also applies to: 169-211, 275-317
unstract/core/src/unstract/core/metrics/__init__.py (1)
39-53: Ruff RUF022:__all__ordering.
Not functionally important, but easy to clean up if you want to keep lint noise down.unstract/core/src/unstract/core/metrics/backends/queue.py (2)
69-115: Retry/backoff here can stall request threads for several seconds when RabbitMQ is down.
Ifrecord()is called in synchronous web paths, the exponential sleeps (Line 111) can become user-facing latency. Consider making retry configurable per environment (or “fail fast” by default), and/or using Kombu’s publish retry policies instead of reconnect sleeps.
90-97: Swallowing connection-close errors hides useful diagnostics.
At least log at debug level so broker flaps are diagnosable.backend/dashboard_metrics/views.py (4)
165-232: Bucket cache DB fallback is N-queries-per-missing-hour (cold cache over 7 days can be ~168 queries).
This will be expensive under load. Prefer one ranged query for the missing span and then bucket in Python.
391-401: Remove the innerfrom datetime import datetime(it’s redundant and easy to shadow).
You already importdatetimeat the top (Line 5); the inner import (Line 391) is unnecessary.
486-523: Monthly path usesTruncDay(month)and then returns “daily” shaped points; clarify granularity semantics.
If the UI requestsgranularity=week/daybut source is monthly, you’ll still return one point per month-start day. If that’s intended, OK—otherwise consider forcinggranularitytomonthwhensource=monthly(and exposingmonthin the serializer choices).
868-878: Cache health check key is global; use a namespaced/random key to avoid cross-worker interference.
Multiple instances probing the same key can produce false negatives.backend/dashboard_metrics/services.py (1)
101-134: Sums can return NULL; considerCoalesce(..., 0)to avoid downstreamNonechecks.
You already guard inget_all_metrics_summaryfor some metrics; pushing it into the query keeps call sites simpler.Also applies to: 330-364
backend/dashboard_metrics/tasks.py (1)
577-580: Iterate orgs with.iterator()to avoid loading all org IDs into memory.
Especially if this runs every 15 minutes.backend/dashboard_metrics/serializers.py (1)
11-72: Validation is solid; consider making naive datetimes explicitly timezone-aware.
Depending on DRF settings,DateTimeFieldmay accept naive values; the views mix both patterns (overview makes aware). Centralizing this in the serializer avoids surprises.backend/dashboard_metrics/cache.py (1)
101-105: Prefer SHA-256 over MD5 for cache key hashing (FIPS/security posture).
MD5 isn’t a security risk here, but it is often disallowed in hardened environments.Proposed fix
- params_hash = hashlib.md5(sorted_params.encode()).hexdigest()[:12] + params_hash = hashlib.sha256(sorted_params.encode()).hexdigest()[:12]
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (2)
backend/uv.lockis excluded by!**/*.lockfrontend/src/assets/dashboard.svgis excluded by!**/*.svg
📒 Files selected for processing (36)
backend/backend/celery_config.pybackend/backend/settings/base.pybackend/dashboard_metrics/README.mdbackend/dashboard_metrics/cache.pybackend/dashboard_metrics/capture.pybackend/dashboard_metrics/management/__init__.pybackend/dashboard_metrics/management/commands/__init__.pybackend/dashboard_metrics/management/commands/backfill_metrics.pybackend/dashboard_metrics/management/commands/generate_metrics_test_data.pybackend/dashboard_metrics/management/commands/populate_test_metrics.pybackend/dashboard_metrics/migrations/0001_initial.pybackend/dashboard_metrics/migrations/0002_setup_cleanup_tasks.pybackend/dashboard_metrics/migrations/0003_setup_aggregation_task.pybackend/dashboard_metrics/models.pybackend/dashboard_metrics/serializers.pybackend/dashboard_metrics/services.pybackend/dashboard_metrics/tasks.pybackend/dashboard_metrics/tests/test_tasks.pybackend/dashboard_metrics/urls.pybackend/dashboard_metrics/views.pybackend/pyproject.tomlfrontend/package.jsonfrontend/src/components/metrics-dashboard/MetricsChart.jsxfrontend/src/components/metrics-dashboard/MetricsDashboard.cssfrontend/src/components/metrics-dashboard/MetricsDashboard.jsxfrontend/src/components/metrics-dashboard/MetricsSummary.jsxfrontend/src/components/metrics-dashboard/MetricsTable.jsxfrontend/src/components/metrics-dashboard/index.jsfrontend/src/components/navigations/side-nav-bar/SideNavBar.jsxfrontend/src/helpers/metricsCache.jsfrontend/src/hooks/useMetricsData.jsfrontend/src/pages/MetricsDashboardPage.jsxfrontend/src/routes/useMainAppRoutes.jsunstract/core/src/unstract/core/metrics/__init__.pyunstract/core/src/unstract/core/metrics/backends/noop.pyunstract/core/src/unstract/core/metrics/backends/queue.py
✅ Files skipped from review due to trivial changes (1)
- frontend/src/components/metrics-dashboard/MetricsDashboard.css
🚧 Files skipped from review as they are similar to previous changes (4)
- backend/pyproject.toml
- backend/dashboard_metrics/tests/test_tasks.py
- backend/dashboard_metrics/urls.py
- backend/backend/settings/base.py
🧰 Additional context used
🧬 Code graph analysis (8)
frontend/src/components/metrics-dashboard/MetricsChart.jsx (1)
frontend/src/components/metrics-dashboard/MetricsDashboard.jsx (1)
Typography(34-34)
frontend/src/components/metrics-dashboard/MetricsDashboard.jsx (1)
frontend/src/components/metrics-dashboard/MetricsChart.jsx (2)
useMemo(130-164)useMemo(267-291)
backend/dashboard_metrics/services.py (5)
backend/api_v2/models.py (1)
APIDeployment(43-154)backend/pipeline_v2/models.py (1)
Pipeline(35-156)backend/usage_v2/models.py (1)
Usage(26-81)backend/workflow_manager/file_execution/models.py (1)
WorkflowFileExecution(68-250)backend/workflow_manager/workflow_v2/models/execution.py (2)
WorkflowExecution(102-372)organization_id(266-274)
backend/dashboard_metrics/serializers.py (2)
backend/dashboard_metrics/models.py (4)
EventMetricsHourly(39-142)Meta(109-142)Meta(215-248)Meta(321-354)backend/dashboard_metrics/views.py (2)
series(355-566)summary(243-352)
backend/dashboard_metrics/management/commands/backfill_metrics.py (3)
backend/account_v2/models.py (1)
Organization(12-46)backend/dashboard_metrics/models.py (4)
EventMetricsDaily(145-248)EventMetricsHourly(39-142)EventMetricsMonthly(251-354)MetricType(14-18)backend/dashboard_metrics/services.py (10)
MetricsQueryService(37-464)get_documents_processed(67-99)get_pages_processed(102-133)get_llm_calls(136-169)get_challenges(172-206)get_summarization_calls(209-243)get_deployed_api_requests(246-285)get_etl_pipeline_executions(288-328)get_llm_usage_cost(331-364)get_prompt_executions(367-398)
unstract/core/src/unstract/core/metrics/backends/noop.py (3)
unstract/core/src/unstract/core/metrics/backends/base.py (3)
AbstractMetricBackend(8-40)record(15-24)flush(27-32)unstract/core/src/unstract/core/metrics/types.py (1)
MetricEvent(52-96)unstract/core/src/unstract/core/metrics/__init__.py (1)
record(58-119)
backend/dashboard_metrics/migrations/0001_initial.py (2)
backend/dashboard_metrics/migrations/0003_setup_aggregation_task.py (1)
Migration(40-51)backend/dashboard_metrics/migrations/0002_setup_cleanup_tasks.py (1)
Migration(70-81)
unstract/core/src/unstract/core/metrics/__init__.py (5)
unstract/core/src/unstract/core/metrics/config.py (2)
get_backend(28-60)reset_backend(63-68)unstract/core/src/unstract/core/metrics/registry.py (3)
get_all_metrics(42-48)get_metric_type(25-39)validate_metric(13-22)unstract/core/src/unstract/core/metrics/types.py (3)
MetricEvent(52-96)MetricName(15-32)MetricType(8-12)unstract/core/src/unstract/core/metrics/backends/noop.py (1)
record(25-40)unstract/core/src/unstract/core/metrics/backends/queue.py (1)
record(117-156)
🪛 markdownlint-cli2 (0.18.1)
backend/dashboard_metrics/README.md
15-15: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
26-26: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
47-47: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
54-54: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
86-86: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
187-187: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
299-299: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
482-482: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
512-512: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
712-712: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.11)
backend/dashboard_metrics/migrations/0003_setup_aggregation_task.py
6-6: Unused function argument: schema_editor
(ARG001)
33-33: Unused function argument: schema_editor
(ARG001)
41-44: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
46-51: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/backend/celery_config.py
42-49: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
56-66: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
69-74: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
unstract/core/src/unstract/core/metrics/backends/queue.py
95-96: try-except-pass detected, consider logging the exception
(S110)
95-95: Do not catch blind exception: Exception
(BLE001)
101-101: Consider moving this statement to an else block
(TRY300)
103-103: Do not catch blind exception: Exception
(BLE001)
113-113: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
115-115: Avoid specifying long messages outside the exception class
(TRY003)
149-149: Consider moving this statement to an else block
(TRY300)
212-212: Do not catch blind exception: Exception
(BLE001)
backend/dashboard_metrics/models.py
113-130: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
131-142: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
219-236: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
237-248: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
325-342: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
343-354: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/management/commands/populate_test_metrics.py
23-33: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
53-53: Unused method argument: args
(ARG002)
63-63: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
63-63: Avoid specifying long messages outside the exception class
(TRY003)
67-67: Avoid specifying long messages outside the exception class
(TRY003)
91-91: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
117-117: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
167-167: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
172-172: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
174-174: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
176-176: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
177-177: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
184-186: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
187-187: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
190-190: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
193-193: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
backend/dashboard_metrics/tasks.py
186-186: Do not catch blind exception: Exception
(BLE001)
398-398: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
439-439: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
480-480: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
669-669: Do not catch blind exception: Exception
(BLE001)
692-692: Redundant exception object included in logging.exception call
(TRY401)
backend/dashboard_metrics/management/commands/generate_metrics_test_data.py
29-32: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
34-41: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
43-45: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
48-54: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
56-62: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
89-89: Unused method argument: args
(ARG002)
232-232: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
244-244: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
246-246: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
250-252: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
253-255: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
258-258: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
259-259: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
285-285: Unused method argument: start_time
(ARG002)
285-285: Unused method argument: end_time
(ARG002)
292-292: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
299-299: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
302-302: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
307-307: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
308-308: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
309-309: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
313-313: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
351-351: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
361-361: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
364-364: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
365-365: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
369-369: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
379-379: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
408-408: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
409-409: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
411-411: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
412-412: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
421-421: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
backend/dashboard_metrics/migrations/0002_setup_cleanup_tasks.py
6-6: Unused function argument: schema_editor
(ARG001)
58-58: Unused function argument: schema_editor
(ARG001)
71-74: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
76-81: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/serializers.py
121-132: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/capture.py
45-45: Consider moving this statement to an else block
(TRY300)
backend/dashboard_metrics/management/commands/backfill_metrics.py
35-49: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
85-85: Unused method argument: args
(ARG002)
258-258: Do not catch blind exception: Exception
(BLE001)
278-278: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
295-295: Do not catch blind exception: Exception
(BLE001)
309-309: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
326-326: Do not catch blind exception: Exception
(BLE001)
340-340: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
357-357: Do not catch blind exception: Exception
(BLE001)
unstract/core/src/unstract/core/metrics/backends/noop.py
25-25: Unused method argument: event
(ARG002)
backend/dashboard_metrics/migrations/0001_initial.py
13-15: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
17-381: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/cache.py
103-103: Probable use of insecure hash functions in hashlib: md5
(S324)
176-176: Consider moving this statement to an else block
(TRY300)
277-277: Do not catch blind exception: Exception
(BLE001)
470-470: Consider moving this statement to an else block
(TRY300)
510-510: Do not catch blind exception: Exception
(BLE001)
backend/dashboard_metrics/views.py
59-59: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
60-60: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
62-62: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
63-63: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
64-64: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
72-72: Abstract raise to an inner function
(TRY301)
72-72: Avoid specifying long messages outside the exception class
(TRY003)
78-78: Avoid specifying long messages outside the exception class
(TRY003)
84-84: Avoid specifying long messages outside the exception class
(TRY003)
235-235: Loop control variable bucket_ts not used within loop body
(B007)
237-237: Loop control variable bucket_ts not used within loop body
(B007)
679-679: Avoid specifying long messages outside the exception class
(TRY003)
737-737: Avoid specifying long messages outside the exception class
(TRY003)
830-830: Unused method argument: request
(ARG002)
863-863: Consider moving this statement to an else block
(TRY300)
875-875: Consider moving this statement to an else block
(TRY300)
unstract/core/src/unstract/core/metrics/__init__.py
39-53: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
| def _upsert_hourly(self, aggregations: dict) -> tuple[int, int]: | ||
| """Upsert hourly aggregations.""" | ||
| created, updated = 0, 0 | ||
| with transaction.atomic(): | ||
| for key, agg in aggregations.items(): | ||
| org_id, ts_str, metric_name, project, tag = key | ||
| timestamp = datetime.fromisoformat(ts_str) | ||
|
|
||
| try: | ||
| obj, was_created = EventMetricsHourly._base_manager.update_or_create( | ||
| organization_id=org_id, | ||
| timestamp=timestamp, | ||
| metric_name=metric_name, | ||
| project=project, | ||
| tag=tag, | ||
| defaults={ | ||
| "metric_type": agg["metric_type"], | ||
| "metric_value": agg["value"], | ||
| "metric_count": agg["count"], | ||
| "labels": agg["labels"], | ||
| }, | ||
| ) | ||
| if was_created: | ||
| created += 1 | ||
| else: | ||
| updated += 1 | ||
| except Exception as e: | ||
| logger.warning(f"Error upserting hourly {key}: {e}") | ||
|
|
||
| return created, updated |
There was a problem hiding this comment.
Catching exceptions inside a single outer transaction.atomic() can break the whole backfill.
If any DB error occurs (e.g., IntegrityError), the transaction may become unusable and subsequent iterations can fail even though exceptions are caught.
Simplest fix: remove the outer transaction.atomic() (each update_or_create() is already atomic), or use a per-row savepoint.
Proposed fix (remove outer atomic + keep loop going safely)
def _upsert_hourly(self, aggregations: dict) -> tuple[int, int]:
"""Upsert hourly aggregations."""
created, updated = 0, 0
- with transaction.atomic():
- for key, agg in aggregations.items():
- org_id, ts_str, metric_name, project, tag = key
- timestamp = datetime.fromisoformat(ts_str)
+ for key, agg in aggregations.items():
+ org_id, ts_str, metric_name, project, tag = key
+ timestamp = datetime.fromisoformat(ts_str)
- try:
- obj, was_created = EventMetricsHourly._base_manager.update_or_create(
+ try:
+ _, was_created = EventMetricsHourly._base_manager.update_or_create(
organization_id=org_id,
timestamp=timestamp,
metric_name=metric_name,
project=project,
tag=tag,
defaults={
"metric_type": agg["metric_type"],
"metric_value": agg["value"],
"metric_count": agg["count"],
"labels": agg["labels"],
},
)
- if was_created:
- created += 1
- else:
- updated += 1
- except Exception as e:
- logger.warning(f"Error upserting hourly {key}: {e}")
+ if was_created:
+ created += 1
+ else:
+ updated += 1
+ except Exception as e:
+ logger.warning(f"Error upserting hourly {key}: {e}")
return created, updatedApply the same pattern to daily/monthly; also prefix unused obj as _ (RUF059).
Also applies to: 300-329, 331-360
🧰 Tools
🪛 Ruff (0.14.11)
278-278: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
295-295: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In @backend/dashboard_metrics/management/commands/backfill_metrics.py around
lines 269 - 298, The outer transaction.atomic() in _upsert_hourly wraps all
update_or_create calls so a DB error (e.g., IntegrityError) can leave the
transaction unusable and abort the rest of the loop; remove the outer
transaction.atomic() (or alternatively use a per-row savepoint) so each
EventMetricsHourly._base_manager.update_or_create call runs in its own atomic
context and the loop can continue on exceptions; also change the unused returned
variable name obj to _ and apply the same change (remove outer atomic or use
per-row savepoints and rename obj → _) in the corresponding daily/monthly upsert
methods that use update_or_create.
There was a problem hiding this comment.
No, that concern is not relevant here. The backfill_metrics.py command does not use transaction.atomic() at all. Each update_or_create call in _upsert_hourly, _upsert_daily, and _upsert_monthly is independently atomic (Django's default), and exceptions are caught per-row with except Exception — so one failure doesn't break subsequent iterations.
It looks like that transaction.atomic() issue was from a previous version that was already fixed. The current code is fine.
There was a problem hiding this comment.
@athul-rs, I appreciate the clarification, but looking at the diff for this file, I can see that transaction.atomic() is present in the current code:
Line 271 in the diff shows:
def _upsert_hourly(self, aggregations: dict) -> tuple[int, int]:
"""Upsert hourly aggregations."""
created, updated = 0, 0
with transaction.atomic(): # ← Line 271
for key, agg in aggregations.items():The outer with transaction.atomic(): wraps all iterations of the loop, which means if any update_or_create() call raises a database error (e.g., IntegrityError), the entire transaction becomes unusable and subsequent iterations will fail even though the exception is caught.
Are you perhaps viewing a different version of the file, or has this already been fixed in a more recent commit not shown in this diff?
backend/dashboard_metrics/management/commands/generate_metrics_test_data.py
Outdated
Show resolved
Hide resolved
backend/dashboard_metrics/management/commands/generate_metrics_test_data.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/dashboard_metrics/views.py`:
- Around line 845-848: The current parsing of limit uses limit =
min(int(request.query_params.get("limit", 10)), 50) which will raise a
ValueError for non-integer input and cause a 500; update the parsing in the view
(where organization = self._get_organization() is used and limit is computed) to
defensively parse the query param by catching ValueError/TypeError, falling back
to a safe default (e.g., 10), enforcing a minimum (e.g., 1) and maximum of 50,
and then assign the sanitized value to limit; reference the
request.query_params.get("limit", 10) expression and the limit variable when
making the change.
🧹 Nitpick comments (6)
backend/dashboard_metrics/views.py (1)
331-341: Avoid per-metric lookup formetric_typein summary.The loop issues one query per metric to fetch
metric_type. You can pull it in the initial aggregation to remove the N+1.♻️ Suggested refactor
- summary = ( - queryset.values("metric_name") - .annotate( - total_value=Sum("metric_value"), - total_count=Sum("metric_count"), - average_value=Avg("metric_value"), - min_value=Min("metric_value"), - max_value=Max("metric_value"), - ) - .order_by("metric_name") - ) - - # Add metric_type from the first record of each metric - summary_list = [] - for row in summary: - metric_type_record = ( - queryset.filter(metric_name=row["metric_name"]) - .values("metric_type") - .first() - ) - row["metric_type"] = ( - metric_type_record["metric_type"] if metric_type_record else "counter" - ) - summary_list.append(row) + summary_list = list( + queryset.values("metric_name", "metric_type") + .annotate( + total_value=Sum("metric_value"), + total_count=Sum("metric_count"), + average_value=Avg("metric_value"), + min_value=Min("metric_value"), + max_value=Max("metric_value"), + ) + .order_by("metric_name") + )frontend/src/components/metrics-dashboard/MetricsChart.jsx (5)
122-128: Clean up forEach callback to avoid returning a value.The static analysis correctly identifies that
Set.add()returns a value, andforEachcallbacks should not return values. While this works, it's cleaner to use a block body.♻️ Suggested fix
data.daily_trend.forEach((item) => { if (item.metrics) { - Object.keys(item.metrics).forEach((key) => metricNamesSet.add(key)); + Object.keys(item.metrics).forEach((key) => { + metricNamesSet.add(key); + }); } });
179-197: Move static color constants to module level.
METRIC_BAR_COLORS,DEFAULT_BAR_COLORS, andgetBarColorare recreated on every render. Since these are static, move them outside the component for minor performance improvement and consistency withTREND_COLORSat line 21.♻️ Suggested refactor
Move these definitions after line 31 (after
TREND_COLORS):// Bar colors - metric-specific for key metrics, fallback array for others const METRIC_BAR_COLORS = { pages_processed: "#1890ff", documents_processed: "#52c41a", failed_pages: "#ff4d4f", llm_calls: "#722ed1", prompt_executions: "#faad14", llm_usage: "#13c2c2", }; const DEFAULT_BAR_COLORS = [ "#1890ff", "#faad14", "#52c41a", "#722ed1", "#13c2c2", ]; const getBarColor = (metric, idx) => METRIC_BAR_COLORS[metric] || DEFAULT_BAR_COLORS[idx % DEFAULT_BAR_COLORS.length];
410-416: Same forEach callback issue as noted earlier.Apply the same fix here for consistency.
♻️ Suggested fix
data.daily_trend.forEach((item) => { if (item.metrics) { - Object.keys(item.metrics).forEach((key) => metricNamesSet.add(key)); + Object.keys(item.metrics).forEach((key) => { + metricNamesSet.add(key); + }); } });
779-792: MoveTREND_BAR_COLORSandgetTrendColorto module level.Same issue as in
MetricsChart— these static constants are recreated on every render.
396-551: Consider extracting shared chart logic into reusable utilities.
MetricsBreakdownandTrendAnalysisChartshare nearly identical patterns:
- Filter dropdown state and handlers (
selectedMetrics,handleFilterClick,cardExtra)- Metric extraction from
daily_trend- Loading/empty state rendering
A custom hook (e.g.,
useMetricFilter) and a wrapper component for the card chrome could reduce ~100 lines of duplication across these components. This is optional and can be deferred.Also applies to: 674-848
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@backend/dashboard_metrics/tasks.py`:
- Around line 586-694: The exception handlers inside the org-processing block
are currently catching all Exceptions (the inner try around the metric loop and
the outer try wrapping org processing), which swallows
DatabaseError/OperationalError and prevents Celery's autoretry; change both
handlers to re-raise DB errors so Celery can retry: in the inner except around
the metric loop (where logger.warning(...) and stats["errors"] += 1 happen),
detect and re-raise DatabaseError and OperationalError before handling other
exceptions, and in the outer except (where logger.exception(...) and
stats["errors"] += 1 happen) do the same; keep the existing logging/stats
increments for non-DB exceptions and ensure _upsert_hourly_replace,
_upsert_daily_replace, and _upsert_monthly_replace DB errors will propagate.
In `@frontend/src/components/metrics-dashboard/MetricsChart.jsx`:
- Around line 258-286: The chart can render blank when metricsToShow (computed
from selectedMetrics and availableMetrics) is empty; add an explicit early
return that checks metricsToShow.length === 0 and renders the same Card used for
other empty states (e.g., title "Trend Analysis (Last 30 Days)" with an Empty
component and a helpful description like "No metrics selected or available")
before the chart rendering logic so users see a clear empty state instead of a
blank chart.
In `@frontend/src/hooks/useMetricsData.js`:
- Around line 114-118: When orgId becomes falsy the effect should clear the
recent-activity state to avoid showing stale data; inside the useEffect that
currently calls fetchActivity() when orgId is truthy (the effect depending on
[orgId, limit]), add an else branch that resets your activity state (e.g., call
the state updater used to store recent activity such as setRecentActivity([]) or
setActivity([])) so recent activity is cleared whenever orgId is
null/undefined/empty.
- Around line 67-71: In useMetricsData's useEffect (the effect that depends on
[orgId, startDate, endDate]) ensure that when orgId is falsy you clear the
hook's cached metric state instead of simply skipping fetchOverview; add an else
branch that resets the relevant state variables (e.g., metrics/overview state,
loading/error flags) so stale data from the previous org is removed on
logout/org switch; update the effect that currently calls fetchOverview() to
conditionally call fetchOverview() when orgId is present and otherwise call the
state-reset logic.
🧹 Nitpick comments (1)
backend/dashboard_metrics/tasks.py (1)
396-411: Avoid unusedobjfrom update_or_create.
Rename to_/_objto address RUF059 and reduce lint noise.♻️ Proposed cleanup
- obj, was_created = EventMetricsHourly._base_manager.update_or_create( + _, was_created = EventMetricsHourly._base_manager.update_or_create( organization_id=org_id, timestamp=hour_ts, metric_name=metric_name, project=project, tag=tag, defaults={ "metric_type": agg["metric_type"], "metric_value": agg["value"], "metric_count": agg["count"], "labels": agg["labels"], }, )- obj, was_created = EventMetricsDaily._base_manager.update_or_create( + _, was_created = EventMetricsDaily._base_manager.update_or_create( organization_id=org_id, date=date_val, metric_name=metric_name, project=project, tag=tag, defaults={ "metric_type": agg["metric_type"], "metric_value": agg["value"], "metric_count": agg["count"], "labels": agg["labels"], }, )- obj, was_created = EventMetricsMonthly._base_manager.update_or_create( + _, was_created = EventMetricsMonthly._base_manager.update_or_create( organization_id=org_id, month=month_val, metric_name=metric_name, project=project, tag=tag, defaults={ "metric_type": agg["metric_type"], "metric_value": agg["value"], "metric_count": agg["count"], "labels": agg["labels"], }, )Also applies to: 438-452, 479-492
- Implement multi-tier metrics aggregation with celery-batches - Add hourly/daily/monthly storage tables with proper indexing - Add dedicated metrics Celery worker (worker-metrics) - Add live query endpoints for immediate data access - Add production readiness: retry config, rate limiting, health checks - Add queue routing and broker resilience configuration - Add comprehensive unit tests for batch processing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Backend: - Add EventMetricsHourly/Daily/Monthly models for pre-aggregated data - Add MetricsQueryService for querying source tables (Usage, PageUsage) - Add aggregate_from_sources Celery task (runs every 15 min) - Add cleanup tasks for hourly (30 days) and daily (365 days) retention - Add /overview/, /usage/, /activity/ API endpoints - Fix Celery task queue routing to dashboard_metric_events - Use _base_manager to bypass org context filter in Celery tasks Frontend: - Add MetricsDashboard component with tabs (Overview, Usage, Activity) - Add MetricsChart, MetricsSummary, MetricsTable components - Add useMetricsData hook with caching - Add recharts dependency for visualizations - Add dashboard.svg icon and sidebar navigation - Support date range picker for all tabs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Remove redundant Detailed View tab and MetricsTable component - Add Recent Activity component showing real-time file executions - Differentiates by type: ETL, API, Workflow - Shows status, file name, and relative timestamps - Convert charts to line charts for 30-day visualization - Create PagesChart (pages_processed + failed_pages) - Create TrendAnalysisChart (other metrics with filter dropdown) - Date picker now only affects summary cards, charts fixed at 30 days - Add recent-activity API endpoint with proper URL registration - Fix organization filter queries in services.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Reduce default limit from 10 to 5 items - Add max-height with overflow scroll for compact display Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cleanup after dashboard refactoring: - Remove unused chart components (MetricsChart, MetricsBreakdown, MultiSeriesChart, MetricsComparison) - Remove unused hooks (useMetricsData, useMetricsSummary, useMetricsSeries) - Remove unused imports (BarChart, Bar, Progress) - Remove unused CSS styles (.usage-quota-*, .trend-*, .breakdown-*) - Update exports to only include used components - Add failed_pages to metrics aggregation task Reduces bundle size by ~982 lines of dead code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
acb9e9c to
ce6bba3
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/pyproject.toml (1)
15-20:⚠️ Potential issue | 🟠 MajorUpdate celery-batches to 0.10 or later for Celery 5.3.x and Python 3.12 compatibility.
celery-batches 0.9.0 targets Celery 5.0–5.1 and does not support Celery 5.3.x. For Celery 5.3.x and Python 3.12, use
celery-batches>=0.10(or latest 0.11). No known security advisories exist for this package.
🤖 Fix all issues with AI agents
In `@backend/dashboard_metrics/management/commands/backfill_metrics.py`:
- Around line 202-256: The code currently increments the per-key "count" by 1 in
hourly_agg/daily_agg/monthly_agg which yields bucket counts instead of
source-record counts; update the increments in the loops in backfill_metrics.py
(the blocks that update hourly_agg[key]["count"], daily_agg[key]["count"], and
monthly_agg[key]["count"]) to use an explicit row-level count value from the
query result (e.g., row.get("count")) falling back to 1 if missing: set
row_count = row.get("count", None) and then hourly_agg[key]["count"] +=
row_count if row_count is not None else (value if metric_type == "counter" else
1); apply the same change for the daily and monthly aggregation loops so
averages use source-record counts (and confirm MetricsQueryService returns a
"count" field).
In `@backend/dashboard_metrics/tasks.py`:
- Around line 121-129: The event handler is passing a None tag into model
upserts which conflicts with the EventMetrics* CharField (null=False); normalize
tag to a non-null string (e.g., "") right after reading it from the event
(replace tag = event.get("tag") with a normalized value) and use that normalized
variable in all subsequent get_or_create()/update_or_create() calls so the model
default is not overridden by None.
In `@frontend/src/components/metrics-dashboard/MetricsDashboard.jsx`:
- Around line 26-70: chartStart/chartEnd are frozen by useMemo at mount so
refetchChart() keeps using stale dates; update handleRefresh to compute a fresh
30-day range and pass it to the chart refetch, e.g. compute const newStart =
dayjs().subtract(30, "day").toISOString() and const newEnd =
dayjs().toISOString() inside handleRefresh and call refetchChart(newStart,
newEnd). If refetchChart currently doesn't accept parameters, modify
useMetricsOverview(chartStart, chartEnd) to return a refetch that accepts
optional (start,end) overrides and use those in handleRefresh; touch the symbols
chartStart, chartEnd, refetchChart, useMetricsOverview, and handleRefresh.
In `@frontend/src/components/metrics-dashboard/RecentActivity.jsx`:
- Around line 91-125: The List rendering of dataSource={data.activity} lacks a
stable key causing reconciliation warnings; update the List component (the one
using dataSource, renderItem, and STATUS_CONFIG/TYPE_CONFIG) to include
rowKey="id" so React uses each activity's stable id field for keys (ensure the
id property on items is used as the unique key).
🧹 Nitpick comments (4)
unstract/core/src/unstract/core/metrics/registry.py (1)
42-48: Return a stable order fromget_all_metrics().Converting a set to a list yields nondeterministic ordering; sorting avoids flaky output and makes results predictable.
♻️ Suggested fix
- return list(_REGISTERED_METRICS) + return sorted(_REGISTERED_METRICS)backend/backend/celery_config.py (1)
1-74: Annotate mutable class attributes asClassVarto satisfy Ruff.
This clears RUF012 warnings and makes the config intent explicit.♻️ Suggested fix
-from urllib.parse import quote_plus +from urllib.parse import quote_plus +from typing import ClassVar from django.conf import settings from kombu import Queue @@ - broker_transport_options = { + broker_transport_options: ClassVar[dict[str, object]] = { "max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 1, "socket_timeout": 30, "socket_keepalive": True, } @@ - task_queues = [ + task_queues: ClassVar[list[Queue]] = [ Queue("celery", routing_key="celery"), Queue("dashboard_metric_events", routing_key="dashboard_metric_events"), Queue("celery_periodic_logs", routing_key="celery_periodic_logs"), Queue("celery_log_task_queue", routing_key="celery_log_task_queue"), Queue("celery_api_deployments", routing_key="celery_api_deployments"), Queue("file_processing", routing_key="file_processing"), Queue("api_file_processing", routing_key="api_file_processing"), Queue("file_processing_callback", routing_key="file_processing_callback"), Queue("api_file_processing_callback", routing_key="api_file_processing_callback"), ] @@ - task_routes = { + task_routes: ClassVar[dict[str, dict[str, str]]] = { "dashboard_metrics.process_events": {"queue": "dashboard_metric_events"}, "dashboard_metrics.aggregate_from_sources": {"queue": "dashboard_metric_events"}, "dashboard_metrics.cleanup_hourly_data": {"queue": "dashboard_metric_events"}, "dashboard_metrics.cleanup_daily_data": {"queue": "dashboard_metric_events"}, }backend/dashboard_metrics/capture.py (1)
79-88: Avoid mutating caller-suppliedlabels.
labels or {}followed by in-place updates will modify the caller’s dict. Copy before extending and apply the same pattern in all record_* methods.♻️ Suggested pattern
- metric_labels = labels or {} + metric_labels = dict(labels) if labels else {}backend/dashboard_metrics/management/commands/populate_test_metrics.py (1)
53-77: Consider chaining the original exception for better debugging.When re-raising
CommandError, chain the original exception to preserve the stack trace.🔧 Suggested improvement
try: org = Organization.objects.get(id=org_id) except Organization.DoesNotExist: - raise CommandError(f"Organization with ID {org_id} not found") + raise CommandError(f"Organization with ID {org_id} not found") from None
- Remove event-driven capture code (signals, kafka dependency, real-time hooks) - Add per-workflow LLM token usage endpoint with Redis caching - Add LLM Usage tab with sortable table showing tokens/cost per workflow - Add HITL Reviews & Completions bar chart to overview - Add HITL metrics (reviews, completions) to summary cards and trend chart - Add frontend caching layer (localStorage) for workflow token usage - Fix double queryset evaluation in MetricsQueryService - Add migration to fix tag NOT NULL constraint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@backend/dashboard_metrics/README.md`:
- Around line 786-794: Update the migration filenames listed in the README to
match the actual migrations in the PR: replace `0002_setup_cleanup_tasks.py`
with `0002_setup_cleanup_periodic_task`, ensure `0003_setup_aggregation_task.py`
is updated to the actual `0003_add_daily_monthly_tables` (or keep
`0003_setup_aggregation_task` only if that file exists), and add the missing
`0004_fix_tag_not_null` entry so the README's migrations block reflects
`0001_initial.py`, `0002_setup_cleanup_periodic_task.py`,
`0003_add_daily_monthly_tables.py` (or `0003_setup_aggregation_task.py` if
applicable), and `0004_fix_tag_not_null.py`.
In `@backend/dashboard_metrics/services.py`:
- Around line 649-670: The loop building results can raise if
execution.workflow_execution.workflow is None; update the dict construction
inside the recent loop (where execution, workflow_execution, and
workflow.workflow_name are referenced) to safely handle a missing workflow by
using a conditional or getattr to set "workflow_name" to a sensible fallback
(e.g., None or empty string) when execution.workflow_execution.workflow is None,
and keep all other fields unchanged.
In `@backend/dashboard_metrics/tasks.py`:
- Around line 296-369: The aggregator is currently incrementing "count" by 1 per
returned row (thus counting buckets, not events); change it to consume the
event-count returned by query methods (use row.get("count", 1) instead of += 1)
in hourly_agg and daily_agg and when building monthly_buckets (add
row.get("count", 1) to the bucket["count"]), and update MetricsQueryService
methods to return both "value" and "count" (e.g., Count("id")) so metric records
reflect the actual number of events aggregated into each record.
In `@frontend/src/hooks/useMetricsData.js`:
- Around line 182-186: When orgId becomes falsy the hook currently skips calling
fetchUsage but leaves previous org's data in state; inside the useEffect that
depends on [orgId, startDate, endDate] (the block that calls fetchUsage), add
logic to detect when orgId is falsy and explicitly reset the hook's state (the
usage/metrics state variables managed by useMetricsData, e.g., the variables
updated by fetchUsage) to their initial empty values before returning so stale
cross-org data is cleared; implement the reset where you currently call
fetchUsage (or in the effect cleanup) and reference useEffect, fetchUsage,
orgId, startDate, endDate and the hook's state setters to locate the change.
🧹 Nitpick comments (4)
backend/dashboard_metrics/README.md (1)
15-15: Add language identifiers to fenced code blocks.Several ASCII-art / plain-text fenced code blocks lack a language specifier (e.g.,
```textor```plaintext). This triggers MD040 warnings and prevents proper rendering in some Markdown processors.Also applies to: 86-86, 299-299, 480-480, 771-771
backend/dashboard_metrics/migrations/0004_fix_tag_not_null.py (1)
69-98:reverse_sqldrops NOT NULL but doesn't drop the DEFAULT.The forward SQL sets both
SET DEFAULT ''andSET NOT NULL, but the reverse only doesDROP NOT NULL. If this migration is reversed, theDEFAULT ''will remain. This is harmless in practice but technically asymmetric.Suggested fix (if full reversibility is desired)
reverse_sql=( - "ALTER TABLE event_metrics_hourly " "ALTER COLUMN tag DROP NOT NULL;" + "ALTER TABLE event_metrics_hourly " + "ALTER COLUMN tag DROP NOT NULL, " + "ALTER COLUMN tag DROP DEFAULT;" ),Apply similarly to the other two tables.
frontend/src/helpers/metricsCache.js (1)
66-83: AfterclearMetricsCache()on quota error, the current write is silently lost.When
QuotaExceededErroris caught, all metrics cache is cleared but the currentdatais not retried. This means the response that triggered the write will not be cached. Consider retrying thesetItemonce after clearing.frontend/src/components/metrics-dashboard/MetricsChart.jsx (1)
322-334:TREND_BAR_COLORSis re-created on every render.This constant map is defined inside the component body after early returns. Consider hoisting it to module scope alongside
TREND_COLORSto avoid unnecessary re-creation.Suggested move
+// Colors for specific trend metrics +const TREND_BAR_COLORS = { + documents_processed: "#faad14", + llm_calls: "#722ed1", + prompt_executions: "#52c41a", + deployed_api_requests: "#13c2c2", + llm_usage: "#eb2f96", + etl_pipeline_executions: "#fa8c16", + challenges: "#2f54eb", + summarization_calls: "#a0d911", + hitl_reviews: "#283593", + hitl_completions: "#00897b", +}; + // Metrics for Pages chart const PAGES_METRICS = ["pages_processed", "failed_pages"];Then remove the identical block from inside
TrendAnalysisChart.
chandrasekharan-zipstack
left a comment
There was a problem hiding this comment.
Only checked a few files myself, here's some review comments from my agent:
- N+1 aggregation task - queries ALL orgs x ALL metrics every 15 min - aggregate_metrics_from_sources iterates every org, runs 12 metric queries x 3 granularities (hourly/daily/monthly) = ~36 DB queries per org. With many orgs, this is a DB performance bomb every 15 minutes. No batching or parallelism.
- recent_activity endpoint lacks rate limiting/caching - Unlike other endpoints, recent_activity has no @cache_metrics_response decorator and does multiple DB queries (Pipeline, APIDeployment, WorkflowFileExecution, Usage) on every request
- Individual update_or_create calls inside loops (no bulk operations - _upsert_hourly_replace, _upsert_daily_replace, _upsert_monthly_replace all do individual update_or_create per record inside a single transaction.atomic(). For large datasets this is extremely slow. Should use bulk_create(..., update_conflicts=True).
- Frontend adds recharts dependency but no lockfile update
- Some FE code has inline CSS that needs to be avoided
- Avoid management commands for seeding the DB to test this dashboard feature, instead make it a part of test scaffolding
- The cleanup of this also dangerously clears other data in some of the tables
- Check if
capture.py+ core metrics library is dead code. The agent reports that its not called from anywhere
backend/backend/settings/base.py
Outdated
| ) | ||
|
|
||
| # Dashboard Metrics Cache TTL settings (in seconds) | ||
| DASHBOARD_CACHE_TTL_CURRENT_HOUR = int( |
There was a problem hiding this comment.
@athul-rs do you think a 30s cache TTL is too aggressive here for hourly information?
- How expensive is the query which is being cached?
- If the data that we show using this cache is mainly charts over a longer period, I don't think it needs to be this recent (maybe 1-2 mins could be fine)
There was a problem hiding this comment.
It's a simple indexed range scan — not expensive per bucket. But the data it's caching is for dashboard charts showing trends over days/weeks. Users aren't expecting real-time second-by-second updates on chart data. 60s/120s is a reasonable middle ground.
backend/dashboard_metrics/management/commands/generate_metrics_test_data.py
Outdated
Show resolved
Hide resolved
backend/dashboard_metrics/management/commands/generate_metrics_test_data.py
Outdated
Show resolved
Hide resolved
backend/dashboard_metrics/management/commands/populate_test_metrics.py
Outdated
Show resolved
Hide resolved
- Revert celery_config.py to original (remove redundant/side-effect-prone global settings that affected all workers) - Remove redundant DASHBOARD_CACHE_TTL_* from base.py (cache.py uses getattr with same defaults) - Fix backfill_metrics.py: add missing failed_pages, fix tag=None→"", use _base_manager for daily/monthly upserts - Restore HITL metrics with try/except ImportError guard (safe for OSS, functional on cloud) - Make HITLChart data-driven: returns null when no HITL data available - Filter zero-value metrics from summary cards Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In PostgreSQL, when an exception occurs inside transaction.atomic(), the transaction is marked as aborted and ALL subsequent queries in that block fail with InternalError. Since we catch exceptions per-row to allow partial progress, the outer transaction.atomic() defeats that intent. Each update_or_create() is already individually atomic in Django's default autocommit mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove abandoned unstract.core.metrics real-time capture package (8 files, zero external imports) - Remove labels JSONField and GIN indexes from all 3 metric models and migration - Remove unused serializers (MetricDataPointSerializer, MetricSeriesSerializer, MetricsSummarySerializer, MetricsResponseSerializer) - Remove unused cache functions (invalidate_metrics_cache, MetricsCacheService, get_bucket_cache_stats, get_time_aware_cache_ttl) - Remove dead _apply_filters method from ViewSet - Remove deleted management commands (generate_metrics_test_data, populate_test_metrics) - Remove capture.py and consolidated stale migrations - Fix cross-org data bleed in useMetricsData.js hooks (reset state when orgId is falsy) - Restore corrupted upsert functions in tasks.py - Update README with architecture docs and current metrics list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
| - Hourly cleanup: Daily at 2:00 AM UTC, removes hourly data older than 30 days | ||
| - Daily cleanup: Weekly on Sundays at 3:00 AM UTC, removes daily data older than 365 days |
There was a problem hiding this comment.
| - Hourly cleanup: Daily at 2:00 AM UTC, removes hourly data older than 30 days | |
| - Daily cleanup: Weekly on Sundays at 3:00 AM UTC, removes daily data older than 365 days | |
| - Hourly data cleanup: Daily at 2:00 AM UTC, removes hourly data older than 30 days | |
| - Daily data cleanup: Weekly on Sundays at 3:00 AM UTC, removes daily data older than 365 days |
There was a problem hiding this comment.
@athul-rs how will this command be run during deployment? Is it a separate K8s job? Why isn't it a data migration instead?
There was a problem hiding this comment.
@athul-rs how will this command be run during deployment? Is it a separate K8s job? Why isn't it a data migration instead?
Intentionally done this way: Backfilling 90 days of data across many orgs could take significant time. A data migration would block the entire migration chain and slow every deploy. Rerunning if required is also a factor.
There was a problem hiding this comment.
NIT: Consider following a rule of only 1 model per file. It makes it easy for humans and agents to understand context easily. Should be an easy refactor with Claude
|
@athul-rs Few minor points from one of the screenshots -
|
- Move RangePicker and refresh button into tab bar extra content - Replace topbar-right with Documentation and Slack Community buttons - Default trend chart to documents_processed only Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
@athul-rs please move celery tasks from backend to workers
But I think the breat tasks are still in backend, So my comment might be invalidated
| PeriodicTask = apps.get_model("django_celery_beat", "PeriodicTask") | ||
|
|
||
| # --- Aggregation task: every 15 minutes --- | ||
| schedule_15min, _ = IntervalSchedule.objects.get_or_create( |
There was a problem hiding this comment.
It would be better to create a worker that runs every N seconds.
Please check workers/log_consumer.
Avoid using periodic tasks or Celery Beat for this.
There was a problem hiding this comment.
any reason to avoid using celery beat for this ?
| from datetime import datetime, timedelta | ||
| from typing import Any | ||
|
|
||
| from celery import shared_task |
There was a problem hiding this comment.
please use workers for tasks , We already dedjangofying the clery and moved away from backend.
There was a problem hiding this comment.
Note: But I think the breat tasks are still in backend, So my comment might be invalidated
There was a problem hiding this comment.
What exactly need to be changed ? since "dedjango-fying" pattern applies to worker apps, not scheduling. Correct me if am wrong ?
|
|
||
| @action(detail=False, methods=["get"], url_path="live-series") | ||
| @cache_metrics_response(endpoint="live_series") | ||
| def live_series(self, request: Request) -> Response: |
There was a problem hiding this comment.
is it from aggregted table or or actual source tables?
There was a problem hiding this comment.
It queries directly from source tables (Usage, PageUsage, WorkflowExecution, etc.)
There was a problem hiding this comment.
It is being removed, not used now.
| - ${TOOL_REGISTRY_CONFIG_SRC_PATH}:/data/tool_registry_config | ||
|
|
||
| # Celery worker for dashboard metrics processing | ||
| worker-metrics: |
There was a problem hiding this comment.
What this worker do? How this diffrent from tasks in backend/dashborad_metrics , is it not by beat?
There was a problem hiding this comment.
dashboard_metrics/tasks.py defines the tasks (aggregate_metrics_from_sources, cleanup_hourly_metrics, cleanup_daily_metrics)
Celery Beat schedules them (via the migration that creates PeriodicTask entries routed to the dashboard_metric_events queue)
worker-metrics docker service consumes that queue: celery -A backend worker -Q dashboard_metric_events
It's the same relationship as worker-logging consuming the celery_periodic_logs queue. The dedicated docker service gives it its own process/scaling independent of the general worker.
@chandrasekharan-zipstack Updated. |
Frontend:
- Rename side nav "Metrics" to "Dashboard" with "New" tag
- Rename header from "Metrics Dashboard" to "Dashboard"
- Fix dashboard icon color to light grey (#90A4B7) matching other nav icons
- HITL chart colors changed to blue/gold palette (#69b1ff/#faad14)
- HITL and Trend charts side by side (lg={12})
- LLM Usage table: header outside card, styled table with background
- Summary cards: remove suffixes, use Math.round(), fix prettier
- Add cloud-only PlanBanner import via try-catch require()
Backend:
- Move cache TTLs to Django settings (base.py + sample.env)
- Fix _base_manager usage in tasks.py and services.py for Celery context
- Simplify views.py: use model manager for org filtering
- Add recent_activity caching, fix limit parsing
- Use logger.exception() instead of f-string in error handling
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace md5 with sha256 for cache key hashing - Add noopener,noreferrer to window.open calls - Remove unused DASHBOARD_METRICS_ENABLED from docker-compose Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Athul <89829560+athul-rs@users.noreply.github.com>
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|
Code reviewFound 1 issue:
unstract/frontend/src/components/navigations/side-nav-bar/SideNavBar.jsx Lines 322 to 332 in 3756ed4 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
| Uses Redis through Django's cache framework. | ||
|
|
||
| Cache TTL Strategy: | ||
| - Current hour/day/month data: Configurable via DASHBOARD_CACHE_TTL_CURRENT_BUCKET (default 60s) |
There was a problem hiding this comment.
Hope its count and size are not huge, so it won’t fill the Redis memory quickly.
| # Examples: 900 (15 min), 1800 (30 min), 3600 (60 min) | ||
| MIN_SCHEDULE_INTERVAL_SECONDS=1800 | ||
|
|
||
| # Dashboard Metrics Cache TTLs (in seconds) |
| def _get_trunc_func(granularity: str): | ||
| """Get the appropriate truncation function for the granularity.""" | ||
| trunc_map = { | ||
| "hour": TruncHour, |
There was a problem hiding this comment.
Making these hour,day etc.. to an enum would be better


What
Why
How
backend/dashboard_metrics/Django app with models, views, services, and tasksunstract/core/src/unstract/core/metrics/library with queue backend using Kombudashboard_metric_eventsqueueCan this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
pytest backend/dashboard_metrics/tests/docker compose up worker-metricsGET /api/v2/metrics/health/Screenshots
Checklist
I have read and understood the Contribution Guidelines.