Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
node_modules
.git
.gitignore
*.md
dist
23 changes: 23 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Base
FROM node:24-slim AS base
ENV PNPM_HOME="/pnpm"
ENV PATH="$PNPM_HOME:$PATH"
RUN corepack enable

FROM base AS build
COPY . /usr/src/app
WORKDIR /usr/src/app
RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --filter @framework-tracker/cwv-stats... --prod --frozen-lockfile
RUN pnpm deploy --filter=@framework-tracker/cwv-stats --prod /prod/cwv-stats --legacy

FROM base AS cwv-stats-base
ENV NODE_ENV=production
COPY --from=build /prod/cwv-stats/node_modules /app/node_modules
COPY --from=build /prod/cwv-stats/src /app/src
USER node
WORKDIR /app
CMD [ "node", "src/lcp/index.ts" ]

# LCP Stats
FROM cwv-stats-base AS cwv-stats-lcp
CMD [ "node", "src/lcp/index.ts" ]
13 changes: 13 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
services:
cwv-stats-lcp:
build:
context: .
dockerfile: Dockerfile
target: cwv-stats-lcp
image: cwv-stats-lcp
volumes:
- ~/.config/gcloud/application_default_credentials.json:/app/application_default_credentials.json:ro
- ./packages/cwv-stats/src:/app/src
environment:
- GOOGLE_APPLICATION_CREDENTIALS=/app/application_default_credentials.json
- GOOGLE_CLOUD_PROJECT=framework-tracker
123 changes: 123 additions & 0 deletions packages/cwv-stats/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# CWV (Core Web Vitals) Stats – `cwv-stats`

`cwv-stats` is a small Node.js utility that queries the public HTTP Archive dataset in Google BigQuery and
computes **Core Web Vitals statistics for popular JavaScript frameworks**, starting with:

- React
- Next.js

Right now it focuses on **Largest Contentful Paint (LCP)** and returns percentile statistics (p50, p75, p90,
p95, p99) plus basic counts for each framework.

The code that runs the query lives in:

- `src/frameworks/frameworks.ts` – list of frameworks
- `src/query-client/client.ts` – thin BigQuery client wrapper
- `src/lcp/lcp.ts` – LCP query + result shaping
- `src/lcp/index.ts` – entrypoint that runs the LCP query and logs the results

---

## Prerequisites

To run `cwv-stats` locally (inside Docker) you’ll need:

- **Google Cloud Account & Project**.
Setup your Google Cloud Account and new project named `framework-tracker` which has access to BigQuery and the HTTP Archive public dataset
(e.g. `httparchive.sample_data.pages_10k`, or the full `httparchive.latest.pages` if you have billing enabled on this account).
Visit [Getting started accessing the HTTP Archive with BigQuery](https://har.fyi/guides/getting-started/)
- **Docker** installed and running on your machine. [Docker Mac Installation](https://docs.docker.com/desktop/setup/install/mac-install/)

You do _not_ need Node or pnpm installed on your host machine if you only use the container workflow.

### Google Cloud / BigQuery setup

1. **Install the gcloud CLI**
This will be needed for authenticating with BigQuery.
If your running on a Mac you can install via Homebrew

```
brew update && brew install --cask gcloud-cli
```

More details on installing with Homebrew can be found [here](https://docs.cloud.google.com/sdk/docs/downloads-homebrew)

For non Mac users try [Install the Google Cloud CLI](https://docs.cloud.google.com/sdk/docs/install-sdk)

2. **Create credentials file for authenticating**
Once you have setup `gcloud-cli` you will need to authenticate with your account.
First Initialise the Google Cloud CLI by running:

```
gcloud init
```

Then create your local authentication credentials

```
gcloud auth application-default login
```

This will create the following file:
- Linux, macOS: `$HOME/.config/gcloud/application_default_credentials.json`
- Windows: `%APPDATA%\gcloud\application_default_credentials.json`

When we run the container using `docker compose` on our local machine this file is mounted into the container for Mac users. For Window users
you may need to update the volume mount in the `docker-compose.yml` file.

3. Set the **project ID** that should be billed for BigQuery queries.
In the `docker-compose.yml` you will need to set the `GOOGLE_CLOUD_PROJECT` env variable to the project ID used in your GCP account for querying
the http archive. **Note: This is just temporary until we have a GCP account with billing that we can use to query the full dataset.**.

For general reference on the Node.js BigQuery client library, see:
[BigQuery API Client Libraries](https://docs.cloud.google.com/bigquery/docs/reference/libraries#client-libraries-usage-nodejs)

---

## Running the container locally

For local development, this repo uses **Docker Compose**. The `cwv-stats-lcp` service builds the image from
the repo root `Dockerfile` and runs the `src/lcp/index.ts` entrypoint to execute the LCP query.

### 1. Authenticate (ADC)

This service uses Google Cloud **Application Default Credentials (ADC)**. The simplest setup for local
development is:

```bash
gcloud auth application-default login
```

### 2. Build the service image

```bash
docker compose build cwv-stats-lcp
```

If you’ve recently changed/installed dependencies or Dockerfile stages, you may want a clean build:

```bash
docker compose build --no-cache cwv-stats-lcp
```

### 3. Run it

From the **repo root**:

```bash
docker compose run cwv-stats-lcp
```

What Compose does (see `docker-compose.yml`):

- Mounts your local ADC credentials file into the container.
- Sets `GOOGLE_APPLICATION_CREDENTIALS` so the BigQuery client can authenticate.
- Sets `GOOGLE_CLOUD_PROJECT` so BigQuery knows which project to bill.
- Mounts `./packages/cwv-stats/src` into `/app/src` so code changes are reflected without rebuilding.

---

## Adding a new image

If you would like to add a new image you can build off of the `cwv-stats-base` image and set the `CMD` to run the project you are interested in.
Multi stage builds are used to keep the overall size of the image as small as possible.
19 changes: 19 additions & 0 deletions packages/cwv-stats/eslint.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// @ts-check
import js from '@eslint/js'

export default [
{
ignores: ['node_modules/**'],
},
js.configs.recommended,
{
languageOptions: {
globals: {
console: 'readonly',
process: 'readonly',
},
ecmaVersion: 2022,
sourceType: 'module',
},
},
]
13 changes: 13 additions & 0 deletions packages/cwv-stats/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"name": "@framework-tracker/cwv-stats",
"version": "0.0.1",
"private": true,
"type": "module",
"devDependencies": {
"@types/node": "^25.0.3"
},
"dependencies": {
"@google-cloud/bigquery": "^8.1.1",
"zod": "^4.3.6"
}
}
3 changes: 3 additions & 0 deletions packages/cwv-stats/src/frameworks/frameworks.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
export const frameworks = ['REACT', 'NEXT.JS'] as const

export type Framework = (typeof frameworks)[number]
12 changes: 12 additions & 0 deletions packages/cwv-stats/src/lcp/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import { getFrameworksLCP } from './lcp.ts'
import { frameworks } from '../frameworks/frameworks.ts'

async function main() {
console.info('Starting LCP Query')
const stats = await getFrameworksLCP([...frameworks])
console.info(stats)
}

main()
.catch(console.error)
.finally(() => console.info('Finished LCP Query'))
92 changes: 92 additions & 0 deletions packages/cwv-stats/src/lcp/lcp.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import { type Framework, frameworks } from '../frameworks/frameworks.ts'
import { runQuery } from '../query-client/client.ts'
import * as z from 'zod'

const lcpSchema = z.object({
framework: z.enum(frameworks),
total_sites: z.number(),
min_lcp: z.number(),
p50_lcp: z.number(),
p75_lcp: z.number(),
p90_lcp: z.number(),
p95_lcp: z.number(),
p99_lcp: z.number(),
max_lcp: z.number(),
})

type CoreWebVital = 'LCP'

type PercentileStatistics = {
numericUnit: string
p50: number
p75: number
p90: number
p95: number
p99: number
}

type FrameworkMetric = {
framework: Framework
coreWebVital: CoreWebVital
numSitesMeasured: number
stats: PercentileStatistics
}

// TODO - Once we have GCP credits replace httparchive.sample_data.pages_10k with httparchive.latest.pages
// httparchive.latest.pages is a view that reflects the latest monthly snapshot.
// httparchive.crawl.pages is all data from 2011

export async function getFrameworksLCP(
frameworks: Array<Framework>,
): Promise<Array<FrameworkMetric>> {
console.info(`Running LCP Query for frameworks: [${frameworks.join(',')}]`)

// We use APPROX_QUANTILES for better performance on large datasets
// https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/approximate_aggregate_functions#approx_quantiles
const query = `
WITH framework_metrics AS (
SELECT
tech.technology AS framework,
SAFE.FLOAT64(JSON_EXTRACT(lighthouse, '$.audits.largest-contentful-paint.numericValue')) AS lcp_ms
FROM
\`httparchive.sample_data.pages_10k\`,
UNNEST(technologies) AS tech
WHERE
client = 'desktop' AND
LOWER(tech.technology) IN ('react', 'next.js')
)
SELECT
UPPER(framework) AS framework,
COUNT(lcp_ms) AS total_sites,
MIN(lcp_ms) AS min_lcp,
APPROX_QUANTILES(lcp_ms, 100)[OFFSET(50)] AS p50_lcp,
APPROX_QUANTILES(lcp_ms, 100)[OFFSET(75)] AS p75_lcp,
APPROX_QUANTILES(lcp_ms, 100)[OFFSET(90)] AS p90_lcp,
APPROX_QUANTILES(lcp_ms, 100)[OFFSET(95)] AS p95_lcp,
APPROX_QUANTILES(lcp_ms, 100)[OFFSET(99)] AS p99_lcp,
MAX(lcp_ms) AS max_lcp
FROM
framework_metrics
WHERE
lcp_ms IS NOT NULL
GROUP BY framework_metrics.framework
`

const rows = await runQuery(query)

const metrics = rows.map((row) => lcpSchema.parse(row))

return metrics.map((metric) => ({
framework: metric.framework,
coreWebVital: 'LCP',
numSitesMeasured: metric.total_sites,
stats: {
numericUnit: 'ms',
p50: metric.p50_lcp,
p75: metric.p75_lcp,
p90: metric.p90_lcp,
p95: metric.p95_lcp,
p99: metric.p99_lcp,
},
}))
}
19 changes: 19 additions & 0 deletions packages/cwv-stats/src/query-client/client.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import { BigQuery } from '@google-cloud/bigquery'

const bigquery = new BigQuery()

export async function runQuery(query: string): Promise<Array<any>> {
// For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query
const options = {
query: query,
// Location must match that of the dataset(s) referenced in the query.
location: 'US',
}

const [job] = await bigquery.createQueryJob(options)
console.info(`Job ${job.id} started.`)

const [rows] = await job.getQueryResults()

return rows
}
21 changes: 21 additions & 0 deletions packages/cwv-stats/tsconfig.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"compilerOptions": {
"target": "ES2022",
"lib": ["ES2022"],
"module": "ESNext",
"allowImportingTsExtensions": true,
"moduleResolution": "bundler",
"noEmit": true,
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"resolveJsonModule": true,
"allowSyntheticDefaultImports": true,
"forceConsistentCasingInFileNames": true,
"types": ["node"],
"erasableSyntaxOnly": true,
"preserveSymlinks": false
},
"include": ["apps/**/*.ts", "src/lcp/index.ts"],
"exclude": ["node_modules"]
}
Loading
Loading