Skip to content

Add AI bot classification for event enrichment#57

Open
jaredmixpanel wants to merge 4 commits intomasterfrom
feature/ai-bot-classification
Open

Add AI bot classification for event enrichment#57
jaredmixpanel wants to merge 4 commits intomasterfrom
feature/ai-bot-classification

Conversation

@jaredmixpanel
Copy link
Contributor

@jaredmixpanel jaredmixpanel commented Feb 19, 2026

Summary

Adds AI bot classification with a BotClassifyingMessageBuilder decorator that automatically detects AI crawler requests and enriches tracked events with classification properties.

What it does

  • Classifies user-agent strings against a database of 12 known AI bots
  • Enriches events with $is_ai_bot, $ai_bot_name, $ai_bot_provider, and $ai_bot_category properties
  • Supports custom bot patterns that take priority over built-in patterns
  • Case-insensitive matching

AI Bots Detected

GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, Google-Extended, PerplexityBot, Bytespider, CCBot, Applebot-Extended, Meta-ExternalAgent, cohere-ai

Implementation Details

Architecture

  • BotClassifyingMessageBuilder is a drop-in replacement for MessageBuilder — zero modifications to existing SDK files
  • Intercepts event() and importEvent(); all people/group methods delegate unchanged to the wrapped MessageBuilder
  • Deep-copies JSONObject properties via new JSONObject(properties.toString()) serialization to prevent mutation of caller's object
  • Builder pattern for custom classifiers (AiBotClassifier.Builder)
  • Flyweight NOT_A_BOT singleton (AiBotClassification.noMatch()) for non-bot classification results — avoids allocations on the hot path

Public API

Class Method / Constructor Description
AiBotClassifier static AiBotClassification classify(String userAgent) Classify a user-agent against the default 12-bot database (static convenience)
AiBotClassifier AiBotClassification classifyUserAgent(String userAgent) Instance method — includes custom bots added via Builder
AiBotClassifier static List<AiBotEntry> getBotDatabase() Returns unmodifiable view of the default bot database
AiBotClassifier.Builder Builder addBot(AiBotEntry entry) Add a single custom bot entry (checked before built-ins)
AiBotClassifier.Builder Builder addBots(List<AiBotEntry> entries) Add multiple custom bot entries
AiBotClassifier.Builder AiBotClassifier build() Build the classifier instance
AiBotClassification boolean isAiBot() Whether the user-agent matched an AI bot
AiBotClassification String getBotName() Bot name (e.g. "GPTBot"), or null if not a bot
AiBotClassification String getProvider() Provider (e.g. "OpenAI"), or null if not a bot
AiBotClassification String getCategory() Category ("indexing", "retrieval", or "agent"), or null if not a bot
AiBotEntry AiBotEntry(Pattern pattern, String name, String provider, String category, String description) Immutable bot definition with compiled regex pattern
AiBotEntry boolean matches(String userAgent) Test if user-agent matches this bot's pattern via Matcher.find()
BotClassifyingMessageBuilder BotClassifyingMessageBuilder(MessageBuilder delegate) Wrap a MessageBuilder with default bot classification
BotClassifyingMessageBuilder BotClassifyingMessageBuilder(MessageBuilder delegate, AiBotClassifier classifier) Wrap with a custom classifier
BotClassifyingMessageBuilder JSONObject event(String distinctId, String eventName, JSONObject properties) Create event with bot enrichment
BotClassifyingMessageBuilder JSONObject importEvent(String distinctId, String eventName, JSONObject properties) Create import event with bot enrichment

Notable Design Decisions

  1. Decorator, not subclass: BotClassifyingMessageBuilder wraps MessageBuilder via composition rather than extending it, keeping the change fully additive with zero edits to existing SDK source files.
  2. Static + instance classification: AiBotClassifier.classify(ua) provides a zero-setup static path for the common case; AiBotClassifier.Builder + classifyUserAgent(ua) supports custom bots when needed. Custom entries are prepended so they take priority over built-in patterns.
  3. Defensive deep-copy: enrichProperties() serializes the input JSONObject to a string and re-parses it (new JSONObject(properties.toString())) before injecting properties, ensuring the caller's original object is never mutated. On JSONException, the original properties are returned unchanged.

Usage Examples

Drop-in MessageBuilder Replacement

MessageBuilder base = new MessageBuilder("YOUR_TOKEN");
BotClassifyingMessageBuilder builder = new BotClassifyingMessageBuilder(base);

// Use builder.event() exactly like MessageBuilder.event()
JSONObject props = new JSONObject();
props.put("$user_agent", "Mozilla/5.0 (compatible; GPTBot/1.0)");
props.put("page", "/pricing");

JSONObject message = builder.event("user-123", "page_view", props);
// message properties now include:
//   $is_ai_bot: true
//   $ai_bot_name: "GPTBot"
//   $ai_bot_provider: "OpenAI"
//   $ai_bot_category: "indexing"

Standalone Classification

AiBotClassification result = AiBotClassifier.classify(
    "Mozilla/5.0 (compatible; ClaudeBot/1.0; +https://claudebot.ai)"
);

if (result.isAiBot()) {
    System.out.println(result.getBotName());    // "ClaudeBot"
    System.out.println(result.getProvider());   // "Anthropic"
    System.out.println(result.getCategory());   // "indexing"
}

Custom Bot Patterns

import java.util.regex.Pattern;

AiBotClassifier classifier = new AiBotClassifier.Builder()
    .addBot(new AiBotEntry(
        Pattern.compile("MyInternalBot/", Pattern.CASE_INSENSITIVE),
        "MyInternalBot", "Acme Corp", "agent", "Internal AI agent"
    ))
    .build();

// Custom bots are checked before built-in bots
AiBotClassification result = classifier.classifyUserAgent("MyInternalBot/2.0");
// result.getBotName() -> "MyInternalBot"

Full Tracking Flow

MessageBuilder base = new MessageBuilder("YOUR_TOKEN");
BotClassifyingMessageBuilder builder = new BotClassifyingMessageBuilder(base);
MixpanelAPI mixpanel = new MixpanelAPI();

// Build the event
JSONObject props = new JSONObject();
props.put("$user_agent", request.getHeader("User-Agent"));
props.put("path", request.getRequestURI());

JSONObject event = builder.event(userId, "page_view", props);

// Deliver to Mixpanel
mixpanel.sendMessage(event);

Files Added

  • src/main/java/com/mixpanel/mixpanelapi/AiBotClassification.java
  • src/main/java/com/mixpanel/mixpanelapi/AiBotClassifier.java
  • src/main/java/com/mixpanel/mixpanelapi/AiBotEntry.java
  • src/main/java/com/mixpanel/mixpanelapi/BotClassifyingMessageBuilder.java
  • src/test/java/com/mixpanel/mixpanelapi/AiBotClassifierTest.java
  • src/test/java/com/mixpanel/mixpanelapi/BotClassifyingMessageBuilderTest.java

Files Modified

  • None

Test Plan

  • All 12 AI bot user-agents correctly classified
  • Non-AI-bot user-agents return $is_ai_bot: false (Chrome, Googlebot, curl, etc.)
  • Empty string and null/nil inputs handled gracefully
  • Case-insensitive matching works
  • Custom bot patterns checked before built-in
  • Event properties preserved through enrichment
  • No regressions in existing test suite

Part of AI bot classification feature for Java SDK.
Part of AI bot classification feature for Java SDK.
Part of AI bot classification feature for Java SDK.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional AI-bot user-agent classification and event enrichment to the Mixpanel Java SDK via a BotClassifyingMessageBuilder decorator, plus unit tests for classification and enrichment behavior.

Changes:

  • Introduces an AI bot “database” (AiBotEntry) and classifier (AiBotClassifier) that returns an immutable result (AiBotClassification).
  • Adds BotClassifyingMessageBuilder wrapper to enrich event and importEvent properties with $is_ai_bot and related $ai_bot_* fields based on $user_agent.
  • Adds focused JUnit tests for classifier behavior and message enrichment/passthrough behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/main/java/com/mixpanel/mixpanelapi/AiBotClassification.java Immutable classification result model returned by the classifier.
src/main/java/com/mixpanel/mixpanelapi/AiBotClassifier.java Default bot database + classification logic + builder for custom bot patterns.
src/main/java/com/mixpanel/mixpanelapi/AiBotEntry.java Immutable DB entry mapping regex patterns to bot metadata.
src/main/java/com/mixpanel/mixpanelapi/BotClassifyingMessageBuilder.java Decorator around MessageBuilder that enriches event/import properties based on $user_agent.
src/test/java/com/mixpanel/mixpanelapi/AiBotClassifierTest.java Tests for default classification, negative cases, case-insensitivity, and custom pattern priority.
src/test/java/com/mixpanel/mixpanelapi/BotClassifyingMessageBuilderTest.java Tests for enrichment, passthrough behavior, preservation, and end-to-end delivery serialization.

- Add null guard in AiBotEntry.matches()
- Validate null elements in Builder.addBots()
- Remove unused ArrayList import
- Fix getBotDatabase() Javadoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants