Skip to content

derekvan/Readwise-recommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Readwise Recommendation Engine

A standalone Node.js recommendation system that surfaces relevant articles from your Readwise queue based on your reading interests.

Overview

This system analyzes your reading interests and generates daily HTML recommendations from your Readwise "Later" queue and tagged collections. It scores articles based on thematic keyword matching, filters out previously recommended items, and hosts the results as a bookmarkable webpage.

Claude Code is optional β€” it can help create your interest profile by analyzing your highlights, but you can also create the profile manually.

Prerequisites

Setup

1. Clone and Install

git clone https://github.com/yourusername/Readwise-recommender.git
cd Readwise-recommender
npm install

2. Set Your Readwise API Token

export READWISE_TOKEN="your-api-token-here"

Add this to your ~/.bashrc or ~/.zshrc to make it permanent.

3. Create Your Interest Profile

Your interest profile defines the topics you care about. There are two ways to create it:

Option A: Use Claude Code (Recommended)

Claude Code can analyze your Readwise highlights and automatically generate a high-quality interest profile. This approach identifies thematic patterns across thousands of highlights, picking keywords and weights you might not think of.

Step 1: Fetch your highlights

Fetch all my Readwise highlights using readwise_export_highlights
and save to data/highlights_raw.json

Step 2: Generate interest profile

Analyze all highlights in data/highlights_raw.json. Identify 25 thematic clusters
representing my core interests. For each cluster, provide: theme name, 5-10 keywords,
and calculate weight based on how many highlights match. Save to data/interest_profile.json.

Claude Code will produce a JSON file with your personalized reading themes.

Option B: Create Manually

Create data/interest_profile.json with this structure:

{
  "version": "1.0",
  "metadata": {
    "created_at": "2026-01-29T10:00:00Z",
    "cluster_count": 15
  },
  "clusters": [
    {
      "id": "cluster_001",
      "theme": "Decision Making & Cognitive Frameworks",
      "keywords": ["decision-making", "intuition", "biases", "rationality", "thinking"],
      "weight": 0.18,
      "description": "Frameworks for making better decisions and understanding cognitive biases"
    },
    {
      "id": "cluster_002",
      "theme": "Writing & Communication",
      "keywords": ["writing", "storytelling", "editing", "clarity", "communication"],
      "weight": 0.12,
      "description": "Techniques for clear, effective writing and communication"
    },
    {
      "id": "cluster_003",
      "theme": "Productivity & Focus",
      "keywords": ["productivity", "focus", "deep work", "distraction", "habits"],
      "weight": 0.15,
      "description": "Systems for managing attention and building effective habits"
    }
  ]
}

Field guide:

  • theme: A topic you care about (e.g., "Decision Making")
  • keywords: Words that signal this topic in article titles/summaries (lowercase, 5-10 per theme)
  • weight: Importance of this theme (0.0-1.0). Higher = more likely to surface articles on this topic. Weights don't need to sum to 1.0.
  • description: Human-readable explanation of what this cluster represents

Tips:

  • Start with 10-25 themes that represent your core reading interests
  • Think about what words appear in article titles/summaries for each topic
  • Higher weights prioritize that theme in scoring
  • You can refine your profile over time based on which recommendations feel right

4. Configure Settings

Copy the example settings file:

cp utils/settings.example.json utils/settings.json

Edit utils/settings.json to customize:

{
  "tagBucket": {
    "enabled": true,
    "tag": "bankruptcy2025-12",
    "label": "Bankruptcy Readings",
    "emoji": "πŸ’Ό",
    "cooldownMonths": 6,
    "count": 3
  },
  "laterBucket": {
    "label": "Top Picks",
    "emoji": "✨",
    "cooldownMonths": 2,
    "count": 2
  },
  "batchCount": 3,
  "recommendationsPerBucket": 5,
  "excludedCategories": ["pdf", "epub"],
  "upload": {
    "method": "local"
  }
}

See Configuration Reference below for all options.

5. Fetch Initial Data

Fetch documents from your "Later" queue:

node utils/fetch_later_incremental.js
node utils/merge_chunks.js
node utils/build_scored_cache.js

(Optional) Fetch from a tagged collection:

node utils/fetch_tag_incremental.js "bankruptcy2025-12"
node utils/merge_tag_chunks.js "bankruptcy2025-12"
node utils/score_tag_documents.js "bankruptcy2025-12"

6. Generate Your First Recommendations

node daily_recs.js

This generates an HTML file at output/recommendations/YYYY-MM-DD.html and prints the file path.

Daily Usage

Once setup is complete, run one command each day:

node daily_recs.js

What this does:

  • Generates HTML recommendations from cached, scored data
  • Filters out previously recommended items
  • Creates multiple batches of recommendations (default: 3 batches with 2 "Later" + 3 tagged items each)
  • Shows batch 1 by default with a "Get More Recommendations" button to reveal additional batches
  • Uploads based on your configured method (or saves locally)
  • Prints the URL or file path

Time: < 10 seconds

The generated page includes:

  • Multiple batches of recommendations with progressive disclosure
  • Scores, matched themes, and reading time estimates
  • Archive buttons to remove articles from your queue without reading
  • Mobile-friendly, bookmarkable design

Refreshing Cached Data

The system caches scored documents to keep daily runs fast. Refresh when you've added new articles:

Refresh "Later" Documents:

node utils/fetch_later_incremental.js
node utils/merge_chunks.js
node utils/build_scored_cache.js

Refresh Tagged Collection:

node utils/fetch_tag_incremental.js "your-tag-name"
node utils/merge_tag_chunks.js "your-tag-name"
node utils/score_tag_documents.js "your-tag-name"

Recommendation: Refresh weekly, or whenever you've added 50+ new articles.

Configuration Reference

Edit utils/settings.json to customize behavior:

{
  "tagBucket": {
    "enabled": true,              // Enable/disable tag-based recommendations
    "tag": "bankruptcy2025-12",   // Readwise tag to track
    "label": "Bankruptcy Readings", // Display name in HTML
    "emoji": "πŸ’Ό",                // Section emoji
    "cooldownMonths": 6,          // Months before re-recommending same item
    "count": 3                    // Items per batch from this bucket (default: 3)
  },
  "laterBucket": {
    "label": "Top Picks",         // Display name for "Later" recommendations (default: "Top Picks")
    "emoji": "✨",
    "cooldownMonths": 2,
    "count": 2                    // Items per batch from this bucket (default: 2)
  },
  "batchCount": 3,                // Number of batches to generate (default: 3)
  "recommendationsPerBucket": 5,  // Fallback count if bucket.count not specified
  "excludedCategories": ["pdf", "epub"], // Document types to skip
  "scoring": {
    "strongestThemeMultiplier": 2,  // Boost for strongest theme match
    "rawScoreMultiplier": 6,        // Overall score scaling
    "maxMatchedClusters": 5         // Max themes shown per document
  },
  "cache": {
    "refreshIntervalDays": 7,     // Days between full cache refreshes
    "maxIncrementalUpdates": 10   // Max incremental updates before full refresh
  },
  "upload": {
    "method": "local",            // local, github-pages, or scp
    "remoteFilename": "recs.html",
    "urlPath": "/recommendations/"
  }
}

Note: utils/settings.json is gitignored β€” your settings stay private.

Hosting Options

Choose how to view your daily recommendations:

Option 1: Local File (Easiest)

Open the HTML file directly in your browser:

open output/recommendations/$(date +%Y-%m-%d).html

Pros: Zero setup, complete privacy, works offline Cons: Not bookmarkable, need to find file each day

Set "upload.method": "local" in settings.json.

Option 2: GitHub Pages (Free Bookmarkable URL)

Host on GitHub Pages for a consistent, bookmarkable URL.

Initial Setup:

  1. Create a public GitHub repository (e.g., my-recommendations)

  2. Clone it locally and create the docs folder:

    git clone https://github.com/yourusername/my-recommendations.git
    cd my-recommendations
    mkdir docs
    echo "# My Recommendations" > docs/README.md
    git add docs/ && git commit -m "Initialize docs" && git push
    
  3. Enable GitHub Pages:

    • Go to repository Settings β†’ Pages
    • Source: Deploy from a branch
    • Branch: main β†’ /docs folder β†’ Save
    • Your site: https://yourusername.github.io/my-recommendations/
  4. Configure settings.json:

    {
      "upload": {
        "method": "github-pages",
        "pagesRepoPath": "/absolute/path/to/my-recommendations",
        "pagesUrl": "https://yourusername.github.io/my-recommendations/"
      }
    }
    

Daily Publishing:

node daily_recs.js automatically copies the HTML to docs/index.html, commits, and pushes.

Pros: Free hosting, bookmarkable, archive buttons work from anywhere Cons: Repository must be public (article titles visible), requires git push per update

Security Note: HTML contains article titles and reading times. If this is sensitive, use local hosting instead.

Option 3: Personal Web Server (SCP)

Upload to your own server via SCP.

Setup:

  1. Configure SCP in utils/config.json:

    {
      "sshHost": "your.server.com",
      "sshUser": "username",
      "remotePath": "/path/to/webroot/",
      "remoteFilename": "recs.html",
      "urlPath": "/recommendations/"
    }
    
  2. Set upload method in utils/settings.json:

    {
      "upload": {
        "method": "scp"
      }
    }
    
  3. Ensure SSH key authentication is configured (no password prompts).

Pros: Complete privacy, custom domain, full control Cons: Requires web server with SSH access

Option 4: No Upload (Local Archive Only)

Generate HTML without uploading:

{
  "upload": {
    "method": "local"
  }
}

node daily_recs.js prints the local file path without uploading.

How Scoring Works

Hybrid Scoring Algorithm

For each document:

  1. Extract text: title + summary + author (lowercased)
  2. Match keywords: For each theme, count keyword matches
  3. Calculate theme contribution: (matches / total_keywords) Γ— theme_weight
  4. Apply hybrid scoring: (strongest_theme Γ— 2) + sum_of_other_contributions
  5. Scale to 1-10: Multiply by 6 and cap at 10

Why hybrid? This prioritizes depth (strong single-theme match) over breadth (many weak matches).

Example:

  • Document A: Strong "Meditation" match (0.045) + weak matches β†’ Score: 2.5
  • Document B: Many weak matches across 5 themes β†’ Score: 2.0

Document A wins because it strongly matches a high-weight theme, even though Document B matches more themes weakly.

Automating with Cron

Run recommendations automatically each morning using cron.

Setup Instructions

  1. Find your node path:

    which node
    # Example output: /usr/local/bin/node or /opt/homebrew/bin/node
    
  2. Get your project's absolute path:

    cd /path/to/Readwise-recommender
    pwd
    # Example output: /Users/yourusername/Code/Readwise-recommender
    
  3. Set READWISE_TOKEN permanently:

    Add to your shell profile (~/.bashrc, ~/.zshrc, or ~/.bash_profile):

    export READWISE_TOKEN="your-api-token-here"
    

    Then reload:

    source ~/.zshrc  # or ~/.bashrc
    
  4. Edit crontab:

    crontab -e
    
  5. Add cron job:

    For 6:00 AM daily (replace paths with your actual paths):

    0 6 * * * cd /Users/yourusername/Code/Readwise-recommender && /usr/local/bin/node daily_recs.js >> /tmp/daily_recs.log 2>&1
    

    Cron time format: minute hour day month weekday

    • 0 6 * * * = 6:00 AM daily
    • 0 9 * * * = 9:00 AM daily
    • 30 7 * * * = 7:30 AM daily
  6. Verify cron is running:

    crontab -l  # List your cron jobs
    
  7. Check logs:

    tail -f /tmp/daily_recs.log  # Watch output in real-time
    cat /tmp/daily_recs.log      # View full log
    

Common Issues

Cron job not running?

  • Verify cron service is running: sudo launchctl list | grep cron (macOS)
  • Check system logs: grep CRON /var/log/syslog (Linux)
  • Ensure full paths are used (no ~ shorthand)

Environment variables not available?

  • Cron runs with minimal environment. Export READWISE_TOKEN in your shell profile as shown above
  • Test the exact cron command manually: cd /path && /usr/local/bin/node daily_recs.js

Wrong node version?

  • Cron may use system node instead of nvm/asdf node
  • Use full path from which node to ensure correct version

Alternative: Using launchd (macOS)

For more reliable scheduling on macOS, consider using launchd instead of cron. Create ~/Library/LaunchAgents/com.user.readwise-recs.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.user.readwise-recs</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/node</string>
        <string>/Users/yourusername/Code/Readwise-recommender/daily_recs.js</string>
    </array>
    <key>EnvironmentVariables</key>
    <dict>
        <key>READWISE_TOKEN</key>
        <string>your-api-token-here</string>
    </dict>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>6</integer>
        <key>Minute</key>
        <integer>0</integer>
    </dict>
    <key>StandardOutPath</key>
    <string>/tmp/readwise-recs.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/readwise-recs-error.log</string>
</dict>
</plist>

Load it:

launchctl load ~/Library/LaunchAgents/com.user.readwise-recs.plist

Check status:

launchctl list | grep readwise-recs

Data Schemas

Interest Profile (data/interest_profile.json)

{
  "version": "1.0",
  "metadata": {
    "created_at": "2026-01-29T10:00:00Z",
    "cluster_count": 25
  },
  "clusters": [
    {
      "id": "cluster_001",
      "theme": "Decision Making & Cognitive Frameworks",
      "keywords": ["decision-making", "intuition", "fear", "reason", "biases"],
      "weight": 0.15,
      "description": "Frameworks for making better decisions..."
    }
  ]
}

Recommendation Log (data/recommendation_log.json)

{
  "version": "1.0",
  "recommendations": [
    {
      "date": "2026-01-29",
      "bucket_1_later": [
        {
          "document_id": "01kdnmypxhaphr6be3vv7xkgn4",
          "title": "How Can I Stop Being So Afraid of Changing My Life?",
          "score": 10.0,
          "themes": ["Decision Making", "Overcoming Fear"]
        }
      ],
      "bucket_2_bankruptcy": [...],
      "bonus_doc_ids": ["01xyz...", "01abc..."],
      "bonus_buckets": {
        "bucket_1_later": ["01xyz..."],
        "bucket_2_bankruptcy": ["01abc..."]
      }
    }
  ],
  "recommended_doc_ids": ["01kdnmypxhaphr6be3vv7xkgn4", ...]
}

Troubleshooting

No high-scoring recommendations?

  • Check your interest profile (data/interest_profile.json) β€” ensure keywords match how articles describe topics
  • Lower the threshold or refine keyword lists based on article titles in your queue

Recommendations not aligned with interests?

  • Review cluster weights in data/interest_profile.json
  • Add missing keywords that appear in article titles you're interested in
  • If using Claude Code, regenerate the profile after accumulating new highlights

Want more variety?

  • Increase batchCount in settings.json to generate more batches (e.g., change from 3 to 5)
  • Adjust laterBucket.count and tagBucket.count for more items per batch
  • Use the "Get More Recommendations" button to reveal additional batches beyond the first set

Archive buttons not working?

  • Archive buttons require READWISE_TOKEN in browser localStorage
  • Open browser console and run: localStorage.setItem('READWISE_TOKEN', 'your-token-here')
  • This is per-origin (GitHub Pages users need to set it once per browser)

Data fetch failing?

Project Structure

Readwise-recommender/
β”œβ”€β”€ data/                          # Your data (gitignored)
β”‚   β”œβ”€β”€ highlights_raw.json       # Cached highlights (optional, for profile generation)
β”‚   β”œβ”€β”€ interest_profile.json     # Your thematic clusters
β”‚   β”œβ”€β”€ recommendation_log.json   # Tracking previously recommended items
β”‚   β”œβ”€β”€ documents_later.json      # Scored candidates from "Later" queue
β”‚   └── documents_bankruptcy.json # Scored candidates from bankruptcy tags
β”‚
β”œβ”€β”€ output/
β”‚   └── recommendations/           # Daily HTML files (local archive)
β”‚       └── YYYY-MM-DD.html
β”‚
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ fetch_later_incremental.js    # Fetch "Later" documents
β”‚   β”œβ”€β”€ fetch_tag_incremental.js      # Fetch tagged documents
β”‚   β”œβ”€β”€ merge_chunks.js               # Merge fetched data
β”‚   β”œβ”€β”€ build_scored_cache.js         # Score documents
β”‚   β”œβ”€β”€ upload_recommendations.js     # SCP upload helper
β”‚   β”œβ”€β”€ upload_github_pages.js        # GitHub Pages upload helper
β”‚   β”œβ”€β”€ settings.example.json         # Configuration template
β”‚   β”œβ”€β”€ settings.json                 # Your settings (gitignored)
β”‚   β”œβ”€β”€ config.json.template          # SCP config template
β”‚   └── config.json                   # SCP credentials (gitignored)
β”‚
β”œβ”€β”€ daily_recs.js                      # Main daily script
β”œβ”€β”€ scoring_engine.js                  # Scoring algorithm
└── README.md                          # This file

Future Enhancements

  • Feedback Loop: Track read vs. skipped items and adjust theme weights automatically
  • Diversity Scoring: Ensure recommendations span multiple themes by default
  • Engagement Analysis: "What themes did I engage with most this month?"
  • Conversational Refinement: Natural language profile updates ("I want fewer philosophy, more practical advice")
  • Weekly Summaries: Reading pattern analysis and theme trends

Questions or feedback? Open an issue at https://github.com/yourusername/Readwise-recommender/issues

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors