Skip to content

Improve node summary prompt and add timestamped output filenames#124

Open
kasatgauravrisa wants to merge 1 commit intoVectifyAI:mainfrom
kasatgauravrisa:improve-node-summary-and-timestamped-output
Open

Improve node summary prompt and add timestamped output filenames#124
kasatgauravrisa wants to merge 1 commit intoVectifyAI:mainfrom
kasatgauravrisa:improve-node-summary-and-timestamped-output

Conversation

@kasatgauravrisa
Copy link

Summary

  • Improved node summary generation: Updated the prompt in generate_node_summary to include the section title and instruct the LLM to focus only on that specific section's content, rather than treating the entire page text generically. This produces more accurate, scoped summaries.
  • Timestamped output filenames: Added a timestamp (YYYYMMDD_HHMMSS) to the output JSON filename in run_pageindex.py, so multiple runs on the same PDF don't overwrite previous results.

Motivation

The original summary prompt passed the full page text without any section context, leading to summaries that could drift to cover content from other sections on the same page. By anchoring the prompt to the node's title, the LLM generates more relevant descriptions.

Timestamped filenames are a simple quality-of-life improvement for users who iterate on the same document.

Files Changed

  • pageindex/utils.py — Updated prompt in generate_node_summary()
  • run_pageindex.py — Added datetime-based suffix to output filename

Made with Cursor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant