Skip to content

Async support, built-in polling helper & webhook callbacks for document processing #122

@shashank-518

Description

@shashank-518

🐛 Problem Statement

I am integrating the PageIndex Python SDK into a production application and have successfully submitted documents using submit_document(). However, when polling for status via:

status = pi_client.get_document(doc_id)["status"]

The status remains "processing" for an extended period. The SDK provides no built-in mechanism to wait for completion — the only current option is a blocking while loop with time.sleep(), which is unsuitable for production-grade async backends.


💡 Proposed Improvements

1. ⚡ Async / Await Support

Provide an async-native client so status checks are non-blocking and integrate cleanly with FastAPI, asyncio, and other modern frameworks:

# Non-blocking, compatible with FastAPI / asyncio
status = await pi_client.get_document(doc_id)

2. 🔁 Built-in wait_until_completed() Helper

A high-level helper that handles polling internally, with configurable timeout and retry interval:

result = pi_client.wait_until_completed(
    doc_id,
    timeout=300,       # seconds
    poll_interval=5    # seconds
)

3. 🔔 Webhook / Callback Support

Supporting webhook notifications when processing completes would eliminate polling entirely — ideal for event-driven architectures:

pi_client.submit_document(
    file_path="report.pdf",
    webhook_url="https://myapp.com/callbacks/pageindex"
)
# PageIndex POSTs to the URL when processing completes
# Payload: { "doc_id": "...", "status": "completed" }

✅ Expected Benefit

These improvements would significantly enhance usability for production backend systems — especially those built on async frameworks. They would:

  • Reduce boilerplate polling code
  • Prevent resource waste from tight polling loops
  • Align the SDK with modern Python ecosystem expectations

Is there already a recommended best practice for handling long-running document processing tasks with the current SDK? Happy to contribute a PR if the team is open to it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions