-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
🐛 Problem Statement
I am integrating the PageIndex Python SDK into a production application and have successfully submitted documents using submit_document(). However, when polling for status via:
status = pi_client.get_document(doc_id)["status"]The status remains "processing" for an extended period. The SDK provides no built-in mechanism to wait for completion — the only current option is a blocking while loop with time.sleep(), which is unsuitable for production-grade async backends.
💡 Proposed Improvements
1. ⚡ Async / Await Support
Provide an async-native client so status checks are non-blocking and integrate cleanly with FastAPI, asyncio, and other modern frameworks:
# Non-blocking, compatible with FastAPI / asyncio
status = await pi_client.get_document(doc_id)2. 🔁 Built-in wait_until_completed() Helper
A high-level helper that handles polling internally, with configurable timeout and retry interval:
result = pi_client.wait_until_completed(
doc_id,
timeout=300, # seconds
poll_interval=5 # seconds
)3. 🔔 Webhook / Callback Support
Supporting webhook notifications when processing completes would eliminate polling entirely — ideal for event-driven architectures:
pi_client.submit_document(
file_path="report.pdf",
webhook_url="https://myapp.com/callbacks/pageindex"
)
# PageIndex POSTs to the URL when processing completes
# Payload: { "doc_id": "...", "status": "completed" }✅ Expected Benefit
These improvements would significantly enhance usability for production backend systems — especially those built on async frameworks. They would:
- Reduce boilerplate polling code
- Prevent resource waste from tight polling loops
- Align the SDK with modern Python ecosystem expectations
Is there already a recommended best practice for handling long-running document processing tasks with the current SDK? Happy to contribute a PR if the team is open to it.