Conversation
…not be treated as dependable software.
hlbmtc
left a comment
There was a problem hiding this comment.
Looking great! A couple of small nits
| @@ -0,0 +1,189 @@ | |||
| """Main content panel showing resolution status and live agent feed.""" | |||
|
|
|||
| from __future__ import annotations | |||
There was a problem hiding this comment.
Let's remove this. Current python version already supports annotations out of the box
| if isinstance(question, BinaryQuestion): | ||
| return await self._resolve_binary(question) | ||
| else: | ||
| return NotImplemented |
There was a problem hiding this comment.
I think we should change it to raise to explicitly trigger exceptions
| ) -> Optional[BinaryResolution]: | ||
|
|
||
| # Rephrase question if its time context has passed | ||
| question = await self._rephrase_question_if_needed(question) |
There was a problem hiding this comment.
Interesting. Isn’t Sonnet good enough to compose search queries without this extra step?
| ) | ||
| searcher = AskNewsSearcher() | ||
| if cutoff_date is not None: | ||
| return await searcher.get_formatted_news_before_date_async( |
There was a problem hiding this comment.
Quick question: did you test this with a cutoff_date older than 48h? I remember AskNews had a bug where it returned [] when using historical=True together with end_timestamp. I might be mistaken, but I recall seeing something like that before
| self.binary_theshold = binary_threshold | ||
| self.mc_threshold = mc_threshold | ||
|
|
||
| @abstractmethod |
There was a problem hiding this comment.
Nit: remove this. resolve_question is actually a concrete subclass' implementation here
| Supports concurrent resolution of questions when assessing resolvers. | ||
| """ | ||
|
|
||
| def __init__(self, resolver: AutoResolver, allowed_types: list[QuestionBasicType], questions: list[int | str] = [], tournaments: list[int | str] = [], max_concurrency: int = 3): |
There was a problem hiding this comment.
Do we flag uncertain cases for human review? Do you think we should also add certainty level for calibration purposes?
|
|
||
| 1. **Current Status**: What is the current state of affairs related to this question? | ||
| 2. **Resolution Criteria**: Have the resolution criteria been met? | ||
| 3. **Timeline Check**: Consider the scheduled resolution date and current date |
There was a problem hiding this comment.
Should we also pass the current date into the prompt, just in case?
| model_for_resolver: str = "openrouter/anthropic/claude-sonnet-4.6", | ||
| model_for_output_structure: str = "openrouter/anthropic/claude-sonnet-4.6", | ||
| model_for_researcher: str = "openrouter/anthropic/claude-sonnet-4.6", | ||
| model_for_rephraser: str = "openrouter/anthropic/claude-sonnet-4.6", |
There was a problem hiding this comment.
A small nit: Maybe we can switch it to something lighter e.g haiku?
|
|
||
| # Handoff | ||
|
|
||
| When you've gathered sufficient information, hand off to the resolver |
There was a problem hiding this comment.
Hm, I think handoff is not defined for the research agent
Summary
Initial implementation of auto-resolver, along with a basic tui for interacting with the resolution output. Resolver uses agents sdk and follows the process:
What works well:
What needs work:
Supporting evidence
The following image shows result of running on a random 60 questions from the fall aib tournament.
The following image shows results of running on all present questions in spring aib tournament. Note that ~67 questions were marked as not yet resolvable automatic due to the resolution date not having passed.
The following images depicts two instances where the auto-resolver picked up on an event that is not yet reflected on Metaculus spring aib.
(Backing validation)