Conversation
| @@ -0,0 +1,87 @@ | |||
| name: scientific_obfuscation_converter | |||
There was a problem hiding this comment.
pr title says "translation" - can we standardize to one of them?
|
|
||
| ## Mode-specific guidelines: | ||
|
|
||
| {% if mode == "academic" %} |
There was a problem hiding this comment.
I am confused. This includes only a specific section depending on the mode BUT at the end there's a combined mode. How will it know all the modes if we exclude most of them? Examples below also include all of them.
There was a problem hiding this comment.
Also not sure if I understand the combined mode. In the class it's explicitly listed as a mode, but here it's a catchall, so "foobar" would resolve to a combined prompt. I feel like it would be better to just drop "combined" and refer to the default/wildcard as a combined mode
| Raises: | ||
| ValueError: If an invalid mode is provided. | ||
| """ | ||
| valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined") |
There was a problem hiding this comment.
is there an easier way to check for this given that it's a literal that's defined above
There was a problem hiding this comment.
I suggested one below by just attaching the valid modes to the class itself, but it's a nit so feel free to disregard
There was a problem hiding this comment.
I suppose you could have both.
from typing import Literal, get_args
ObfuscationMode = Literal[
"academic", "technical", "smiles", "research", "reaction", "combined"
]
OBFUSCATION_MODES = set(get_args(ObfuscationMode))
def is_valid_mode(value: str) -> bool:
return value in OBFUSCATION_MODES
There was a problem hiding this comment.
Pull request overview
Adds a new LLM-based prompt converter that rewrites prompts into “scientific/technical” phrasing across multiple modes, along with the seed prompt template, documentation wiring, and unit tests.
Changes:
- Introduces
ScientificObfuscationConverter(mode-driven) backed by a new YAML seed prompt template. - Exposes the converter via
pyrit.prompt_converterexports and API docs. - Adds unit tests and an example usage snippet in the converters documentation notebook.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
pyrit/prompt_converter/scientific_obfuscation_converter.py |
Implements the new LLM-based converter and identifier construction. |
pyrit/datasets/prompt_converters/scientific_obfuscation_converter.yaml |
Adds the mode-parameterized system prompt template used by the converter. |
pyrit/prompt_converter/__init__.py |
Exports the new converter from the prompt_converter package. |
tests/unit/converter/test_scientific_obfuscation_converter.py |
Adds unit tests validating initialization, mode validation, and conversion behavior. |
doc/code/converters/1_text_to_text_converters.py |
Documents example usage of the new converter in the text-to-text converters notebook source. |
doc/code/converters/1_text_to_text_converters.ipynb |
Adds the corresponding notebook cell content for the new converter example. |
doc/api.rst |
Adds the converter to the API reference list. |
Comments suppressed due to low confidence (2)
pyrit/prompt_converter/scientific_obfuscation_converter.py:23
- The PR title/description refer to a "Scientific Translation Converter", but the implementation and dataset are named "ScientificObfuscationConverter" / "scientific_obfuscation_converter". If this is intended to be a translation-style converter, consider aligning the naming (or update the PR description) to avoid confusion for API consumers and documentation readers.
class ScientificObfuscationConverter(LLMGenericTextConverter):
"""
Uses an LLM to transform simple or direct prompts into
pyrit/prompt_converter/scientific_obfuscation_converter.py:67
valid_modesduplicates the allowed values already defined inObfuscationMode. To avoid the tuple and the type alias drifting out of sync, derive the runtime list from the type (e.g.,typing.get_args(ObfuscationMode)) or centralize the allowed modes as a single constant reused for both validation and typing.
valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
if mode not in valid_modes:
raise ValueError(f"Invalid mode '{mode}'. Must be one of: {valid_modes}")
|
|
||
| system_arg = mock_target.set_system_prompt.call_args[1]["system_prompt"] | ||
| assert isinstance(system_arg, str) | ||
| assert len(system_arg) > 0 |
There was a problem hiding this comment.
The test only asserts the system prompt is non-empty, which won't catch regressions where the wrong mode/template branch is rendered. Consider asserting for a technical-mode specific marker from the template (e.g., the rendered "Technical Mode" heading or another unique phrase).
| assert len(system_arg) > 0 | |
| assert "technical" in system_arg.lower() |
|
|
||
| system_arg = mock_target.set_system_prompt.call_args[1]["system_prompt"] | ||
| assert isinstance(system_arg, str) | ||
| assert len(system_arg) > 0 |
There was a problem hiding this comment.
The test only asserts the system prompt is non-empty, which won't catch regressions where the converter renders the wrong mode branch. Consider asserting for a combined-mode specific marker from the template (e.g., the rendered "Combined Mode" heading or another unique phrase).
| assert len(system_arg) > 0 | |
| assert "combined" in system_arg.lower() |
|
|
||
| system_arg = mock_target.set_system_prompt.call_args[1]["system_prompt"] | ||
| assert isinstance(system_arg, str) | ||
| assert "academic" in system_arg.lower() or len(system_arg) > 0 |
There was a problem hiding this comment.
This assertion is effectively only checking that the system prompt is non-empty: "academic" in system_arg.lower() or len(system_arg) > 0 will pass for any non-empty prompt even if the mode-specific rendering is wrong. Consider asserting for an academic-mode specific marker from the template (e.g., the rendered "Academic Mode" section) so the test will fail if the wrong mode is used.
| assert "academic" in system_arg.lower() or len(system_arg) > 0 | |
| assert "academic mode" in system_arg.lower() |
| - ``combined``: Use all techniques together | ||
|
|
||
| """ | ||
|
|
There was a problem hiding this comment.
| MODES: Tuple[str] = ("academic", "technical", "smiles", "research", "reaction", "combined") | |
| ValueError: If an invalid mode is provided. | ||
| """ | ||
| valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined") | ||
| if mode not in valid_modes: |
There was a problem hiding this comment.
| if mode not in valid_modes: | |
| if mode not in self.MODES: |
|
|
||
| import logging | ||
| import pathlib | ||
| from typing import Literal, Optional |
There was a problem hiding this comment.
| from typing import Literal, Optional | |
| from typing import Optional, Tuple |
| logger = logging.getLogger(__name__) | ||
|
|
||
| # Supported obfuscation modes | ||
| ObfuscationMode = Literal["academic", "technical", "smiles", "research", "reaction", "combined"] |
There was a problem hiding this comment.
| ObfuscationMode = Literal["academic", "technical", "smiles", "research", "reaction", "combined"] |
| self, | ||
| *, | ||
| converter_target: PromptChatTarget = REQUIRED_VALUE, # type: ignore[assignment] | ||
| mode: ObfuscationMode = "combined", |
There was a problem hiding this comment.
| mode: ObfuscationMode = "combined", | |
| mode: str = "combined", |
| Args: | ||
| converter_target (PromptChatTarget): The LLM target to perform the conversion. | ||
| Can be omitted if a default has been configured via PyRIT initialization. | ||
| mode (ObfuscationMode): The obfuscation mode to use. Options are: |
There was a problem hiding this comment.
| mode (ObfuscationMode): The obfuscation mode to use. Options are: | |
| mode (str): The obfuscation mode to use. Options are: |
| Raises: | ||
| ValueError: If an invalid mode is provided. | ||
| """ | ||
| valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined") |
There was a problem hiding this comment.
| valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined") |
| """ | ||
| valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined") | ||
| if mode not in valid_modes: | ||
| raise ValueError(f"Invalid mode '{mode}'. Must be one of: {valid_modes}") |
There was a problem hiding this comment.
| raise ValueError(f"Invalid mode '{mode}'. Must be one of: {valid_modes}") | |
| raise ValueError(f"Invalid mode '{mode}'. Must be one of: {self.MODES}") |
| Raises: | ||
| ValueError: If an invalid mode is provided. | ||
| """ | ||
| valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined") |
There was a problem hiding this comment.
I suggested one below by just attaching the valid modes to the class itself, but it's a nit so feel free to disregard
|
|
||
| ## Mode-specific guidelines: | ||
|
|
||
| {% if mode == "academic" %} |
There was a problem hiding this comment.
Also not sure if I understand the combined mode. In the class it's explicitly listed as a mode, but here it's a catchall, so "foobar" would resolve to a combined prompt. I feel like it would be better to just drop "combined" and refer to the default/wildcard as a combined mode
| Supports multiple modes: | ||
| - ``academic``: Frame as scholarly, homework style questions | ||
| - ``technical``: Use precise technical jargon and nomenclature | ||
| - ``smiles``: Use SMILES chemical notation and IUPAC names |
There was a problem hiding this comment.
do you have links / more information on what SMILES and IUPAC are ? maybe it's more common knowledge than I think but it might be helpful to have this listed
Description
Adding scientific translation converter to translate queries into various "scientific" modes
Tests and Documentation
Added unit tests and added converter into converters notebook for text->text using LLMs