diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..dea56fc --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,182 @@ +# AGENTS.md — asr-client-python + +Guidance for AI coding agents working in this repository. + +--- + +## What this repo is + +A gRPC CLI client and Python library for the Techmo ASR (Automatic Speech Recognition) Service. Entry point: `asr_client.__main__:main` (installed as `asr-client`). The gRPC stubs come from a Git submodule (`submodules/asr-api-python`) and are generated at install time — they are never committed. + +--- + +## Setup + +Run once after cloning: + +```bash +./setup.sh # initialise submodules +./install.sh # create .venv and install package + test deps +``` + +`install.sh` requires system packages to build `pyaudio` — it will tell you exactly what's missing: + +```bash +apt install python3-dev portaudio19-dev build-essential +``` + +If `from asr_api import ...` fails, the submodule is not initialised. Run `./setup.sh` then `./install.sh`. + +--- + +## Running tests + +```bash +# Fast — no live service required +pytest tests/ -k "not integration" + +# With coverage +pytest tests/ --cov=asr_client --cov-report=term-missing -k "not integration" + +# Full matrix (Python 3.8–3.14), requires tox-uv +tox + +# Single Python version +tox -e py311 + +# Integration tests — requires a live ASR service +pytest tests/test_integration.py --asr-service-address HOST:PORT +``` + +All tests except integration are self-contained. Integration tests auto-skip when `--asr-service-address` is absent. + +--- + +## Package layout + +``` +asr_client/ +├── __init__.py # create_grpc_channel(), create_grpc_channel_credentials(), +│ # _generate_request_with_traceback decorator +├── __main__.py # CLI: argument parsing, TLS setup, streaming dispatch +├── audio_processing.py # AudioFile, AudioFileStream, MicrophoneStream +├── v1.py # v1 and v1p1 API (shared implementation, stub selected by arg) +├── dictation.py # Legacy dictation API +└── VERSION.py # __version__ string +``` + +The `asr_api` package (gRPC stubs) is installed from `submodules/asr-api-python` under the name `techmo-asr-api`. It is not on PyPI. + +--- + +## Hard constraints + +**Never commit generated files.** `*_pb2.py` and `*_pb2_grpc.py` are produced by `grpc_tools.protoc` at install time. They live in the submodule's build output, not in this repo. + +**Do not add modules to `ignore_errors` in `pyproject.toml`.** All existing source modules have mypy errors suppressed (`ignore_errors = true`) as a legacy exception. New code must be written with full `mypy --strict` compliance. + +**Do not remove the `warnings.catch_warnings()` block in `audio_processing.py`.** It suppresses a `SyntaxWarning` from `pydub` on Python 3.14. Removing it breaks the import on that version. + +**Do not change gRPC metadata key names** without verifying server-side expectations: +- `v1` / `v1p1` API: session metadata key is `"session-id"` (hyphen) +- `dictation` API: session metadata key is `"session_id"` (underscore) + +**`grpc.RpcError` exits 0.** `main()` catches it, prints it, and returns normally. Do not write tests that assert `returncode != 0` to detect recognition failures — check stdout/stderr content instead. + +--- + +## Code style + +| Rule | Value | +|------|-------| +| Line length | **160** (not 88 or 79) | +| Linter | `ruff` — rules `E,W,F,I,S,UP,B`; ignores `S101`, `S603`, `S607` | +| Type checking | `mypy --strict` | +| Shell scripts | `shellcheck` + `shfmt` (4-space indent) | + +160 is the enforced line length. Standard PEP 8 limits do not apply here. + +--- + +## Key patterns + +### Generator → gRPC stub: always use `_generate_request_with_traceback` + +gRPC's C runtime silently swallows exceptions raised inside generator functions passed to a stub. Without this decorator the failure surfaces only as `StatusCode.UNKNOWN "Exception iterating requests!"` with no traceback. Apply it to every generator that feeds a stub: + +```python +from asr_client import _generate_request_with_traceback +from itertools import chain + +@_generate_request_with_traceback +def _generate_config_request(...): yield ... + +@_generate_request_with_traceback +def _generate_data_requests(audio_stream, ...): yield ... + +responses = stub.StreamingRecognize( + chain(_generate_config_request(...), _generate_data_requests(...)), + metadata=..., +) +``` + +### API version dispatch + +| `--api-version` | Module | Stub | Response key | +|----------------|--------|------|--------------| +| `v1p1` (default) | `v1.py` | `asr_api.v1p1.AsrStub` | `"result"` | +| `v1` | `v1.py` | `asr_api.v1.AsrStub` | `"result"` | +| `dictation` | `dictation.py` | `asr_api.dictation.SpeechStub` | `"results"` | + +`v1` and `v1p1` share one implementation file; the stub is selected by the `api_patch_version` argument (`None` → v1, `1` → v1p1). + +### Additional config key encoding + +`build_additional_config_specs_dict()` transforms kwarg names to server keys: +- `__` → `.` (double underscore → dot) +- `_` → `-` (single underscore → hyphen) + +New tuning CLI args must follow this convention: `--decoder.new-param` with `dest="decoder__new_param"`. Args with value `"NA"` are excluded. `--max-hypotheses-for-softmax` is always forwarded regardless. + +### CLI argument validators + +Defined as local functions inside `parse_args()`: `assure_int`, `positive_int`, `unsigned_int`, `non_empty_str`. The `Once` action prevents an argument from being repeated. Use these when adding new CLI arguments. + +### gRPC 4 MB limit + +The default max incoming message size is 4 MB. Sending a file larger than ~4 MB as a single chunk fails with `StatusCode.RESOURCE_EXHAUSTED`. Always use `--audio-stream-chunk-duration` for large files in tests and examples. + +--- + +## Adding things + +### New CLI flag +1. Add inside `parse_args()` in `__main__.py` using an existing validator or a new one following the same pattern. +2. Thread the value through to the request builder or config dict. +3. Add a test in `test_cli.py`. + +### New API version +1. Add `v2.py` following the pattern in `v1.py`. +2. Add the version string to `--api-version` choices. +3. Add `elif args.api_version == "v2":` in the dispatch in `main()` before `else: raise AssertionError`. +4. Add integration tests in `test_integration.py`. + +### Version bump +1. Edit `asr_client/VERSION.py`. +2. Update `CHANGELOG.md`. +3. Do not edit `pyproject.toml` — it reads the version dynamically from `VERSION.py`. + +--- + +## Test layout + +| File | What it tests | +|------|--------------| +| `test_cli.py` | Argument parsing, help text | +| `test_audio.py` | WAV loading, streaming (`wav_path` fixture generates silence) | +| `test_channel.py` | gRPC channel creation | +| `test_version.py` | Version string format | +| `test_integration.py` | Live ASR service (auto-skips without `--asr-service-address`) | +| `conftest.py` | Registers `--asr-service-address`; provides `wav_path` fixture | + +`asr_service_address` and `audio_wav` fixtures are defined in `test_integration.py`, not `conftest.py`. `audio_wav` requires `data/audio.wav` and auto-skips if absent. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..944fbbf --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,247 @@ +# CLAUDE.md — asr-client-python + +## Project Overview + +**asr-client-python** is a gRPC client library and CLI tool for the Techmo ASR (Automatic Speech Recognition) Service. It wraps the `asr-api-python` Python bindings (from `submodules/asr-api-python`) into a user-facing package with: +- A Python API (`asr_client` package) +- A CLI tool (`asr-client` entry point → `asr_client.__main__:main`) + +Current version: see `asr_client/VERSION.py` (`__version__` attribute). + +--- + +## Repository Setup (run once after cloning) + +```bash +./setup.sh # Sync git submodules +./install.sh # Create .venv + install package with test extras +``` + +- `setup.sh` — submodule sync/init only (`git submodule sync/update --init --recursive`). +- `install.sh` — creates `.venv` via `uv`, runs `uv pip install -e ".[test]"`. Accepts an optional `VENV_PATH` argument (default: `.venv`). Checks upfront for `uv`, `gcc`, Python headers, and PortAudio headers — fails with a clear message if any are missing. + +--- + +## Package Architecture + +``` +asr_client/ +├── __init__.py # Public API: create_grpc_channel(), create_grpc_channel_credentials(), +│ # and the _generate_request_with_traceback decorator +├── __main__.py # CLI entry point: argument parsing, TLS, streaming logic +├── audio_processing.py # AudioFile, AudioStream, AudioFileStream, MicrophoneStream +├── v1.py # v1 and v1p1 API implementation +├── dictation.py # Legacy dictation API implementation +└── VERSION.py # __version__ attribute +``` + +### API version map + +| `--api-version` value | Module | gRPC stub | Response key | +|----------------------|--------|-----------|--------------| +| `v1p1` (default) | `v1.py` | `asr_api.v1p1.AsrStub` | `"result"` | +| `v1` | `v1.py` | `asr_api.v1.AsrStub` | `"result"` | +| `dictation` | `dictation.py` | `asr_api.dictation.SpeechStub` | `"results"` (plural) | + +`v1` and `v1p1` share `v1.py`; the stub is selected via `api_patch_version` (`None` → v1, `1` → v1p1). + +**Session ID metadata key differs by API version:** `v1`/`v1p1` sends gRPC metadata key `"session-id"` (hyphen); `dictation` sends `"session_id"` (underscore). Do not change either without verifying the server-side expectation. + +--- + +## gRPC / Protobuf Rules + +- Generated protobuf files (`*_pb2.py`, `*_pb2_grpc.py`) come from `submodules/asr-api-python` and are produced at install time — never commit them. +- All gRPC stubs are imported via `from asr_api import v1, v1p1, dictation`. The `asr_api` package is installed from the submodule under the name `techmo-asr-api` (see `[tool.uv.sources]` in `pyproject.toml`), not from PyPI. +- If `asr_api` imports fail after cloning, run `./setup.sh` then `./install.sh`. +- `tox.ini` installs the submodule directly as a dep: `deps = -e {toxinidir}/submodules/asr-api-python`. + +--- + +## Key Submodule + +| Path | Purpose | +|------|---------| +| `submodules/asr-api-python/` | gRPC stubs for all ASR API versions (the `asr_api` package) | + +Run `./setup.sh` after any commit that changes `.gitmodules`. + +--- + +## System Prerequisites + +Before `./install.sh` can succeed on a fresh machine (Debian/Ubuntu): + +```bash +apt install python3-dev portaudio19-dev build-essential +``` + +`python3-dev` and `build-essential` are needed to compile `pyaudio`'s C extension; `portaudio19-dev` provides the PortAudio headers. `./install.sh` checks all three and fails with a clear message if any are missing. + +--- + +## Dependencies — Sharp Edges + +| Dependency | Why it matters | +|-----------|----------------| +| `audioop-lts` | `audioop` removed from stdlib in Python 3.13; conditional dep for `python_version >= '3.13'` | +| `pydub` | Emits `SyntaxWarning` on Python 3.14 during import; suppressed with `warnings.catch_warnings()` in `audio_processing.py` — do not remove or move that guard | +| `grpcio` | Pinned `<1.71.0` for Python 3.8 only; unpinned on 3.9+ | +| `techmo-asr-api` | Local package from submodule, not on PyPI | + +**gRPC 4 MB message limit:** Default max incoming message size is 4 MB. Sending a file larger than ~4 MB as a single chunk will fail with `StatusCode.RESOURCE_EXHAUSTED`. Use `--audio-stream-chunk-duration` for large files. + +--- + +## Testing + +### Running tests + +```bash +# Unit tests only (no service needed) +pytest tests/ -k "not integration" + +# With coverage +pytest tests/ --cov=asr_client --cov-report=term-missing -k "not integration" + +# Integration tests (requires a live ASR service) +pytest tests/test_integration.py --asr-service-address HOST:PORT + +# Full tox matrix (Python 3.8–3.14) +tox + +# Single version +tox -e py311 + +# Integration via tox +tox -e py311 -- tests/test_integration.py --asr-service-address HOST:PORT +``` + +### Test structure + +| File | Type | Notes | +|------|------|-------| +| `test_cli.py` | Unit | CLI argument parsing and help text | +| `test_audio.py` | Unit | WAV loading and streaming; uses `wav_path` fixture (generated silence) | +| `test_channel.py` | Unit | gRPC channel creation | +| `test_version.py` | Unit | Version format validation | +| `test_integration.py` | Integration | Live ASR service; auto-skipped if `--asr-service-address` absent | +| `conftest.py` | Config | Registers `--asr-service-address`; provides `wav_path` fixture | + +**Important:** The `asr_service_address` and `audio_wav` fixtures are defined inside `test_integration.py` itself — not in `conftest.py`. `audio_wav` expects `data/audio.wav` to exist; auto-skips if absent. + +--- + +## Code Patterns + +### `_generate_request_with_traceback` decorator + +Defined in `__init__.py`. gRPC's C runtime silently swallows Python exceptions raised inside generator functions passed to a stub — they surface as `StatusCode.UNKNOWN "Exception iterating requests!"` with no traceback. This decorator catches exceptions, prints the traceback via `traceback.print_exc()`, then re-raises. Apply it to every generator function that feeds a gRPC stub. + +Usage pattern (from `v1.py` and `dictation.py`): + +```python +from itertools import chain + +@_generate_request_with_traceback +def _generate_config_request(...) -> Iterator[...]: + yield ConfigRequest(...) + +@_generate_request_with_traceback +def _generate_data_requests(audio_stream, ...) -> Iterator[...]: + for chunk in audio_stream: + yield DataRequest(audio=chunk) + +responses = stub.StreamingRecognize( + chain(_generate_config_request(...), _generate_data_requests(audio_stream, ...)), + metadata=..., + timeout=..., +) +``` + +### TLS / channel creation + +`create_grpc_channel_credentials` takes **bytes**, not file paths: + +```python +# Insecure +with create_grpc_channel("host:port") as channel: ... + +# One-way TLS (system root CAs) +creds = create_grpc_channel_credentials() +with create_grpc_channel("host:port", credentials=creds) as channel: ... + +# mTLS +creds = create_grpc_channel_credentials( + tls_certificate_chain=Path("client.crt").read_bytes(), + tls_private_key=Path("client.key").read_bytes(), + tls_root_certificates=Path("ca.crt").read_bytes(), +) +``` + +### Audio classes + +- `AudioFile` — reads a WAV file via `pydub`, validates mono 16-bit PCM. Raises `AudioFileError` for format errors. +- `AudioFileStream(AudioStream)` — iterates `AudioFile` in chunks. `chunk_duration_ms` optional; omitting it sends the whole file as one chunk. +- `MicrophoneStream(AudioStream)` — PyAudio-based live capture. **Not a context manager** — cleanup via `__del__`. Requires `--audio-stream-chunk-duration`. + +**Constraint:** `--audio-stream-chunk-duration` is required when `--audio-mic` is used. Enforced by a post-parse check in `parse_args()` — not by argparse itself. + +### CLI argument validation + +Custom argparse validators are local functions inside `parse_args()` in `__main__.py`: `assure_int()`, `positive_int()`, `unsigned_int()`, `non_empty_str()`. The `Once` action (also local) prevents an argument from being specified more than once. Use these patterns when adding new arguments. + +### `build_additional_config_specs_dict` — key name transformation + +Transforms kwarg names into server config keys: +- `__` (double underscore) → `.` (dot) +- `_` (single underscore) → `-` (hyphen) + +Example: `decoder__beam_size` → `decoder.beam-size`. + +**Convention for new tuning args:** +- CLI flag: dotted notation with hyphens — `--decoder.new-param` +- `dest`: double underscores for dots, single underscores for hyphens — `dest="decoder__new_param"` + +Args whose value equals `"NA"` are excluded (that is the no-op default for optional `str` args). **Exception:** `--max-hypotheses-for-softmax` always forwarded (`default=10`, `type=unsigned_int`). + +**Known inconsistency:** `--decoder.beam-threshold` has `dest="decoder__beam_size_threshold"` (spurious "size" — acknowledged by a `TODO`). The kwarg passed to the builder is correct. Do not use this `dest` as a naming template. + +### `grpc.RpcError` exits 0 + +`main()` catches `grpc.RpcError`, prints it, and returns normally — process exits 0. **Do not write tests that assert `returncode != 0` to detect recognition failures.** Inspect stdout/stderr instead. + +--- + +## Code Style + +The codebase was written to these standards (even though pre-commit is not configured in this public repo): + +| Convention | Detail | +|-----------|--------| +| `ruff` | Rules `E,W,F,I,S,UP,B`; ignores `S101` (asserts ok), `S603`, `S607` | +| line length | **160** — not 88 or 79 | +| `mypy --strict` | Full strict; per-module relaxations in `pyproject.toml` | + +**The critical number is 160 — not 88.** Existing modules have `ignore_errors = true` in `pyproject.toml` for pre-existing issues. `asr_client/__init__.py` is actively checked. Write new code with full strict mypy compliance; do not add new modules to the `ignore_errors` list. + +--- + +## Common Tasks + +### Add a new CLI flag +1. Add the argument inside `parse_args()` in `__main__.py` in the appropriate group. +2. Use an existing local validator or write one following the same pattern. +3. Pass the value into the request builder or config dict. +4. Add a unit test in `test_cli.py`. + +### Add support for a new API version +1. Add `v2.py` following the pattern in `v1.py` — an `Asr` class with `streaming_recognize`. +2. Add the version string to `--api-version` choices in `__main__.py`. +3. Add `elif args.api_version == "v2":` in the dispatch inside `main()` (before `else: raise AssertionError`). +4. Add integration tests in `test_integration.py`. + +### Bump the version +1. Edit `asr_client/VERSION.py` (`__version__` string). +2. Update `CHANGELOG.md`. +3. `pyproject.toml` reads the version dynamically — do not edit it for version bumps. diff --git a/README.md b/README.md index 29154b6..a9afe90 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,14 @@ **Table of contents** -[[_ToC_]] +- [Overview](#overview) +- [Setup](#setup) + - [Requirements](#requirements) + - [Manual submodule update](#manual-submodule-update) +- [Install](#install) + - [Using the provided script](#using-the-provided-script) + - [Manual installation](#manual-installation) +- [Usage](#usage) + - [ASR Client](#asr-client) # ASR Client (Python) diff --git a/doc/DOCUMENTATION.md b/doc/DOCUMENTATION.md index b6fbb11..b6d13b7 100644 --- a/doc/DOCUMENTATION.md +++ b/doc/DOCUMENTATION.md @@ -40,7 +40,7 @@ However, with this approach the service treats each audio file as coming from a Example: -```bash, +```bash python -m asr_client -s 0.0.0.0:30384 -a ./audio/*.wav --session-id "$(whoami)'s session" ```