Table of contents
The gRPC Python client for Techmo ASR Service.
For project details, its structure, and functionality, head to the documentation.
The project can be used as-is and does not require any additional setup.
For basic development use, consider convenient ./setup.sh.
- Python >=3.8
- uv (install:
curl -LsSf https://astral.sh/uv/install.sh | sh) - PortAudio 19.6.0 (required by the PyAudio Python package)
It is the duty of the build configuration to clone all the necessary submodules. However, it sometimes fails, for example, when building a Docker image from an uninitialized repository. In that case, the solution is to download the missing dependencies manually.
Example:
git submodule update --init --depth 1 submodules/asr-api-pythonDo not forget about the submodules of the submodules. Eventually, use the --recursive flag.
./install.shCreates a .venv virtualenv with uv and installs the package with its dependencies.
uv venv .venv
source .venv/bin/activate
uv pip install .If installation fails, the troubleshooting section of the documentation may be helpful.
Performs speech recognition on an ASR Service instance.
asr_client [-h, --help] [-v, --version] [OPTIONS]... [-s, ]--service-address ADDRESS [-m, ]--audio-mic --audio-stream-chunk-duration ARG
asr_client [-h, --help] [-v, --version] [OPTIONS]... [-s, ]--service-address ADDRESS [-a, ]--audio-paths PATH...
Examples:
- perform speech recognition on an audio stream coming from a file
python -m asr_client -s 0.0.0.0:30384 -a ./audio.wav- perform speech recognition on an audio stream coming from a microphone in 200-milliseconds chunks
python -m asr_client -s 0.0.0.0:30384 -m --audio-stream-chunk-duration 200- perform speech recognition on an audio stream coming from a file on a zipformer model named
my_zipformer_modelwith decoder.criterion-type set to S2S and extractor.sampling-frequency set to 16000:
python -m asr_client -s 0.0.0.0:30384 -a ./audio.wav --speech-model my_zipformer_model --decoder.criterion-type S2S --extractor.sampling-frequency 16000- prepend 150 ms of silence to the audio before sending (useful when speech starts immediately at the beginning of the file and the voice activity detector needs a moment to prime):
python -m asr_client -s 0.0.0.0:30384 -a ./audio.wav --audio-prepend-silence 150- append 300 ms of trailing silence (useful to ensure the detector finalises the last utterance):
python -m asr_client -s 0.0.0.0:30384 -a ./audio.wav --audio-append-silence 300For some more usage scenarios, head to the documentation.