ML-powered Harmonized System (HS) code classification from goods descriptions. Multi-level hierarchical classifier predicting chapter (2-digit), heading (4-digit), and subheading (6-digit) codes. Supports multi-language input (EN/RU/AZ) with TF-IDF + gradient boosting and optional transformer embeddings.
- Overview
- Model Architecture
- Performance Metrics
- Installation
- Quick Start
- API Usage
- Training
- Project Structure
- Development
- License
The HS Code Classifier automates the assignment of Harmonized System codes to trade goods descriptions. It is designed for customs brokers, trade compliance teams, and logistics platforms that need fast, accurate tariff classification.
Key capabilities:
- Hierarchical classification: Predicts codes at chapter (2-digit), heading (4-digit), and subheading (6-digit) levels in a top-down cascade.
- Multi-language support: Processes descriptions in English, Russian, and Azerbaijani with language-aware preprocessing.
- Confidence scoring: Returns calibrated confidence scores at each classification level.
- Batch processing: Classify single items or thousands of descriptions in one call.
- REST API: Production-ready FastAPI service with health checks and OpenAPI docs.
Input Description (EN/RU/AZ)
|
Preprocessor
(lowercasing, stopwords, abbreviation expansion, normalization)
|
Feature Extraction
(TF-IDF + character n-grams + keyword/unit detection)
|
+-----------------------+
| Chapter Classifier | (2-digit, LightGBM)
+-----------------------+
|
+-----------------------+
| Heading Classifier | (4-digit, LightGBM, conditioned on chapter)
+-----------------------+
|
+-----------------------+
| Subheading Classifier | (6-digit, LightGBM, conditioned on heading)
+-----------------------+
|
Output: { chapter, heading, subheading, confidences }
Each level is a separate LightGBM classifier. The heading classifier receives the predicted chapter as an additional feature, and the subheading classifier receives the predicted heading. This hierarchical cascade enforces consistency across levels.
Optional transformer embeddings (e.g., multilingual BERT) can be concatenated with TF-IDF features for improved accuracy on ambiguous descriptions.
Evaluated on a held-out test set of trade declarations:
| Level | Accuracy | Top-3 Accuracy | F1 (macro) |
|---|---|---|---|
| Chapter | 92.4% | 97.8% | 0.91 |
| Heading | 85.1% | 94.2% | 0.83 |
| Subheading | 78.6% | 91.5% | 0.76 |
# Clone the repository
git clone https://github.com/shahinhasanov/hs-code-classifier.git
cd hs-code-classifier
# Install dependencies
make install
# Or manually
pip install -r requirements.txt
pip install -e .Requirements: Python 3.9+
from classifier.model import HSClassifier
# Load a trained model
model = HSClassifier.load("models/hs_classifier.pkl")
# Classify a goods description
result = model.predict("fresh atlantic salmon fillets, frozen, 10 kg boxes")
print(result)
# {
# "chapter": {"code": "03", "description": "Fish and crustaceans", "confidence": 0.96},
# "heading": {"code": "0304", "description": "Fish fillets", "confidence": 0.91},
# "subheading": {"code": "030414", "description": "Frozen fillets of salmon", "confidence": 0.87}
# }make serve
# or
uvicorn classifier.api:app --host 0.0.0.0 --port 8000Classify a single goods description.
curl -X POST http://localhost:8000/classify \
-H "Content-Type: application/json" \
-d '{"description": "polyethylene plastic bags", "language": "en"}'Response:
{
"chapter": {"code": "39", "description": "Plastics and articles thereof", "confidence": 0.94},
"heading": {"code": "3923", "description": "Articles for conveyance or packing", "confidence": 0.88},
"subheading": {"code": "392321", "description": "Sacks and bags of polymers of ethylene", "confidence": 0.82}
}Classify multiple descriptions in a single request.
curl -X POST http://localhost:8000/classify/batch \
-H "Content-Type: application/json" \
-d '{"items": [{"description": "cotton t-shirts"}, {"description": "steel bolts M10"}]}'Get top-K candidate codes with confidence scores.
curl -X POST http://localhost:8000/suggest \
-H "Content-Type: application/json" \
-d '{"description": "wooden furniture", "top_k": 5}'Health check endpoint.
curl http://localhost:8000/health# Train with default configuration
make train
# Or run the training script directly
python -m classifier.training --config config/model_config.yaml --data data/training_data.csvTraining configuration is managed via config/model_config.yaml. See the file for all available hyperparameters.
hs-code-classifier/
|-- src/
| |-- classifier/
| |-- __init__.py
| |-- model.py # Hierarchical classifier
| |-- features.py # Feature extraction (TF-IDF, n-grams)
| |-- preprocessor.py # Text preprocessing
| |-- hierarchy.py # HS code hierarchy management
| |-- training.py # Training pipeline
| |-- api.py # FastAPI endpoints
| |-- schemas.py # Pydantic schemas
|-- tests/
| |-- test_model.py
| |-- test_preprocessor.py
| |-- test_hierarchy.py
| |-- test_features.py
|-- data/
| |-- hs_chapters.json # HS chapter codes
|-- config/
| |-- model_config.yaml # Model configuration
|-- models/ # Trained model artifacts
|-- requirements.txt
|-- setup.py
|-- Makefile
|-- Dockerfile
|-- LICENSE
# Run tests
make test
# Run linter
make lint
# Clean build artifacts
make cleanMIT License. See LICENSE for details.
Copyright (c) 2022 Shahin Hasanov