HS Code Classifier

ML-powered Harmonized System (HS) code classification from goods descriptions. Multi-level hierarchical classifier predicting chapter (2-digit), heading (4-digit), and subheading (6-digit) codes. Supports multi-language input (EN/RU/AZ) with TF-IDF + gradient boosting and optional transformer embeddings.

Overview

The HS Code Classifier automates the assignment of Harmonized System codes to trade goods descriptions. It is designed for customs brokers, trade compliance teams, and logistics platforms that need fast, accurate tariff classification.

Key capabilities:

Hierarchical classification: Predicts codes at chapter (2-digit), heading (4-digit), and subheading (6-digit) levels in a top-down cascade.
Multi-language support: Processes descriptions in English, Russian, and Azerbaijani with language-aware preprocessing.
Confidence scoring: Returns calibrated confidence scores at each classification level.
Batch processing: Classify single items or thousands of descriptions in one call.
REST API: Production-ready FastAPI service with health checks and OpenAPI docs.

Model Architecture

Input Description (EN/RU/AZ)
        |
   Preprocessor
   (lowercasing, stopwords, abbreviation expansion, normalization)
        |
   Feature Extraction
   (TF-IDF + character n-grams + keyword/unit detection)
        |
   +-----------------------+
   | Chapter Classifier    |  (2-digit, LightGBM)
   +-----------------------+
        |
   +-----------------------+
   | Heading Classifier    |  (4-digit, LightGBM, conditioned on chapter)
   +-----------------------+
        |
   +-----------------------+
   | Subheading Classifier |  (6-digit, LightGBM, conditioned on heading)
   +-----------------------+
        |
   Output: { chapter, heading, subheading, confidences }

Each level is a separate LightGBM classifier. The heading classifier receives the predicted chapter as an additional feature, and the subheading classifier receives the predicted heading. This hierarchical cascade enforces consistency across levels.

Optional transformer embeddings (e.g., multilingual BERT) can be concatenated with TF-IDF features for improved accuracy on ambiguous descriptions.

Performance Metrics

Evaluated on a held-out test set of trade declarations:

Level	Accuracy	Top-3 Accuracy	F1 (macro)
Chapter	92.4%	97.8%	0.91
Heading	85.1%	94.2%	0.83
Subheading	78.6%	91.5%	0.76

Installation

# Clone the repository
git clone https://github.com/shahinhasanov/hs-code-classifier.git
cd hs-code-classifier

# Install dependencies
make install

# Or manually
pip install -r requirements.txt
pip install -e .

Requirements: Python 3.9+

Quick Start

from classifier.model import HSClassifier

# Load a trained model
model = HSClassifier.load("models/hs_classifier.pkl")

# Classify a goods description
result = model.predict("fresh atlantic salmon fillets, frozen, 10 kg boxes")
print(result)
# {
#     "chapter": {"code": "03", "description": "Fish and crustaceans", "confidence": 0.96},
#     "heading": {"code": "0304", "description": "Fish fillets", "confidence": 0.91},
#     "subheading": {"code": "030414", "description": "Frozen fillets of salmon", "confidence": 0.87}
# }

API Usage

Start the server

make serve
# or
uvicorn classifier.api:app --host 0.0.0.0 --port 8000

Endpoints

POST /classify

Classify a single goods description.

curl -X POST http://localhost:8000/classify \
  -H "Content-Type: application/json" \
  -d '{"description": "polyethylene plastic bags", "language": "en"}'

Response:

{
  "chapter": {"code": "39", "description": "Plastics and articles thereof", "confidence": 0.94},
  "heading": {"code": "3923", "description": "Articles for conveyance or packing", "confidence": 0.88},
  "subheading": {"code": "392321", "description": "Sacks and bags of polymers of ethylene", "confidence": 0.82}
}

POST /classify/batch

Classify multiple descriptions in a single request.

curl -X POST http://localhost:8000/classify/batch \
  -H "Content-Type: application/json" \
  -d '{"items": [{"description": "cotton t-shirts"}, {"description": "steel bolts M10"}]}'

POST /suggest

Get top-K candidate codes with confidence scores.

curl -X POST http://localhost:8000/suggest \
  -H "Content-Type: application/json" \
  -d '{"description": "wooden furniture", "top_k": 5}'

GET /health

Health check endpoint.

curl http://localhost:8000/health

Training

# Train with default configuration
make train

# Or run the training script directly
python -m classifier.training --config config/model_config.yaml --data data/training_data.csv

Training configuration is managed via config/model_config.yaml. See the file for all available hyperparameters.

Project Structure

hs-code-classifier/
|-- src/
|   |-- classifier/
|       |-- __init__.py
|       |-- model.py            # Hierarchical classifier
|       |-- features.py         # Feature extraction (TF-IDF, n-grams)
|       |-- preprocessor.py     # Text preprocessing
|       |-- hierarchy.py        # HS code hierarchy management
|       |-- training.py         # Training pipeline
|       |-- api.py              # FastAPI endpoints
|       |-- schemas.py          # Pydantic schemas
|-- tests/
|   |-- test_model.py
|   |-- test_preprocessor.py
|   |-- test_hierarchy.py
|   |-- test_features.py
|-- data/
|   |-- hs_chapters.json        # HS chapter codes
|-- config/
|   |-- model_config.yaml       # Model configuration
|-- models/                     # Trained model artifacts
|-- requirements.txt
|-- setup.py
|-- Makefile
|-- Dockerfile
|-- LICENSE

Development

# Run tests
make test

# Run linter
make lint

# Clean build artifacts
make clean

License

MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HS Code Classifier

Table of Contents

Overview

Model Architecture

Performance Metrics

Installation

Quick Start

API Usage

Start the server

Endpoints

POST /classify

POST /classify/batch

POST /suggest

GET /health

Training

Project Structure

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
config		config
data		data
src/classifier		src/classifier
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

HS Code Classifier

Table of Contents

Overview

Model Architecture

Performance Metrics

Installation

Quick Start

API Usage

Start the server

Endpoints

POST /classify

POST /classify/batch

POST /suggest

GET /health

Training

Project Structure

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages