A FastAPI service that masks PII/PCI from support emails and classifies them into one of four categories: Incident, Request, Change, or Problem.
Input Email → PII Masker (Regex + spaCy NER) → TF-IDF + LinearSVC Classifier → JSON Response
.
├── main.py # FastAPI app entry point & /classify router
├── models.py # ML model training and inference logic
├── utils.py # PII/PCI masking utilities
├── train_model.py # Script to train and save the model
├── email_classifier.joblib # Serialised trained model (generated by train_model.py)
├── combined_emails_with_natural_pii.csv # Training dataset
├── requirements.txt
└── README.md
| Entity | Method | Example |
|---|---|---|
full_name |
Regex trigger phrases + spaCy NER | "My name is John Doe" |
email |
Regex | user@example.com |
phone_number |
Regex | +1-555-123-4567 |
dob |
Regex | 15/06/1990 |
aadhar_num |
Regex | 1234 5678 9012 |
credit_debit_no |
Regex | 4111 1111 1111 1111 |
cvv_no |
Regex | CVV: 123 → masks 123 |
expiry_no |
Regex | 12/25 |
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_smpython train_model.py
# Optionally specify paths:
python train_model.py --csv combined_emails_with_natural_pii.csv --model email_classifier.joblibuvicorn main:app --host 0.0.0.0 --port 7860Request:
{
"input_email_body": "Subject: Server Down\n\nMy name is John Smith. The server is unreachable. You can reach me at john@company.com."
}Response:
{
"input_email_body": "Subject: Server Down\n\nMy name is John Smith...",
"list_of_masked_entities": [
{
"position": [33, 43],
"classification": "full_name",
"entity": "John Smith"
},
{
"position": [89, 105],
"classification": "email",
"entity": "john@company.com"
}
],
"masked_email": "Subject: Server Down\n\nMy name is [full_name]. The server is unreachable. You can reach me at [email].",
"category_of_the_email": "Incident"
}Returns {"status": "ok"}.
Trained on ~24,000 multilingual support emails. Test set (15%):
| Class | Precision | Recall | F1 |
|---|---|---|---|
| Change | 0.94 | 0.89 | 0.92 |
| Incident | 0.73 | 0.86 | 0.79 |
| Problem | 0.62 | 0.43 | 0.51 |
| Request | 0.94 | 0.94 | 0.94 |
| Accuracy | 0.79 |
Deploy as a Docker Space (no Gradio/Streamlit frontend):
POST https://<username>-<space-name>.hf.space/classify
The email_classifier.joblib model file must be committed to the Space repository.