Skip to content

ThunderShadows/Email-Classification-For-Support-Team-Using-PII-Masking

Repository files navigation

Email Classification & PII Masking API

A FastAPI service that masks PII/PCI from support emails and classifies them into one of four categories: Incident, Request, Change, or Problem.

Architecture

Input Email → PII Masker (Regex + spaCy NER) → TF-IDF + LinearSVC Classifier → JSON Response

Project Structure

.
├── main.py                          # FastAPI app entry point & /classify router
├── models.py                        # ML model training and inference logic
├── utils.py                         # PII/PCI masking utilities
├── train_model.py                   # Script to train and save the model
├── email_classifier.joblib          # Serialised trained model (generated by train_model.py)
├── combined_emails_with_natural_pii.csv  # Training dataset
├── requirements.txt
└── README.md

PII Entities Detected

Entity Method Example
full_name Regex trigger phrases + spaCy NER "My name is John Doe"
email Regex user@example.com
phone_number Regex +1-555-123-4567
dob Regex 15/06/1990
aadhar_num Regex 1234 5678 9012
credit_debit_no Regex 4111 1111 1111 1111
cvv_no Regex CVV: 123 → masks 123
expiry_no Regex 12/25

Setup & Installation

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm

Training the Model

python train_model.py
# Optionally specify paths:
python train_model.py --csv combined_emails_with_natural_pii.csv --model email_classifier.joblib

Running the API Locally

uvicorn main:app --host 0.0.0.0 --port 7860

API Usage

POST /classify

Request:

{
  "input_email_body": "Subject: Server Down\n\nMy name is John Smith. The server is unreachable. You can reach me at john@company.com."
}

Response:

{
  "input_email_body": "Subject: Server Down\n\nMy name is John Smith...",
  "list_of_masked_entities": [
    {
      "position": [33, 43],
      "classification": "full_name",
      "entity": "John Smith"
    },
    {
      "position": [89, 105],
      "classification": "email",
      "entity": "john@company.com"
    }
  ],
  "masked_email": "Subject: Server Down\n\nMy name is [full_name]. The server is unreachable. You can reach me at [email].",
  "category_of_the_email": "Incident"
}

GET /health

Returns {"status": "ok"}.

Model Performance

Trained on ~24,000 multilingual support emails. Test set (15%):

Class Precision Recall F1
Change 0.94 0.89 0.92
Incident 0.73 0.86 0.79
Problem 0.62 0.43 0.51
Request 0.94 0.94 0.94
Accuracy 0.79

Hugging Face Spaces Deployment

Deploy as a Docker Space (no Gradio/Streamlit frontend):

POST https://<username>-<space-name>.hf.space/classify

The email_classifier.joblib model file must be committed to the Space repository.

About

A system which categorizes incoming support emails into predefined categories while ensuring that personal information (PII) is masked before processing. After classification, the masked data will be restored to its original form.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors