bmc_hub/docs/EMAIL_RULES_VS_WORKFLOWS_ANALYSIS.md
Christian 3fb43783a6 feat: Implement Email Workflow System with comprehensive documentation and migration scripts
- Added Email Workflow System with automated actions based on email classification.
- Created database schema with tables for workflows, executions, and actions.
- Developed API endpoints for CRUD operations on workflows and execution history.
- Included pre-configured workflows for invoice processing, time confirmation, and bankruptcy alerts.
- Introduced user guide and workflow system improvements for better usability.
- Implemented backup system for automated backup jobs and notifications.
- Established email activity log to track all actions and events related to emails.
2025-12-15 12:28:12 +01:00

9.5 KiB
Raw Permalink Blame History

Email Rules vs Workflows - Analyse

🔍 Oversigt

BMC Hub har 2 systemer til automatisk email-behandling:

  1. Email Rules (legacy) - email_rules tabel
  2. Email Workflows (nyere) - email_workflows tabel

⚙️ Hvordan Fungerer De?

Email Processing Flow

📧 Ny Email Modtaget
    ↓
1⃣ Save email til database (email_messages)
    ↓
2⃣ Classify med AI/simple classifier
    ↓  (classification + confidence_score gemmes)
    ↓
3⃣ Execute WORKFLOWS først 🆕
    ├─ Finder workflows med matching classification
    ├─ Tjekker confidence_threshold
    ├─ Checker sender/subject patterns (regex)
    ├─ Executer workflow steps i rækkefølge
    └─ Stopper hvis stop_on_match=true
    ↓
4⃣ Match RULES bagefter (legacy) 🕰️
    ├─ Finder rules med matching conditions
    ├─ Tjekker sender, domain, classification, subject
    ├─ Executer rule action (kun 1 action per rule)
    └─ Stopper efter første match

🆚 Forskelle

Feature Email Rules (Legacy) Email Workflows (Ny)
Fleksibilitet Enkelt action per rule Multiple steps per workflow
Priority Ja (priority field) Ja (priority field)
Stop on match Implicit (første match vinder) Explicit (stop_on_match flag)
Pattern matching Basic (exact match, contains) Advanced (regex patterns)
Confidence check Nej Ja (confidence_threshold)
Execution tracking Nej Ja (email_workflow_executions)
Statistics Ja (match_count) Ja (execution_count, success/failure)
Actions 5 types 10+ types
Database table email_rules email_workflows
Enabled by EMAIL_RULES_ENABLED EMAIL_WORKFLOWS_ENABLED
Auto-execute EMAIL_RULES_AUTO_PROCESS Altid (hvis enabled)

⚠️ PROBLEM: Duplikering og Konflikter

1. Begge Kan Køre Samtidigt

Scenarie:

Email: Faktura fra leverandør@example.com
Classification: invoice, confidence: 0.95

WORKFLOW matches:
  - "Invoice Processing Workflow"
    → Steps: link_to_vendor, extract_invoice_data, mark_as_processed
    → Executes first! ✅

RULE matches:
  - "Link Supplier Emails" 
    → Action: link_supplier
    → Executes after! ⚠️

RESULTAT: link_to_vendor køres 2 gange!

2. Ingen Koordination

Workflows ved ikke om rules har kørt (eller omvendt).

Problem:

  • Email kan markeres som "processed" af workflow
  • Rule prøver stadig at køre action bagefter
  • Resultatet logges 2 steder (workflow_executions + rule match_count)

3. Overlappende Actions

Samme funktionalitet i begge systemer:

Action Type Rule Name Workflow Action
Link vendor link_supplier link_to_vendor
Link customer link_customer link_to_customer
Mark spam mark_spam (mangler)
Link case link_case create_ticket
Invoice extraction (mangler) extract_invoice_data

4. Auto-Process Flag Virker Ikke for Workflows

I koden:

# Rules respekterer auto-process flag
if self.auto_process:
    await self._execute_rule_action(email_data, rule)
else:
    logger.info(f"⏭️ Auto-process disabled - rule action not executed")

# Workflows kører ALTID hvis enabled=true
workflow_result = await email_workflow_service.execute_workflows(email_data)

Problem: Man kan ikke disable workflow auto-execution uden at disable hele workflow systemet.

Hvad Virker Godt

1. Workflows Er Mere Kraftfulde

  • Multi-step execution
  • Better tracking (execution history)
  • Regex pattern matching
  • Confidence threshold check
  • Success/failure statistics

2. Rules Er Simplere

  • God til simple hvis-så logik
  • Lettere at forstå for non-technical brugere
  • Fungerer fint for basic email routing

3. Begge Har Priority Ordering

  • Workflows executes i priority order
  • Rules matches i priority order
  • Første match kan stoppe kæden (hvis configured)

🐛 Konkrete Bugs Fundet

Bug #1: Workflow Executes ALTID

Kode: email_processor_service.py line 77-79

# Step 4: Execute workflows based on classification
workflow_result = await email_workflow_service.execute_workflows(email_data)

Problem: Ingen check af EMAIL_RULES_AUTO_PROCESS eller lignende flag.

Fix:

if settings.EMAIL_WORKFLOWS_ENABLED:
    workflow_result = await email_workflow_service.execute_workflows(email_data)

Bug #2: Rules Kører Efter Workflows

Kode: email_processor_service.py line 84-88

# Step 5: Match against rules (legacy support)
if self.rules_enabled:
    matched = await self._match_rules(email_data)

Problem: Hvis workflow allerede har processed emailen, skal rule ikke køre.

Fix:

# Step 5: Match against rules (legacy support) - skip if already processed by workflow
if self.rules_enabled and not email_data.get('_workflow_processed'):
    matched = await self._match_rules(email_data)

Bug #3: Manglende Deduplication

Problem: Samme action kan executes af både workflow og rule.

Fix: Add check i rule execution:

# Check if email already processed by workflow
already_processed = execute_query(
    "SELECT id FROM email_workflow_executions WHERE email_id = %s AND status = 'completed'",
    (email_id,), fetchone=True
)
if already_processed:
    logger.info(f"⏭️ Email already processed by workflow, skipping rule")
    return False

Bug #4: extract_invoice_data Workflow Action Kan Fejle Stille

Kode: email_workflow_service.py line 380+

if not file_path.exists():
    # No error raised! Just continues...

Problem: Hvis PDF fil ikke findes, fejler workflow ikke - den fortsætter bare.

Fix: Raise exception:

if not file_path.exists():
    raise FileNotFoundError(f"Attachment file not found: {attachment_path}")

💡 Anbefalinger

Anbefaling #1: Vælg ÉT System

Option A: Deprecate Rules (anbefalet)

  • Workflows er mere kraftfulde
  • Better tracking og debugging
  • Fremtidssikret arkitektur

Migration plan:

  1. Opret workflows der matcher alle aktive rules
  2. Disable rules (set enabled=false)
  3. Test workflows grundigt
  4. Fjern rule execution fra processor

Option B: Keep Both, Men Koordinér

  • Add _workflow_processed flag til email_data
  • Skip rules hvis workflow har kørt
  • Document clearly når man skal bruge rules vs workflows

Anbefaling #2: Tilføj Workflow Auto-Process Flag

Tilføj til email_workflows tabel:

ALTER TABLE email_workflows ADD COLUMN auto_execute BOOLEAN DEFAULT true;

Check flag før execution:

if workflow.get('auto_execute', True):
    result = await self._execute_workflow(workflow, email_data)

Anbefaling #3: Unified Action Registry

Opret fælles action handlers:

# shared/email_actions.py
class EmailActions:
    @staticmethod
    async def link_to_vendor(email_id, vendor_id):
        # Single implementation used by both rules and workflows
        ...

Anbefaling #4: Better Conflict Detection

Add admin UI warning:

# Check for overlapping rules and workflows
def check_conflicts():
    conflicts = []
    for rule in active_rules:
        for workflow in active_workflows:
            if might_conflict(rule, workflow):
                conflicts.append({
                    'rule': rule['name'],
                    'workflow': workflow['name'],
                    'reason': 'Both match same classification'
                })
    return conflicts

Anbefaling #5: Execution Log Consolidation

Single view af alle actions:

CREATE VIEW email_action_log AS
SELECT 
    'workflow' as source,
    e.email_id,
    w.name as action_name,
    e.status,
    e.started_at
FROM email_workflow_executions e
JOIN email_workflows w ON w.id = e.workflow_id
UNION ALL
SELECT 
    'rule' as source,
    em.id as email_id,
    er.name as action_name,
    CASE WHEN em.auto_processed THEN 'completed' ELSE 'skipped' END as status,
    em.updated_at as started_at
FROM email_messages em
JOIN email_rules er ON er.id = em.rule_id
WHERE em.rule_id IS NOT NULL
ORDER BY started_at DESC;

🎯 Action Plan

Umiddelbart (Kritisk):

  1. Add EMAIL_WORKFLOWS_ENABLED check før workflow execution
  2. Add workflow-processed check før rule matching
  3. Fix extract_invoice_data silent failure
  4. Add duplicate action detection

Kort Sigt:

  1. Add auto_execute column til workflows tabel
  2. Create unified action handlers
  3. Add conflict detection admin tool
  4. Document clearly hvornår man skal bruge hvad

Lang Sigt:

  1. Decide: Deprecate rules eller keep both?
  2. Migrate existing rules til workflows (hvis deprecating)
  3. Create unified execution log view
  4. Add UI for viewing all email actions i ét dashboard

📊 Hvad Skal Du Gøre Nu?

Spørgsmål til dig:

  1. Vil du beholde begge systemer eller kun workflows?

    • Hvis kun workflows: Vi kan migrate rules → workflows nu
    • Hvis begge: Vi skal fixe koordineringen
  2. Skal workflows kunne disables uden at slukke helt for systemet?

    • Ja → Vi tilføjer auto_execute flag
    • Nej → Workflows kører altid når enabled=true
  3. Er der aktive rules i produktion lige nu?

    • Ja → Vi skal være forsigtige med ændringer
    • Nej → Vi kan bare disable rule system

Quick Fix (5 min): Jeg kan tilføje de 4 kritiske fixes nu hvis du vil fortsætte med begge systemer.

Long Fix (1 time): Jeg kan deprecate rules og migrate til workflows hvis du vil simplificere.

Hvad foretrækker du? 🤔