bmc_hub/docs/EMAIL_RULES_VS_WORKFLOWS_ANALYSIS.md
Christian 3fb43783a6 feat: Implement Email Workflow System with comprehensive documentation and migration scripts
- Added Email Workflow System with automated actions based on email classification.
- Created database schema with tables for workflows, executions, and actions.
- Developed API endpoints for CRUD operations on workflows and execution history.
- Included pre-configured workflows for invoice processing, time confirmation, and bankruptcy alerts.
- Introduced user guide and workflow system improvements for better usability.
- Implemented backup system for automated backup jobs and notifications.
- Established email activity log to track all actions and events related to emails.
2025-12-15 12:28:12 +01:00

317 lines
9.5 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Email Rules vs Workflows - Analyse
## 🔍 Oversigt
BMC Hub har **2 systemer** til automatisk email-behandling:
1. **Email Rules** (legacy) - `email_rules` tabel
2. **Email Workflows** (nyere) - `email_workflows` tabel
## ⚙️ Hvordan Fungerer De?
### Email Processing Flow
```
📧 Ny Email Modtaget
1⃣ Save email til database (email_messages)
2⃣ Classify med AI/simple classifier
↓ (classification + confidence_score gemmes)
3⃣ Execute WORKFLOWS først 🆕
├─ Finder workflows med matching classification
├─ Tjekker confidence_threshold
├─ Checker sender/subject patterns (regex)
├─ Executer workflow steps i rækkefølge
└─ Stopper hvis stop_on_match=true
4⃣ Match RULES bagefter (legacy) 🕰️
├─ Finder rules med matching conditions
├─ Tjekker sender, domain, classification, subject
├─ Executer rule action (kun 1 action per rule)
└─ Stopper efter første match
```
## 🆚 Forskelle
| Feature | Email Rules (Legacy) | Email Workflows (Ny) |
|---------|---------------------|---------------------|
| **Fleksibilitet** | Enkelt action per rule | Multiple steps per workflow |
| **Priority** | Ja (priority field) | Ja (priority field) |
| **Stop on match** | Implicit (første match vinder) | Explicit (stop_on_match flag) |
| **Pattern matching** | Basic (exact match, contains) | Advanced (regex patterns) |
| **Confidence check** | Nej | Ja (confidence_threshold) |
| **Execution tracking** | Nej | Ja (email_workflow_executions) |
| **Statistics** | Ja (match_count) | Ja (execution_count, success/failure) |
| **Actions** | 5 types | 10+ types |
| **Database table** | email_rules | email_workflows |
| **Enabled by** | EMAIL_RULES_ENABLED | EMAIL_WORKFLOWS_ENABLED |
| **Auto-execute** | EMAIL_RULES_AUTO_PROCESS | Altid (hvis enabled) |
## ⚠️ PROBLEM: Duplikering og Konflikter
### 1. Begge Kan Køre Samtidigt
**Scenarie:**
```
Email: Faktura fra leverandør@example.com
Classification: invoice, confidence: 0.95
WORKFLOW matches:
- "Invoice Processing Workflow"
→ Steps: link_to_vendor, extract_invoice_data, mark_as_processed
→ Executes first! ✅
RULE matches:
- "Link Supplier Emails"
→ Action: link_supplier
→ Executes after! ⚠️
RESULTAT: link_to_vendor køres 2 gange!
```
### 2. Ingen Koordination
Workflows ved ikke om rules har kørt (eller omvendt).
**Problem:**
- Email kan markeres som "processed" af workflow
- Rule prøver stadig at køre action bagefter
- Resultatet logges 2 steder (workflow_executions + rule match_count)
### 3. Overlappende Actions
**Samme funktionalitet i begge systemer:**
| Action Type | Rule Name | Workflow Action |
|-------------|-----------|----------------|
| Link vendor | `link_supplier` | `link_to_vendor` |
| Link customer | `link_customer` | `link_to_customer` |
| Mark spam | `mark_spam` | *(mangler)* |
| Link case | `link_case` | `create_ticket` |
| Invoice extraction | *(mangler)* | `extract_invoice_data` |
### 4. Auto-Process Flag Virker Ikke for Workflows
**I koden:**
```python
# Rules respekterer auto-process flag
if self.auto_process:
await self._execute_rule_action(email_data, rule)
else:
logger.info(f"⏭️ Auto-process disabled - rule action not executed")
# Workflows kører ALTID hvis enabled=true
workflow_result = await email_workflow_service.execute_workflows(email_data)
```
**Problem:** Man kan ikke disable workflow auto-execution uden at disable hele workflow systemet.
## ✅ Hvad Virker Godt
### 1. Workflows Er Mere Kraftfulde
- Multi-step execution
- Better tracking (execution history)
- Regex pattern matching
- Confidence threshold check
- Success/failure statistics
### 2. Rules Er Simplere
- God til simple hvis-så logik
- Lettere at forstå for non-technical brugere
- Fungerer fint for basic email routing
### 3. Begge Har Priority Ordering
- Workflows executes i priority order
- Rules matches i priority order
- Første match kan stoppe kæden (hvis configured)
## 🐛 Konkrete Bugs Fundet
### Bug #1: Workflow Executes ALTID
**Kode:** `email_processor_service.py` line 77-79
```python
# Step 4: Execute workflows based on classification
workflow_result = await email_workflow_service.execute_workflows(email_data)
```
**Problem:** Ingen check af `EMAIL_RULES_AUTO_PROCESS` eller lignende flag.
**Fix:**
```python
if settings.EMAIL_WORKFLOWS_ENABLED:
workflow_result = await email_workflow_service.execute_workflows(email_data)
```
### Bug #2: Rules Kører Efter Workflows
**Kode:** `email_processor_service.py` line 84-88
```python
# Step 5: Match against rules (legacy support)
if self.rules_enabled:
matched = await self._match_rules(email_data)
```
**Problem:** Hvis workflow allerede har processed emailen, skal rule ikke køre.
**Fix:**
```python
# Step 5: Match against rules (legacy support) - skip if already processed by workflow
if self.rules_enabled and not email_data.get('_workflow_processed'):
matched = await self._match_rules(email_data)
```
### Bug #3: Manglende Deduplication
**Problem:** Samme action kan executes af både workflow og rule.
**Fix:** Add check i rule execution:
```python
# Check if email already processed by workflow
already_processed = execute_query(
"SELECT id FROM email_workflow_executions WHERE email_id = %s AND status = 'completed'",
(email_id,), fetchone=True
)
if already_processed:
logger.info(f"⏭️ Email already processed by workflow, skipping rule")
return False
```
### Bug #4: `extract_invoice_data` Workflow Action Kan Fejle Stille
**Kode:** `email_workflow_service.py` line 380+
```python
if not file_path.exists():
# No error raised! Just continues...
```
**Problem:** Hvis PDF fil ikke findes, fejler workflow ikke - den fortsætter bare.
**Fix:** Raise exception:
```python
if not file_path.exists():
raise FileNotFoundError(f"Attachment file not found: {attachment_path}")
```
## 💡 Anbefalinger
### Anbefaling #1: Vælg ÉT System
**Option A: Deprecate Rules (anbefalet)**
- Workflows er mere kraftfulde
- Better tracking og debugging
- Fremtidssikret arkitektur
**Migration plan:**
1. Opret workflows der matcher alle aktive rules
2. Disable rules (set enabled=false)
3. Test workflows grundigt
4. Fjern rule execution fra processor
**Option B: Keep Both, Men Koordinér**
- Add `_workflow_processed` flag til email_data
- Skip rules hvis workflow har kørt
- Document clearly når man skal bruge rules vs workflows
### Anbefaling #2: Tilføj Workflow Auto-Process Flag
**Tilføj til `email_workflows` tabel:**
```sql
ALTER TABLE email_workflows ADD COLUMN auto_execute BOOLEAN DEFAULT true;
```
**Check flag før execution:**
```python
if workflow.get('auto_execute', True):
result = await self._execute_workflow(workflow, email_data)
```
### Anbefaling #3: Unified Action Registry
**Opret fælles action handlers:**
```python
# shared/email_actions.py
class EmailActions:
@staticmethod
async def link_to_vendor(email_id, vendor_id):
# Single implementation used by both rules and workflows
...
```
### Anbefaling #4: Better Conflict Detection
**Add admin UI warning:**
```python
# Check for overlapping rules and workflows
def check_conflicts():
conflicts = []
for rule in active_rules:
for workflow in active_workflows:
if might_conflict(rule, workflow):
conflicts.append({
'rule': rule['name'],
'workflow': workflow['name'],
'reason': 'Both match same classification'
})
return conflicts
```
### Anbefaling #5: Execution Log Consolidation
**Single view af alle actions:**
```sql
CREATE VIEW email_action_log AS
SELECT
'workflow' as source,
e.email_id,
w.name as action_name,
e.status,
e.started_at
FROM email_workflow_executions e
JOIN email_workflows w ON w.id = e.workflow_id
UNION ALL
SELECT
'rule' as source,
em.id as email_id,
er.name as action_name,
CASE WHEN em.auto_processed THEN 'completed' ELSE 'skipped' END as status,
em.updated_at as started_at
FROM email_messages em
JOIN email_rules er ON er.id = em.rule_id
WHERE em.rule_id IS NOT NULL
ORDER BY started_at DESC;
```
## 🎯 Action Plan
### Umiddelbart (Kritisk):
1. ✅ Add `EMAIL_WORKFLOWS_ENABLED` check før workflow execution
2. ✅ Add workflow-processed check før rule matching
3. ✅ Fix `extract_invoice_data` silent failure
4. ✅ Add duplicate action detection
### Kort Sigt:
5. Add `auto_execute` column til workflows tabel
6. Create unified action handlers
7. Add conflict detection admin tool
8. Document clearly hvornår man skal bruge hvad
### Lang Sigt:
9. Decide: Deprecate rules eller keep both?
10. Migrate existing rules til workflows (hvis deprecating)
11. Create unified execution log view
12. Add UI for viewing all email actions i ét dashboard
## 📊 Hvad Skal Du Gøre Nu?
**Spørgsmål til dig:**
1. **Vil du beholde begge systemer eller kun workflows?**
- Hvis kun workflows: Vi kan migrate rules → workflows nu
- Hvis begge: Vi skal fixe koordineringen
2. **Skal workflows kunne disables uden at slukke helt for systemet?**
- Ja → Vi tilføjer auto_execute flag
- Nej → Workflows kører altid når enabled=true
3. **Er der aktive rules i produktion lige nu?**
- Ja → Vi skal være forsigtige med ændringer
- Nej → Vi kan bare disable rule system
**Quick Fix (5 min):**
Jeg kan tilføje de 4 kritiske fixes nu hvis du vil fortsætte med begge systemer.
**Long Fix (1 time):**
Jeg kan deprecate rules og migrate til workflows hvis du vil simplificere.
Hvad foretrækker du? 🤔