bmc_hub/docs/EMAIL_RULES_VS_WORKFLOWS_ANALYSIS.md

317 lines
9.5 KiB
Markdown
Raw Permalink Normal View History

# Email Rules vs Workflows - Analyse
## 🔍 Oversigt
BMC Hub har **2 systemer** til automatisk email-behandling:
1. **Email Rules** (legacy) - `email_rules` tabel
2. **Email Workflows** (nyere) - `email_workflows` tabel
## ⚙️ Hvordan Fungerer De?
### Email Processing Flow
```
📧 Ny Email Modtaget
1⃣ Save email til database (email_messages)
2⃣ Classify med AI/simple classifier
↓ (classification + confidence_score gemmes)
3⃣ Execute WORKFLOWS først 🆕
├─ Finder workflows med matching classification
├─ Tjekker confidence_threshold
├─ Checker sender/subject patterns (regex)
├─ Executer workflow steps i rækkefølge
└─ Stopper hvis stop_on_match=true
4⃣ Match RULES bagefter (legacy) 🕰️
├─ Finder rules med matching conditions
├─ Tjekker sender, domain, classification, subject
├─ Executer rule action (kun 1 action per rule)
└─ Stopper efter første match
```
## 🆚 Forskelle
| Feature | Email Rules (Legacy) | Email Workflows (Ny) |
|---------|---------------------|---------------------|
| **Fleksibilitet** | Enkelt action per rule | Multiple steps per workflow |
| **Priority** | Ja (priority field) | Ja (priority field) |
| **Stop on match** | Implicit (første match vinder) | Explicit (stop_on_match flag) |
| **Pattern matching** | Basic (exact match, contains) | Advanced (regex patterns) |
| **Confidence check** | Nej | Ja (confidence_threshold) |
| **Execution tracking** | Nej | Ja (email_workflow_executions) |
| **Statistics** | Ja (match_count) | Ja (execution_count, success/failure) |
| **Actions** | 5 types | 10+ types |
| **Database table** | email_rules | email_workflows |
| **Enabled by** | EMAIL_RULES_ENABLED | EMAIL_WORKFLOWS_ENABLED |
| **Auto-execute** | EMAIL_RULES_AUTO_PROCESS | Altid (hvis enabled) |
## ⚠️ PROBLEM: Duplikering og Konflikter
### 1. Begge Kan Køre Samtidigt
**Scenarie:**
```
Email: Faktura fra leverandør@example.com
Classification: invoice, confidence: 0.95
WORKFLOW matches:
- "Invoice Processing Workflow"
→ Steps: link_to_vendor, extract_invoice_data, mark_as_processed
→ Executes first! ✅
RULE matches:
- "Link Supplier Emails"
→ Action: link_supplier
→ Executes after! ⚠️
RESULTAT: link_to_vendor køres 2 gange!
```
### 2. Ingen Koordination
Workflows ved ikke om rules har kørt (eller omvendt).
**Problem:**
- Email kan markeres som "processed" af workflow
- Rule prøver stadig at køre action bagefter
- Resultatet logges 2 steder (workflow_executions + rule match_count)
### 3. Overlappende Actions
**Samme funktionalitet i begge systemer:**
| Action Type | Rule Name | Workflow Action |
|-------------|-----------|----------------|
| Link vendor | `link_supplier` | `link_to_vendor` |
| Link customer | `link_customer` | `link_to_customer` |
| Mark spam | `mark_spam` | *(mangler)* |
| Link case | `link_case` | `create_ticket` |
| Invoice extraction | *(mangler)* | `extract_invoice_data` |
### 4. Auto-Process Flag Virker Ikke for Workflows
**I koden:**
```python
# Rules respekterer auto-process flag
if self.auto_process:
await self._execute_rule_action(email_data, rule)
else:
logger.info(f"⏭️ Auto-process disabled - rule action not executed")
# Workflows kører ALTID hvis enabled=true
workflow_result = await email_workflow_service.execute_workflows(email_data)
```
**Problem:** Man kan ikke disable workflow auto-execution uden at disable hele workflow systemet.
## ✅ Hvad Virker Godt
### 1. Workflows Er Mere Kraftfulde
- Multi-step execution
- Better tracking (execution history)
- Regex pattern matching
- Confidence threshold check
- Success/failure statistics
### 2. Rules Er Simplere
- God til simple hvis-så logik
- Lettere at forstå for non-technical brugere
- Fungerer fint for basic email routing
### 3. Begge Har Priority Ordering
- Workflows executes i priority order
- Rules matches i priority order
- Første match kan stoppe kæden (hvis configured)
## 🐛 Konkrete Bugs Fundet
### Bug #1: Workflow Executes ALTID
**Kode:** `email_processor_service.py` line 77-79
```python
# Step 4: Execute workflows based on classification
workflow_result = await email_workflow_service.execute_workflows(email_data)
```
**Problem:** Ingen check af `EMAIL_RULES_AUTO_PROCESS` eller lignende flag.
**Fix:**
```python
if settings.EMAIL_WORKFLOWS_ENABLED:
workflow_result = await email_workflow_service.execute_workflows(email_data)
```
### Bug #2: Rules Kører Efter Workflows
**Kode:** `email_processor_service.py` line 84-88
```python
# Step 5: Match against rules (legacy support)
if self.rules_enabled:
matched = await self._match_rules(email_data)
```
**Problem:** Hvis workflow allerede har processed emailen, skal rule ikke køre.
**Fix:**
```python
# Step 5: Match against rules (legacy support) - skip if already processed by workflow
if self.rules_enabled and not email_data.get('_workflow_processed'):
matched = await self._match_rules(email_data)
```
### Bug #3: Manglende Deduplication
**Problem:** Samme action kan executes af både workflow og rule.
**Fix:** Add check i rule execution:
```python
# Check if email already processed by workflow
already_processed = execute_query(
"SELECT id FROM email_workflow_executions WHERE email_id = %s AND status = 'completed'",
(email_id,), fetchone=True
)
if already_processed:
logger.info(f"⏭️ Email already processed by workflow, skipping rule")
return False
```
### Bug #4: `extract_invoice_data` Workflow Action Kan Fejle Stille
**Kode:** `email_workflow_service.py` line 380+
```python
if not file_path.exists():
# No error raised! Just continues...
```
**Problem:** Hvis PDF fil ikke findes, fejler workflow ikke - den fortsætter bare.
**Fix:** Raise exception:
```python
if not file_path.exists():
raise FileNotFoundError(f"Attachment file not found: {attachment_path}")
```
## 💡 Anbefalinger
### Anbefaling #1: Vælg ÉT System
**Option A: Deprecate Rules (anbefalet)**
- Workflows er mere kraftfulde
- Better tracking og debugging
- Fremtidssikret arkitektur
**Migration plan:**
1. Opret workflows der matcher alle aktive rules
2. Disable rules (set enabled=false)
3. Test workflows grundigt
4. Fjern rule execution fra processor
**Option B: Keep Both, Men Koordinér**
- Add `_workflow_processed` flag til email_data
- Skip rules hvis workflow har kørt
- Document clearly når man skal bruge rules vs workflows
### Anbefaling #2: Tilføj Workflow Auto-Process Flag
**Tilføj til `email_workflows` tabel:**
```sql
ALTER TABLE email_workflows ADD COLUMN auto_execute BOOLEAN DEFAULT true;
```
**Check flag før execution:**
```python
if workflow.get('auto_execute', True):
result = await self._execute_workflow(workflow, email_data)
```
### Anbefaling #3: Unified Action Registry
**Opret fælles action handlers:**
```python
# shared/email_actions.py
class EmailActions:
@staticmethod
async def link_to_vendor(email_id, vendor_id):
# Single implementation used by both rules and workflows
...
```
### Anbefaling #4: Better Conflict Detection
**Add admin UI warning:**
```python
# Check for overlapping rules and workflows
def check_conflicts():
conflicts = []
for rule in active_rules:
for workflow in active_workflows:
if might_conflict(rule, workflow):
conflicts.append({
'rule': rule['name'],
'workflow': workflow['name'],
'reason': 'Both match same classification'
})
return conflicts
```
### Anbefaling #5: Execution Log Consolidation
**Single view af alle actions:**
```sql
CREATE VIEW email_action_log AS
SELECT
'workflow' as source,
e.email_id,
w.name as action_name,
e.status,
e.started_at
FROM email_workflow_executions e
JOIN email_workflows w ON w.id = e.workflow_id
UNION ALL
SELECT
'rule' as source,
em.id as email_id,
er.name as action_name,
CASE WHEN em.auto_processed THEN 'completed' ELSE 'skipped' END as status,
em.updated_at as started_at
FROM email_messages em
JOIN email_rules er ON er.id = em.rule_id
WHERE em.rule_id IS NOT NULL
ORDER BY started_at DESC;
```
## 🎯 Action Plan
### Umiddelbart (Kritisk):
1. ✅ Add `EMAIL_WORKFLOWS_ENABLED` check før workflow execution
2. ✅ Add workflow-processed check før rule matching
3. ✅ Fix `extract_invoice_data` silent failure
4. ✅ Add duplicate action detection
### Kort Sigt:
5. Add `auto_execute` column til workflows tabel
6. Create unified action handlers
7. Add conflict detection admin tool
8. Document clearly hvornår man skal bruge hvad
### Lang Sigt:
9. Decide: Deprecate rules eller keep both?
10. Migrate existing rules til workflows (hvis deprecating)
11. Create unified execution log view
12. Add UI for viewing all email actions i ét dashboard
## 📊 Hvad Skal Du Gøre Nu?
**Spørgsmål til dig:**
1. **Vil du beholde begge systemer eller kun workflows?**
- Hvis kun workflows: Vi kan migrate rules → workflows nu
- Hvis begge: Vi skal fixe koordineringen
2. **Skal workflows kunne disables uden at slukke helt for systemet?**
- Ja → Vi tilføjer auto_execute flag
- Nej → Workflows kører altid når enabled=true
3. **Er der aktive rules i produktion lige nu?**
- Ja → Vi skal være forsigtige med ændringer
- Nej → Vi kan bare disable rule system
**Quick Fix (5 min):**
Jeg kan tilføje de 4 kritiske fixes nu hvis du vil fortsætte med begge systemer.
**Long Fix (1 time):**
Jeg kan deprecate rules og migrate til workflows hvis du vil simplificere.
Hvad foretrækker du? 🤔