bmc_hub/docs/ai_template_prompt_example.md
Christian dcb4d8a280 feat: Implement supplier invoices management with e-conomic integration
- Added FastAPI views for supplier invoices in the billing frontend.
- Created EconomicService for handling e-conomic API interactions, including safety modes for read-only and dry-run operations.
- Developed database migration for supplier invoices, including tables for invoices, line items, and settings.
- Documented kassekladde module features, architecture, API endpoints, and usage guide in KASSEKLADDE.md.
- Implemented views for overdue invoices and pending e-conomic sync.
2025-12-07 03:29:54 +01:00

214 lines
6.1 KiB
Markdown

# AI Template Generation - Perfect Prompt Example
## Prompt til Ollama/LLM
```
OPGAVE: Analyser denne danske faktura og udtræk information til template-generering.
RETURNER KUN VALID JSON - ingen forklaring, ingen markdown, kun ren JSON!
REQUIRED JSON STRUKTUR:
{
"vendor_cvr": {
"value": "17630903",
"pattern": "DK\\s*(\\d{8})",
"group": 1
},
"invoice_number": {
"value": "974733485",
"pattern": "Nummer\\s*(\\d+)",
"group": 1
},
"invoice_date": {
"value": "30.06.2025",
"pattern": "Dato\\s*(\\d{1,2}[\\/.\\-]\\d{1,2}[\\/.\\-]\\d{4})",
"group": 1,
"format": "DD.MM.YYYY"
},
"total_amount": {
"value": "5.165,61",
"pattern": "Total\\s*([\\d.,]+)",
"group": 1
},
"detection_patterns": [
{"type": "text", "pattern": "ALSO A/S", "weight": 0.5},
{"type": "text", "pattern": "Mårkærvej 2", "weight": 0.3},
{"type": "text", "pattern": "Faktura", "weight": 0.2}
],
"lines_start": {
"pattern": "Position Varenr\\. Beskrivelse Antal/Enhed"
},
"lines_end": {
"pattern": "Subtotal|I alt ekskl\\. moms"
}
}
REGLER:
1. Pattern skal være regex med escaped backslashes (\\s, \\d)
2. Group angiver hvilken gruppe i regex der skal udtrækkes (1-baseret)
3. Value skal være den faktiske værdi fundet i dokumentet
4. Detection_patterns skal være 3-5 unikke tekststrenge der identificerer leverandøren
5. lines_start er teksten LIGE FØR varelinjer starter
6. lines_end er teksten EFTER varelinjer slutter
7. LAV IKKE line_pattern - systemet bruger automatisk multi-line extraction
PDF TEKST:
[PDF_CONTENT_HER]
RETURNER KUN JSON - intet andet!
```
## Eksempel Response (det du skal få tilbage)
```json
{
"vendor_cvr": {
"value": "17630903",
"pattern": "DK\\s*(\\d{8})",
"group": 1
},
"invoice_number": {
"value": "974733485",
"pattern": "Nummer\\s*(\\d+)",
"group": 1
},
"invoice_date": {
"value": "30.06.2025",
"pattern": "Dato\\s*(\\d{1,2}[\\/.\\-]\\d{1,2}[\\/.\\-]\\d{4})",
"group": 1,
"format": "DD.MM.YYYY"
},
"total_amount": {
"value": "5.165,61",
"pattern": "beløb\\s*([\\d.,]+)",
"group": 1
},
"detection_patterns": [
{"type": "text", "pattern": "ALSO A/S", "weight": 0.5},
{"type": "text", "pattern": "Mårkærvej 2", "weight": 0.3},
{"type": "text", "pattern": "DK-2630 Taastrup", "weight": 0.2}
],
"lines_start": {
"pattern": "Position Varenr\\. Beskrivelse Antal/Enhed Pris pr\\. enhed Total pris"
},
"lines_end": {
"pattern": "Subtotal"
}
}
```
## Hvordan bruges det i kode
```python
import json
import requests
pdf_text = "ALSO A/S\nMårkærvej 2\n2630 Taastrup\nNummer 974733485..."
prompt = f"""OPGAVE: Analyser denne danske faktura og udtræk information til template-generering.
RETURNER KUN VALID JSON - ingen forklaring, kun JSON!
REQUIRED JSON STRUKTUR:
{{
"vendor_cvr": {{"value": "17630903", "pattern": "DK\\\\s*(\\\\d{{8}})", "group": 1}},
"invoice_number": {{"value": "974733485", "pattern": "Nummer\\\\s*(\\\\d+)", "group": 1}},
"invoice_date": {{"value": "30.06.2025", "pattern": "Dato\\\\s*(\\\\d{{1,2}}[\\\\/.\\\\-]\\\\d{{1,2}}[\\\\/.\\\\-]\\\\d{{4}})", "group": 1}},
"total_amount": {{"value": "5.165,61", "pattern": "Total\\\\s*([\\\\d.,]+)", "group": 1}},
"detection_patterns": [{{"type": "text", "pattern": "ALSO A/S", "weight": 0.5}}],
"lines_start": {{"pattern": "Position Varenr"}},
"lines_end": {{"pattern": "Subtotal"}}
}}
PDF TEKST:
{pdf_text[:2000]}
RETURNER KUN JSON!"""
# Send til Ollama
response = requests.post('http://localhost:11434/api/generate', json={
'model': 'llama3.2',
'prompt': prompt,
'stream': False,
'options': {'temperature': 0.1}
})
result = json.loads(response.json()['response'])
print(json.dumps(result, indent=2, ensure_ascii=False))
```
## Test via curl
```bash
# Hent PDF tekst
PDF_TEXT=$(curl -s -X POST http://localhost:8001/api/v1/supplier-invoices/reprocess/4 | jq -r '.pdf_text')
# Send til AI endpoint
curl -X POST http://localhost:8001/api/v1/supplier-invoices/ai-analyze \
-H "Content-Type: application/json" \
-d "{\"pdf_text\": \"$PDF_TEXT\", \"vendor_id\": 1}" | jq .
```
## Tips for bedste resultater
1. **Brug temperature 0.1** - For konsistente JSON responses
2. **Escaping**: Brug `\\\\s` i Python strings (bliver til `\\s` i JSON, `\s` i regex)
3. **Specificer format**: Vis eksempel-output i prompten
4. **Vis struktur**: Giv klar JSON struktur med alle required felter
5. **Begræns tekst**: Kun første 2000 tegn (indeholder det vigtigste)
6. **Validation**: Check at response er valid JSON før brug
## Konvertering til template format
AI returnerer nested format, men template vil have flat format:
```python
ai_result = {
"vendor_cvr": {"value": "17630903", "pattern": "DK\\s*(\\d{8})", "group": 1},
"invoice_number": {"value": "974733485", "pattern": "Nummer\\s*(\\d+)", "group": 1}
}
# Konverter til template field_mappings
field_mappings = {}
for field_name, config in ai_result.items():
if field_name != 'detection_patterns':
field_mappings[field_name] = {
'pattern': config['pattern'],
'group': config.get('group', 1)
}
if 'format' in config:
field_mappings[field_name]['format'] = config['format']
```
## Forventet output format
Template systemet forventer:
```json
{
"vendor_id": 1,
"template_name": "ALSO A/S",
"detection_patterns": [
{"type": "text", "pattern": "ALSO A/S", "weight": 0.5}
],
"field_mappings": {
"vendor_cvr": {
"pattern": "DK\\s*(\\d{8})",
"group": 1
},
"invoice_number": {
"pattern": "Nummer\\s*(\\d+)",
"group": 1
},
"lines_start": {
"pattern": "Position Varenr"
},
"lines_end": {
"pattern": "Subtotal"
}
}
}
```
VIGTIGT: Ingen `line_item` pattern - systemet bruger automatisk multi-line extraction!