bmc_hub/docs/ai_template_prompt_example.md
Christian dcb4d8a280 feat: Implement supplier invoices management with e-conomic integration
- Added FastAPI views for supplier invoices in the billing frontend.
- Created EconomicService for handling e-conomic API interactions, including safety modes for read-only and dry-run operations.
- Developed database migration for supplier invoices, including tables for invoices, line items, and settings.
- Documented kassekladde module features, architecture, API endpoints, and usage guide in KASSEKLADDE.md.
- Implemented views for overdue invoices and pending e-conomic sync.
2025-12-07 03:29:54 +01:00

6.1 KiB

AI Template Generation - Perfect Prompt Example

Prompt til Ollama/LLM

OPGAVE: Analyser denne danske faktura og udtræk information til template-generering.

RETURNER KUN VALID JSON - ingen forklaring, ingen markdown, kun ren JSON!

REQUIRED JSON STRUKTUR:
{
    "vendor_cvr": {
        "value": "17630903",
        "pattern": "DK\\s*(\\d{8})",
        "group": 1
    },
    "invoice_number": {
        "value": "974733485",
        "pattern": "Nummer\\s*(\\d+)",
        "group": 1
    },
    "invoice_date": {
        "value": "30.06.2025",
        "pattern": "Dato\\s*(\\d{1,2}[\\/.\\-]\\d{1,2}[\\/.\\-]\\d{4})",
        "group": 1,
        "format": "DD.MM.YYYY"
    },
    "total_amount": {
        "value": "5.165,61",
        "pattern": "Total\\s*([\\d.,]+)",
        "group": 1
    },
    "detection_patterns": [
        {"type": "text", "pattern": "ALSO A/S", "weight": 0.5},
        {"type": "text", "pattern": "Mårkærvej 2", "weight": 0.3},
        {"type": "text", "pattern": "Faktura", "weight": 0.2}
    ],
    "lines_start": {
        "pattern": "Position Varenr\\. Beskrivelse Antal/Enhed"
    },
    "lines_end": {
        "pattern": "Subtotal|I alt ekskl\\. moms"
    }
}

REGLER:
1. Pattern skal være regex med escaped backslashes (\\s, \\d)
2. Group angiver hvilken gruppe i regex der skal udtrækkes (1-baseret)
3. Value skal være den faktiske værdi fundet i dokumentet
4. Detection_patterns skal være 3-5 unikke tekststrenge der identificerer leverandøren
5. lines_start er teksten LIGE FØR varelinjer starter
6. lines_end er teksten EFTER varelinjer slutter
7. LAV IKKE line_pattern - systemet bruger automatisk multi-line extraction

PDF TEKST:
[PDF_CONTENT_HER]

RETURNER KUN JSON - intet andet!

Eksempel Response (det du skal få tilbage)

{
    "vendor_cvr": {
        "value": "17630903",
        "pattern": "DK\\s*(\\d{8})",
        "group": 1
    },
    "invoice_number": {
        "value": "974733485",
        "pattern": "Nummer\\s*(\\d+)",
        "group": 1
    },
    "invoice_date": {
        "value": "30.06.2025",
        "pattern": "Dato\\s*(\\d{1,2}[\\/.\\-]\\d{1,2}[\\/.\\-]\\d{4})",
        "group": 1,
        "format": "DD.MM.YYYY"
    },
    "total_amount": {
        "value": "5.165,61",
        "pattern": "beløb\\s*([\\d.,]+)",
        "group": 1
    },
    "detection_patterns": [
        {"type": "text", "pattern": "ALSO A/S", "weight": 0.5},
        {"type": "text", "pattern": "Mårkærvej 2", "weight": 0.3},
        {"type": "text", "pattern": "DK-2630 Taastrup", "weight": 0.2}
    ],
    "lines_start": {
        "pattern": "Position Varenr\\. Beskrivelse Antal/Enhed Pris pr\\. enhed Total pris"
    },
    "lines_end": {
        "pattern": "Subtotal"
    }
}

Hvordan bruges det i kode

import json
import requests

pdf_text = "ALSO A/S\nMårkærvej 2\n2630 Taastrup\nNummer 974733485..."

prompt = f"""OPGAVE: Analyser denne danske faktura og udtræk information til template-generering.

RETURNER KUN VALID JSON - ingen forklaring, kun JSON!

REQUIRED JSON STRUKTUR:
{{
    "vendor_cvr": {{"value": "17630903", "pattern": "DK\\\\s*(\\\\d{{8}})", "group": 1}},
    "invoice_number": {{"value": "974733485", "pattern": "Nummer\\\\s*(\\\\d+)", "group": 1}},
    "invoice_date": {{"value": "30.06.2025", "pattern": "Dato\\\\s*(\\\\d{{1,2}}[\\\\/.\\\\-]\\\\d{{1,2}}[\\\\/.\\\\-]\\\\d{{4}})", "group": 1}},
    "total_amount": {{"value": "5.165,61", "pattern": "Total\\\\s*([\\\\d.,]+)", "group": 1}},
    "detection_patterns": [{{"type": "text", "pattern": "ALSO A/S", "weight": 0.5}}],
    "lines_start": {{"pattern": "Position Varenr"}},
    "lines_end": {{"pattern": "Subtotal"}}
}}

PDF TEKST:
{pdf_text[:2000]}

RETURNER KUN JSON!"""

# Send til Ollama
response = requests.post('http://localhost:11434/api/generate', json={
    'model': 'llama3.2',
    'prompt': prompt,
    'stream': False,
    'options': {'temperature': 0.1}
})

result = json.loads(response.json()['response'])
print(json.dumps(result, indent=2, ensure_ascii=False))

Test via curl

# Hent PDF tekst
PDF_TEXT=$(curl -s -X POST http://localhost:8001/api/v1/supplier-invoices/reprocess/4 | jq -r '.pdf_text')

# Send til AI endpoint
curl -X POST http://localhost:8001/api/v1/supplier-invoices/ai-analyze \
  -H "Content-Type: application/json" \
  -d "{\"pdf_text\": \"$PDF_TEXT\", \"vendor_id\": 1}" | jq .

Tips for bedste resultater

  1. Brug temperature 0.1 - For konsistente JSON responses
  2. Escaping: Brug \\\\s i Python strings (bliver til \\s i JSON, \s i regex)
  3. Specificer format: Vis eksempel-output i prompten
  4. Vis struktur: Giv klar JSON struktur med alle required felter
  5. Begræns tekst: Kun første 2000 tegn (indeholder det vigtigste)
  6. Validation: Check at response er valid JSON før brug

Konvertering til template format

AI returnerer nested format, men template vil have flat format:

ai_result = {
    "vendor_cvr": {"value": "17630903", "pattern": "DK\\s*(\\d{8})", "group": 1},
    "invoice_number": {"value": "974733485", "pattern": "Nummer\\s*(\\d+)", "group": 1}
}

# Konverter til template field_mappings
field_mappings = {}
for field_name, config in ai_result.items():
    if field_name != 'detection_patterns':
        field_mappings[field_name] = {
            'pattern': config['pattern'],
            'group': config.get('group', 1)
        }
        if 'format' in config:
            field_mappings[field_name]['format'] = config['format']

Forventet output format

Template systemet forventer:

{
    "vendor_id": 1,
    "template_name": "ALSO A/S",
    "detection_patterns": [
        {"type": "text", "pattern": "ALSO A/S", "weight": 0.5}
    ],
    "field_mappings": {
        "vendor_cvr": {
            "pattern": "DK\\s*(\\d{8})",
            "group": 1
        },
        "invoice_number": {
            "pattern": "Nummer\\s*(\\d+)",
            "group": 1
        },
        "lines_start": {
            "pattern": "Position Varenr"
        },
        "lines_end": {
            "pattern": "Subtotal"
        }
    }
}

VIGTIGT: Ingen line_item pattern - systemet bruger automatisk multi-line extraction!