# AI Template Generation - Perfect Prompt Example ## Prompt til Ollama/LLM ``` OPGAVE: Analyser denne danske faktura og udtræk information til template-generering. RETURNER KUN VALID JSON - ingen forklaring, ingen markdown, kun ren JSON! REQUIRED JSON STRUKTUR: { "vendor_cvr": { "value": "17630903", "pattern": "DK\\s*(\\d{8})", "group": 1 }, "invoice_number": { "value": "974733485", "pattern": "Nummer\\s*(\\d+)", "group": 1 }, "invoice_date": { "value": "30.06.2025", "pattern": "Dato\\s*(\\d{1,2}[\\/.\\-]\\d{1,2}[\\/.\\-]\\d{4})", "group": 1, "format": "DD.MM.YYYY" }, "total_amount": { "value": "5.165,61", "pattern": "Total\\s*([\\d.,]+)", "group": 1 }, "detection_patterns": [ {"type": "text", "pattern": "ALSO A/S", "weight": 0.5}, {"type": "text", "pattern": "Mårkærvej 2", "weight": 0.3}, {"type": "text", "pattern": "Faktura", "weight": 0.2} ], "lines_start": { "pattern": "Position Varenr\\. Beskrivelse Antal/Enhed" }, "lines_end": { "pattern": "Subtotal|I alt ekskl\\. moms" } } REGLER: 1. Pattern skal være regex med escaped backslashes (\\s, \\d) 2. Group angiver hvilken gruppe i regex der skal udtrækkes (1-baseret) 3. Value skal være den faktiske værdi fundet i dokumentet 4. Detection_patterns skal være 3-5 unikke tekststrenge der identificerer leverandøren 5. lines_start er teksten LIGE FØR varelinjer starter 6. lines_end er teksten EFTER varelinjer slutter 7. LAV IKKE line_pattern - systemet bruger automatisk multi-line extraction PDF TEKST: [PDF_CONTENT_HER] RETURNER KUN JSON - intet andet! ``` ## Eksempel Response (det du skal få tilbage) ```json { "vendor_cvr": { "value": "17630903", "pattern": "DK\\s*(\\d{8})", "group": 1 }, "invoice_number": { "value": "974733485", "pattern": "Nummer\\s*(\\d+)", "group": 1 }, "invoice_date": { "value": "30.06.2025", "pattern": "Dato\\s*(\\d{1,2}[\\/.\\-]\\d{1,2}[\\/.\\-]\\d{4})", "group": 1, "format": "DD.MM.YYYY" }, "total_amount": { "value": "5.165,61", "pattern": "beløb\\s*([\\d.,]+)", "group": 1 }, "detection_patterns": [ {"type": "text", "pattern": "ALSO A/S", "weight": 0.5}, {"type": "text", "pattern": "Mårkærvej 2", "weight": 0.3}, {"type": "text", "pattern": "DK-2630 Taastrup", "weight": 0.2} ], "lines_start": { "pattern": "Position Varenr\\. Beskrivelse Antal/Enhed Pris pr\\. enhed Total pris" }, "lines_end": { "pattern": "Subtotal" } } ``` ## Hvordan bruges det i kode ```python import json import requests pdf_text = "ALSO A/S\nMårkærvej 2\n2630 Taastrup\nNummer 974733485..." prompt = f"""OPGAVE: Analyser denne danske faktura og udtræk information til template-generering. RETURNER KUN VALID JSON - ingen forklaring, kun JSON! REQUIRED JSON STRUKTUR: {{ "vendor_cvr": {{"value": "17630903", "pattern": "DK\\\\s*(\\\\d{{8}})", "group": 1}}, "invoice_number": {{"value": "974733485", "pattern": "Nummer\\\\s*(\\\\d+)", "group": 1}}, "invoice_date": {{"value": "30.06.2025", "pattern": "Dato\\\\s*(\\\\d{{1,2}}[\\\\/.\\\\-]\\\\d{{1,2}}[\\\\/.\\\\-]\\\\d{{4}})", "group": 1}}, "total_amount": {{"value": "5.165,61", "pattern": "Total\\\\s*([\\\\d.,]+)", "group": 1}}, "detection_patterns": [{{"type": "text", "pattern": "ALSO A/S", "weight": 0.5}}], "lines_start": {{"pattern": "Position Varenr"}}, "lines_end": {{"pattern": "Subtotal"}} }} PDF TEKST: {pdf_text[:2000]} RETURNER KUN JSON!""" # Send til Ollama response = requests.post('http://localhost:11434/api/generate', json={ 'model': 'llama3.2', 'prompt': prompt, 'stream': False, 'options': {'temperature': 0.1} }) result = json.loads(response.json()['response']) print(json.dumps(result, indent=2, ensure_ascii=False)) ``` ## Test via curl ```bash # Hent PDF tekst PDF_TEXT=$(curl -s -X POST http://localhost:8001/api/v1/supplier-invoices/reprocess/4 | jq -r '.pdf_text') # Send til AI endpoint curl -X POST http://localhost:8001/api/v1/supplier-invoices/ai-analyze \ -H "Content-Type: application/json" \ -d "{\"pdf_text\": \"$PDF_TEXT\", \"vendor_id\": 1}" | jq . ``` ## Tips for bedste resultater 1. **Brug temperature 0.1** - For konsistente JSON responses 2. **Escaping**: Brug `\\\\s` i Python strings (bliver til `\\s` i JSON, `\s` i regex) 3. **Specificer format**: Vis eksempel-output i prompten 4. **Vis struktur**: Giv klar JSON struktur med alle required felter 5. **Begræns tekst**: Kun første 2000 tegn (indeholder det vigtigste) 6. **Validation**: Check at response er valid JSON før brug ## Konvertering til template format AI returnerer nested format, men template vil have flat format: ```python ai_result = { "vendor_cvr": {"value": "17630903", "pattern": "DK\\s*(\\d{8})", "group": 1}, "invoice_number": {"value": "974733485", "pattern": "Nummer\\s*(\\d+)", "group": 1} } # Konverter til template field_mappings field_mappings = {} for field_name, config in ai_result.items(): if field_name != 'detection_patterns': field_mappings[field_name] = { 'pattern': config['pattern'], 'group': config.get('group', 1) } if 'format' in config: field_mappings[field_name]['format'] = config['format'] ``` ## Forventet output format Template systemet forventer: ```json { "vendor_id": 1, "template_name": "ALSO A/S", "detection_patterns": [ {"type": "text", "pattern": "ALSO A/S", "weight": 0.5} ], "field_mappings": { "vendor_cvr": { "pattern": "DK\\s*(\\d{8})", "group": 1 }, "invoice_number": { "pattern": "Nummer\\s*(\\d+)", "group": 1 }, "lines_start": { "pattern": "Position Varenr" }, "lines_end": { "pattern": "Subtotal" } } } ``` VIGTIGT: Ingen `line_item` pattern - systemet bruger automatisk multi-line extraction!