bmc_hub/migrations/140_email_extracted_vendor_fields.sql
Christian c6d310e96d feat: analyze PDF attachments for invoice extraction v2.2.18
- email_analysis_service: extract PDF text from attachments as PRIMARY source
  - _build_invoice_extraction_context: reads PDF bytes (in-memory or DB)
  - _extract_pdf_texts_from_attachments: pdfplumber on in-memory bytes
  - _get_attachment_texts_from_db: fallback to content_data/file_path
  - _build_extraction_prompt: comprehensive schema (vendor, CVR, lines, dates)
  - num_predict 300→3000, timeout 30→120s, format=json
- email_processor_service: _update_extracted_fields saves vendor_name, CVR, invoice_date
- migration 140: extracted_vendor_name, extracted_vendor_cvr, extracted_invoice_date columns

Sender (forwarder/external bookkeeper) is now ignored for vendor detection.
The actual invoice PDF determines vendor/amounts/lines.
2026-03-02 00:17:41 +01:00

12 lines
642 B
SQL

-- Migration 140: Add vendor extraction fields to email_messages
-- Stores vendor info extracted from attached invoice PDFs
ALTER TABLE email_messages
ADD COLUMN IF NOT EXISTS extracted_vendor_name VARCHAR(255),
ADD COLUMN IF NOT EXISTS extracted_vendor_cvr VARCHAR(20),
ADD COLUMN IF NOT EXISTS extracted_invoice_date DATE;
COMMENT ON COLUMN email_messages.extracted_vendor_name IS 'Vendor name from attached invoice PDF';
COMMENT ON COLUMN email_messages.extracted_vendor_cvr IS 'Vendor CVR from attached invoice PDF';
COMMENT ON COLUMN email_messages.extracted_invoice_date IS 'Invoice date from attached invoice PDF';