- email_analysis_service: extract PDF text from attachments as PRIMARY source - _build_invoice_extraction_context: reads PDF bytes (in-memory or DB) - _extract_pdf_texts_from_attachments: pdfplumber on in-memory bytes - _get_attachment_texts_from_db: fallback to content_data/file_path - _build_extraction_prompt: comprehensive schema (vendor, CVR, lines, dates) - num_predict 300→3000, timeout 30→120s, format=json - email_processor_service: _update_extracted_fields saves vendor_name, CVR, invoice_date - migration 140: extracted_vendor_name, extracted_vendor_cvr, extracted_invoice_date columns Sender (forwarder/external bookkeeper) is now ignored for vendor detection. The actual invoice PDF determines vendor/amounts/lines.
12 lines
642 B
SQL
12 lines
642 B
SQL
-- Migration 140: Add vendor extraction fields to email_messages
|
|
-- Stores vendor info extracted from attached invoice PDFs
|
|
|
|
ALTER TABLE email_messages
|
|
ADD COLUMN IF NOT EXISTS extracted_vendor_name VARCHAR(255),
|
|
ADD COLUMN IF NOT EXISTS extracted_vendor_cvr VARCHAR(20),
|
|
ADD COLUMN IF NOT EXISTS extracted_invoice_date DATE;
|
|
|
|
COMMENT ON COLUMN email_messages.extracted_vendor_name IS 'Vendor name from attached invoice PDF';
|
|
COMMENT ON COLUMN email_messages.extracted_vendor_cvr IS 'Vendor CVR from attached invoice PDF';
|
|
COMMENT ON COLUMN email_messages.extracted_invoice_date IS 'Invoice date from attached invoice PDF';
|