
Why automate invoice extraction?
Manually extracting invoice data is time-consuming and error-prone. This n8n workflow drastically simplifies the task: as soon as a PDF lands in your Drive, data is extracted, formatted, and saved into Google Sheets automatically—zero manual entry, consistent results.
How the automation works
- Google Drive Trigger: Watches a specific folder every minute for new PDF invoices.
- Download PDF: Retrieves the new file via Google Drive integration.
- Extract Plain Text: Uses the “Extract from File” node to get raw text from the PDF.
- Clean & Format: “Edit Fields” node adjusts the text, removes noise and structures data.
- AI Agent (Groq – qwen‑qwq‑32b): Parses the invoice text and extracts key fields like invoice number, date, line items, prices, tax, and grand total. Missing fields are marked “NA.”
- Save to Google Sheets: Appends a new row with extracted data. A conditional check ensures no duplicates are added.
Tips, variations, & advanced ideas
- 🛡️ Duplicate prevention: Customize Google Sheets lookup to compare invoice numbers or timestamps.
- 💬 Notifications: Add Slack or email nodes to notify your team when a new invoice is processed.
- 📥 Backup: Save original PDFs into a different Drive folder or archive them in AWS S3 for redundancy.
- ⚙️ Scalability: Swap out qwen‑qwq‑32b for a smaller AI model if you’re processing high volume or require budget control.
Ready to streamline your invoice process?
Get help implementing this workflow or building a custom solution: