Files
siddharthd ce67e38d77 docs: update CLAUDE.md and README to reflect recent changes
- Add reconciled_with_id and created_at columns to transactions table docs
- Document split_payments, expense_metadata, rule_apply_runs tables
- Update /api/transactions route docs with has_split filter and all sort options
- Add /api/transactions/reconcile and /api/split-payments to API table
- Document import date (created_at) behaviour and reconciliation caveat
- Add Prisma regeneration note to CLAUDE.md
- Note schema drift for tables added without migration files
2026-05-10 16:52:55 +10:00

383 lines
19 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Finance App
Personal finance tracker built on Next.js 16 (App Router), PostgreSQL, and Prisma. Bank statements are ingested automatically from Paperless-NGX via an N8N workflow that uses Gemini to extract structured data from PDF statements.
## Stack
- **Frontend**: Next.js 16 App Router, TypeScript, Tailwind CSS, Recharts
- **Backend**: Next.js API routes, raw PostgreSQL via `pg` + `@prisma/adapter-pg`
- **Database**: PostgreSQL (`postgres-personal` container)
- **Auth**: `X-Forwarded-User` header (email) set by Traefik forward-auth → mapped to `participants.email`
- **Ingestion**: N8N workflow → Gemini 2.5 Flash (PDF parsing) → PostgreSQL
---
## Data Model
### `statements`
The top-level document, one row per billing period per account.
| Column | Type | Description |
|--------|------|-------------|
| `id` | int | Primary key |
| `bank_name` | text | Normalised bank name (e.g. "American Express") |
| `card_name` | text | Product name (e.g. "Rewards Travel Adventures") |
| `account_number` | text | Account/card number (spaces stripped) |
| `account_type` | text | Raw account type string from statement |
| `statement_type` | text | Normalised type: `Credit Card`, `Business Card`, `multi-currency account`, etc. |
| `account_holder_name` | text | Name on the account if extracted |
| `billing_start_date` | date | Period start |
| `billing_end_date` | date | Period end — used as the deduplication anchor |
| `opening_balance` | numeric | Balance at start of period |
| `closing_balance` | numeric | Balance at end of period |
| `total_credits` | numeric | Sum of all credits in period |
| `total_debits` | numeric | Sum of all debits in period |
| `total_amount_due` | numeric | Amount due (credit cards) |
| `minimum_amount_due` | numeric | Minimum payment due (credit cards) |
| `payment_due_date` | date | Payment due date (credit cards) |
| `credit_limit` | numeric | Credit limit (credit cards) |
| `available_credit` | numeric | Available credit at statement date |
| `interest_charged` | numeric | Interest charged this period (from statement summary) |
| `fees_charged` | numeric | Fees charged this period (from statement summary) |
| `currency` | text | Statement currency (e.g. `AUD`, `USD`) |
| `exchange_rate_to_aud` | numeric | FX rate at ingestion time (live from open.er-api.com) |
| `owner_id` | int FK → `participants` | Which person owns this statement |
| `paperless_doc_id` | int | Paperless-NGX document ID — deduplication key |
| `tier_used` | text | AI model used for extraction (e.g. `gemini-2.5-flash`) |
| `event_created` | bool | Whether a Google Calendar reminder was created for payment due date |
**Deduplication**: unique index on `(bank_name, account_number, billing_end_date)` prevents re-ingestion of the same period. `paperless_doc_id` has a separate unique index for Paperless-linked documents.
**Credit card detection**: `statement_type ILIKE '%card%'`
---
### `transactions`
One row per line item within a statement. Cascade-deleted when the parent statement is deleted.
| Column | Type | Description |
|--------|------|-------------|
| `id` | int | Primary key |
| `statement_id` | int FK → `statements` (nullable) | Parent statement; NULL for manually-entered transactions |
| `owner_id` | int FK → `participants` (nullable) | Owner for manual transactions (no statement); statement-linked transactions derive owner from `statements.owner_id` |
| `transaction_date` | date | Date of transaction |
| `description` | text | Raw description from the statement |
| `amount` | numeric | Original amount in statement currency |
| `amount_aud` | numeric | AUD-converted amount (= amount if already AUD) |
| `transaction_type` | text | `debit`, `credit`, `payment`, `refund`, `fee`, `interest`, `transfer` |
| `merchant_name` | text | Raw merchant name extracted by Gemini |
| `merchant_normalized` | text | Cleaned/normalised merchant name (Gemini) |
| `location` | text | Location if present on statement |
| `foreign_currency_amount` | numeric | Original foreign amount if this was an FX transaction |
| `foreign_currency_code` | text | Foreign currency code (e.g. `USD`) |
| `category` | text | AI-assigned category (see category taxonomy below) |
| `row_index` | int | Position in statement — used for deduplication |
| `reconciled_with_id` | int FK → `transactions` (nullable) | Links a manually-entered transaction to its matching statement transaction after reconciliation |
| `created_at` | timestamptz | When the row was inserted — the "import date". For reconciled transactions the UI shows the original manual/CSV `created_at`, not the statement's |
**Deduplication**: unique index on `(statement_id, transaction_date, description, amount, row_index)`.
**Analytics**: all spend queries use `amount_aud` for cross-currency consistency. Split-adjusted queries apply `amount_aud * share_percent / 100` where a split exists for the current user.
---
### `transaction_overrides`
User corrections to AI-extracted data. Stored separately to preserve the original extraction.
| Column | Type | Description |
|--------|------|-------------|
| `transaction_id` | int FK → `transactions` (unique) | One override per transaction |
| `merchant_normalized` | text | User-corrected merchant name |
| `category_override` | text | User-corrected category |
| `notes` | text | Free-text notes |
All analytics queries use `COALESCE(o.category_override, t.category)` and `COALESCE(o.merchant_normalized, t.merchant_normalized, t.merchant_name)` to prefer overrides over AI values.
---
### `transaction_splits`
Shared expense tracking — records that a transaction was split between participants.
| Column | Type | Description |
|--------|------|-------------|
| `transaction_id` | int FK → `transactions` | The transaction being split |
| `participant_id` | int FK → `participants` | Who shares in this transaction |
| `share_percent` | numeric(5,2) | Their percentage (1100) |
| `settled` | bool | Whether this share has been settled |
| `settled_at` | timestamptz | When it was settled |
A transaction can be split across multiple participants. The statement owner's own share is implicit (`100 - SUM(other shares)`). Analytics queries LEFT JOIN `transaction_splits` on `participant_id = current_user.id` — if no split row exists, the full amount belongs to the owner.
---
### `transaction_tags`
Many-to-many join between transactions and tags.
| Column | Type |
|--------|------|
| `transaction_id` | int FK → `transactions` |
| `tag_id` | int FK → `tags` |
---
### `tags`
User-defined coloured labels for ad-hoc transaction grouping beyond the fixed category taxonomy.
| Column | Type | Description |
|--------|------|-------------|
| `id` | int | Primary key |
| `name` | text (unique) | Tag name |
| `color` | text | Hex colour (default `#6366f1`) |
---
### `participants`
People who own statements or share expenses.
| Column | Type | Description |
|--------|------|-------------|
| `id` | int | Primary key |
| `name` | text (unique) | Display name |
| `email` | text (unique) | Login identity — matched against `X-Forwarded-User` header |
---
### `account_owner_mappings`
Persists `(bank, account_number) → owner` assignments so future ingestion auto-assigns the correct owner without manual intervention.
| Column | Type | Description |
|--------|------|-------------|
| `bank_name` | text | |
| `account_number` | text | |
| `owner_id` | int FK → `participants` | |
Written when a user reassigns a statement owner in the UI. Consulted by the N8N workflow on every new statement insert.
---
### `rules`
Saved auto-categorisation rules. Applied in bulk via the Rules page.
| Column | Type | Description |
|--------|------|-------------|
| `owner_id` | int FK → `participants` | Rule belongs to this user |
| `name` | text | Rule label |
| `conditions` | jsonb | Array of `{field, operator, value}` — AND logic |
| `actions` | jsonb | `{set_category, add_tag_ids, set_merchant}` |
| `enabled` | bool | |
| `priority` | int | Higher priority rules run first |
**Condition fields**: `merchant_normalized`, `description`, `category`, `bank_name`, `amount`, `transaction_type`
**Condition operators**: `contains`, `equals`, `starts_with`, `gt`, `lt`, `not_equals`
**Actions**: `set_category`, `set_merchant`, `add_tag_ids`, `apply_split`
---
### `split_payments`
Records of actual cash settlements between participants.
| Column | Type | Description |
|--------|------|-------------|
| `from_participant_id` | int FK → `participants` | Who paid |
| `to_participant_id` | int FK → `participants` | Who received |
| `amount` | numeric | Amount settled |
| `payment_date` | date | Date of settlement |
| `notes` | text | Optional note (e.g. "bank transfer") |
| `linked_transaction_id` | int FK → `transactions` (nullable) | If the payment was itself a transaction |
---
### `expense_metadata`
Enrichment records for non-statement expenses (email receipts, manual entries). Linked to a `transaction` if one exists; otherwise a standalone record awaiting reconciliation.
| Column | Type | Description |
|--------|------|-------------|
| `transaction_id` | int FK → `transactions` (unique, nullable) | Linked transaction; NULL until reconciled |
| `source` | text | Origin: `email`, `manual` |
| `paperless_doc_id` | int | Paperless-NGX document ID |
| `payment_method` | text | `credit_card`, `debit_card`, `paypal`, `afterpay`, `cash`, etc. |
| `payment_method_detail` | text | Card last-4 or provider detail |
| `order_reference` | text | Order/confirmation number |
| `line_items` | jsonb | Array of `{description, qty, unit_price, total}` |
| `merchant_normalized` | text | Canonical merchant for matching |
| `amount` / `transaction_date` | numeric / date | Used for reconciliation matching when `transaction_id IS NULL` |
| `extraction_model` | text | AI model used (`gemini-2.5-flash`) |
Partial index on `(merchant_normalized, transaction_date) WHERE transaction_id IS NULL` powers reconciliation queries.
---
### `rule_apply_runs`
Audit log of bulk rule-apply operations. Each run captures which transactions were affected and a full snapshot for revert support.
| Column | Type | Description |
|--------|------|-------------|
| `owner_id` | int FK → `participants` | |
| `applied_at` | timestamptz | When the run executed |
| `split_from` | date | Optional date filter used for this run |
| `matched` | int | Number of rules matched |
| `transactions_affected` | int | Number of transactions changed |
| `reverted_at` | timestamptz | Set when run was reverted |
| `snapshot` | jsonb | Pre-run state of all affected transactions |
---
### `budgets`
Monthly spend targets per category. Stored but currently unused in the UI (replaced by the analytics/insights views).
| Column | Type | Description |
|--------|------|-------------|
| `owner_id` | int FK → `participants` | |
| `category` | text | Category name |
| `month` | date | Always first of month (e.g. `2026-03-01`) |
| `amount_limit` | numeric | Spend target for that category/month |
---
## Category Taxonomy
Fixed set defined in `src/lib/categories.ts`. Applied by Gemini at ingestion and overridable by the user or rules engine:
`groceries` · `dining` · `transport` · `fuel` · `shopping` · `utilities` · `entertainment` · `travel` · `health` · `insurance` · `subscriptions` · `cash_advance` · `government` · `education` · `rent` · `home_goods` · `home_maintenance` · `transfers` · `income` · `investment` · `personal_care` · `pets` · `gifts` · `charity` · `other`
- **home_goods** — items purchased for the house (appliances, furniture, kitchenware, electronics)
- **home_maintenance** — services on the property (cleaning, mowing, repairs)
**Committed spend** (Insights page): `rent`, `utilities`, `insurance`, `subscriptions`
**Excluded from spend analytics**: `transfers`, `investment`
---
## API Routes
All routes require authentication via `X-Forwarded-User` header (set by Traefik). Responses are always scoped to the authenticated user's `owner_id`.
| Method | Route | Description |
|--------|-------|-------------|
| GET | `/api/statements` | All statements for current user |
| GET / PATCH | `/api/statements/[id]` | Get statement; PATCH to reassign owner (also writes `account_owner_mappings`) |
| GET | `/api/transactions` | Paginated transactions. Filters: `from`, `to`, `categories`, `bank_names`, `tag_ids`, `transaction_types`, `search`, `statement_id`, `amount_min`, `amount_max`, `has_split` (`yes`/`no`). Sort: `sort_by` (`transaction_date`\|`amount`\|`created_at`), `sort_dir` (`asc`\|`desc`) |
| POST | `/api/transactions` | Create a manual transaction (no statement) |
| GET / PATCH | `/api/transactions/[id]` | Get transaction; PATCH to upsert override (category, merchant, notes) |
| GET / POST | `/api/transactions/[id]/splits` | List or create splits on a transaction |
| GET / POST | `/api/transactions/[id]/tags` | List or apply tags to a transaction |
| POST | `/api/transactions/bulk` | Bulk update category/merchant across multiple transactions |
| POST | `/api/transactions/reconcile` | Link manual transactions to statement transactions; copies overrides, tags, splits across |
| GET | `/api/analytics/monthly` | Split-adjusted monthly spend by category + income + investments. Params: `months` (124, default 6) |
| GET | `/api/analytics/subscriptions` | Recurring charge detection — merchants with ≥3 occurrences at consistent intervals |
| GET | `/api/analytics/fees` | Fees and interest from statement summaries + individual fee/interest transactions |
| GET | `/api/shared-transactions` | Transactions with active splits; sorted client-side by date/imported/amount in the UI |
| POST | `/api/splits/settle` | Mark a split as settled |
| GET / POST | `/api/split-payments` | List or record cash settlements between participants |
| GET / POST | `/api/participants` | List participants; POST to create (with optional `email`) |
| GET | `/api/participants/[id]/balance` | Net balance owed by/to a specific participant |
| GET | `/api/participants/balances` | All participant balances |
| GET / POST | `/api/rules` | List or create rules |
| PATCH / DELETE | `/api/rules/[id]` | Update or delete a rule |
| POST | `/api/rules/apply` | Run all enabled rules against all transactions; returns `{matched, transactions_affected}` |
| GET / POST | `/api/budgets` | List budgets for a month (`?month=YYYY-MM`); upsert budget |
| DELETE | `/api/budgets/[id]` | Delete a budget |
| GET | `/api/merchants` | Merchant name autocomplete suggestions |
| GET | `/api/me` | Current user info derived from `X-Forwarded-User` header |
| GET / POST | `/api/tags` | List or create tags |
| PATCH / DELETE | `/api/tags/[id]` | Update or delete a tag |
---
## Ingestion Pipeline
```
Paperless-NGX
└─ documents tagged "Bank Statement" + "Credit Card" (without "cc-processor")
N8N workflow — polls every 5 minutes (workflow ID: FysADdFwEtwONQl4)
├─ Duplicate check: SELECT WHERE paperless_doc_id = <id>
│ └─ Already processed → skip, mark in Paperless
├─ Download PDF binary from Paperless API
├─ Gemini 2.5 Flash — PDF → structured JSON
│ responseSchema: { summary: {...}, transactions: [...] }
│ timeout: 180s, retryOnFail: 3×, delay: 30s
├─ Parse & normalise
│ account_number: strip spaces
│ bank_name: title-case
│ FX rate: fetch live from open.er-api.com if non-AUD
├─ Statement exists? (bank + account + billing_end_date)
│ └─ Duplicate → skip, mark in Paperless
├─ New bank? → Slack approval gate (human confirms before insert)
├─ Lookup account_owner_mappings → resolve owner_id (default: 1 = "Me")
├─ INSERT statements + transactions
├─ Google Calendar reminder for payment_due_date (credit cards)
└─ Paperless: PATCH document to add "cc-processor" tag
```
N8N workflow JSON: `docker/automation/workflows/cc-statement-processor-paperless.json` in the smarthome repo.
---
## Schema Migrations
Located in `prisma/migrations/`. Applied manually against the running container:
```bash
docker exec postgres-personal psql -U personal -d personal \
< prisma/migrations/<migration>/migration.sql
```
| Migration | What it adds |
|-----------|-------------|
| `0001_init` | `statements`, `transactions`, `participants` |
| `0002_splits` | `transaction_splits` |
| `0003_owner_segregation` | `owner_id` on statements, `account_owner_mappings`, `email` on participants |
| `0004_tags` | `tags`, `transaction_tags` |
| `0005_rules` | `rules` |
| `0006_budgets` | `budgets` |
| `0007_cashflow` | `amount_aud`, `exchange_rate_to_aud` on transactions; `exchange_rate_to_aud` on statements |
> `paperless_doc_id` on statements and the `uq_statements_paperless_doc_id` index were added directly (not tracked in a migration file).
> `owner_id` on transactions and `statement_id` made nullable were applied directly (March 2026) to support manual transaction entry without a fake statement.
> `reconciled_with_id` on transactions, `expense_metadata`, `rule_apply_runs`, `split_payments` were added directly and are covered by the Prisma schema but lack individual migration files.
---
## Known Gaps / TODOs
### Payment Provider tracking
Currently `merchant_normalized` conflates the *payment provider* with the *merchant*. Transactions processed through PayPal, Afterpay, Zip, Alipay, etc. end up with the provider as the merchant when the real merchant can't be recovered.
**What's been done so far:**
- PayPal entries that embed the merchant name (e.g. `PAYPAL *BUNNINGSGRO`) were cleaned up — the real merchant was extracted during the March 2026 consolidation pass.
- Pure PayPal/Afterpay/Zip entries where the merchant is unrecoverable were left as-is.
- A one-time SQL consolidation pass normalised ~50 merchant name variant groups (March 2026).
**Remaining work:**
1. **DB migration**: `ALTER TABLE transactions ADD COLUMN payment_provider text` and same on `transaction_overrides`.
2. **Gemini prompt**: add `payment_provider` to the `responseSchema` so the AI extracts it separately (`"PayPal"`, `"Afterpay"`, `"Zip"`, `null`, etc.) — the raw bank description usually contains enough signal.
3. **Backfill**: for existing transactions, derive `payment_provider` from `merchant_name` patterns (`PAYPAL *`, `AFTERPAY`, `ZIP/ZIPPAY`, `BPAY`).
4. **App**: surface `payment_provider` as a filter/column in the transactions view; exclude payment providers from merchant analytics so they don't inflate the merchant list.
---
## Deployment
Runs as a Docker container alongside the rest of the home lab stack. Build and deploy:
```bash
# From smarthome repo root
docker compose --env-file docker/common.env --env-file docker/finance/.env \
-f docker/finance/docker-compose.yml up -d --build
```
The container uses Next.js standalone output. `@prisma/adapter-pg` and `pg` are listed in `serverExternalPackages` in `next.config.ts` to ensure they are included in the standalone bundle.