From a7461ff83bbfc52b46bd863423067c9f47dcd798 Mon Sep 17 00:00:00 2001 From: siddharthd Date: Mon, 9 Mar 2026 23:12:20 +1100 Subject: [PATCH] docs: replace boilerplate README with full data model and architecture reference --- README.md | 310 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 288 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index e215bc4..b0cf8ef 100644 --- a/README.md +++ b/README.md @@ -1,36 +1,302 @@ -This is a [Next.js](https://nextjs.org) project bootstrapped with [`create-next-app`](https://nextjs.org/docs/app/api-reference/cli/create-next-app). +# Finance App -## Getting Started +Personal finance tracker built on Next.js 16 (App Router), PostgreSQL, and Prisma. Bank statements are ingested automatically from Paperless-NGX via an N8N workflow that uses Gemini to extract structured data from PDF statements. -First, run the development server: +## Stack -```bash -npm run dev -# or -yarn dev -# or -pnpm dev -# or -bun dev +- **Frontend**: Next.js 16 App Router, TypeScript, Tailwind CSS, Recharts +- **Backend**: Next.js API routes, raw PostgreSQL via `pg` + `@prisma/adapter-pg` +- **Database**: PostgreSQL (`postgres-personal` container) +- **Auth**: `X-Forwarded-User` header (email) set by Traefik forward-auth → mapped to `participants.email` +- **Ingestion**: N8N workflow → Gemini 2.5 Flash (PDF parsing) → PostgreSQL + +--- + +## Data Model + +### `statements` +The top-level document, one row per billing period per account. + +| Column | Type | Description | +|--------|------|-------------| +| `id` | int | Primary key | +| `bank_name` | text | Normalised bank name (e.g. "American Express") | +| `card_name` | text | Product name (e.g. "Rewards Travel Adventures") | +| `account_number` | text | Account/card number (spaces stripped) | +| `account_type` | text | Raw account type string from statement | +| `statement_type` | text | Normalised type: `Credit Card`, `Business Card`, `multi-currency account`, etc. | +| `account_holder_name` | text | Name on the account if extracted | +| `billing_start_date` | date | Period start | +| `billing_end_date` | date | Period end — used as the deduplication anchor | +| `opening_balance` | numeric | Balance at start of period | +| `closing_balance` | numeric | Balance at end of period | +| `total_credits` | numeric | Sum of all credits in period | +| `total_debits` | numeric | Sum of all debits in period | +| `total_amount_due` | numeric | Amount due (credit cards) | +| `minimum_amount_due` | numeric | Minimum payment due (credit cards) | +| `payment_due_date` | date | Payment due date (credit cards) | +| `credit_limit` | numeric | Credit limit (credit cards) | +| `available_credit` | numeric | Available credit at statement date | +| `interest_charged` | numeric | Interest charged this period (from statement summary) | +| `fees_charged` | numeric | Fees charged this period (from statement summary) | +| `currency` | text | Statement currency (e.g. `AUD`, `USD`) | +| `exchange_rate_to_aud` | numeric | FX rate at ingestion time (live from open.er-api.com) | +| `owner_id` | int FK → `participants` | Which person owns this statement | +| `paperless_doc_id` | int | Paperless-NGX document ID — deduplication key | +| `tier_used` | text | AI model used for extraction (e.g. `gemini-2.5-flash`) | +| `event_created` | bool | Whether a Google Calendar reminder was created for payment due date | + +**Deduplication**: unique index on `(bank_name, account_number, billing_end_date)` prevents re-ingestion of the same period. `paperless_doc_id` has a separate unique index for Paperless-linked documents. + +**Credit card detection**: `statement_type ILIKE '%card%'` + +--- + +### `transactions` +One row per line item within a statement. Cascade-deleted when the parent statement is deleted. + +| Column | Type | Description | +|--------|------|-------------| +| `id` | int | Primary key | +| `statement_id` | int FK → `statements` | Parent statement | +| `transaction_date` | date | Date of transaction | +| `description` | text | Raw description from the statement | +| `amount` | numeric | Original amount in statement currency | +| `amount_aud` | numeric | AUD-converted amount (= amount if already AUD) | +| `transaction_type` | text | `debit`, `credit`, `payment`, `refund`, `fee`, `interest`, `transfer` | +| `merchant_name` | text | Raw merchant name extracted by Gemini | +| `merchant_normalized` | text | Cleaned/normalised merchant name (Gemini) | +| `location` | text | Location if present on statement | +| `foreign_currency_amount` | numeric | Original foreign amount if this was an FX transaction | +| `foreign_currency_code` | text | Foreign currency code (e.g. `USD`) | +| `category` | text | AI-assigned category (see category taxonomy below) | +| `row_index` | int | Position in statement — used for deduplication | + +**Deduplication**: unique index on `(statement_id, transaction_date, description, amount, row_index)`. + +**Analytics**: all spend queries use `amount_aud` for cross-currency consistency. Split-adjusted queries apply `amount_aud * share_percent / 100` where a split exists for the current user. + +--- + +### `transaction_overrides` +User corrections to AI-extracted data. Stored separately to preserve the original extraction. + +| Column | Type | Description | +|--------|------|-------------| +| `transaction_id` | int FK → `transactions` (unique) | One override per transaction | +| `merchant_normalized` | text | User-corrected merchant name | +| `category_override` | text | User-corrected category | +| `notes` | text | Free-text notes | + +All analytics queries use `COALESCE(o.category_override, t.category)` and `COALESCE(o.merchant_normalized, t.merchant_normalized, t.merchant_name)` to prefer overrides over AI values. + +--- + +### `transaction_splits` +Shared expense tracking — records that a transaction was split between participants. + +| Column | Type | Description | +|--------|------|-------------| +| `transaction_id` | int FK → `transactions` | The transaction being split | +| `participant_id` | int FK → `participants` | Who shares in this transaction | +| `share_percent` | numeric(5,2) | Their percentage (1–100) | +| `settled` | bool | Whether this share has been settled | +| `settled_at` | timestamptz | When it was settled | + +A transaction can be split across multiple participants. The statement owner's own share is implicit (`100 - SUM(other shares)`). Analytics queries LEFT JOIN `transaction_splits` on `participant_id = current_user.id` — if no split row exists, the full amount belongs to the owner. + +--- + +### `transaction_tags` +Many-to-many join between transactions and tags. + +| Column | Type | +|--------|------| +| `transaction_id` | int FK → `transactions` | +| `tag_id` | int FK → `tags` | + +--- + +### `tags` +User-defined coloured labels for ad-hoc transaction grouping beyond the fixed category taxonomy. + +| Column | Type | Description | +|--------|------|-------------| +| `id` | int | Primary key | +| `name` | text (unique) | Tag name | +| `color` | text | Hex colour (default `#6366f1`) | + +--- + +### `participants` +People who own statements or share expenses. + +| Column | Type | Description | +|--------|------|-------------| +| `id` | int | Primary key | +| `name` | text (unique) | Display name | +| `email` | text (unique) | Login identity — matched against `X-Forwarded-User` header | + +--- + +### `account_owner_mappings` +Persists `(bank, account_number) → owner` assignments so future ingestion auto-assigns the correct owner without manual intervention. + +| Column | Type | Description | +|--------|------|-------------| +| `bank_name` | text | | +| `account_number` | text | | +| `owner_id` | int FK → `participants` | | + +Written when a user reassigns a statement owner in the UI. Consulted by the N8N workflow on every new statement insert. + +--- + +### `rules` +Saved auto-categorisation rules. Applied in bulk via the Rules page. + +| Column | Type | Description | +|--------|------|-------------| +| `owner_id` | int FK → `participants` | Rule belongs to this user | +| `name` | text | Rule label | +| `conditions` | jsonb | Array of `{field, operator, value}` — AND logic | +| `actions` | jsonb | `{set_category, add_tag_ids, set_merchant}` | +| `enabled` | bool | | +| `priority` | int | Higher priority rules run first | + +**Condition fields**: `merchant_normalized`, `description`, `category`, `bank_name`, `amount` +**Condition operators**: `contains`, `equals`, `starts_with`, `gt`, `lt`, `not_equals` + +--- + +### `budgets` +Monthly spend targets per category. Stored but currently unused in the UI (replaced by the analytics/insights views). + +| Column | Type | Description | +|--------|------|-------------| +| `owner_id` | int FK → `participants` | | +| `category` | text | Category name | +| `month` | date | Always first of month (e.g. `2026-03-01`) | +| `amount_limit` | numeric | Spend target for that category/month | + +--- + +## Category Taxonomy + +Fixed set defined in `src/lib/categories.ts`. Applied by Gemini at ingestion and overridable by the user or rules engine: + +`groceries` · `dining` · `transport` · `fuel` · `shopping` · `utilities` · `entertainment` · `travel` · `health` · `insurance` · `subscriptions` · `cash_advance` · `government` · `education` · `rent` · `transfers` · `income` · `investment` · `personal_care` · `pets` · `gifts` · `charity` · `other` + +**Committed spend** (Insights page): `rent`, `utilities`, `insurance`, `subscriptions` +**Excluded from spend analytics**: `transfers`, `investment` + +--- + +## API Routes + +All routes require authentication via `X-Forwarded-User` header (set by Traefik). Responses are always scoped to the authenticated user's `owner_id`. + +| Method | Route | Description | +|--------|-------|-------------| +| GET | `/api/statements` | All statements for current user | +| GET / PATCH | `/api/statements/[id]` | Get statement; PATCH to reassign owner (also writes `account_owner_mappings`) | +| GET | `/api/transactions` | Paginated transactions with filters: `from`, `to`, `category`, `merchant`, `statement_id`, `search`, `sort`, `dir` | +| GET / PATCH | `/api/transactions/[id]` | Get transaction; PATCH to upsert override (category, merchant, notes) | +| GET / POST | `/api/transactions/[id]/splits` | List or create splits on a transaction | +| GET / POST | `/api/transactions/[id]/tags` | List or apply tags to a transaction | +| POST | `/api/transactions/bulk` | Bulk update category/merchant across multiple transactions | +| GET | `/api/analytics/monthly` | Split-adjusted monthly spend by category + income + investments. Params: `months` (1–24, default 6) | +| GET | `/api/analytics/subscriptions` | Recurring charge detection — merchants with ≥3 occurrences at consistent intervals | +| GET | `/api/analytics/fees` | Fees and interest from statement summaries + individual fee/interest transactions | +| GET | `/api/shared-transactions` | Transactions that have active splits | +| POST | `/api/splits/settle` | Mark a split as settled | +| GET / POST | `/api/participants` | List participants; POST to create (with optional `email`) | +| GET | `/api/participants/[id]/balance` | Net balance owed by/to a specific participant | +| GET | `/api/participants/balances` | All participant balances | +| GET / POST | `/api/rules` | List or create rules | +| PATCH / DELETE | `/api/rules/[id]` | Update or delete a rule | +| POST | `/api/rules/apply` | Run all enabled rules against all transactions; returns `{matched, transactions_affected}` | +| GET / POST | `/api/budgets` | List budgets for a month (`?month=YYYY-MM`); upsert budget | +| DELETE | `/api/budgets/[id]` | Delete a budget | +| GET | `/api/merchants` | Merchant name autocomplete suggestions | +| GET | `/api/me` | Current user info derived from `X-Forwarded-User` header | +| GET / POST | `/api/tags` | List or create tags | +| PATCH / DELETE | `/api/tags/[id]` | Update or delete a tag | + +--- + +## Ingestion Pipeline + +``` +Paperless-NGX + └─ documents tagged "Bank Statement" + "Credit Card" (without "cc-processor") + │ + ▼ + N8N workflow — polls every 5 minutes (workflow ID: FysADdFwEtwONQl4) + │ + ├─ Duplicate check: SELECT WHERE paperless_doc_id = + │ └─ Already processed → skip, mark in Paperless + │ + ├─ Download PDF binary from Paperless API + │ + ├─ Gemini 2.5 Flash — PDF → structured JSON + │ responseSchema: { summary: {...}, transactions: [...] } + │ timeout: 180s, retryOnFail: 3×, delay: 30s + │ + ├─ Parse & normalise + │ account_number: strip spaces + │ bank_name: title-case + │ FX rate: fetch live from open.er-api.com if non-AUD + │ + ├─ Statement exists? (bank + account + billing_end_date) + │ └─ Duplicate → skip, mark in Paperless + │ + ├─ New bank? → Slack approval gate (human confirms before insert) + │ + ├─ Lookup account_owner_mappings → resolve owner_id (default: 1 = "Me") + │ + ├─ INSERT statements + transactions + │ + ├─ Google Calendar reminder for payment_due_date (credit cards) + │ + └─ Paperless: PATCH document to add "cc-processor" tag ``` -Open [http://localhost:3000](http://localhost:3000) with your browser to see the result. +N8N workflow JSON: `docker/automation/workflows/cc-statement-processor-paperless.json` in the smarthome repo. -You can start editing the page by modifying `app/page.tsx`. The page auto-updates as you edit the file. +--- -This project uses [`next/font`](https://nextjs.org/docs/app/building-your-application/optimizing/fonts) to automatically optimize and load [Geist](https://vercel.com/font), a new font family for Vercel. +## Schema Migrations -## Learn More +Located in `prisma/migrations/`. Applied manually against the running container: -To learn more about Next.js, take a look at the following resources: +```bash +docker exec postgres-personal psql -U personal -d personal \ + < prisma/migrations//migration.sql +``` -- [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js features and API. -- [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial. +| Migration | What it adds | +|-----------|-------------| +| `0001_init` | `statements`, `transactions`, `participants` | +| `0002_splits` | `transaction_splits` | +| `0003_owner_segregation` | `owner_id` on statements, `account_owner_mappings`, `email` on participants | +| `0004_tags` | `tags`, `transaction_tags` | +| `0005_rules` | `rules` | +| `0006_budgets` | `budgets` | +| `0007_cashflow` | `amount_aud`, `exchange_rate_to_aud` on transactions; `exchange_rate_to_aud` on statements | -You can check out [the Next.js GitHub repository](https://github.com/vercel/next.js) - your feedback and contributions are welcome! +> `paperless_doc_id` on statements and the `uq_statements_paperless_doc_id` index were added directly (not tracked in a migration file). -## Deploy on Vercel +--- -The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js. +## Deployment -Check out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details. +Runs as a Docker container alongside the rest of the home lab stack. Build and deploy: + +```bash +# From smarthome repo root +docker compose --env-file docker/common.env --env-file docker/finance/.env \ + -f docker/finance/docker-compose.yml up -d --build +``` + +The container uses Next.js standalone output. `@prisma/adapter-pg` and `pg` are listed in `serverExternalPackages` in `next.config.ts` to ensure they are included in the standalone bundle.