---
stepsCompleted: [1, 2, 3, 4, 5, 6, 7, 8]
lastStep: 8
status: 'complete'
completedAt: '2026-05-25'
inputDocuments:
  - docs/prds/prd-financials-2026-05-25/prd.md
workflowType: 'architecture'
project_name: 'financials'
user_name: 'sayre'
date: '2026-05-25'
---

# Architecture Decision Document — Personal Finance App

_This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together._

---

## Project Context Analysis

### Requirements Overview

**Functional Requirements:**
37 FRs across 8 feature areas in 3 delivery phases. Phase 1 establishes the data model and all manual workflows. Phase 2 adds the PDF parsing pipeline on top of the stable Phase 1 foundation. Phase 3 operates as a read-heavy analytics layer over accumulated transaction history.

The **Paydown Planner** (FR-1.20–1.27) is the most mathematically intensive component: 5 amortization strategies, expandable month-by-month projection tables, active plan monitoring with deviation flags, and plan archival. Calculation logic must be fully isolated from Flask routing for testability. Critically, "active monitoring" is a behavioral requirement — not just math. It requires a mechanism to reconcile actual transactions against a planned schedule, an on-demand recalculation trigger to evaluate deviation, and a defined business rule for what constitutes a deviation (missed payment, underpayment, balance exceeds projection by >5%).

The **PDF Import pipeline** (FR-2.1–2.10) is the most structurally complex: 7 issuer-specific parsers, automatic issuer detection, staged review/commit flow (nothing touches the main database until the user confirms), duplicate detection within a 3-day window, and merchant normalization. Demands a modular, plugin-style parser architecture with an explicit parser contract: what a parser must return (`List[StagedTransaction]`), what it raises on failure, and how partial results are handled. Parsers must not touch the database — they return structured data only.

**Non-Functional Requirements:**
- Performance: Dashboard < 2s; PDF parse+stage < 30s per file
- Data integrity: SQLite WAL mode; all import writes staged and transactional; nothing written to main tables until user confirms review
- Recoverability: SQLite file download + CSV transaction export (scope: all transactions or filtered subset — to be defined)
- Usability: ≤3 clicks for primary flows (add transaction, categorize import batch, view paydown projection, check dashboard)
- No external calls: all assets (Primer CSS, Chart.js) vendored locally in `static/vendor/`

**Scale & Complexity:**
- Primary domain: Full-stack web (Python/Flask backend + HTML/JS/CSS frontend)
- Complexity level: Medium-High (Phase 2 PDF pipeline and Phase 3 analytics drive this; Phase 1 alone is Medium)
- Estimated architectural components: ~10 Flask blueprints, 7 PDF parser modules, 5 amortization strategy implementations, 1 analytics service (pandas), 12+ database models, ~20 HTML templates

### Technical Constraints & Dependencies

- Python 3.10+ · Flask 3.x · SQLite (WAL mode, connection-per-request, `check_same_thread=False`)
- `pdfplumber` for PDF text/table extraction (coordinate-based; issuer layouts shift — versioning strategy required)
- `pandas` for aggregation, analytics, and budget recommendations
- Primer CSS (local copy) · Chart.js (local copy) · Vanilla JS only
- Apache mod_wsgi or local reverse proxy; LAN/localhost only
- Single user; no authentication layer required
- `dateutil` (or equivalent) for month-end safe date arithmetic — one shared utility module, no per-file date logic

### Cross-Cutting Concerns Identified

1. **SQLite concurrency model** — WAL mode is required. Connection-per-request pattern mandatory (`check_same_thread=False`); no shared module-level connection. Phase 3 pandas analytics runs long reads; if background tasks are added for PDF processing, each thread needs its own connection to avoid `database is locked` errors.

2. **Staging pipeline mechanism** — Parse → review → commit must be fully transactional. Two options to decide: (a) separate `import_staging` SQLite file (cleaner isolation, easier to abandon) or (b) `staged_transactions` table in the main DB (simpler FK validation against accounts/categories). This decision is load-bearing for schema design.

3. **Duplicate detection contract** — Dedup key must be defined before schema is finalized (adding `dedup_hash` later is a live-data migration). Key candidate: `(normalized_merchant, amount, date)` within a 3-day window. Must specify: does dedup run against staged records only, committed records only, or both? Normalization must run *before* dedup, not after, to avoid false negatives comparing raw vs. normalized strings.

4. **Merchant normalization (3 layered problems)** — (a) Parser-level: each issuer formats descriptions differently; (b) Cross-issuer: Chase and BofA may describe the same merchant differently; (c) User-correction persistence: user fixes propagate via a `merchant_mappings` table (alias map). This table is a Phase 1 schema requirement even if populated in Phase 2.

5. **Category system** — Shared across transactions, budgets, PDF import auto-categorization, and analytics. Must define: user-definable vs. fixed list; whether analytics assumes a normalized taxonomy; whether parsers auto-assign categories or the user assigns during staged review. Category mutation path must be explicit: what happens when a user renames or deletes a category referenced by transactions, budgets, and merchant mappings (CASCADE vs. RESTRICT, soft-delete vs. hard-delete).

6. **Parser error contract** — Each issuer parser must implement a common interface. Partial results (some transactions parsed, some not) must surface per-row error context (page number, raw text sample) rather than failing silently. The architecture must define the error handling strategy before the happy path.

7. **Date/month arithmetic** — Budget cycles, bill recurrence, and amortization projections must all share a single date utility module with explicit month-end conventions (e.g., Jan 31 + 1 month = Feb 28). No per-file date logic.

8. **Phase 1 schema extensibility** — The transaction schema must reserve nullable columns for Phase 2 import metadata (issuer, raw description, import batch ID, confidence score) from the start. A Phase 2 migration against live SQLite data is avoidable with upfront planning.

9. **Jinja2 template hierarchy** — With 10 blueprints and shared components (navigation, category dropdowns, date pickers), template inheritance must be deliberately designed. Undisciplined inheritance leads to duplication and fragile partial updates.

10. **Testing seams** — Paydown Planner, PDF parsers, and analytics service must all be isolated from Flask routing and the database. Parsers return `List[StagedTransaction]` only. Analytics service accepts a DataFrame or query result, not a DB session. Unit tests must be runnable without a live database.

11. **Asset vendoring** — Primer CSS and Chart.js served from `static/vendor/`; no CDN dependencies per NFR-6.

---

## Starter Template & Project Scaffold

### Approach: Manual Flask Application Factory

No cookiecutter template used. Hand-scaffolded following Flask's Application Factory
pattern: testable `create_app()`, blueprint registration at startup, extensions
deferred until factory runs, per-request SQLite connections.

### Initialization Commands

```bash
python -m venv .venv
source .venv/bin/activate
pip install flask flask-sqlalchemy flask-migrate pdfplumber pandas python-dateutil pytest
pip freeze > requirements.txt
```

### Project Structure

```
financials/
├── app/
│   ├── __init__.py              # Application factory (create_app)
│   ├── extensions.py            # db, migrate — instantiated here, init'd in factory
│   │                            # WAL mode via @event.listens_for(engine, "connect")
│   │                            # PRAGMA journal_mode=WAL wired here, never assumed
│   ├── models/
│   │   ├── account.py
│   │   ├── transaction.py       # Nullable Phase 2 columns pre-populated:
│   │   │                        # issuer, import_batch_id, merchant_raw, confidence_score
│   │   ├── staged_transaction.py # Staging model; promotion to transaction is explicit
│   │   ├── category.py          # Soft-delete via is_active; hard-delete RESTRICT FK
│   │   ├── budget.py
│   │   ├── bill.py
│   │   ├── debt.py
│   │   ├── paydown_plan.py
│   │   ├── import_batch.py
│   │   ├── merchant_mapping.py  # Phase 1 requirement — alias map for normalization
│   │   └── settings.py
│   ├── blueprints/
│   │   ├── dashboard/
│   │   ├── transactions/
│   │   ├── budgets/
│   │   ├── settings/
│   │   ├── bills/
│   │   ├── paydown/
│   │   ├── import_pdf/          # Wizard flow — step-specific templates
│   │   │   └── templates/
│   │   │       ├── upload.html
│   │   │       ├── review.html  # Editable staging table before commit
│   │   │       └── confirm.html
│   │   ├── analytics/
│   │   └── api/                 # JSON endpoints — sub-modules by domain
│   │       ├── transactions.py
│   │       ├── budgets.py
│   │       ├── paydown.py
│   │       └── analytics.py
│   ├── services/
│   │   ├── amortization.py      # All 5 strategies flat in one module; extract only
│   │   │                        # when strategy divergence justifies a package
│   │   ├── pdf_parsers/
│   │   │   ├── base.py          # Abstract base: parse(pdf_path) → List[StagedTransaction]
│   │   │   │                    # Parsers NEVER access the database
│   │   │   ├── detector.py      # raw text → issuer string ONLY; never imports issuer modules
│   │   │   ├── chase.py
│   │   │   ├── bofa.py
│   │   │   ├── apple.py         # OQ-3: pytesseract OCR fallback may be required
│   │   │   ├── kohls.py
│   │   │   ├── dcu.py           # OQ-5: manual extraction template may be required
│   │   │   ├── target.py        # OQ-4: handles both credit and debit RedCard formats
│   │   │   └── discover.py
│   │   ├── staging_pipeline.py  # Orchestrates: parse → normalize → dedup → stage → commit
│   │   ├── analytics/           # Package from the start given Phase 3 complexity
│   │   │   ├── __init__.py
│   │   │   ├── budget_analytics.py
│   │   │   └── trend_analytics.py
│   │   ├── category_service.py  # All category mutation logic (rename, soft-delete, cascade)
│   │   ├── merchant_normalizer.py
│   │   └── duplicate_detector.py # Dedup key: (normalized_merchant, amount, date) ±3 days
│   ├── utils/                   # Pure functions; no business logic, no DB access
│   │   ├── date_utils.py        # relativedelta wrapper; month-end conventions documented
│   │   └── validators.py
│   ├── templates/
│   │   ├── base.html            # Nav active_page convention documented in header comment
│   │   ├── components/          # ALL 8 established before any blueprint view is written
│   │   │   ├── alert.html       # Inline contextual (distinct from flash page-level)
│   │   │   ├── empty_state.html # Zero-state: title, description, optional CTA
│   │   │   ├── page_header.html # Title + subtitle + primary action button
│   │   │   ├── stat_card.html   # Dashboard metric tile
│   │   │   ├── pagination.html
│   │   │   ├── confirm_modal.html # Data-attribute triggered; focus trap; Esc closes
│   │   │   ├── form_row.html    # Label + input + error message unit
│   │   │   └── progress_bar.html # % + color-state token (safe/warning/danger) + label
│   │   └── {blueprint}/
│   └── static/
│       ├── vendor/
│       │   ├── primer/
│       │   └── chartjs/
│       ├── css/
│       │   ├── tokens.css       # Domain semantic states → Primer utility class mappings
│       │   └── app.css          # Primer overrides and layout utilities
│       └── js/
│           ├── chart-defaults.js # Chart.js global config (palette, tooltips, responsive)
│           ├── modal.js          # Confirm modal: focus trap, Esc handler, trigger convention
│           ├── loading.js        # data-loading attribute toggle convention
│           └── {feature}.js
├── migrations/
├── tests/
│   ├── conftest.py              # FIRST FILE WRITTEN: in-memory SQLite, TESTING=True
│   ├── fixtures/pdfs/           # Scrubbed PDF samples — one per issuer (7 files)
│   ├── test_models/             # Schema contracts, FK constraints
│   ├── test_services/
│   │   └── test_duplicate_detector.py  # Written BEFORE first migration (TDD order)
│   ├── test_parsers/            # One file per issuer against PDF fixtures
│   ├── test_analytics/
│   └── test_blueprints/         # Route smoke tests, HTTP status codes
├── instance/                    # Runtime-generated — in .gitignore, never committed
├── .gitignore                   # instance/, .venv/, __pycache__/, *.pyc, *.db
├── config.py                    # Config classes; DB path via environment variable
├── wsgi.py                      # application = create_app() for mod_wsgi
└── requirements.txt
```

### Key Architectural Decisions from Scaffold

| Decision | Choice | Rationale |
|----------|--------|-----------|
| ORM | Flask-SQLAlchemy 3.x | Connection-per-request lifecycle; pool handles `check_same_thread` |
| WAL mode | `extensions.py` event listener | Explicit, not assumed; fires on every connection |
| Staging boundary | `staged_transaction` model + `staging_pipeline.py` | Clear promotion path; nothing auto-commits |
| Amortization | Single flat `amortization.py` | Extract to package only when divergence justifies it |
| Analytics | Package `services/analytics/` | Phase 3 complexity warrants it from the start |
| Date arithmetic | `app/utils/date_utils.py` | Pure utility; not a service; wraps `relativedelta` |
| Parser isolation | `base.py` contract enforced | Parsers return data only; no DB access ever |
| Category mutation | Soft-delete + `category_service.py` | RESTRICT FK; no silent cascade |
| `api/` blueprint | Sub-modules by domain | Prevents 800-line flat `routes.py` |
| Components | 8 partials before first view | Trust through consistency; established upfront |
| Tokens | `tokens.css` before first stylesheet | Domain semantics → Primer utilities; one change point |
| `import_pdf/` | 3-step wizard templates | Upload → Review → Confirm; not a single scrolling form |

**Note:** Project scaffold creation is the first implementation story.

---

## Core Architectural Decisions

### Decision Priority Analysis

**Critical (Block Implementation):**
- Staging pipeline: separate `import_staging.db` file ✅
- Category taxonomy: system defaults + user-extensible ✅
- Dedup key: `(normalized_merchant, amount, date)` ±3 days, normalization-first ✅

**Important (Shape Architecture):**
- Form handling: Flask-WTF ✅
- Error handling: `@app.errorhandler` + flash (HTML) / JSON error body (API) ✅
- CSV export: both all-transactions and filtered-view options ✅
- Paydown monitoring trigger: on load + on balance update ✅

**Deferred (Post-MVP):**
- Logging strategy: Python `logging` to rotating file — wire up after Phase 1 baseline
- SQLite backup automation: manual download only for Phase 1; automation deferred

---

### Data Architecture

**Staging Pipeline — Separate `import_staging.db`**
- Mechanism: second SQLite file (`instance/import_staging.db`) opened only during an
  active import session; abandoned imports leave no debris in the main database
- FK validation against accounts/categories is done in application code (not DB-level
  FK) before promotion; this is acceptable given single-user, low-volume imports
- `staging_pipeline.py` opens both connections, orchestrates the full flow, closes the
  staging connection and wipes the staging DB after successful commit or explicit abandon
- Cascading implication: `staged_transaction.py` model file operates against the staging
  DB, not the main DB; SQLAlchemy bind key or separate engine instance required

**Category Taxonomy — System Defaults + User-Extensible**
- System categories (Housing, Groceries, Dining, Transportation, Utilities, Healthcare,
  Entertainment, Shopping, Subscriptions, Personal Care, Debt Payments, Income,
  Transfers, Uncategorized) are seeded at DB init and marked `is_system = True`
- Users can add custom categories (`is_system = False`) and rename any category
- System categories cannot be deleted; user categories can be soft-deleted
  (`is_active = False`); hard-delete blocked by RESTRICT FK if transactions reference them
- Analytics (Phase 3) operates on the full category set; subscription detection and
  pattern analysis handle arbitrary user-defined names (no normalized taxonomy assumed)
- `category_service.py` owns all mutation logic

**Dedup Key — `(normalized_merchant, amount, date)` ±3 days**
- Normalization runs *before* dedup in `staging_pipeline.py`; raw strings are never
  compared against normalized ones
- Dedup checks against both staged records (current import session) and committed
  transactions (main DB) — catches statement-period overlaps across multiple imports
- `dedup_hash` column stored on `transaction` and `staged_transaction` models as
  `SHA256(normalized_merchant + str(amount) + date.isoformat())`; computed at parse time
- False positives (legitimate duplicate amounts) surfaced to user for manual confirm/skip
- `test_duplicate_detector.py` written before first migration (TDD order enforced)

**Migrations — Flask-Migrate (Alembic)**
- All schema changes version-controlled; `flask db migrate` + `flask db upgrade`
- Phase 1 `transaction` model pre-populates nullable Phase 2 columns:
  `issuer VARCHAR`, `merchant_raw TEXT`, `import_batch_id INTEGER FK nullable`,
  `confidence_score FLOAT nullable`
- Eliminates live-data migration at Phase 2 onboarding

**CSV Export — Dual Options**
- "Export All" button on transactions list: exports full unfiltered transaction table
- "Export View" button: exports currently filtered/paginated result set (all pages,
  not just current page)
- Both produce standard CSV with columns: date, merchant, amount, category, account, notes

---

### Authentication & Security

**No Authentication Required**
- Single user, LAN/localhost only; no login screen, no session management
- Flask-WTF CSRF tokens applied to all state-changing forms as good practice
  (low overhead, prevents accidental cross-origin form submission even on LAN)
- SQLite file stored in `instance/` (outside web root); not directly accessible via HTTP
- No sensitive data encrypted at rest beyond filesystem-level security of the host

---

### API & Communication Patterns

**REST-Style JSON Endpoints (`api/` blueprint sub-modules)**
- `GET` for data fetches (dashboard widgets, chart data, plan projections)
- `POST` for state changes from JS (balance updates, staging confirmations)
- All `api/` routes return `Content-Type: application/json`
- Error format: `{"error": "human-readable message", "code": HTTP_STATUS_INT}`

**Error Handling**
- HTML routes: `@app.errorhandler(404)` and `@app.errorhandler(500)` render
  `errors/404.html` and `errors/500.html`; user-facing errors use `flash()` + redirect
- API routes: return JSON error body with appropriate HTTP status code
- Shared utility: `app/utils/errors.py` — `api_error(message, code)` helper function
- PDF parse failures: structured error list returned to `import_pdf/` blueprint for
  display in `review.html` (page number + raw text sample per failed row)

**Flask-WTF — Form Handling**
- All state-changing HTML forms use Flask-WTF `FlaskForm` subclasses
- Validation logic lives in form classes, not route functions
- CSRF token included automatically via `{{ form.hidden_tag() }}` in templates

---

### Frontend Architecture

**Vanilla JS + Fetch API**
- AJAX calls use `fetch()` to `api/` endpoints; no jQuery, no framework
- `loading.js` toggles `data-loading` attribute on target elements during fetch;
  CSS handles spinner visibility via attribute selector
- Chart.js initialized via `chart-defaults.js` (palette, tooltips, responsive config
  set on `Chart.defaults` once at page load)
- `modal.js` handles single confirm modal pattern: focus trap, Esc closes,
  focus returns to trigger element on close

**Template Rendering — Jinja2 + Primer CSS**
- `base.html` passes `active_page` string to nav; each route passes `active_page=
  "blueprint_name"` in `render_template()` call — convention documented in `base.html`
- `tokens.css` maps domain semantic states to Primer utility classes:
  `--color-safe`, `--color-warning`, `--color-danger` used throughout templates
  instead of hardcoded Primer class names

---

### Infrastructure & Deployment

**Self-Hosted — Apache + mod_wsgi**
- `wsgi.py` exposes `application = create_app()` for mod_wsgi
- DB path, secret key, and debug flag driven by environment variables in `config.py`
- `flask run` for local development; mod_wsgi for stable serving

**Paydown Monitoring Recalculation**
- Triggered on both: (a) every load of the Paydown Monitor view, (b) every
  `POST` to update a card balance (FR-1.26)
- Recalculation is synchronous (fast enough for single-user; no background task needed)
- Result cached in `paydown_plans.last_recalculated_at`; stale indicator shown if
  > 30 days since last balance update

**Backup & Recovery**
- Manual: "Download Database" button downloads `instance/financials.db` as a file
- Manual: "Export CSV" on transactions view (all or filtered)
- No automated backup in Phase 1; deferred to user-level cron if desired later

---

### Decision Impact Analysis

**Implementation Sequence (critical path):**
1. Scaffold + `conftest.py` + in-memory SQLite fixture
2. `test_duplicate_detector.py` spec → `duplicate_detector.py` → first migration
3. Core models (transaction with Phase 2 nullable columns, category with is_system,
   merchant_mapping, staged_transaction)
4. `date_utils.py` + `category_service.py` + `amortization.py` (all pure, testable)
5. Phase 1 blueprints (dashboard last — depends on all others)
6. `staging_pipeline.py` + PDF parsers (Phase 2)
7. `services/analytics/` package (Phase 3)

**Cross-Component Dependencies:**
- `merchant_mapping` model must exist before `staging_pipeline.py` is written
- `staged_transaction` model uses separate SQLAlchemy engine bound to staging DB
- `category_service.py` is called by transactions blueprint, import blueprint, and
  analytics — cannot have circular imports with any of them
- `amortization.py` has zero dependencies on models or Flask — pure Python functions
- `services/analytics/` depends on pandas + SQLAlchemy query results; never on
  Flask request context

---

## Implementation Patterns & Consistency Rules

### Naming Conventions

**Python (backend):**
- Models: PascalCase singular — `Transaction`, `MerchantMapping`, `PaydownPlan`
- DB tables: snake_case plural — `transactions`, `merchant_mappings`, `paydown_plans`
- DB columns: snake_case — `merchant_raw`, `import_batch_id`, `created_at`
- FK columns: `{table_singular}_id` — `account_id`, `category_id`, `plan_id`
- Functions/methods: snake_case — `calculate_avalanche()`, `get_monthly_spend()`
- Blueprint names: snake_case matching folder — `import_pdf`, `paydown`
- Form classes: `{Entity}Form` — `TransactionForm`, `BudgetForm`
- Service functions: module-level, not class-based — `normalize_merchant(raw)`,
  not `MerchantNormalizer().normalize(raw)`

**Routes:**
- HTML routes: kebab-case plural resource — `/transactions/`, `/import-pdf/`
- HTML route actions: `/{resource}/{id}/edit`, `/{resource}/{id}/delete`
- API routes: `/api/{resource}` plural — `/api/transactions`, `/api/budgets`
- Route function names: `{blueprint}_{action}` — `transactions_list`,
  `transactions_create`, `api_transactions_get`

**JavaScript:**
- Functions: camelCase — `loadDashboardChart()`, `submitBalanceUpdate()`
- DOM data attributes: kebab-case — `data-account-id`, `data-plan-id`
- JS module filenames: kebab-case matching blueprint — `import-pdf.js`, `paydown.js`

**Templates:**
- Template filenames: snake_case — `transaction_list.html`, `budget_edit.html`
- Template variables: snake_case; primary form always named `form`
- Always pass `active_page='{blueprint_name}'` in every `render_template()` call

**CSS:**
- Custom classes: kebab-case prefixed `fin-` — `fin-burn-bar`, `fin-stat-card`
  (prevents collision with Primer class names)

---

### Structure Patterns

**Blueprint internal structure (every blueprint):**
```
{blueprint}/
├── __init__.py      # Blueprint definition only
├── routes.py        # All HTML route functions
├── forms.py         # FlaskForm subclasses for this blueprint
└── templates/
    └── {blueprint}/ # Templates namespaced under blueprint name
```

**Service functions — module-level only, never class-based:**
```python
# CORRECT
def calculate_avalanche(cards: list, extra_monthly: Decimal) -> AmortizationResult: ...

# WRONG
class AmortizationService:
    def calculate_avalanche(self, cards, extra_monthly): ...
```

**Models — one file per model; FK defined on the owning side:**
```python
# transaction.py owns the FK to category
category_id = db.Column(db.Integer,
                        db.ForeignKey('categories.id', ondelete='RESTRICT'),
                        nullable=True)
```

**Test file naming:** `test_{module_name}.py` in the appropriate `tests/` subfolder

---

### API Response Formats

**Data fetch (GET) — direct JSON, no envelope wrapper:**
```json
[{"id": 1, "merchant": "Amazon", "amount": "29.99", "date": "2026-05-01"}]
```

**Action (POST) — success:**
```json
{"success": true, "id": 42}
```

**Action (POST/DELETE) — error:**
```json
{"error": "Category cannot be deleted: 12 transactions reference it.", "code": 409}
```

**Paginated list (GET):**
```json
{"items": [...], "page": 1, "per_page": 50, "total": 234}
```

---

### Data Format Standards

| Type | JSON | Python | SQLite |
|------|------|--------|--------|
| Money amounts | `"29.99"` (string) | `Decimal` | `NUMERIC(10,2)` |
| Dates | `"2026-05-01"` (ISO 8601) | `datetime.date` | `TEXT` |
| Booleans | `true`/`false` | `True`/`False` | `INTEGER` 1/0 |
| Nulls | `null` | `None` | `NULL` — never empty string |
| IDs | `42` (integer) | `int` | `INTEGER` |

---

### Form Handling — Post/Redirect/Get (PRG)

```python
@bp.route('/transactions/create', methods=['GET', 'POST'])
def transactions_create():
    form = TransactionForm()
    if form.validate_on_submit():
        # write to DB, then:
        flash('Transaction added.', 'success')
        return redirect(url_for('transactions.transactions_list'))
    return render_template('transactions/create.html', form=form,
                           active_page='transactions')
```

- `flash()` categories: `'success'`, `'error'`, `'warning'`, `'info'` only
- Never re-render the form template on successful POST — always redirect
- `{{ form.hidden_tag() }}` in every HTML form (Flask-WTF CSRF)

---

### Error Handling Patterns

**HTML routes — flash + redirect:**
```python
try:
    category_service.delete_category(id)
except IntegrityError:
    flash('Category in use — cannot be deleted.', 'error')
    return redirect(url_for('budgets.budgets_list'))
```

**API routes — `api_error()` helper:**
```python
from app.utils.errors import api_error
if not account:
    return api_error('Account not found.', 404)
```

**Parser contract — structured errors, never raise on partial failure:**
```python
def parse(pdf_path: str) -> tuple[list[StagedTransaction], list[ParseError]]:
    # Returns (results, errors); ParseError carries page_number + raw_text
    # NEVER raises; partial results are valid return values
```

---

### Loading State Pattern (Vanilla JS)

```javascript
element.dataset.loading = 'true';
fetch('/api/transactions')
  .then(r => r.json())
  .then(data => { element.dataset.loading = 'false'; render(data); });
```
```css
/* app.css — CSS handles visibility via attribute selector */
[data-loading="true"] .fin-content { opacity: 0.4; pointer-events: none; }
[data-loading="true"] .fin-spinner { display: block; }
```

---

### SQLAlchemy Query Patterns

**Eager-load relationships when iterating (avoid N+1):**
```python
# CORRECT
transactions = Transaction.query.options(
    joinedload(Transaction.category),
    joinedload(Transaction.account)
).filter_by(account_id=id).all()
```

**Dashboard aggregations — SQL aggregation, never Python loops over all rows:**
```python
# CORRECT
monthly_total = db.session.query(func.sum(Transaction.amount)).filter(
    extract('month', Transaction.date) == current_month
).scalar()
```

---

### Enforcement — All Agents MUST / MUST NOT

**MUST:**
- Pass `active_page='{blueprint_name}'` in every `render_template()` call
- Use PRG for all state-changing HTML routes
- Return `(results, errors)` tuple from all parser `parse()` methods
- Use `Decimal` for all monetary arithmetic — never `float`
- Use `tests/conftest.py` in-memory DB fixture for all DB-touching tests
- Call `api_error()` from `app/utils/errors.py` for all API error responses
- Prefix all custom CSS classes with `fin-`
- Commit to DB at the route/blueprint layer only — never inside service functions

**MUST NOT:**
- Use `float` for money amounts
- Access `db.session` inside `services/pdf_parsers/` modules
- Import issuer parser modules inside `detector.py`
- Commit `instance/` directory to git
- Hardcode category names as strings in service or analytics logic
  (query by `is_system=True` or by ID)

---

## Project Structure & Boundaries

> **Scope constraint:** All structural decisions below assume single-user,
> self-hosted, LAN-only deployment. No auth layer, no concurrency model beyond
> SQLite WAL, no horizontal scaling. This constraint is load-bearing — several
> decisions are only correct because of it.

### Service Layer Taxonomy

Three distinct service categories — mixing them is the primary source of
untestable Flask business logic:

| Layer | Modules | Rule |
|-------|---------|------|
| **Pure computation** | `amortization.py`, `rolling_aggregator.py`, `date_utils.py` | Zero imports outside stdlib + `decimal`. No SQLAlchemy, Flask, or pandas. |
| **Detection / classification** | `duplicate_detector.py`, `merchant_normalizer.py` | Read-only; fingerprints/data passed in, not the session. No writes. |
| **Lifecycle orchestration** | `plan_lifecycle.py`, `staging_pipeline.py`, `category_service.py` | Multi-step transactional writes. Called from routes only. |
| **Read-only analytics** | `services/analytics/` | Accepts DataFrames or query results. No Flask context. No writes. |
| **Generative insights** | `services/insights/` | Produces actionable outputs (recommendations, flags). Read-only against DB; outputs presented to user via blueprint — not auto-written. |
| **Cross-blueprint aggregation** | `services/summary.py` | Dashboard queries spanning multiple models. Keeps dashboard blueprint free of direct multi-model imports. |

---

### Requirements to Structure Mapping

| FR Group | Phase | Primary Location | Notes |
|----------|-------|-----------------|-------|
| FR-1.1–1.3 Dashboard | 1 | `blueprints/dashboard/` · `services/summary.py` · `api/dashboard.py` | `summary.py` owns the 5 aggregation queries |
| FR-1.4–1.9 Transactions | 1 | `blueprints/transactions/` · `models/transaction.py` | Manual entry calls `duplicate_detector` |
| FR-1.10–1.15 Budgets | 1 | `blueprints/budgets/` · `models/budget.py` | Recommendations from `insights/budget_recommender.py` |
| FR-1.15a Settings | 1 | `blueprints/settings/` · `models/settings.py` | Income passed as param to insight services — never imported directly |
| FR-1.16–1.19 Bills & Debts | 1 | `blueprints/bills/` · `models/bill.py` · `models/debt.py` | |
| FR-1.20–1.24 Paydown Calculator | 1 | `blueprints/paydown/` · `services/amortization.py` | Pure computation; all 5 strategies flat in one module |
| FR-1.25 Deviation Monitoring | 1 | `services/plan_monitor.py` · `blueprints/paydown/` (display only) | Deviation computation in service, not blueprint |
| FR-1.26–1.27 Plan Lifecycle | 1 | `services/plan_lifecycle.py` · `models/paydown_plan.py` | Archive + activate as single transaction |
| FR-2.1–2.10 PDF Import | 2 | `blueprints/import_pdf/` · `services/pdf_parsers/` · `services/staging_pipeline.py` · `services/merchant_normalizer.py` · `services/duplicate_detector.py` | |
| FR-3.1–3.2 Spending Patterns | 3 | `blueprints/analytics/` · `services/analytics/trend_analytics.py` | Read-only; Chart.js charts |
| FR-3.3 Subscription Detector | 3 | `services/insights/subscription_detector.py` | Generative: surfaces list for user review |
| FR-3.4 Pattern Flags | 3 | `services/insights/pattern_flags.py` · `services/rolling_aggregator.py` | Reads settings.monthly_income as a param |
| FR-3.5 Budget Recommendations | 3 | `services/insights/budget_recommender.py` · `services/rolling_aggregator.py` | Shared rolling-window infrastructure |
| FR-3.6 Merchant Summary | 3 | `services/analytics/merchant_analytics.py` | Read-only aggregation |

**Cross-cutting → location:**

| Concern | Location |
|---------|----------|
| Category mutation | `services/category_service.py` |
| Merchant normalization | `models/merchant_mapping.py` · `services/merchant_normalizer.py` |
| Shared rolling-window calculations | `services/rolling_aggregator.py` |
| Staged→committed mapper | `services/staging_pipeline.staged_txn_to_model()` |
| Date arithmetic | `app/utils/date_utils.py` |
| API error helper | `app/utils/errors.py` |
| Shared Jinja2 partials | `app/templates/components/` (8 components) |
| Domain CSS tokens | `app/static/css/tokens.css` |

---

### Complete Service Directory

```
app/services/
├── amortization.py              # Pure: 5 strategy functions + AmortizationResult @dataclass
│                                # AmortizationResult holds plain Python values ONLY
├── plan_monitor.py              # FR-1.25: deviation computation (actual vs projected)
├── plan_lifecycle.py            # FR-1.26–1.27: archive + activate as single transaction
├── rolling_aggregator.py        # Shared rolling-window calculations (insights + analytics)
├── summary.py                   # Dashboard: 5 sequential aggregation queries
├── category_service.py          # Category mutation: rename, soft-delete, cascade
├── merchant_normalizer.py       # Normalization: parser-level + cross-issuer + alias lookup
├── duplicate_detector.py        # Called by staging_pipeline AND transactions blueprint
│                                # Input: list[str] fingerprints — NOT a DB session
│                                # Dedup key: SHA256(normalized_merchant + amount + date)
├── pdf_parsers/
│   ├── base.py                  # Abstract: parse(path) → tuple[list[StagedTransaction], list[ParseError]]
│   │                            # StagedTransaction is a plain @dataclass — zero SQLAlchemy
│   ├── detector.py              # Content-based registry → issuer string | raises DetectionError
│   │                            # NEVER imports from issuer modules
│   └── chase.py … discover.py  # 7 issuer parsers
├── staging_pipeline.py          # Orchestrates import; owns import_staging.db engine
│                                # staged_txn_to_model(): StagedTransaction → StagedTransactionModel
│                                # Commit: set committed_at before promoting (soft-delete pattern)
├── analytics/
│   ├── __init__.py
│   ├── budget_analytics.py      # Read-only: actuals vs budget aggregations
│   ├── trend_analytics.py       # Read-only: category trends, MoM/YoY
│   └── merchant_analytics.py   # Read-only: top-N merchant spend
└── insights/                   # Generative — read DB, return outputs to blueprint (no auto-writes)
    ├── __init__.py
    ├── subscription_detector.py # FR-3.3: recurring charge detection
    ├── pattern_flags.py         # FR-3.4: overspend, subscription gap, income ratio flags
    └── budget_recommender.py    # FR-3.5: 3-month rolling avg → per-category recommendations
```

---

### Architectural Boundaries (9 total)

**Boundary 1 — Flask Request / Service**
Routes own `flask.request`, `flash()`, `redirect()`. Services never import from `flask`.
Settings values extracted in routes, passed as plain parameters to services.

**Boundary 2 — Service / Database**
`db.session.commit()` at route layer only. Services `add()`/`delete()` but never commit.
Exception: `staging_pipeline.py` manages its own staging DB commit internally — the only
permitted exception, explicitly documented.

**Boundary 3 — Parser / Database**
Parsers have zero SQLAlchemy imports. Return `tuple[list[StagedTransaction], list[ParseError]]`.
`StagedTransaction` is a plain `@dataclass`. `staged_txn_to_model()` in `staging_pipeline.py`
is the only conversion point.

**Boundary 4 — Staging / Main Database**
`import_staging.db` opened by `staging_pipeline.py` only via dedicated `staging_engine`.
Commit strategy: set `committed_at` on staged rows before promoting to main DB; cleanup
step removes `committed_at IS NOT NULL` rows. Idempotent — safe if cleanup fails.

**Boundary 5 — Analytics + Insights / Flask Context**
`services/analytics/` and `services/insights/` never access `flask.g`, `current_app`,
or request context. Settings values (e.g., `monthly_income`) passed as function params.

**Boundary 6 — Amortization / Everything**
`services/amortization.py`: zero imports beyond `decimal` + stdlib.
`AmortizationResult` is a plain `@dataclass` with primitive fields only — no ORM refs
(prevents `DetachedInstanceError` in templates).

**Boundary 7 — Duplicate Detector / DB Session**
`duplicate_detector.flag_duplicates()` receives `list[str]` fingerprint hashes — not a
session or query object. Caller queries existing fingerprints and passes them in.
Stateless and testable without a DB fixture.

**Boundary 8 — Pandas Isolation**
`pandas` imported only in `services/analytics/` and `services/rolling_aggregator.py`.
No blueprint, model, utility, or other service file imports pandas.

**Boundary 9 — Single-User Scope**
No concurrency handling beyond SQLite WAL + SQLAlchemy connection-per-request.
No auth, no session management, no row-level ownership. Decisions depending on this
constraint are marked in comments; violating this scope requires architectural review.

---

### Data Flows (Corrected)

**Flow 1 — Manual Transaction Entry**
```
POST /transactions/create
  → TransactionForm.validate_on_submit()
  → existing_fps = query recent transaction fingerprints
  → duplicate_detector.flag_duplicates([new_fp], existing_fps)
      → if duplicate: flash warning, re-render with flag
      → if clean: db.session.add(Transaction(...))
  → db.session.commit() → flash('success') + redirect (PRG)
```

**Flow 2 — PDF Import**
```
POST /import-pdf/upload
  → staging_pipeline.begin_import(pdf_paths)
      [any exception → rollback all staging writes for this batch]
      → detector.detect_issuer(raw_text) → issuer | raises DetectionError
      → parser.parse(path) → (list[StagedTransaction], list[ParseError])
      → merchant_normalizer.normalize(staged_txns)   ← BEFORE dedup
      → existing_fps = query main DB recent fingerprints
      → duplicate_detector.flag_duplicates(normalized, existing_fps)
      → staged_txn_to_model() for each → write to import_staging.db
  → redirect to /import-pdf/review/{batch_id}

POST /import-pdf/confirm/{batch_id}
  → staging_pipeline.commit_import(batch_id)
      → BEGIN TRANSACTION (main DB)
      → promote staged rows → transactions
      → write import_batch record
      → COMMIT
      → set committed_at on staged rows (cleanup runs separately)
  → flash('Imported N, skipped M duplicates.') + redirect
```

**Flow 3 — Dashboard Render**
```
GET /
  → summary.get_dashboard_data(month, active_plan_id, monthly_income)
      [5 SEQUENTIAL queries — single db.session, no threading]
      → budget_burn, upcoming_bills, paydown_progress,
         recent_txns, monthly_summary
  → render_template('dashboard/index.html', **data, active_page='dashboard')
```

**Flow 4 — Paydown Plan Activation**
```
POST /paydown/activate/{strategy}
  → validate: extra_monthly > 0, at least one card with balance > 0
  → plan_lifecycle.activate_plan(strategy, cards, extra_monthly)
      → BEGIN TRANSACTION
      → db.session.add(new PaydownPlan(status='active'))   ← write NEW first
      → old_plan.status = 'archived'; old_plan.archived_at = now()
      → COMMIT  [if commit fails, nothing archived]
  → flash('Plan activated.') + redirect to /paydown/monitor
```

**Flow 5 — Budget Recommendations**
```
GET /budgets/recommend
  → settings = Settings.query.first()
  → months_available = count distinct transaction months
  → if months_available >= 2:
      rolling_avgs = rolling_aggregator.category_rolling_avg(months=3)
      recs = insights.budget_recommender.from_history(rolling_avgs)
    else:
      recs = insights.budget_recommender.fifty_thirty_twenty(settings.monthly_income)
  → render_template('budgets/recommend.html', recommendations=recs)
```

---

### Development Workflow

```bash
# Local development
export FLASK_APP=wsgi.py FLASK_DEBUG=1
export DATABASE_URL=sqlite:///financials.db
flask db upgrade && flask run

# Tests
pytest tests/                                           # full suite
pytest tests/test_services/test_duplicate_detector.py  # run BEFORE first migration
pytest tests/test_parsers/                              # Phase 2 (requires PDF fixtures)
pytest -k "not test_parsers"                            # Phase 1 only

# Apache mod_wsgi deployment
# WSGIDaemonProcess financials python-home=/path/to/.venv
# WSGIScriptAlias / /var/www/html/financials/wsgi.py
```

---

## Architecture Validation Results

### Coherence Validation ✅

All technology choices compatible. Patterns align with stack. 9 boundaries are
internally consistent and non-contradictory. Service layer taxonomy (6 layers)
cleanly separates pure computation, detection, orchestration, analytics, insights,
and cross-blueprint aggregation.

**Minor note:** `summary.get_dashboard_data()` must receive `monthly_income` as a
parameter extracted at the route layer (not query Settings internally) — consistent
with Boundary 1 and Boundary 5.

### Requirements Coverage ✅

All 37 FRs mapped to specific modules. All 6 NFRs architecturally addressed.
Five previously-at-risk FRs (FR-1.25, FR-1.26/1.27, FR-2.7, FR-3.3–3.5) have
dedicated service modules after Party Mode review.

### Gap Analysis

**Important (non-blocking — address at implementation start):**

**1. Index strategy** — Add to Phase 1 first migration:
```sql
CREATE INDEX idx_transactions_date ON transactions(date);
CREATE INDEX idx_transactions_category ON transactions(category_id);
CREATE INDEX idx_transactions_account ON transactions(account_id);
CREATE INDEX idx_bills_due_date ON bills(due_date);
CREATE INDEX idx_balance_updates_account ON paydown_balance_updates(account_id, updated_at);
```

**2. Canonical `StagedTransaction` @dataclass fields:**
```python
@dataclass
class StagedTransaction:
    date: datetime.date
    merchant_raw: str
    merchant_normalized: str
    amount: Decimal
    is_credit: bool
    issuer: str
    dedup_hash: str        # SHA256(normalized_merchant + str(amount) + date.isoformat())
    confidence_score: float  # 0.0–1.0; below 0.7 flagged for user review
    raw_text: str          # Original description string from PDF
```

**3. Canonical `ParseError` @dataclass fields:**
```python
@dataclass
class ParseError:
    page_number: int
    raw_text: str
    reason: str
    parser_version: str    # e.g. "chase-v1" — aids layout drift diagnosis
```

**Deferred (fully deferrable):**
- `conftest.py` internal fixture structure
- Chart.js per-page initialization beyond `chart-defaults.js`
- `requirements.txt` pinned versions

---

### Architecture Completeness Checklist

**Requirements Analysis**
- [x] Project context thoroughly analyzed
- [x] Scale and complexity assessed
- [x] Technical constraints identified
- [x] Cross-cutting concerns mapped

**Architectural Decisions**
- [x] Critical decisions documented with versions
- [x] Technology stack fully specified
- [x] Integration patterns defined
- [x] Performance considerations addressed

**Implementation Patterns**
- [x] Naming conventions established
- [x] Structure patterns defined
- [x] Communication patterns specified
- [x] Process patterns documented

**Project Structure**
- [x] Complete directory structure defined
- [x] Component boundaries established
- [x] Integration points mapped
- [x] Requirements to structure mapping complete

---

### Architecture Readiness Assessment

**Overall Status:** READY FOR IMPLEMENTATION
**Confidence Level:** High

**Key Strengths:**
- 9 enforced architectural boundaries prevent the most common Flask failure modes
  (parser/DB entanglement, analytics in Flask context, float money arithmetic)
- Service layer taxonomy established before any code is written — prevents business
  logic from calcifying in blueprints
- Phase 1 schema pre-populates Phase 2 nullable columns — no live migration risk
- `duplicate_detector` correctly positioned as shared infrastructure (import + manual)
- Staged import with soft-delete commit is resilient to partial failures
- `AmortizationResult` and `StagedTransaction` as plain dataclasses prevents
  `DetachedInstanceError` class of bugs entirely

**Areas for Future Enhancement:**
- Parser versioning strategy (OQ-3/4/5 open: Apple Card OCR, Target dual-card, DCU)
- Automated SQLite backup (manual download only in Phase 1)
- Database index monitoring as transaction volume grows

---

### Implementation Handoff

**AI Agent Guidelines:**
- Follow all 9 architectural boundaries as documented — no exceptions without review
- Check service layer taxonomy before placing any new module
- `tests/conftest.py` is the first file written — before any model or service
- `test_duplicate_detector.py` is written before the first Alembic migration
- `StagedTransaction` and `ParseError` canonical field specs above are the
  authoritative definition for all 7 parser implementations
- `AmortizationResult` fields must be plain Python primitives — no ORM references

**First Implementation Step:**
```bash
# 1. Scaffold directories
mkdir -p app/{models,blueprints/{dashboard,transactions,budgets,settings,bills,paydown,import_pdf,analytics,api},services/{pdf_parsers,analytics,insights},utils,templates/components,static/{vendor/{primer,chartjs},css,js}}

# 2. Write tests/conftest.py (in-memory SQLite fixture)

# 3. Define StagedTransaction and ParseError dataclasses in services/pdf_parsers/base.py

# 4. Write tests/test_services/test_duplicate_detector.py

# 5. pip install flask flask-sqlalchemy flask-migrate pdfplumber pandas python-dateutil flask-wtf pytest
# flask db init
```
