# Story 3.2: Import Processing & Data Normalization

## Story

**As a** system administrator,
**I want** uploaded CSV files to be parsed, validated, and normalized into the database automatically,
**So that** supplier enrichment data is available for categorization without manual data entry.

## Status

done

## Acceptance Criteria

All ACs met. See Dev Notes for design decisions.

## Tasks / Subtasks

- [x] **Task 1: Migrations** (AC: schema ACs)
  - [x] `20260523000002_create_products_table.php` — id, import_job_id (FK, SET NULL), sku, mpn, name, manufacturer, manufacturer_category, status (default 'imported'), original_data (jsonb), created_at, updated_at. Indexes on sku, mpn, import_job_id, status.
  - [x] `20260523000003_create_product_attributes_table.php` — id, product_id (FK, CASCADE), attribute_name, attribute_value, original_value, created_at. Index on product_id.
  - [x] `20260523000004_create_import_job_errors_table.php` — id, import_job_id (FK, CASCADE), row_number, field_name, error_reason, original_value, created_at. Index on import_job_id.

- [x] **Task 2: Create `ImportProcessor`** (AC: all processing ACs)
  - [x] `process(int $jobId)` — marks job `processing`, opens file, calls `processStream()`, handles path as absolute or PROJECT_ROOT-relative.
  - [x] `processStream()` — streams CSV via `fgetcsv()` (one row at a time; never loads whole file). Trims all headers and cell values. Validates identifier column presence in header. Per row: checks identifier presence, calls `insertProduct()` or `recordError()`. Updates `processed_rows` every 100 rows. Sets final status (completed / completed_with_errors / failed) and logs summary.
  - [x] `insertProduct()` — extracts mapped fields as strings via `getString()` (no numeric casting ever). Stores raw row in `original_data` JSONB. Inserts unmapped columns as `product_attributes` rows (with both trimmed and original values).
  - [x] `buildColumnIndex()` — case-insensitive alias resolution: `mpn`/`part_number` → `mpn`, `name`/`product_name` → `name`, `manufacturer_category`/`category` → `manufacturer_category`.
  - [x] `failJob()` — sets `status='failed'`, `completed_at`, `error_message`, writes `Logger::error`.

- [x] **Task 3: Create `scripts/process_import.php`** (AC: cron AC)
  - [x] SAPI guard + arg validation. Bootstraps autoloader + dotenv. Calls `ImportProcessor::process()`. Reads final DB status for exit code. Header comment documents cron setup.

- [x] **Task 4: Tests** (`tests/Unit/ImportProcessorTest.php`)
  - [x] 12 end-to-end tests against live DB: successful import, leading-zero SKU preserved, `part_number` alias, mixed rows → completed_with_errors, missing identifier column → failed, unmapped columns → product_attributes, original_data JSONB, header/cell whitespace trimmed, empty CSV fails.

## Dev Notes

### String-safe identifiers — enforced in `getString()`

`getString()` reads from `$trimmedRow[header]` (already a string from `fgetcsv()`). No conversion is applied at any point. Leading zeros, hyphens, and special characters are preserved verbatim.

### Absolute vs relative stored_filepath

`ImportProcessor::process()` checks `str_starts_with($path, '/')`: absolute paths (used in tests) are used as-is; relative paths get `PROJECT_ROOT . '/'` prepended (production convention).

### File list

**New files:** `db/migrations/20260523000002-4_*.php`, `src/Services/ImportProcessor.php`, `scripts/process_import.php`, `tests/Unit/ImportProcessorTest.php`
