# AI Cats — User Guide

AI Cats (AI Categorization System) is a self-hosted web application that assigns Akeneo product categories to your product catalog using AI-generated confidence scores. High-confidence predictions are submitted to Akeneo automatically; low-confidence predictions are queued for a data steward to review, adjust, and approve.

This guide covers every user-facing function in the order a typical operator would encounter it.

---

## Table of Contents

1. [Accessing the Application](#1-accessing-the-application)
2. [Admin Setup](#2-admin-setup)
3. [Importing Products via CSV](#3-importing-products-via-csv)
4. [Syncing Products from Akeneo](#4-syncing-products-from-akeneo)
5. [Running Categorization](#5-running-categorization)
6. [Reviewing Predictions](#6-reviewing-predictions)
7. [Adjusting a Category Inline](#7-adjusting-a-category-inline)
8. [Submitting to Akeneo Collaborative Workflow](#8-submitting-to-akeneo-collaborative-workflow)
9. [Health Dashboard & Monitoring](#9-health-dashboard--monitoring)
10. [Logs & Troubleshooting](#10-logs--troubleshooting)
11. [Backups](#11-backups)
12. [Database Reference](#12-database-reference)
13. [CSV Format Reference](#13-csv-format-reference)
14. [Confidence Scoring Reference](#14-confidence-scoring-reference)
15. [Status Reference](#15-status-reference)

---

## 1. Accessing the Application

The application runs at **`http://<host>:8888/`**. The default deployment uses Apache on port 8888.

On first visit you are redirected to **`/login`**. Enter your username and password. There is no self-registration; accounts are seeded directly in the database (see [Database Reference](#12-database-reference)).

After login you land on the **Predictions** page, which is the main working view for data stewards. Admin-only pages (Settings, Logs, Backups, Health) are accessible from the sidebar.

---

## 2. Admin Setup

Before processing any products, an administrator must configure the Akeneo connection and set the confidence threshold.

### 2.1 Configure Akeneo Credentials

Navigate to **Admin → Settings** (`/admin/settings`).

| Field | What to enter |
|---|---|
| Base URL | Your Akeneo instance root, e.g. `https://akeneo.example.com` |
| Client ID | OAuth 2.0 client ID from Akeneo → Connect → API Connections |
| Client Secret | OAuth 2.0 client secret (masked; leave blank to keep the existing value) |
| Username | Akeneo user account username |
| Password | Akeneo user account password (masked; leave blank to keep) |

Click **Save Settings**, then click **Test Akeneo Connection**. A green "Connection successful" banner confirms the credentials work. The result is also visible on the Health dashboard.

### 2.2 Set the Confidence Threshold

On the same Settings page, set **Confidence Threshold** (1–100, default **90**).

- Predictions scoring **at or above** this value are **auto-approved** and submitted to Akeneo without human review.
- Predictions scoring **below** this value are placed in the **Needs Review** queue for a data steward.

A threshold of 90 means "only act automatically when the AI is highly certain." Lowering it reduces the review queue but increases the risk of incorrect categorizations being sent to Akeneo.

---

## 3. Importing Products via CSV

Products enter the system either by CSV upload or by syncing from Akeneo (section 4). Most workflows start with CSV.

### 3.1 Create an Import Job

Navigate to **Import Jobs** (`/import-jobs`) and click **New Import**. On the upload form:

- Select a `.csv` file (max 50 MB).
- Click **Upload & Import**.

The file is stored on the server under `public/uploads/csv/` and a background job processes it row by row. You are redirected to the job detail page (`/import-jobs/:id`), which polls for progress every few seconds.

### 3.2 Monitor Import Progress

The job detail page shows:

- **Status badge** — `Processing`, `Completed`, `Completed with Errors`, or `Failed`.
- **Row counters** — Total rows, processed, succeeded, errored.
- **Error table** — For each failed row: row number, field name, error reason, and the original value that caused the problem.

A `Completed with Errors` job means some rows were imported successfully and some were skipped. The error table tells you which rows need attention.

### 3.3 CSV Format

See the full [CSV Format Reference](#13-csv-format-reference) for column names and rules. The short version:

- **Required:** at least one of `sku`, `mpn`, or `part_number` per row.
- **Standard columns** are mapped to known product fields; all other columns become searchable product attributes used by the AI.
- Headers are case-insensitive. Column order does not matter.

---

## 4. Syncing Products from Akeneo

If your products already exist in Akeneo you can pull them into AI Cats rather than uploading a CSV.

### 4.1 Manual Sync

Navigate to **Admin → Health** (`/admin/health`) and click **Sync Products from Akeneo**. The sync runs in the background. The Health page polls for progress and shows the result (products fetched/created/updated) when it completes.

**What the sync does:**

1. Authenticates with Akeneo using the stored OAuth credentials.
2. Fetches all products from Akeneo via paginated API calls (100 per page).
3. For each Akeneo product: matches it to a local product by SKU or MPN (case-insensitive). If a match is found, the `akeneo_identifier` and sync timestamp are updated. If no match is found, a new product row is created.
4. Also syncs the Akeneo category tree into the local `categories` table (used by the inline adjust autocomplete).

### 4.2 Scheduled Sync (Cron)

To sync automatically, add a crontab entry for the sync script:

```
# Sync Akeneo products and categories every hour
0 * * * * php /var/www/html/ai_cats/scripts/sync_akeneo.php >> /var/www/html/ai_cats/logs/sync.log 2>&1
```

---

## 5. Running Categorization

After products are in the system, you submit a batch for AI categorization.

### 5.1 Select Products

Navigate to **Products** (`/products`). The table shows all products with their current status. Use the checkboxes to select the products you want to categorize. The **Select all** checkbox in the header selects all products on the current page. A floating action bar appears at the bottom showing how many are selected.

**Tip:** Filter by status `imported` to see products that have not yet been categorized.

### 5.2 Submit a Batch

Click **Submit for Categorization** in the floating action bar. A confirmation dialog appears; confirm to create the batch.

You are redirected to **Categorization Batches** (`/categorization-batches`), where the new batch appears with status `Pending` → `Processing`. The page polls for updates every 10 seconds.

### 5.3 What Happens During Categorization

The PHP application spawns a Python script (`python/categorize.py`) in the background. The script:

1. Loads each product's data and all its attributes from the database.
2. Runs the **evidence builder** — extracts signals from part number patterns, manufacturer category keywords, and enrichment attributes from the CSV.
3. Computes a **confidence score** (10–100) from the weighted evidence signals.
4. Assigns `confidence_level`: `high` (≥ threshold), `medium` (70 to threshold−1), or `low` (< 70).
5. Saves a prediction row and individual evidence records to the database.

After the Python script completes, PHP applies **threshold routing**:

- **Score ≥ threshold** → status `auto_approved`; if the product has an Akeneo identifier, the category is submitted to Akeneo immediately via the REST API.
- **Score < threshold** → status `needs_review`; the prediction is placed in the review queue.

The batch detail page (`/categorization-batches/:id`) shows a confidence distribution (high/medium/low counts and average score) once the batch completes.

### 5.4 Retrying a Failed Batch

If a batch fails or has errors, a **Retry** button appears on the batch detail page. Retrying re-submits only the products that did not get a successful prediction (already-predicted products are skipped). A batch may be retried up to **3 times** total.

---

## 6. Reviewing Predictions

The Predictions page (`/predictions`) is the main workspace for data stewards. It shows all predictions that scored below the confidence threshold and need a human decision.

### 6.1 Navigating the Queue

The page has **status tabs** across the top:

| Tab | Description |
|---|---|
| Needs Review | Below-threshold predictions awaiting a decision |
| Submitted | Submitted to Akeneo Collaborative Workflow, awaiting outcome |
| Awaiting Akeneo Approval | Submitted; workflow is actively in progress |
| Approved | Approved by a reviewer in Akeneo |
| Rejected | Rejected by a reviewer in Akeneo |
| Auto-Approved | High-confidence predictions sent directly to Akeneo |
| All | Every prediction regardless of status |

Each tab shows a count badge. The **queue counter** below the page title ("X items awaiting review") tracks how many predictions still need attention on the Needs Review tab.

### 6.2 Filtering the Queue

- **Confidence level filter** (dropdown): All levels / High (≥ threshold%) / Medium (70–threshold−1%) / Low (< 70%).
- **Search** (text input): Filters visible rows by SKU, MPN, or product name without a server round-trip.

### 6.3 Reading a Row

Each row shows:

| Column | Description |
|---|---|
| SKU | Product SKU (string, preserves leading zeros) |
| MPN | Manufacturer part number |
| Product Name | Product name from import or Akeneo |
| Suggested Category | AI-recommended Akeneo category code and label. A pencil icon (✎) appears if the category was manually adjusted |
| Confidence | Score badge: green (high), yellow (medium), red (low) |
| Top Evidence | The two strongest signals the AI used to make its recommendation |
| Status | Current prediction status |
| Actions | Submit and/or Adjust buttons (on Needs Review rows) |

### 6.4 Viewing Full Evidence

Click anywhere on a prediction row (not on a button or checkbox) to expand the **evidence panel** below it. The panel shows:

- **Why this category was suggested** — each evidence signal with its source type, matched value, and percentage contribution to the confidence score.
- **Missing attributes** — fields that exist in the schema but had no value for this product, which could improve confidence if populated.
- **Akeneo Workflow status** — if the prediction was submitted to Akeneo Collaborative Workflow, the current workflow status is shown.

Click the row again to collapse the panel.

### 6.5 Empty State

When the last Needs Review prediction is acted on, the table is replaced with a **completion panel** showing:

- "All predictions reviewed. Nothing waiting on you."
- A session activity summary (only shown if you took at least one action during the session): "Auto-approved: X · Submitted to Akeneo: Y · Adjusted: Z"
- Links to view submitted predictions and start a new categorization batch.

---

## 7. Adjusting a Category Inline

If the AI suggested the wrong category, you can correct it directly in the review queue row before submitting it to Akeneo.

### 7.1 Open the Adjust Editor

On any **Needs Review** row, click **Adjust** in the Actions column. The Suggested Category cell is replaced with a text input pre-filled with the current category label, and the action buttons change to **Save** and **Cancel**.

### 7.2 Search for a Category

Start typing in the input (at least 2 characters). A dropdown appears showing up to 10 matching categories from the Akeneo category tree, formatted as:

> Industrial Fasteners (cat_fasteners_industrial)

Select an entry from the dropdown. The input updates to show the selected label and code.

**Note:** The dropdown is populated from the `categories` table, which is synced from Akeneo during product sync. If the dropdown always shows "No matching categories found", run an Akeneo sync (section 4) to populate the category tree.

### 7.3 Save or Cancel

- **Save** — validates the selection, updates the prediction in the database, and writes an audit log entry recording the old category code, new category code, and confidence score. The row reverts to normal display with the new category shown and a ✎ icon indicating it was manually adjusted. The row stays in the Needs Review queue (adjusted predictions still need to be submitted to Akeneo).
- **Cancel** — discards all changes and reverts the row to its original state.

If you click Save without selecting a category from the dropdown, an inline error appears: "Please select a valid category from the list."

Pressing **Escape** while editing also cancels without saving.

---

## 8. Submitting to Akeneo Collaborative Workflow

Once you're satisfied with a prediction's category (either the AI recommendation or an adjusted one), submit it to Akeneo for collaborative review.

### 8.1 Single-Row Submit

Click **Submit** on any Needs Review row. The row disappears from the queue and the tab counter decrements. The prediction status moves to `submitted`.

### 8.2 Bulk Submit

Use the **checkboxes** to select multiple rows. The floating action bar at the bottom of the page shows how many are selected and a **Submit to Akeneo Workflow** button. Click it to submit all selected predictions in one request.

If all visible Needs Review rows are selected and there are more in the queue (across pages), a notice appears: "X items on this page selected. Select all Y items in queue?" Clicking that link extends the selection to every Needs Review prediction, regardless of the current page.

### 8.3 Bulk Submit Results

After a bulk submit:

- Successfully submitted rows are removed from the queue.
- If any submissions failed, a banner lists each failed SKU and the reason (e.g., "Product has no Akeneo identifier — sync from Akeneo first.").

### 8.4 What Happens in Akeneo

For each submitted prediction, AI Cats:

1. Creates or updates a product draft in Akeneo (`POST /api/rest/v1/products-draft/:identifier`).
2. Sets the `categories` field to the suggested category code.
3. Attaches a top-3 evidence summary as the `ai_evidence_summary` attribute.
4. Returns the proposal code, which is stored locally in `akeneo_workflow` for status tracking.

Akeneo then routes the proposal through its own Collaborative Workflow. A background script (`scripts/poll_akeneo_workflow.php`) checks proposal statuses and updates predictions to `approved` or `rejected` as Akeneo responds.

---

## 9. Health Dashboard & Monitoring

Navigate to **Admin → Health** (`/admin/health`).

### Akeneo Connection

Shows the last test result (Connected / Disconnected / Failed / Untested) and the timestamp. Click **Test Now** to re-test.

### Akeneo Product Sync

Shows the status of the most recent sync (running / completed / failed), when it started and finished, and counts of products fetched/created/updated. Click **Sync Products from Akeneo** to trigger a manual sync.

### Import Jobs (Last 7 Days)

Total imports, and breakdowns of completed / failed / in-progress.

### Categorization Batches (Last 7 Days)

Total batches, auto-approved count, sent-to-workflow count, failed count.

### Recent Import Jobs Table

Lists the five most recent import jobs with live status badges (polling every 10 seconds for any that are still processing).

---

## 10. Logs & Troubleshooting

Navigate to **Admin → Logs** (`/admin/logs`).

### Reading Logs

Logs are stored as newline-delimited text files on the server. Each line contains a timestamp, level (`INFO`, `WARNING`, `ERROR`), message, and a JSON context object.

The Logs page shows the most recent entries from `logs/app.log` and `logs/error.log`. A **Clear Logs** button truncates the file. The page warns if log files are approaching the 50 MB read cap.

### Per-Batch Categorization Logs

Each categorization batch writes its own log at `logs/categorize_{batch_id}.log`. These are not shown in the admin UI but can be inspected directly on the server for debugging Python script failures.

### Common Issues

| Symptom | Likely cause | Fix |
|---|---|---|
| All predictions stuck in `predicted` status | Threshold routing not applied | Reload the batch detail page; routing is applied lazily when the batch status is read |
| Predictions review queue always empty but batch shows predictions | Same as above | Navigate to `/categorization-batches/:id` |
| "Product has no Akeneo identifier" on submit | Product was not synced from Akeneo | Run an Akeneo product sync (section 4) |
| Category search returns no results | `categories` table is empty | Run an Akeneo sync; categories are pulled alongside products |
| Confidence always 10 (minimum) | No evidence signals matched the product | Check that part numbers follow a known pattern, or that manufacturer_category was imported |
| Batch stuck in `processing` for > 30 minutes | Python script crashed silently | Check `logs/categorize_{batch_id}.log` for Python tracebacks |
| Akeneo connection fails | Credentials changed or token expired | Re-enter credentials in Admin → Settings and test the connection |

---

## 11. Backups

Navigate to **Admin → Backups** (`/admin/backups`).

### Manual Backup

Click **Run Backup Now**. AI Cats calls `pg_dump` to create a compressed `.sql.gz` backup of the PostgreSQL database and stores it in the `storage/backups/` directory. The backup table shows the filename, size, status, and any error message.

### Scheduled Backups

The Settings page has backup schedule configuration. For a cron-based scheduled backup:

```
# Daily backup at 2:00 AM
0 2 * * * php /var/www/html/ai_cats/scripts/backup.php >> /var/log/ai_cats_backup.log 2>&1
```

---

## 12. Database Reference

AI Cats uses **PostgreSQL**. Connection details are stored in `/var/www/html/ai_cats/.env` (not in version control). The database name, host, port, user, and password are set there.

### Where the Data Lives

| Table | What it stores |
|---|---|
| `users` | Application login accounts |
| `system_settings` | Admin-configured settings (Akeneo credentials, confidence threshold) |
| `import_jobs` | One row per CSV upload — status, filename, row counts |
| `import_job_errors` | Per-row errors from failed import rows |
| `products` | One row per product — SKU, MPN, name, manufacturer, status |
| `product_attributes` | Arbitrary key/value pairs imported from CSV columns beyond the standard fields |
| `categorization_batches` | One row per AI categorization run — status, product list, counters |
| `predictions` | One row per product per batch — suggested category, confidence score, status, is_adjusted |
| `evidence` | Per-signal evidence records linked to a prediction |
| `audit_log` | Append-only log of every status-changing event |
| `categories` | Akeneo category tree (synced from Akeneo; used for adjust autocomplete) |
| `akeneo_sync_log` | One row per Akeneo product sync run |
| `akeneo_workflow` | Tracks submitted proposals and their Akeneo status |
| `backups` | Backup job history |

### Connecting Directly

```bash
# From the server (using credentials from .env):
psql -h 127.0.0.1 -U <DB_USER> -d <DB_NAME>
```

### Applying Migrations

Database schema is managed with [Phinx](https://phinx.org). To apply pending migrations after a code update:

```bash
cd /var/www/html/ai_cats
vendor/bin/phinx migrate
```

---

## 13. CSV Format Reference

### Required Column

At least one of the following must be present in every row, or the row is skipped with an error:

| Column name(s) | Maps to |
|---|---|
| `sku` | `products.sku` |
| `mpn` or `part_number` | `products.mpn` |

### Standard Optional Columns

| Column name(s) | Maps to | Notes |
|---|---|---|
| `name` or `product_name` | `products.name` | Human-readable product name |
| `manufacturer` | `products.manufacturer` | Brand or manufacturer name |
| `manufacturer_category` or `category` | `products.manufacturer_category` | Supplier's own category label; used by the AI |

Column matching is **case-insensitive** and **order-independent**.

### Extra Columns

Every column that does not match a standard name is imported as a `product_attributes` row (key = column header, value = cell value). These attributes are available to the AI evidence builder as enrichment signals and can improve categorization accuracy.

### Data Rules

- **SKU, MPN, part_number** are always stored as strings — leading zeros, hyphens, and mixed-case are preserved exactly as entered.
- Empty cells are stored as `null` and do not cause errors unless they are in the identifier columns.
- Each row's full original data is stored as JSON (`products.original_data`) for traceability.
- Rows with a SKU or MPN that already exists in the database are **updated**, not duplicated.

### Example CSV

```csv
sku,mpn,name,manufacturer,manufacturer_category,series,voltage_rating
CB-0042,AMPCB42,Ribbon Cable 42-pin,Amphenol,Cables,FlatFlex,5V
R00471,RES-470R,470 Ohm Resistor,Yageo,Resistors,RC Series,0.25W
0001-PWR,PSU-12V5A,12V 5A Power Supply,Mean Well,Power Supplies,RS-60,12V
```

Row 1 has `sku=CB-0042`. The AI sees the `CB-` prefix as a part number pattern signal pointing to the Cables category, and the `manufacturer_category=Cables` as a second confirming signal — this prediction will likely score high.

Row 3 has `sku=0001-PWR` with a leading zero — stored as the string `"0001-PWR"`, not `1-PWR`.

The columns `series` and `voltage_rating` do not map to standard fields, so they become product attributes and are available to the AI as enrichment signals.

---

## 14. Confidence Scoring Reference

Every prediction includes a confidence score from **10 to 100**.

### Score Calculation

The Python categorizer extracts evidence signals from the product data. Each signal has a **weight** (how much it contributes to the overall score) and a **signal strength** (how confident that specific signal is).

| Evidence source | Weight | Typical signal strength |
|---|---|---|
| Part number prefix pattern | 0.40 (40%) | 0.70–0.80 |
| Manufacturer category keyword | 0.35 (35%) | 0.75 |
| Enrichment attribute match | 0.15 (15%) | 0.50 |
| Fallback (no evidence found) | 0.001 | 0.10 |

**Formula:** `confidence_score = (Σ weight × strength) / (Σ weight) × 100`, clamped to [10, 100].

A product with a strong part number pattern match (`weight=0.40, strength=0.80`) and a matching manufacturer category (`weight=0.35, strength=0.75`) will score approximately 78 — medium confidence.

### Confidence Levels

| Level | Score range | Meaning |
|---|---|---|
| High | ≥ threshold (default 90%) | AI is highly certain; auto-approved |
| Medium | 70% to threshold−1% | Reasonable match; flagged for review |
| Low | < 70% | Weak evidence; flagged for review |

Only High predictions bypass the review queue. Medium and Low always go to Needs Review.

### Part Number Prefix Patterns

The AI recognises these built-in patterns (case-insensitive prefix match):

| Prefix pattern | Suggested category |
|---|---|
| `CB-` or `CAB-` | Cables |
| `R` or `RES` followed by digits | Resistors |
| `C` or `CAP` followed by digits | Capacitors |
| `IC` or `U` followed by digits | Integrated Circuits |
| `SW-` or `BTN-` | Switches |
| `PSU-` or `PWR-` | Power Supplies |
| `LED-` or `DIODE-` | Diodes |
| `FAN-` or `COOL-` | Cooling |

Products whose part numbers don't match any pattern rely on manufacturer category keywords and enrichment attributes for evidence. If no evidence is found, a Fallback signal is used and the prediction will score near the minimum (≈10).

---

## 15. Status Reference

### Prediction Status

| Status | Meaning | Next steps |
|---|---|---|
| `predicted` | Categorized by AI; threshold routing not yet applied | Transient; routing is applied when the batch detail page loads |
| `needs_review` | Below threshold; awaiting data steward decision | Submit or Adjust in the review queue |
| `submitted` | Submitted to Akeneo Collaborative Workflow | Waiting for Akeneo response |
| `awaiting_akeneo` | Akeneo proposal is actively in workflow | No action needed |
| `approved` | Akeneo reviewer approved the proposal | Terminal |
| `rejected` | Akeneo reviewer rejected the proposal | Optionally re-submit with an adjusted category |
| `auto_approved` | Scored ≥ threshold; auto-submitted to Akeneo | Terminal |

### Product Status

| Status | Meaning |
|---|---|
| `imported` | In system from CSV or Akeneo sync; not yet categorized |
| `predicted` | Has at least one prediction (may still be routing) |
| `needs_review` | Active prediction is below threshold |
| `auto_approved` | Active prediction was auto-approved |
| `submitted` | Submitted to Akeneo workflow |
| `approved` | Akeneo-approved |
| `rejected` | Akeneo-rejected |

### Import Job Status

| Status | Meaning |
|---|---|
| `pending` | Queued; not yet started |
| `processing` | File is being read row by row |
| `completed` | All rows processed without errors |
| `completed_with_errors` | Some rows succeeded, some failed (see error table) |
| `failed` | Processing could not complete |

### Categorization Batch Status

| Status | Meaning |
|---|---|
| `pending` | Queued; Python script not yet started |
| `processing` | Python script is actively categorizing |
| `completed` | All products categorized successfully |
| `completed_with_errors` | Some products succeeded, some had per-product errors |
| `failed` | Script crashed or a fatal error occurred (retry available) |
