# Python Setup

The AI categorization engine is written in Python 3 and communicates with PHP via the PostgreSQL database. PHP spawns the categorization script as a background process and polls `categorization_batches.status` for progress — it does not read Python stdout or stderr directly.

## Prerequisites

- Python 3.9 or later
- PostgreSQL credentials configured in the project `.env` file

## Install dependencies

From the project root:

```bash
python3 -m venv python/.venv
python/.venv/bin/pip install -r python/requirements.txt
```

## Environment variables

The Python services read database credentials from the same `.env` file used by PHP:

| Variable      | Description                  | Example              |
|---------------|------------------------------|----------------------|
| `DB_HOST`     | PostgreSQL host               | `localhost`          |
| `DB_PORT`     | PostgreSQL port               | `5432`               |
| `DB_NAME`     | Database name                 | `ai_cats`            |
| `DB_USER`     | Database user                 | `ai_cats_user`       |
| `DB_PASSWORD` | Database password             | *(set in .env)*      |

## Running the categorizer manually

```bash
python/.venv/bin/python python/categorize.py <batch_id>
```

PHP typically spawns this in the background with output redirected:

```bash
python3 python/categorize.py 42 >> logs/categorize_42.log 2>&1 &
```

## Directory structure

```
python/
├── requirements.txt          # psycopg2-binary, python-dotenv
├── .gitignore                # excludes .venv/, __pycache__/, *.pyc
├── config.py                 # reads DB env vars via python-dotenv
├── categorize.py             # entry point: accepts batch_id as CLI arg
├── services/
│   ├── __init__.py
│   ├── categorizer.py        # orchestrates per-product prediction + DB writes
│   ├── evidence_builder.py   # extracts evidence signals from product data
│   └── confidence_scorer.py  # computes confidence score and level
└── models/
    ├── __init__.py
    └── database.py           # psycopg2 connection factory
```

## PHP / Python communication contract

1. PHP writes a `categorization_batches` record (`status = 'pending'`, `product_ids` as JSONB array).
2. PHP spawns `python3 python/categorize.py {batch_id}` in the background.
3. Python sets `status = 'processing'`, iterates `product_ids`, writes `predictions` and `evidence` rows, and updates `processed_products` after each product.
4. PHP polls `/api/categorization-batches/{id}/status` every 5 seconds — the API reads directly from the DB.
5. On completion Python sets `status = 'completed'` or `'completed_with_errors'`; on fatal error it sets `'failed'`.
