---

project_name: 'AI-Assisted Product Categorization System'
user_name: 'Sayre'
date: '2026-05-12'
sections_completed:

* project_purpose
* technology_stack
* architecture_rules
* database_model
* api_contracts
* workflow_rules
* implementation_rules
* testing_expectations

---

# Project Context for BMAD AI Agents

## 1. Purpose

This project is a self-hosted AI-assisted product categorization system for product data enrichment workflows.

The system should help load supplier/product data, normalize product and attribute records, run categorization logic, store predictions, and prepare clean output for Akeneo PIM. Human approval should occur inside Akeneo using the Akeneo Collaborative Workflow process, not inside this custom application.

The goal is not to create a generic product catalog app. The goal is to support controlled, auditable product categorization and enrichment recommendations at scale, while using Akeneo Collaborative Workflow as the formal approval layer.

## 2. Core Product Goals

* Import product/SKU data from CSV files.
* Normalize products and product attributes into PostgreSQL.
* Categorize products using existing enrichment data, supplier data, category rules, required attributes, and evidence.
* Store categorization predictions with confidence scores and explanation/evidence.
* Route categorization decisions that require human review into Akeneo Collaborative Workflow.
* Preserve auditability: every import, prediction, Akeneo workflow submission, approval outcome, and assignment should be traceable.
* Keep the system self-hosted and avoid unapproved cloud dependencies.
* Keep the UI simple, clean, and focused on operational review workflows.

## 3. Important Business Rules

* Confidence threshold starts at `90%` for automatic categorization eligibility.
* Predictions below threshold must be routed into Akeneo Collaborative Workflow instead of being automatically applied.
* Categorization decisions should be based on existing enrichment data and supplier/product data stored in the database.
* Do not treat AI/model output as final truth without confidence, evidence, and Akeneo workflow review controls.
* Product data is operationally important; avoid destructive updates unless explicitly required.
* Preserve original supplier values where useful for traceability, even when normalized values are created.
* SKU/MPN/part number fields must be handled as strings. Never coerce part numbers into numeric values.
* Trim imported column names and values before processing.
* Imports must be repeatable and diagnosable through job/error tables.

## 4. Technology Stack

### Preferred Stack

This project should prioritize familiar, maintainable web technologies:

* HTML
* CSS
* PHP
* JavaScript
* TypeScript where it provides clear value
* Python where it provides clear value for data processing, AI/ML, embeddings, batch jobs, or analysis
* PostgreSQL

The default implementation should not assume React, Next.js, Laravel, Node.js, Express, or other frameworks unless a specific story explicitly justifies them. Python is approved for targeted backend/data tasks, but the primary web application should remain PHP-first unless a story explicitly justifies otherwise.

### Backend

* Primary backend language: PHP
* Database access: PDO or a clearly approved database abstraction layer
* File uploads: PHP upload handling with explicit validation
* CSV parsing: PHP-based CSV parsing unless a specific story justifies another tool
* API responses: JSON from PHP endpoints
* Long-running jobs may be handled by PHP CLI scripts, queue tables, cron jobs, or worker-style processes depending on the story.

### PHP

* Use clear, readable PHP with small functions/classes.
* Use PDO prepared statements or another approved parameterized database layer.
* Keep route/controller code thin; move business logic into reusable services.
* Validate uploaded files before processing.
* Keep long-running import/categorization work out of normal page loads when practical.
* Store job status and row-level errors for diagnosability.
* Keep environment variables and secrets server-side only.

### JavaScript / TypeScript

* Use JavaScript for browser interactivity and workflow screens.
* Use TypeScript only where it provides clear benefits, such as complex service logic, shared types, or isolated tooling.
* Do not introduce unnecessary build tooling for simple frontend behavior.
* Avoid optional chaining in frontend JavaScript unless compatibility is confirmed.
* Validate external input before using it.

### Python

* Use Python for data processing, AI/ML, embeddings, statistical analysis, batch jobs, and complex file transformations when it is the better tool.
* Good Python use cases include CSV/Excel processing, data profiling, embeddings, similarity scoring, model experimentation, batch categorization, statistical analysis, and offline enrichment jobs.
* Python should not replace the PHP web application by default.
* Keep Python jobs isolated and callable from cron, queue workers, CLI commands, or controlled backend processes.
* Python should communicate through PostgreSQL tables, files, or clearly defined command/API boundaries.
* Store job progress, errors, and outputs so the PHP UI can report status.
* Avoid making Python the primary web stack unless explicitly approved by the story.

### SQL/PostgreSQL

* Prefer readable SQL with explicit column names.
* Avoid destructive migrations without backup/rollback notes.
* Add indexes intentionally for high-volume lookup paths such as SKU, MPN, category, import job, and prediction status.
* Keep confidence calculations explainable.
* Store evidence in a way that can be displayed in the UI.

### Frontend

* Keep the interface minimal and task-focused.
* Use clear status labels: imported, failed, predicted, submitted to Akeneo workflow, awaiting Akeneo approval, approved in Akeneo, rejected in Akeneo, exported.
* Show evidence and missing required attributes near each prediction so the Akeneo workflow reviewer can understand why the recommendation was made.
* Avoid cluttered screens; prioritize review queues and actionable exceptions.
* Use plain JavaScript patterns compatible with the target browser/runtime.

## 10. UX Preferences

* Minimal, clean interface.
* Muted/dark-capable styling is preferred where practical.
* Use tables, cards, and compact filters for operational workflows.
* Prioritize clarity over decoration.
* Important screens should make it obvious:

  * what needs to be submitted to Akeneo workflow,
  * why a category was suggested,
  * what data is missing,
  * what has already been approved or rejected in Akeneo,
  * what action the user should take next.

## 11. Error Handling and Logging

* Every import should have a clear success/failure summary.
* Row-level import failures should include row number, field name when possible, reason, and original value.
* API errors should return consistent JSON responses.
* Do not expose stack traces or secrets to the frontend.
* Log enough detail server-side to troubleshoot failed imports and categorization issues.

## 12. Security and Configuration

* Secrets must live in `.env` or secure environment configuration.
* Never commit `.env` files or credentials.
* Do not expose database credentials to the browser.
* Validate uploaded file type and size.
* Treat uploaded files as untrusted input.
* Use least-privilege database credentials where possible.

## 13. Testing Expectations

At minimum, implementation stories should consider tests or manual validation for:

* CSV upload success path.
* CSV validation failure path.
* Import job creation.
* Row-level import error capture.
* Product normalization.
* Attribute normalization.
* Single-product categorization.
* Batch categorization.
* Confidence threshold handling.
* Akeneo Collaborative Workflow submission and approval-outcome handling.
* Database error handling.
* API response shape consistency.

## 14. BMAD Agent Guidance

### Analyst / PM

Focus on operational workflows, data quality problems, Akeneo Collaborative Workflow routing, confidence thresholds, and auditability. Avoid vague AI promises. Requirements should clearly identify what data is used, what decisions are automated, what is submitted to Akeneo, and what requires Akeneo approval.

### Architect

Respect the preferred stack: PostgreSQL, PHP, HTML, CSS, JavaScript, TypeScript only where it adds clear value, and Python for data processing/AI/ML/batch work where it is the better tool. Do not introduce unnecessary frameworks, Node/Express services, frontend frameworks, or cloud services unless explicitly justified by the story. Prioritize maintainability, auditability, and clear data flow.

### Scrum Master

Stories should be small enough to implement safely. Each story should include acceptance criteria covering data integrity, errors, Akeneo workflow submission behavior, and approval-outcome handling.

### Developer

Follow existing patterns before introducing new ones. Preserve database auditability. Do not bypass confidence threshold rules or Akeneo Collaborative Workflow requirements. Do not create destructive operations without explicit acceptance criteria.

### QA

Review for data integrity, edge cases, import failures, threshold behavior, security, Akeneo workflow routing, and whether evidence is sufficient for an Akeneo workflow reviewer to trust or reject a prediction.

## 15. Definition of Done

A story is not complete unless:

* The implemented behavior matches the acceptance criteria.
* Input validation is handled.
* Errors are visible enough to troubleshoot.
* Relevant database changes are documented.
* Confidence and Akeneo Collaborative Workflow rules are preserved.
* UI changes remain minimal and clear.
* No secrets or unsafe assumptions are introduced.
* Manual test steps are provided when automated tests are not included.

## 16. Open Questions / To Confirm

* Exact project name to use throughout BMAD documents.
* Final confidence threshold policy after initial `90%` starting point.
* Whether categorization recommendations should be pushed directly to Akeneo through the API or staged in an intermediate export file first.
* Whether Akeneo Collaborative Workflow can support the desired bulk review/approval process.
* Whether rejected or corrected Akeneo workflow decisions should be used as training/evidence feedback.
* Which Akeneo users/roles are allowed to approve category assignments in Collaborative Workflow.
* Required retention period for import files and job logs.