sporedimk/systemarhitecture.md

# Price Comparison PWA Solution Structure

This document outlines the comprehensive solution structure for a price comparison PWA with NestJS backend and PostgreSQL database.

## System Architecture

```mermaid
graph TD
    A[Web Scrapers] -->|Extract Data| B[NestJS Backend]
    B -->|Store Data| C[PostgreSQL Database]
    B -->|Serve API| D[PWA Frontend]
    E[Users] -->|Use| D
```

## Backend Structure

### Database Schema

```mermaid
erDiagram
    PRODUCT {
        int id PK
        string name
        string description
        string category
        boolean availability
    }
    PRICE {
        int id PK
        int product_id FK
        float regular_price
        float discounted_price
        float discount_percentage
        string unit_price
        string promotion_type
        date promotion_start
        date promotion_end
        date last_updated
        int source_id FK
    }
    SOURCE {
        int id PK
        string name
        string url
        string logo
        datetime last_scraped
    }
    PRODUCT ||--o{ PRICE : has
    SOURCE ||--o{ PRICE : provides
```

### Additional Database Fields

We'll add these fields to handle the specific data format:
- **Product**: Add `sourceProductId` to track original product IDs
- **Price**: Add `vatIncluded` boolean flag since prices include VAT
- **Source**: Add `lastUpdateTime` to track the "Последно ажурирање" timestamp

### Data Transformation Rules

1. **Text Processing**
   - Handle Cyrillic text encoding (UTF-8)
   - Parse product names and descriptions
   - Extract category from description field

2. **Price Processing**
   - Convert prices from string to float
   - Handle "ден/кг" unit price format
   - Store both VAT-included and VAT-excluded prices

3. **Date Processing**
   - Parse dates from "DD/MM/YYYY" format
   - Handle time in "HH:mm" format for last update
   - Store timestamps in UTC

### Scraper Implementation

The scraper will process the HTML table structure:

```typescript
interface RawProductData {
  productName: string;       // "Назив на стока"
  regularPrice: string;      // "Продажна цена (со ДДВ)"
  unitPrice: string;        // "Единечна цена"
  availability: string;     // "Достапност во продажен објект"
  description: string;      // "Опис на стока"
  discountPrice: string;    // "Цена со попуст"
  discountPercent: string;  // "Попуст (%)"
  promotionType: string;    // "Вид на продажно потикнување"
  promotionPeriod: string;  // "Времетраење на промоција или попуст"
}

interface ProcessedProduct {
  name: string;
  description: string;
  category: string;  // Extracted from description
  availability: boolean;
  prices: {
    regular: number;
    discounted: number | null;
    unit: {
      price: number;
      measurement: string;  // "ден/кг", etc.
    };
  };
  promotion: {
    type: string;
    discountPercentage: number;
    startDate: Date;
    endDate: Date;
  } | null;
}
```

### HTML Parsing Strategy

1. **Table Structure**
   ```typescript
   const parseTable = async (html: string): Promise<RawProductData[]> => {
     // Use cheerio or similar for HTML parsing
     // Target structure: table > tr > td
     // Skip header row (first row)
     // Handle Cyrillic encoding
   }
   ```

2. **Data Extraction**
   ```typescript
   const extractProduct = (row: CheerioElement): RawProductData => {
     // Extract td contents
     // Clean and normalize text
     // Handle special characters
   }
   ```

3. **Data Transformation**
   ```typescript
   const transformProduct = (raw: RawProductData): ProcessedProduct => {
     // Convert prices to numbers
     // Parse dates
     // Extract category
     // Convert availability to boolean
   }
   ```

### NestJS Modules

1. **Scraper Module**
   - Service for each data source
   - HTML parsing utilities
   - Scheduling for regular updates
   - Error handling and retry logic

2. **Product Module**
   - Product entity and repository
   - CRUD operations
   - Search and filtering

3. **Price Module**
   - Price entity and repository
   - Price history tracking
   - Discount calculations

4. **Source Module**
   - Source entity and repository
   - Source metadata management

5. **API Module**
   - RESTful endpoints
   - GraphQL API (optional)
   - Authentication and rate limiting

## Frontend Structure (PWA)

1. **Core Components**
   - Product listing
   - Product details
   - Price comparison
   - Search and filters
   - Favorites/Watchlist

2. **PWA Features**
   - Offline support
   - Push notifications for price drops
   - App installation
   - Responsive design

## Implementation Plan

### Phase 1: Backend Setup

1. Initialize NestJS project
2. Set up PostgreSQL connection
3. Define database entities
4. Create basic API endpoints

### Phase 2: Scraper Implementation

1. Create scraper services for each source
2. Implement HTML parsing based on the provided structure
3. Set up scheduled scraping jobs
4. Implement data normalization and storage

### Phase 3: Frontend Development

1. Set up PWA framework
2. Implement core UI components
3. Connect to backend API
4. Implement offline functionality

### Phase 4: Testing & Deployment

1. Unit and integration testing
2. Performance optimization
3. Deployment setup
4. Monitoring and analytics

## Scraper Implementation Details

Based on the HTML structure provided, here's how we'll parse the data:

```typescript
interface ProductData {
  name: string;
  regularPrice: number;
  unitPrice: string;
  availability: boolean;
  description: string;
  discountedPrice: number | null;
  discountPercentage: number | null;
  promotionType: string | null;
  promotionPeriod: {
    start: Date | null;
    end: Date | null;
  };
  lastUpdated: Date;
  source: string;
}
```

The scraper will:

1. Fetch the HTML content
2. Parse the table structure
3. Extract data from each row
4. Transform dates and numeric values
5. Store normalized data in the database

## Data Extraction Process

The HTML structure contains product information in a table format. Each row represents a product with the following columns:

- Product name
- Regular price (with VAT)
- Unit price
- Availability
- Product description
- Regular price (repeated)
- Discounted price
- Discount percentage
- Type of promotion
- Promotion duration

The scraper will need to handle:

- Text encoding (appears to be in Cyrillic)
- Date parsing (format: DD/MM/YYYY)
- Price conversion to numeric values
- Availability conversion to boolean
- Extracting promotion date ranges

## API Endpoints

The backend will provide the following key API endpoints:

1. **Products**
   - `GET /products` - List all products with pagination
   - `GET /products/:id` - Get product details
   - `GET /products/search` - Search products by name/category

2. **Prices**
   - `GET /prices/product/:id` - Get all prices for a product
   - `GET /prices/compare/:ids` - Compare prices for multiple products
   - `GET /prices/history/:id` - Get price history for a product

3. **Sources**
   - `GET /sources` - List all data sources
   - `GET /sources/:id/products` - Get products from a specific source

4. **User Features**
   - `POST /watchlist` - Add product to watchlist
   - `GET /watchlist` - Get user's watchlist
   - `POST /notifications` - Configure price drop notifications