sporedimk/systemarhitecture.md
2025-05-06 10:01:03 +02:00

7.3 KiB
Raw Blame History

Price Comparison PWA Solution Structure

This document outlines the comprehensive solution structure for a price comparison PWA with NestJS backend and PostgreSQL database.

System Architecture

graph TD
    A[Web Scrapers] -->|Extract Data| B[NestJS Backend]
    B -->|Store Data| C[PostgreSQL Database]
    B -->|Serve API| D[PWA Frontend]
    E[Users] -->|Use| D

Backend Structure

Database Schema

erDiagram
    PRODUCT {
        int id PK
        string name
        string description
        string category
        boolean availability
    }
    PRICE {
        int id PK
        int product_id FK
        float regular_price
        float discounted_price
        float discount_percentage
        string unit_price
        string promotion_type
        date promotion_start
        date promotion_end
        date last_updated
        int source_id FK
    }
    SOURCE {
        int id PK
        string name
        string url
        string logo
        datetime last_scraped
    }
    PRODUCT ||--o{ PRICE : has
    SOURCE ||--o{ PRICE : provides

Additional Database Fields

We'll add these fields to handle the specific data format:

  • Product: Add sourceProductId to track original product IDs
  • Price: Add vatIncluded boolean flag since prices include VAT
  • Source: Add lastUpdateTime to track the "Последно ажурирање" timestamp

Data Transformation Rules

  1. Text Processing

    • Handle Cyrillic text encoding (UTF-8)
    • Parse product names and descriptions
    • Extract category from description field
  2. Price Processing

    • Convert prices from string to float
    • Handle "ден/кг" unit price format
    • Store both VAT-included and VAT-excluded prices
  3. Date Processing

    • Parse dates from "DD/MM/YYYY" format
    • Handle time in "HH:mm" format for last update
    • Store timestamps in UTC

Scraper Implementation

The scraper will process the HTML table structure:

interface RawProductData {
  productName: string;       // "Назив на стока"
  regularPrice: string;      // "Продажна цена (со ДДВ)"
  unitPrice: string;        // "Единечна цена"
  availability: string;     // "Достапност во продажен објект"
  description: string;      // "Опис на стока"
  discountPrice: string;    // "Цена со попуст"
  discountPercent: string;  // "Попуст (%)"
  promotionType: string;    // "Вид на продажно потикнување"
  promotionPeriod: string;  // "Времетраење на промоција или попуст"
}

interface ProcessedProduct {
  name: string;
  description: string;
  category: string;  // Extracted from description
  availability: boolean;
  prices: {
    regular: number;
    discounted: number | null;
    unit: {
      price: number;
      measurement: string;  // "ден/кг", etc.
    };
  };
  promotion: {
    type: string;
    discountPercentage: number;
    startDate: Date;
    endDate: Date;
  } | null;
}

HTML Parsing Strategy

  1. Table Structure

    const parseTable = async (html: string): Promise<RawProductData[]> => {
      // Use cheerio or similar for HTML parsing
      // Target structure: table > tr > td
      // Skip header row (first row)
      // Handle Cyrillic encoding
    }
    
  2. Data Extraction

    const extractProduct = (row: CheerioElement): RawProductData => {
      // Extract td contents
      // Clean and normalize text
      // Handle special characters
    }
    
  3. Data Transformation

    const transformProduct = (raw: RawProductData): ProcessedProduct => {
      // Convert prices to numbers
      // Parse dates
      // Extract category
      // Convert availability to boolean
    }
    

NestJS Modules

  1. Scraper Module

    • Service for each data source
    • HTML parsing utilities
    • Scheduling for regular updates
    • Error handling and retry logic
  2. Product Module

    • Product entity and repository
    • CRUD operations
    • Search and filtering
  3. Price Module

    • Price entity and repository
    • Price history tracking
    • Discount calculations
  4. Source Module

    • Source entity and repository
    • Source metadata management
  5. API Module

    • RESTful endpoints
    • GraphQL API (optional)
    • Authentication and rate limiting

Frontend Structure (PWA)

  1. Core Components

    • Product listing
    • Product details
    • Price comparison
    • Search and filters
    • Favorites/Watchlist
  2. PWA Features

    • Offline support
    • Push notifications for price drops
    • App installation
    • Responsive design

Implementation Plan

Phase 1: Backend Setup

  1. Initialize NestJS project
  2. Set up PostgreSQL connection
  3. Define database entities
  4. Create basic API endpoints

Phase 2: Scraper Implementation

  1. Create scraper services for each source
  2. Implement HTML parsing based on the provided structure
  3. Set up scheduled scraping jobs
  4. Implement data normalization and storage

Phase 3: Frontend Development

  1. Set up PWA framework
  2. Implement core UI components
  3. Connect to backend API
  4. Implement offline functionality

Phase 4: Testing & Deployment

  1. Unit and integration testing
  2. Performance optimization
  3. Deployment setup
  4. Monitoring and analytics

Scraper Implementation Details

Based on the HTML structure provided, here's how we'll parse the data:

interface ProductData {
  name: string;
  regularPrice: number;
  unitPrice: string;
  availability: boolean;
  description: string;
  discountedPrice: number | null;
  discountPercentage: number | null;
  promotionType: string | null;
  promotionPeriod: {
    start: Date | null;
    end: Date | null;
  };
  lastUpdated: Date;
  source: string;
}

The scraper will:

  1. Fetch the HTML content
  2. Parse the table structure
  3. Extract data from each row
  4. Transform dates and numeric values
  5. Store normalized data in the database

Data Extraction Process

The HTML structure contains product information in a table format. Each row represents a product with the following columns:

  • Product name
  • Regular price (with VAT)
  • Unit price
  • Availability
  • Product description
  • Regular price (repeated)
  • Discounted price
  • Discount percentage
  • Type of promotion
  • Promotion duration

The scraper will need to handle:

  • Text encoding (appears to be in Cyrillic)
  • Date parsing (format: DD/MM/YYYY)
  • Price conversion to numeric values
  • Availability conversion to boolean
  • Extracting promotion date ranges

API Endpoints

The backend will provide the following key API endpoints:

  1. Products

    • GET /products - List all products with pagination
    • GET /products/:id - Get product details
    • GET /products/search - Search products by name/category
  2. Prices

    • GET /prices/product/:id - Get all prices for a product
    • GET /prices/compare/:ids - Compare prices for multiple products
    • GET /prices/history/:id - Get price history for a product
  3. Sources

    • GET /sources - List all data sources
    • GET /sources/:id/products - Get products from a specific source
  4. User Features

    • POST /watchlist - Add product to watchlist
    • GET /watchlist - Get user's watchlist
    • POST /notifications - Configure price drop notifications