sporedimk/systemarhitecture.md
2025-05-06 10:01:03 +02:00

294 lines
7.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Price Comparison PWA Solution Structure
This document outlines the comprehensive solution structure for a price comparison PWA with NestJS backend and PostgreSQL database.
## System Architecture
```mermaid
graph TD
A[Web Scrapers] -->|Extract Data| B[NestJS Backend]
B -->|Store Data| C[PostgreSQL Database]
B -->|Serve API| D[PWA Frontend]
E[Users] -->|Use| D
```
## Backend Structure
### Database Schema
```mermaid
erDiagram
PRODUCT {
int id PK
string name
string description
string category
boolean availability
}
PRICE {
int id PK
int product_id FK
float regular_price
float discounted_price
float discount_percentage
string unit_price
string promotion_type
date promotion_start
date promotion_end
date last_updated
int source_id FK
}
SOURCE {
int id PK
string name
string url
string logo
datetime last_scraped
}
PRODUCT ||--o{ PRICE : has
SOURCE ||--o{ PRICE : provides
```
### Additional Database Fields
We'll add these fields to handle the specific data format:
- **Product**: Add `sourceProductId` to track original product IDs
- **Price**: Add `vatIncluded` boolean flag since prices include VAT
- **Source**: Add `lastUpdateTime` to track the "Последно ажурирање" timestamp
### Data Transformation Rules
1. **Text Processing**
- Handle Cyrillic text encoding (UTF-8)
- Parse product names and descriptions
- Extract category from description field
2. **Price Processing**
- Convert prices from string to float
- Handle "ден/кг" unit price format
- Store both VAT-included and VAT-excluded prices
3. **Date Processing**
- Parse dates from "DD/MM/YYYY" format
- Handle time in "HH:mm" format for last update
- Store timestamps in UTC
### Scraper Implementation
The scraper will process the HTML table structure:
```typescript
interface RawProductData {
productName: string; // "Назив на стока"
regularPrice: string; // "Продажна цена (со ДДВ)"
unitPrice: string; // "Единечна цена"
availability: string; // "Достапност во продажен објект"
description: string; // "Опис на стока"
discountPrice: string; // "Цена со попуст"
discountPercent: string; // "Попуст (%)"
promotionType: string; // "Вид на продажно потикнување"
promotionPeriod: string; // "Времетраење на промоција или попуст"
}
interface ProcessedProduct {
name: string;
description: string;
category: string; // Extracted from description
availability: boolean;
prices: {
regular: number;
discounted: number | null;
unit: {
price: number;
measurement: string; // "ден/кг", etc.
};
};
promotion: {
type: string;
discountPercentage: number;
startDate: Date;
endDate: Date;
} | null;
}
```
### HTML Parsing Strategy
1. **Table Structure**
```typescript
const parseTable = async (html: string): Promise<RawProductData[]> => {
// Use cheerio or similar for HTML parsing
// Target structure: table > tr > td
// Skip header row (first row)
// Handle Cyrillic encoding
}
```
2. **Data Extraction**
```typescript
const extractProduct = (row: CheerioElement): RawProductData => {
// Extract td contents
// Clean and normalize text
// Handle special characters
}
```
3. **Data Transformation**
```typescript
const transformProduct = (raw: RawProductData): ProcessedProduct => {
// Convert prices to numbers
// Parse dates
// Extract category
// Convert availability to boolean
}
```
### NestJS Modules
1. **Scraper Module**
- Service for each data source
- HTML parsing utilities
- Scheduling for regular updates
- Error handling and retry logic
2. **Product Module**
- Product entity and repository
- CRUD operations
- Search and filtering
3. **Price Module**
- Price entity and repository
- Price history tracking
- Discount calculations
4. **Source Module**
- Source entity and repository
- Source metadata management
5. **API Module**
- RESTful endpoints
- GraphQL API (optional)
- Authentication and rate limiting
## Frontend Structure (PWA)
1. **Core Components**
- Product listing
- Product details
- Price comparison
- Search and filters
- Favorites/Watchlist
2. **PWA Features**
- Offline support
- Push notifications for price drops
- App installation
- Responsive design
## Implementation Plan
### Phase 1: Backend Setup
1. Initialize NestJS project
2. Set up PostgreSQL connection
3. Define database entities
4. Create basic API endpoints
### Phase 2: Scraper Implementation
1. Create scraper services for each source
2. Implement HTML parsing based on the provided structure
3. Set up scheduled scraping jobs
4. Implement data normalization and storage
### Phase 3: Frontend Development
1. Set up PWA framework
2. Implement core UI components
3. Connect to backend API
4. Implement offline functionality
### Phase 4: Testing & Deployment
1. Unit and integration testing
2. Performance optimization
3. Deployment setup
4. Monitoring and analytics
## Scraper Implementation Details
Based on the HTML structure provided, here's how we'll parse the data:
```typescript
interface ProductData {
name: string;
regularPrice: number;
unitPrice: string;
availability: boolean;
description: string;
discountedPrice: number | null;
discountPercentage: number | null;
promotionType: string | null;
promotionPeriod: {
start: Date | null;
end: Date | null;
};
lastUpdated: Date;
source: string;
}
```
The scraper will:
1. Fetch the HTML content
2. Parse the table structure
3. Extract data from each row
4. Transform dates and numeric values
5. Store normalized data in the database
## Data Extraction Process
The HTML structure contains product information in a table format. Each row represents a product with the following columns:
- Product name
- Regular price (with VAT)
- Unit price
- Availability
- Product description
- Regular price (repeated)
- Discounted price
- Discount percentage
- Type of promotion
- Promotion duration
The scraper will need to handle:
- Text encoding (appears to be in Cyrillic)
- Date parsing (format: DD/MM/YYYY)
- Price conversion to numeric values
- Availability conversion to boolean
- Extracting promotion date ranges
## API Endpoints
The backend will provide the following key API endpoints:
1. **Products**
- `GET /products` - List all products with pagination
- `GET /products/:id` - Get product details
- `GET /products/search` - Search products by name/category
2. **Prices**
- `GET /prices/product/:id` - Get all prices for a product
- `GET /prices/compare/:ids` - Compare prices for multiple products
- `GET /prices/history/:id` - Get price history for a product
3. **Sources**
- `GET /sources` - List all data sources
- `GET /sources/:id/products` - Get products from a specific source
4. **User Features**
- `POST /watchlist` - Add product to watchlist
- `GET /watchlist` - Get user's watchlist
- `POST /notifications` - Configure price drop notifications