294 lines
7.3 KiB
Markdown
294 lines
7.3 KiB
Markdown
# Price Comparison PWA Solution Structure
|
||
|
||
This document outlines the comprehensive solution structure for a price comparison PWA with NestJS backend and PostgreSQL database.
|
||
|
||
## System Architecture
|
||
|
||
```mermaid
|
||
graph TD
|
||
A[Web Scrapers] -->|Extract Data| B[NestJS Backend]
|
||
B -->|Store Data| C[PostgreSQL Database]
|
||
B -->|Serve API| D[PWA Frontend]
|
||
E[Users] -->|Use| D
|
||
```
|
||
|
||
## Backend Structure
|
||
|
||
### Database Schema
|
||
|
||
```mermaid
|
||
erDiagram
|
||
PRODUCT {
|
||
int id PK
|
||
string name
|
||
string description
|
||
string category
|
||
boolean availability
|
||
}
|
||
PRICE {
|
||
int id PK
|
||
int product_id FK
|
||
float regular_price
|
||
float discounted_price
|
||
float discount_percentage
|
||
string unit_price
|
||
string promotion_type
|
||
date promotion_start
|
||
date promotion_end
|
||
date last_updated
|
||
int source_id FK
|
||
}
|
||
SOURCE {
|
||
int id PK
|
||
string name
|
||
string url
|
||
string logo
|
||
datetime last_scraped
|
||
}
|
||
PRODUCT ||--o{ PRICE : has
|
||
SOURCE ||--o{ PRICE : provides
|
||
```
|
||
|
||
### Additional Database Fields
|
||
|
||
We'll add these fields to handle the specific data format:
|
||
- **Product**: Add `sourceProductId` to track original product IDs
|
||
- **Price**: Add `vatIncluded` boolean flag since prices include VAT
|
||
- **Source**: Add `lastUpdateTime` to track the "Последно ажурирање" timestamp
|
||
|
||
### Data Transformation Rules
|
||
|
||
1. **Text Processing**
|
||
- Handle Cyrillic text encoding (UTF-8)
|
||
- Parse product names and descriptions
|
||
- Extract category from description field
|
||
|
||
2. **Price Processing**
|
||
- Convert prices from string to float
|
||
- Handle "ден/кг" unit price format
|
||
- Store both VAT-included and VAT-excluded prices
|
||
|
||
3. **Date Processing**
|
||
- Parse dates from "DD/MM/YYYY" format
|
||
- Handle time in "HH:mm" format for last update
|
||
- Store timestamps in UTC
|
||
|
||
### Scraper Implementation
|
||
|
||
The scraper will process the HTML table structure:
|
||
|
||
```typescript
|
||
interface RawProductData {
|
||
productName: string; // "Назив на стока"
|
||
regularPrice: string; // "Продажна цена (со ДДВ)"
|
||
unitPrice: string; // "Единечна цена"
|
||
availability: string; // "Достапност во продажен објект"
|
||
description: string; // "Опис на стока"
|
||
discountPrice: string; // "Цена со попуст"
|
||
discountPercent: string; // "Попуст (%)"
|
||
promotionType: string; // "Вид на продажно потикнување"
|
||
promotionPeriod: string; // "Времетраење на промоција или попуст"
|
||
}
|
||
|
||
interface ProcessedProduct {
|
||
name: string;
|
||
description: string;
|
||
category: string; // Extracted from description
|
||
availability: boolean;
|
||
prices: {
|
||
regular: number;
|
||
discounted: number | null;
|
||
unit: {
|
||
price: number;
|
||
measurement: string; // "ден/кг", etc.
|
||
};
|
||
};
|
||
promotion: {
|
||
type: string;
|
||
discountPercentage: number;
|
||
startDate: Date;
|
||
endDate: Date;
|
||
} | null;
|
||
}
|
||
```
|
||
|
||
### HTML Parsing Strategy
|
||
|
||
1. **Table Structure**
|
||
```typescript
|
||
const parseTable = async (html: string): Promise<RawProductData[]> => {
|
||
// Use cheerio or similar for HTML parsing
|
||
// Target structure: table > tr > td
|
||
// Skip header row (first row)
|
||
// Handle Cyrillic encoding
|
||
}
|
||
```
|
||
|
||
2. **Data Extraction**
|
||
```typescript
|
||
const extractProduct = (row: CheerioElement): RawProductData => {
|
||
// Extract td contents
|
||
// Clean and normalize text
|
||
// Handle special characters
|
||
}
|
||
```
|
||
|
||
3. **Data Transformation**
|
||
```typescript
|
||
const transformProduct = (raw: RawProductData): ProcessedProduct => {
|
||
// Convert prices to numbers
|
||
// Parse dates
|
||
// Extract category
|
||
// Convert availability to boolean
|
||
}
|
||
```
|
||
|
||
### NestJS Modules
|
||
|
||
1. **Scraper Module**
|
||
- Service for each data source
|
||
- HTML parsing utilities
|
||
- Scheduling for regular updates
|
||
- Error handling and retry logic
|
||
|
||
2. **Product Module**
|
||
- Product entity and repository
|
||
- CRUD operations
|
||
- Search and filtering
|
||
|
||
3. **Price Module**
|
||
- Price entity and repository
|
||
- Price history tracking
|
||
- Discount calculations
|
||
|
||
4. **Source Module**
|
||
- Source entity and repository
|
||
- Source metadata management
|
||
|
||
5. **API Module**
|
||
- RESTful endpoints
|
||
- GraphQL API (optional)
|
||
- Authentication and rate limiting
|
||
|
||
## Frontend Structure (PWA)
|
||
|
||
1. **Core Components**
|
||
- Product listing
|
||
- Product details
|
||
- Price comparison
|
||
- Search and filters
|
||
- Favorites/Watchlist
|
||
|
||
2. **PWA Features**
|
||
- Offline support
|
||
- Push notifications for price drops
|
||
- App installation
|
||
- Responsive design
|
||
|
||
## Implementation Plan
|
||
|
||
### Phase 1: Backend Setup
|
||
|
||
1. Initialize NestJS project
|
||
2. Set up PostgreSQL connection
|
||
3. Define database entities
|
||
4. Create basic API endpoints
|
||
|
||
### Phase 2: Scraper Implementation
|
||
|
||
1. Create scraper services for each source
|
||
2. Implement HTML parsing based on the provided structure
|
||
3. Set up scheduled scraping jobs
|
||
4. Implement data normalization and storage
|
||
|
||
### Phase 3: Frontend Development
|
||
|
||
1. Set up PWA framework
|
||
2. Implement core UI components
|
||
3. Connect to backend API
|
||
4. Implement offline functionality
|
||
|
||
### Phase 4: Testing & Deployment
|
||
|
||
1. Unit and integration testing
|
||
2. Performance optimization
|
||
3. Deployment setup
|
||
4. Monitoring and analytics
|
||
|
||
## Scraper Implementation Details
|
||
|
||
Based on the HTML structure provided, here's how we'll parse the data:
|
||
|
||
```typescript
|
||
interface ProductData {
|
||
name: string;
|
||
regularPrice: number;
|
||
unitPrice: string;
|
||
availability: boolean;
|
||
description: string;
|
||
discountedPrice: number | null;
|
||
discountPercentage: number | null;
|
||
promotionType: string | null;
|
||
promotionPeriod: {
|
||
start: Date | null;
|
||
end: Date | null;
|
||
};
|
||
lastUpdated: Date;
|
||
source: string;
|
||
}
|
||
```
|
||
|
||
The scraper will:
|
||
|
||
1. Fetch the HTML content
|
||
2. Parse the table structure
|
||
3. Extract data from each row
|
||
4. Transform dates and numeric values
|
||
5. Store normalized data in the database
|
||
|
||
## Data Extraction Process
|
||
|
||
The HTML structure contains product information in a table format. Each row represents a product with the following columns:
|
||
|
||
- Product name
|
||
- Regular price (with VAT)
|
||
- Unit price
|
||
- Availability
|
||
- Product description
|
||
- Regular price (repeated)
|
||
- Discounted price
|
||
- Discount percentage
|
||
- Type of promotion
|
||
- Promotion duration
|
||
|
||
The scraper will need to handle:
|
||
|
||
- Text encoding (appears to be in Cyrillic)
|
||
- Date parsing (format: DD/MM/YYYY)
|
||
- Price conversion to numeric values
|
||
- Availability conversion to boolean
|
||
- Extracting promotion date ranges
|
||
|
||
## API Endpoints
|
||
|
||
The backend will provide the following key API endpoints:
|
||
|
||
1. **Products**
|
||
- `GET /products` - List all products with pagination
|
||
- `GET /products/:id` - Get product details
|
||
- `GET /products/search` - Search products by name/category
|
||
|
||
2. **Prices**
|
||
- `GET /prices/product/:id` - Get all prices for a product
|
||
- `GET /prices/compare/:ids` - Compare prices for multiple products
|
||
- `GET /prices/history/:id` - Get price history for a product
|
||
|
||
3. **Sources**
|
||
- `GET /sources` - List all data sources
|
||
- `GET /sources/:id/products` - Get products from a specific source
|
||
|
||
4. **User Features**
|
||
- `POST /watchlist` - Add product to watchlist
|
||
- `GET /watchlist` - Get user's watchlist
|
||
- `POST /notifications` - Configure price drop notifications
|