Scraping Dashboard - Docket Rocket Admin

🏗️ Scraping Architecture

Scheduler

Weekly cron jobs trigger scraping on Wednesday 2AM AEST

node-cron Node.js

→

Coles Scraper

Puppeteer-based scraper for product data extraction

Puppeteer Headless Chrome

→

Woolworths Scraper

Puppeteer-based scraper with stealth plugin

Puppeteer Stealth

→

Data Processing

Product matching and price validation

Fuzzy Match Validation

→

Database Storage

PostgreSQL tables for products and price history

PostgreSQL Indexing

🛒 Coles

Implemented

📅 Update Schedule

Wednesday 2:00 AM

Full price update - All retailers scraped for complete product catalog

Daily 6:00 AM

Specials check - Monitor for new deals and promotions

Hourly (Dev)

Test scrapes - Single category validation during development

Manual Trigger

Admin-initiated scrapes - Test specific retailers or emergency updates

🗄️ Database Schema

-- Products scraped from retailer websites
CREATE TABLE products_scraped (
    id SERIAL PRIMARY KEY,
    retailer_id INTEGER REFERENCES retailers(id),
    product_name VARCHAR(255) NOT NULL,
    brand VARCHAR(100),
    regular_price DECIMAL(10,2) NOT NULL,
    sale_price DECIMAL(10,2),
    on_special BOOLEAN DEFAULT FALSE,
    category VARCHAR(100),
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Price history tracking
CREATE TABLE price_history_scraped (
    id SERIAL PRIMARY KEY,
    product_scraped_id INTEGER REFERENCES products_scraped(id),
    price DECIMAL(10,2) NOT NULL,
    recorded_date TIMESTAMP NOT NULL
);

-- Scraping job monitoring
CREATE TABLE scraping_logs (
    id SERIAL PRIMARY KEY,
    retailer_id INTEGER REFERENCES retailers(id),
    job_type VARCHAR(50) NOT NULL,
    status VARCHAR(20) NOT NULL,
    products_found INTEGER DEFAULT 0,
    started_at TIMESTAMP,
    completed_at TIMESTAMP
);
            

📋 Recent Activity

2024-01-18 10:30

🎯 Scraping framework initialized

2024-01-18 10:32

🗄️ Database schema created

2024-01-18 10:35

🛒 Coles scraper implemented

2024-01-18 10:40

📅 Weekly scheduler configured

2024-01-21

🛍️ Woolworths scraper implemented

2024-01-21

🎯 Coles expanded to 14 categories

🔍 Real-time Scraping Logs

Auto-scroll Waiting for scrape to start...

// Real-time logs will appear here when scraping starts...
// Logs are streamed directly from the scraping service

🕷️ Scraping Dashboard

📊 Dashboard Status

🏃 Active Scraping Processes