πŸ•·οΈ Scraping Dashboard

← Back to Dashboard
⚠️ Super Admin Only - Web scraping infrastructure monitoring and control

πŸ“Š Dashboard Status

πŸ’€ Idle polling (120s) Last refresh: Never
-
Active Scrapers
Wed 2AM
Next Full Update
-
Products Scraped
2
Target Retailers

πŸƒ Active Scraping Processes

No active scraping processes

πŸ—οΈ Scraping Architecture

Scheduler

Weekly cron jobs trigger scraping on Wednesday 2AM AEST

node-cron Node.js
β†’

Coles Scraper

Puppeteer-based scraper for product data extraction

Puppeteer Headless Chrome
β†’

Woolworths Scraper

Puppeteer-based scraper with stealth plugin

Puppeteer Stealth
β†’

Data Processing

Product matching and price validation

Fuzzy Match Validation
β†’

Database Storage

PostgreSQL tables for products and price history

PostgreSQL Indexing
πŸ›’ Coles
Implemented
14
Categories
-
Products

URL: coles.com.au
Method: Puppeteer + Infinite Scroll
Update: Wednesday 2AM AEST

πŸ›οΈ Woolworths
Pending
15
Categories
-
Products

URL: woolworths.com.au
Method: Puppeteer + Stealth
Update: Wednesday 2AM AEST

πŸͺ IGA
Ready
11
Categories
-
Products

URL: igashop.com.au
Method: Enhanced Anti-Bot
Update: On-demand

πŸ“… Update Schedule

Wednesday 2:00 AM
Full price update - All retailers scraped for complete product catalog
Daily 6:00 AM
Specials check - Monitor for new deals and promotions
Hourly (Dev)
Test scrapes - Single category validation during development
Manual Trigger
Admin-initiated scrapes - Test specific retailers or emergency updates

πŸŽ›οΈ Control Panel

πŸ—„οΈ Database Schema

-- Products scraped from retailer websites CREATE TABLE products_scraped ( id SERIAL PRIMARY KEY, retailer_id INTEGER REFERENCES retailers(id), product_name VARCHAR(255) NOT NULL, brand VARCHAR(100), regular_price DECIMAL(10,2) NOT NULL, sale_price DECIMAL(10,2), on_special BOOLEAN DEFAULT FALSE, category VARCHAR(100), scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Price history tracking CREATE TABLE price_history_scraped ( id SERIAL PRIMARY KEY, product_scraped_id INTEGER REFERENCES products_scraped(id), price DECIMAL(10,2) NOT NULL, recorded_date TIMESTAMP NOT NULL ); -- Scraping job monitoring CREATE TABLE scraping_logs ( id SERIAL PRIMARY KEY, retailer_id INTEGER REFERENCES retailers(id), job_type VARCHAR(50) NOT NULL, status VARCHAR(20) NOT NULL, products_found INTEGER DEFAULT 0, started_at TIMESTAMP, completed_at TIMESTAMP );

πŸ“‹ Recent Activity

2024-01-18 10:30
🎯 Scraping framework initialized
2024-01-18 10:32
πŸ—„οΈ Database schema created
2024-01-18 10:35
πŸ›’ Coles scraper implemented
2024-01-18 10:40
πŸ“… Weekly scheduler configured
2024-01-21
πŸ›οΈ Woolworths scraper implemented
2024-01-21
🎯 Coles expanded to 14 categories

πŸ” Real-time Scraping Logs

Waiting for scrape to start...
// Real-time logs will appear here when scraping starts...
// Logs are streamed directly from the scraping service