Data Architecture

Technical documentation for the NTZ Conveyor Belt system

Entity Hierarchy

Data is organized in a strict hierarchy from collection mission down to individual gaze samples:

Mission (ski trip, e.g., "Tahoe 2025") │ └── Day (single date) │ └── Session (one skier's data collection) │ └── Run (single descent, ~10 min) │ └── Segment (contiguous skiing or viewing) │ └── Turn (one compression/extension cycle) │ └── Frame (single timestamp: video + IMU + gaze) │ └── GazePoint (x, y, NTZ classification)

Current Data Model

┌─────────────────────────────────────────────────────────────────────────────┐ │ Mission │ │ ───────── │ │ id: int (PK) │ │ name: string "Tahoe Mogul Study 2025" │ │ start_date: date │ │ end_date: date │ │ location: string │ │ notes: text │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ 1:many ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Day │ │ ───── │ │ id: int (PK) │ │ mission_id: int (FK) │ │ date: date │ │ notes: text │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ 1:many ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Session │ │ ───────── │ │ id: int (PK) │ │ day_id: int (FK) │ │ skier_name: string │ │ start_time: datetime │ │ notes: text │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ 1:many ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Run │ │ ───── │ │ id: int (PK) │ │ session_id: int (FK) │ │ run_number: int │ │ video_file: string path to video file │ │ notes: text │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ 1:many ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Segment │ │ ───────── │ │ id: int (PK) │ │ run_id: int (FK) │ │ segment_number: int │ │ segment_type: enum 'skiing' | 'stationary' │ │ start_frame_number: int │ │ end_frame_number: int │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ 1:many ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Turn │ │ ────── │ │ id: int (PK) │ │ segment_id: int (FK) │ │ turn_number: int resets per segment │ │ cumulative_turn_number: int within run (optional) │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ 1:many ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Frame │ │ ─────── │ │ id: int (PK) │ │ turn_id: int (FK) │ │ frame_number: int within turn │ │ timestamp: float seconds from run start │ │ video_frame_number: int absolute in video │ │ phase_angle: float 0-360° │ │ imu_data: JSON {accel: {x,y,z}, gyro: {x,y,z}} │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ 1:many ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ GazePoint │ │ ─────────── │ │ id: int (PK) │ │ frame_id: int (FK) │ │ x: float video frame x coord (0-1920) │ │ y: float video frame y coord (0-1080) │ │ confidence: float from Pupil Labs (0-1) │ │ ntz_index: int computed: 0, 1, 2, 3... │ └─────────────────────────────────────────────────────────────────────────────┘

Data Volume Estimates

Entity Per Run Per Day (5 runs) Per Mission (3 days)
Runs1515
Segments~4~20~60
Turns~50~250~750
Frames~18,000~90,000~270,000
GazePoints~18,000~90,000~270,000

Assuming 30fps video, ~10 min runs, ~20 turns per skiing segment.

Data Model Options

Option A: Current Model Implemented

Turn-centric hierarchy

Frames belong to Turns. Turn boundaries determined by phase angle crossing 0°/360°.

+ Natural for "aggregate by turn" analysis

+ Turn number available at query time

− Requires pre-processing to assign frames to turns

− Can't easily redefine turn boundaries later

Option B: Flat Frame Model Alternative

Frames belong directly to Segment

Segment └── Frame (with phase_angle, no turn FK) └── GazePoint

Turn number computed at query time from phase angle.

+ Simpler import (no turn boundary detection)

+ Can redefine turn boundaries without re-import

− Turn aggregation requires window functions

− Heavier queries for turn-level stats

Option C: Time-series Store Future

Store frames as time-series data

Use TimescaleDB or similar for efficient time-range queries.

+ Optimized for "give me frames 1000-2000"

+ Better compression for large datasets

− Additional infrastructure

− Overkill for prototype scale

Option D: Denormalized for Viz Alternative

Pre-compute flat table for visualization

VizData (materialized view or table) ────────────────────────────────────────── segment_id | turn_number | phase_angle | timestamp | gaze_x | gaze_y | ntz_index

Keep normalized model for storage, denormalized for fast reads.

+ Single query for visualization

+ Best of both worlds

− Data duplication

− Refresh on updates

NTZ Classification Options

Current: Fixed Pixel Bands

NTZ determined by gaze Y coordinate in video frame:

y > 700 → NTZ-0 (bottom of frame = close) y > 500 → NTZ-1 y > 300 → NTZ-2 y ≤ 300 → NTZ-3+ (top of frame = far ahead)

+ Simple, fast

− Doesn't account for head tilt or terrain angle

− Arbitrary thresholds

Alternative: Perspective-Corrected

Map video Y to estimated real-world distance, then classify.

Requires camera calibration and terrain model.

+ More accurate

− Needs additional data (camera params, terrain)

Alternative: PTL-Based

Detect actual mogul locations (PTLs) in video, classify gaze by nearest PTL.

Requires computer vision / drone mapping.

+ Ground truth accuracy

− Significant additional work

API Endpoints

Endpoint Purpose
/api/missions/ List/detail missions
/api/segments/ List segments (filter by type)
/api/conveyor/segment_data/?segment_id=X Flat data for visualization
/api/conveyor/ntz_summary/?segment_id=X NTZ distribution by phase bin

Open Design Questions

View on GitHub