Self-calibrating cross-camera homography for real-time ghost prediction in multi-camera person tracking[P]

The problem: In multi-camera tracking, when camera A loses track of a person but camera B still sees them, naive approaches extrapolate pixel coordinates linearly. This fails immediately because cameras have completely different coordinate systems. A person at pixel (400, 300) on camera B might be at (800, 500) on camera A, depending on relative position and angle.

Approach: When both cameras simultaneously observe the same person (matched via 64-dim HSV appearance descriptors, L2-normalized, EMA-smoothed at alpha=0.3), we record foot-point correspondence pairs. Bottom-center of the bounding box in each view projects to the same physical ground-plane point.

After 4+ such pairs, cv2.findHomography() + RANSAC gives a 3x3 matrix H mapping camera B pixel space to camera A. System auto-relearns every 5 new pairs and monitors reprojection error, flushing H if it spikes (camera moved).

Three fallback paths:

Path A (H-PROJ, green): homography projection from any source camera with valid H. Most accurate.
Path B (EXTRAP, red): pixel extrapolation with adaptive budget min(250px, 80 + 40*t). Last resort.
Path C (WORLD, orange): world-coordinate pinhole projection from fused 3D Kalman state. Always available.

Costs:

Homography re-estimation: < 0.1ms (called every 5 new pairs)
Per-prediction projection: < 0.001ms

Tracking: Hungarian assignment with 0.6 * IoU + 0.4 * cosine appearance cost. DeepSORT (MobileNet) as primary, falls back to Hungarian (scipy), then centroid.

Sensor trust: Each camera earns trust [0.1, 1.0] via consistency. High-innovation measurements get down-weighted. Kalman measurement noise R scales per update based on confidence, bbox area, and sensor trust.

Full implementation: github.com/mandarwagh9/overwatch. 57 unit tests covering Kalman, homography, tracking. CI on GitHub Actions.

Limitations: ground-plane homography breaks for elevated cameras with steep angles. Re-ID via HSV histograms is weak for people in similar clothing at close spatial proximity.

Curious if anyone has tackled non-ground-plane cross-camera projection or used learned embeddings instead of HSV histograms for re-ID at this inference budget.

submitted by /u/Straight_Stable_6095
[link] [comments]

Self-calibrating cross-camera homography for real-time ghost prediction in multi-camera person tracking[P]

Want to read more?

Tagged with