Computer-Vision Feasibility Probe

The computer-vision feasibility probe gates the smoother’s vision front-end on the transition imagery (code). It feeds the Corrected Ground-Truth Smoother, works on the imagery described under Cameras, and asks one falsifiable question across The Outdoor-to-Indoor Transition Handoff: can the download’s camera data improve a ground-truth-like reconstruction through the gap, where ground_truth_8hz and mocap_vehicle never overlap? The gap is not empty of estimates — the marker bridge, the ground_truth_80hz lane, and rs_odom VIO all continue through it and disagree — so the probe measures whether an independent camera reconstruction can adjudicate them. The handover gaps are 4.97 s, 5.50 s, and 11.11 s on transition_1/transition_2/transition_3.

Result

Four measure-only tests, each with a pass/fail.

Tag-bridge gate (diagnostic; gates the covisibility test). Every gap frame sees at least four ArUco tags (99 / 110 / 222 frames on the three runs), with no no-tag, blurred, or mis-exposed frame. The tags do not bridge for a structural reason: no single tag is co-visible on both immediate handover flanks. The scene is continuously textured, so the gate is favorable; bridging needs the in-gap covisibility chain or natural features.

Out-of-plane conditioning, transition_1 (pass). A two-view natural-feature cloud is genuinely non-planar — smallest-to-largest singular ratio 0.109 versus 0.0069 for the coplanar tag corners, median off-plane depth 7× the tags. Planar-only PnP is degenerate; adding the natural geometry drops the PnP condition number from 89 (tags only) to 24 (natural). Natural features supply the 3-D structure the coplanar tags lack.

Outdoor-to-indoor covisibility chain (pass, all runs). A DISK+LightGlue per-frame relay from an RTK-anchored outdoor frame to a mocap-anchored indoor frame is geometrically verified and unbroken on every run; the weakest link holds 867-1045 inliers, and the bottleneck sits at the real transition point yet still holds. An independent feature chain reaches the mocap-anchored side, overcoming the tag-only limit.

Stereo visual-inertial honesty, transition_3 (pass). Stereo is usable (1261 metric 3-D points per frame, depth 0.35-5.4 m). The stereo VO is metric (Umeyama scale 1.08, ~8% short) and tracks the onboard VIO to 316 mm rigid ATE over a padded window; its vertical agrees with the LRF and barometer to under 1 m. A full visual-inertial bundle adjustment with IMU would remove the residual scale bias.

The honesty test also adjudicates the in-gap reconstructions by strict-gap frame-invariant net displacement:

In-gap source transition_3 net displacement (m)
rs_odom onboard VIO 3.24
Marker bridge 3.40
Independent stereo VO 2.63
Shipped ground_truth_80hz fill 8.00

Two independent sensing families agree at roughly 3 m while the shipped ground_truth_80hz lane is a ~2.4× outlier, so the independent stereo flags the 80 Hz gap-fill as wrong. This is consistent with the smoother’s exclusion of that lane as a scoring source.

Verdict

Cameras can carry the transition gap: coverage is continuous, natural features add the 3-D structure the coplanar tags lack, a dense covisibility relay bridges outdoor-to-indoor on every run, and metric stereo VO carries the gap while catching the 80 Hz gap-fill error. This motivates the tightly-coupled visual-inertial smoother. The static fiducial rig is the physical anchor tying the field together: its pose is effectively identical across the three transitions, so a tag map built on one transition transfers to the others — the premise the Visual-Inertial Bundle Adjustment builds on.