Corrected Ground-Truth Smoother

A corrected ground-truth-like product across all 26 INSANE flights at the PX4-IMU scoring point, vetted against the maintainer ground truth.

The corrected ground-truth smoother is a robust GTSAM batch factor graph that produces a ground-truth-like product across all 26 INSANE flights at the PX4-IMU scoring point. It exports raw formal per-pose covariance, accompanied by a standalone, empirically NEES-calibrated scoring-time position-covariance artifact (code). It preserves the shipped maintainer truth where dual-fixed RTK already makes it strong, repairs the structural attitude defect documented on Ground-Truth Attitude Quality, and bridges the outdoor-to-indoor transition gap described on The Outdoor-to-Indoor Transition Handoff. Three dependency studies feed it.

The graph carries, per regime:

IMU preintegration as a GTSAM CombinedImuFactor off the Kalibr noise model, supplying inertial dynamics, gravity observability, and bias evolution.
Raw dual-RTK position factors at the rtk_gps1/rtk_gps2 antenna lever arms, each weighted by its per-row covariance with the centimetre RTK floors, and a dual-antenna baseline-heading factor.
A gravity-leveling static-attitude factor at the static epoch that repairs the shipped Wahba GPS+mag attitude’s structural gravity inconsistency, plus a static zero-velocity (ZUPT) factor.
A barometer vertical factor fused at the actual px4_baro event times, with a per-keyframe pressure-bias random walk that respects the sensor’s recurring dropouts.

The dual-RTK position factors are the only outdoor heading source, so attitude is validated against held-out evidence rather than self-consistency.

Outdoor and Mars Result

On the 20 outdoor/Mars runs (19 Mars missions plus outdoor_1), the smoother reproduces the trustworthy position where dual-fixed RTK supports it and repairs the attitude where the shipped solution is gravity-inconsistent.

Quantity	Median	Worst
Position RMS vs maintainer truth (m)	0.062	0.351 (`mars_14`)
Static-gravity tilt, recovered (deg)	1.16	—
Static-gravity tilt improvement vs shipped (deg)	7.13	53.7 (`outdoor_1`, 16.9 → 0.28)
Held-out-baseline heading residual p95 (deg)	3.11	6.30

Heading is validated by holding out the dual-antenna baseline-heading factor over a window placed in a dual-fixed-rich region, re-solving, and scoring recovered keyframe yaw against the withheld measured baseline heading versus gap length. A gyro-rate-integral consistency witness bounds the yaw-rate increment independently, and a mars_5/6/7 shipped-attitude cross-check is applied where those runs’ attitude is trustworthy. Across 2 s, 5 s, 10 s, and 20 s held-out gaps the median recovered-yaw error p95 stays at 0.4-1.6 deg; the gyro-integral witness confirms the shape datum independently.

Covariance Calibration

The formal pose marginals are optimistic. Calibration is empirical and split by regime, fitting a position-variance inflation scalar against held-out NEES computed in the world frame (the body-frame marginal translation block rotated by R Cov R^T; the world-over-body NEES ratio is 0.96, so the choice of frame barely moves the position NEES):

Regime	Keyframes	Raw NEES (median)	Variance inflation	Calibrated in-χ² fraction
Dual-fixed RTK	1218	77.5	32.8×	0.67
Float RTK	558	93.2	39.4×	0.75
Indoor	420	0.5	1.0×	0.75

The raw RTK marginals are roughly 33-39× too tight in variance on held-out segments. The indoor regime is the opposite: its formal covariance is rigid-body↔︎PX4-IMU-calibration-dominated and already conservative (raw NEES median 0.5, below the χ²₃ median 2.37), so under an inflate-only rule it is left at the formal marginal — a held-out segment removes truth the exported product carries, so a sub-unit scalar would publish an over-sharp ground-truth covariance. The transition in-gap pure-visual segment carries no co-temporal holdout truth, so it is excluded from the campaign-wide calibration; instead each transition export carries a separate residual-based cov_pose6_honest_ingap field alongside the raw cov_pose6_formal, inflating the formal marginal (inflate-only) to per-tag-residual NEES consistency. The campaign-wide calibration itself is a standalone per-regime artifact applied to the exported position block at scoring time; the estimation graph and trajectory are untouched.

Transition Result

The transition runs add the indoor mocap flank to the outdoor RTK flank through a jointly-solved ENU↔︎mocap transform T_enu_mocap and a rigid-body↔︎PX4-IMU calibration T_body_pximu, stereo visual-odometry between-factors mapped into the IMU body frame, and per-tag in-gap fiducial factors from the pooled board-fixed map. No transition run has co-temporal ground_truth_8hz and mocap_vehicle overlap, and the raw dual-RTK rows that overlap mocap_vehicle arrive at degraded RTK status, so closure comes from the visual and inertial chain across the gap rather than a direct truth-lane overlap.

transition_1 and transition_2 converge cleanly (backbone under Powell’s Dogleg over multifrontal QR in 12 and 16 iterations; final fiducial passes at 4 iterations each). transition_3 is the exception: its two backbone passes reach the Levenberg-Marquardt iteration cap of 100 (cost still reducing at the cap; only its final fiducial pass converges, at 31 iterations). The plain-LM backbone is kept on transition_3 on purpose — the Dogleg+QR that converges the other two settles it into a basin that opens the fiducial gate and pushes the displacement out of the witness cluster — so transition_3 is witness-supported but optimizer-limited, a lm_capped state flagged explicitly. Numerical closure of that backbone is tracked as open work.

Run	Gap (s)	Recovered net gap displacement (m)	ENU↔︎mocap yaw p95 (deg)	Fiducial yaw p95 (deg)	Stereo yaw-rate p95 (deg/s)
`transition_1`	4.97	3.20	0.23	2.09	4.31
`transition_2`	5.50	4.07	0.37	2.06	3.10
`transition_3`	11.11	3.21	0.63	0.75	3.96

Each recovered net gap displacement lands inside its per-run acceptance band — the independent-witness cluster set by rs_odom onboard VIO, the marker-bridge reconstruction, and the independent stereo VO (transition_3 [2.75, 3.41] m, transition_2 [4.00, 4.07] m, transition_1 [3.14, 3.20] m) — while the shipped ground_truth_80hz in-gap fill on transition_3 (~8.0 m) is a 2.48× outlier and is excluded as a scoring lane. Transition heading is validated by three physically independent through-gap consistency checks rather than self-consistency: one rigid ENU↔︎mocap yaw that reconciles the outdoor dual-RTK heading with the indoor mocap heading, stereo yaw-rate continuity against the stereo-VO between-factors, and per-tag fiducial orientation through the board-fixed map. The gap has no fully independent absolute in-gap yaw witness, so these guard against common-mode yaw error rather than supplying a second external truth.

Dependency Studies

The Computer-Vision Feasibility Probe established that cameras can carry the transition gap and gates the vision front-end. The Visual-Inertial Bundle Adjustment provides the pooled board-fixed fiducial map and the stereo relative-pose front-end. The Camera-LRF Calibration Tests resolve the laser-range-finder beam direction with an explicit uncertainty band.

Limitations and Open Work

The crossing carries two physically distinct error processes, modeled separately. Near the building (the outdoor flank, exposed by transition_2) the limiting error is RTK bias carrying false confidence: the reported RTK covariance saturates at its ~1.4 cm horizontal floor while the true position σ rises to ~0.6–0.7 m at the Dronehall entrance. A distance-ramped spatial RTK σ-floor inside a 4 m door band — max-combined with the reported σ so it only ever widens, keyed to the near-door dual-fixed signature rather than a run name — cut the transition_2 outdoor-flank truth RMS from 0.39 m to 0.16 m by no longer over-trusting the near-door fixes. The fiducial consistency gate has a narrow usable window (near ~1.5 m; at ≥3 m it admits mis-registered frames), so the residual fragility is a fiducial-system property — the gate and the absolute board→ENU registration — not a missing in-gap witness.
Inside the gap (exposed by the deep transition_3 gap) the limiting error is the absolute registration of the pooled fiducial field into the world frame. A corner-level fiducial bundle adjustment was evaluated for map-tightening and ruled out: the pooled board map is already ~1 cm metric and each in-gap frame’s multi-tag PnP agrees on the camera pose to ~3 cm, so map quality was never the bottleneck — the residual is IMU/VO drift plus board→ENU registration drift across the unpinned gap, which a tighter map cannot move. (Whether a per-frame multi-tag PnP versus single-square IPPE choice moves the transition_3 in-gap solve is a distinct, still-open question, not a map-tightening one.)
A level-then-yaw decomposition of the ENU↔︎mocap registration was tested as an A/B diagnostic and is benign, not a negative result: on transition_1/transition_2 it matches the plain SO(3)-mean registration to ~1e-5, and on a transition_3 rerun that isolates the seed effect the two seeds differ by 0.2 mm in gap displacement (2.75250 vs 2.75270 m) and 0.003° in the T_enu_mocap yaw, with both converging. It is retained as a diagnostic of the solved registration; the plain SO(3)-mean registration ships.
The three ground UWB anchors (mounted on the door fiducial boards) and the airframe UWB-tag lever arm — absent from the shipped calibration — are recovered, closing the previously-unobservable UWB geometry. As an in-gap factor it is dominated, though: the pooled fiducial map already pins the in-gap position to ~1 cm while the recovered anchors carry ~0.3 m range σ, and the door anchors go non-line-of-sight inside the gap once the vehicle crosses the doorway. The board-fixed fiducial map at millimetre-to-centimetre accuracy dominates both non-vision in-gap witnesses: the LRF nadir-range factor is likewise dominated (the in-gap vertical is the weak axis but fiducial-pinned to 16 mm against the LRF’s ~50-137 mm range prediction), and the nav↔︎stereo cameras agree to 7-13 cm in the board frame on the approach but see disjoint boards at disjoint times. UWB, LRF, and the cross-camera tie therefore stay validation cross-checks, not factors.
The indoor-only product is complete: indoor_1/2/3 run the same batch graph with mocap through the solved registration, exported at the PX4-IMU scoring point in a gravity-leveled ENU (z-axis gravity-up; yaw and origin the conventional gauge, since there is no GNSS indoors), reproducing the mocap truth to 5.8–7.2 mm RMS and bridging held-out 2.5 s mocap windows from inertial support to ~19 mm RMS. All 26 flights now carry a product with raw formal per-pose marginals plus the standalone per-regime scoring-time position-covariance calibration; the open covariance question is whether a sharper-but-honest covariance is recoverable for the transition in-gap pure-visual segment, which carries no co-temporal truth.