Robbyant/lingbot-map
原文摘要
A feed-forward 3D foundation model for reconstructing scenes from streaming data LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction Robbyant Team https://github.com/user-attachments/assets/fe39e095-af2c-4ec9-b68d-a8ba97e505ab 🗺️ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! 🏗️🌍 LingBot-Map has focused on: Geometric Context Transformer : Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory. High-Efficiency Streaming Inference : A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518×378 resolution over long sequences exceeding 10,000 frames. State-of-the-Art Reconstruction : Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches. 📑 Table of Contents Click to expand 📰 News 📋 TODO ⚙️ Installation 📦 Model Download 🚀 Quick Start 🎬 Interactive Demo ( demo.py ) Try the Example Scenes Streaming with Keyframe Interval Windowed Inference (for lon…