M.010.20 — Android Google Lens Integration (Execution Spec)

Date: 2026-04-04
Owner: Copilot execution lane
Status: Ready for implementation

Goal

Integrate Google Lens-style visual understanding into HoloScript mobile flow so users can point at real objects, extract semantic labels/relations, and inject those as typed entities into a .holo scene graph.

Product behavior

User opens camera in mobile authoring mode.
User points at an object and taps Understand.
Vision service returns:
- object label(s)
- confidence
- optional text/OCR
- coarse spatial bounds
System maps result to HoloScript semantic nodes (object, tag, metadata).
Scene graph updates in real time and can be edited before commit.

Door-opening impact

Dramatically lowers scene-authoring friction from blank-canvas to semantic bootstrap.
Converts real world context into structured holographic primitives.
Creates a practical “AI-to-Holo” pipeline useful for demos, enterprise PoCs, and onboarding.

Scope

In scope (v1)

Mobile camera capture + analysis trigger
Vision-to-scene mapping for top-k object labels
Confidence-aware insertion and user review UI
.holo export path with attached semantic metadata

Out of scope (v1)

Full segmentation mesh extraction
Continuous always-on analysis stream
Multi-frame SLAM-grade object persistence

Data contracts

interface LensDetection {
  id: string;
  label: string;
  confidence: number; // 0..1
  boundingBox?: { x: number; y: number; w: number; h: number };
  text?: string;
  attributes?: Record<string, string | number | boolean>;
}

interface SceneSemanticInsertion {
  nodeId: string;
  kind: 'object' | 'tag' | 'annotation';
  sourceDetectionId: string;
  label: string;
  confidence: number;
  metadata: Record<string, unknown>;
}

Technical architecture

1) Mobile capture adapter

Capture frame on user action
Normalize image dimensions + orientation
Route to configured analyzer backend (Lens-compatible interface)

2) Analyzer gateway

Adapter interface for provider abstraction
Returns canonical LensDetection[]
Applies confidence threshold + dedup

3) Semantic mapper

Maps detections into scene entities and tags
Emits staged insertions in review buffer
Supports accept/reject per insertion

4) Scene graph commit

Accepted insertions merged into active .holo composition
Adds provenance metadata:
- analyzer provider
- detection timestamp
- confidence score

Proposed code touch points

packages/studio/:
- mobile camera authoring panel
- review/approve insertion UX
packages/core/:
- semantic insertion schema + scene graph merge utility
packages/llm-provider/ or adapter layer:
- analyzer gateway + provider abstraction

Failure taxonomy

ANALYZER_UNAVAILABLE
FRAME_CAPTURE_FAILED
NO_DETECTIONS
LOW_CONFIDENCE_ONLY
SCENE_INSERTION_CONFLICT

Each failure should provide actionable UX fallback (retry, manual label entry, skip).

Acceptance criteria

Single-frame object analysis returns semantic candidates in under 2s on target device/network.
User can approve/reject candidates before scene mutation.
Approved candidates appear in scene graph and persist in exported .holo.
Insertion metadata includes confidence + source provenance.
Failure states never corrupt existing scene graph.

Test plan

Unit

detection canonicalization and threshold filtering
semantic mapping correctness
conflict-safe insertion merge

Integration

camera capture -> analyzer -> review buffer -> commit
analyzer timeout fallback path
duplicate detection suppression

Device validation

Android capture orientation correctness
low-light and cluttered-background behavior
latency and success-rate sampling

Shipping slices

Slice A: analyzer gateway + schema + mapper core
Slice B: mobile review UX + commit pipeline
Slice C: export/provenance + reliability hardening

Metrics

lens_analysis_requests_total
lens_analysis_success_total
lens_analysis_latency_ms
semantic_candidates_generated_total
semantic_candidates_accepted_total
scene_insert_conflict_total

Definition of done

End-to-end point-and-understand flow works on Android mobile authoring path
acceptance criteria pass
.holo exports contain semantic insertions with provenance
operator docs include setup, thresholds, and known limitations

M.010.20 — Android Google Lens Integration (Execution Spec) ​

Goal ​

Product behavior ​

Door-opening impact ​

Scope ​

In scope (v1) ​

Out of scope (v1) ​

Data contracts ​

Technical architecture ​

1) Mobile capture adapter ​

2) Analyzer gateway ​

3) Semantic mapper ​

4) Scene graph commit ​

Proposed code touch points ​

Failure taxonomy ​

Acceptance criteria ​

Test plan ​

Unit ​

Integration ​

Device validation ​

Shipping slices ​

Metrics ​

Definition of done ​