Architecture¶
How the package is organized¶
strands_sapiens/
├── __init__.py # re-exports the 8 @tool functions + TOOLS list
├── _common.py # checkpoint discovery, input resolution, response helpers
└── tools.py # all @tool implementations
Everything heavy is imported lazily - a fresh import strands_sapiens does
not import torch, sapiens, or cv2. That keeps agent startup fast and
makes the smoke tests runnable on a CPU-only CI.
Tool anatomy¶
Each tool follows the same shape:
@tool
def sapiens_<task>(...) -> dict:
"""Docstring the agent sees."""
try:
size = validate_size("<task>", model_size) # normalize + check
ckpt = checkpoint_path("<task>", size) # compute expected path
if not ckpt.exists():
return err(f"Missing checkpoint: {ckpt}")
# lazy imports so the module is still importable without these deps
import torch, cv2
from sapiens.<...> import ...
# inference, visualization, save results
return ok("...", outputs=[...], checkpoint=str(ckpt))
except Exception as e:
return err(f"sapiens_<task> failed: {e}", traceback=traceback.format_exc())
Benefits:
- Agent-safe failure mode: a broken checkpoint never raises - it returns a structured error with a traceback, so an agent can inspect and recover.
- Short import time: a fresh Python shell imports the whole package in ~50ms.
- Testable on CPU / CI without any weights or GPU.
Response contract¶
Every tool returns the standard Strands ToolResult format:
{
"status": "success", # or "error"
"content": [
{"text": "seg complete on 3 image(s)"}, # summary
{"image": {"format": "jpeg", "source": {"bytes": b"..."}}}, # inline vis (up to 5)
{"json": { # structured data
"task": "seg",
"model_size": "0.4b",
"checkpoint": "...",
"output_dir": "...",
"outputs": [{"input": "...", "vis": "...", "pred": "..."}]
}}
]
}
Compatibility strategy¶
Sapiens2 is active research code - config paths and API surfaces move. The wrapper hedges against this with:
- Arch-name helper (
_common.arch_name) - converts0.1b → sapiens2_01b, handles1b_4k → sapiens2_1b. - Config
rglobfallback - ifconfigs/seg/shutterstock_goliath/sapiens2_0.4b_seg-1024x768.pyis gone,rglob("sapiens2_0.4b_seg*.py")underconfigs/seg/still finds it. - Forward output polymorphism - backbone outputs may be tensor / list / tuple / dict; the wrapper picks the final-stage feature map in all cases.
- Forward signature fallback -
model(inputs)first, thenmodel(inputs, data_samples). - Multi-path pose API - tries
Inferencer→init_pose_model + PoseVisualizer→ scripted fallback. - Visualizer kwargs filtering - only forwards
kpt_thres/line_thickness/radiusif the installedPoseVisualizer.__init__signature actually accepts them. - Inline visualizers for normals / albedo / pointmap - no dependency on upstream visualizer classes that come and go.
Data flow (dense tasks)¶
flowchart LR
A[Image path or folder] --> B[resolve_input]
B --> C[For each image<br/>cv2.imread]
C --> D[model.pipeline<br/>data_preprocessor]
D --> E[model.forward]
E --> F[F.interpolate → input H,W]
F --> G[argmax or raw]
G --> H[Inline visualizer]
H --> I[Write vis + .npy]
I --> J[Structured dict]
Data flow (pose)¶
flowchart LR
A[Image path] --> B[DETR person detector<br/>person bboxes]
B --> C[Crop per person]
C --> D[Sapiens2 pose head<br/>308 heatmaps]
D --> E[Argmax → keypoints]
E --> F[PoseVisualizer]
F --> G[Save vis.jpg + .json]
Why Strands?¶
@tooldecorator gives every function a schema the agent can reason over.- Dicts in, dicts out - no ORM, no global state, no
contextobject. - Composable - cherry-pick one tool, or import the whole
TOOLSlist for a Sapiens-literate agent.
Roadmap¶
- True batching: current loop is one-image-at-a-time. Adding batched inference will give a 3–5× speedup on dense tasks.
- TensorRT: for Jetson-class hardware, TRT conversion can cut latency in half.
- Downstream heads:
sapiens_body_part_attention(image, limb)etc., built on seg + pointmap.