sapiens_seg and sapiens_normal on a single image (0.4b model, NVIDIA Thor)Strands Sapiens exposes Meta's Sapiens2 - a family of high-resolution transformers pretrained on 1B human images - as idiomatic Strands Agents tools.
One Python import, every human-centric vision head: body-part segmentation, surface normals, intrinsic albedo, 3D pointmaps, 308-keypoint pose, plus raw pretrained backbone features.
What it gives you¶
-
๐งฉ 29-class body-part segmentation
โ Guide
-
๐งญ Surface normals & albedo
-
๐ 3D pointmap (per-pixel 3D)
โ Pointmap guide
-
๐ฆด 308-keypoint 2D pose
Face 274 + body + hands + feet.
โ Pose guide
-
๐ง Pretrain backbone features
Drop-in dense features for RAG / downstream heads.
โ Backbone guide
-
๐ Checkpoint / env discovery
โ Checkpoints
-
๐ฌ Video processing
Frame-by-frame inference on any video file.
โ Video guide
Why¶
Sapiens2 is the current state-of-the-art for human perception at native high resolution (up to 4096ร3072). It's one of the highest-signal models ever open-sourced for human-centric understanding - but it ships as a research codebase with CLI scripts, config gymnastics, and manual checkpoint wiring.
Strands Sapiens turns each head into a one-line, agent-callable tool, with structured responses, defensive fallbacks, and compatibility across the upstream's breaking API shuffles.
60-second quickstart¶
# 1) CUDA PyTorch (platform-specific; e.g. CUDA 12.4)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
# 2) Sapiens2
pip install git+https://github.com/facebookresearch/sapiens2.git
# 3) This wrapper
pip install git+ssh://git@github.com/cagataycali/strands-sapiens.git
# 4) Drop a checkpoint into ~/sapiens2_host/seg/sapiens2_0.4b_seg.safetensors
export SAPIENS_CHECKPOINT_ROOT=~/sapiens2_host
# 5) Go
python -c "from strands_sapiens import sapiens_info; print(sapiens_info())"
Full installation โ
API reference โ
Use from a Strands agent¶
from strands import Agent
from strands_sapiens import TOOLS # list of @tool
agent = Agent(tools=TOOLS)
agent("Segment every person in /data/photos and save to /data/out")
agent("Run 308-kpt pose on /data/photos/jump.jpg")
agent("What Sapiens2 checkpoints do I have available locally?")
Every tool returns a structured dict:
{
"status": "success" | "error",
"message": "...",
"outputs": [...], # per-image entries
"checkpoint": "...",
# task-specific keys
}
Verified environment¶
Tested on NVIDIA Thor (JetPack 6, aarch64) with CUDA PyTorch 2.7+ and the
sapiens2_0.4b_seg / sapiens2_0.1b_pretrain checkpoints.
Python 3.10+ (Thor's default 3.10 works; newer also fine).
Built on top of Meta's Sapiens2, powered by Strands Agents.