Clew Module (stx.clew)
Hash-based provenance tracking for reproducible science. Clew (Ariadne’s
thread) records file hashes during @stx.session runs and traces
dependency chains back to source.
How It Works
@stx.sessionstarts a tracking sessionstx.io.load()records input file hashesstx.io.save()records output file hashesSession close computes a combined hash of all inputs/outputs
Later,
stx.clewcan verify nothing has changed
import scitex as stx
# Automatic -- just use @stx.session + stx.io
@stx.session
def main():
data = stx.io.load("input.csv") # Tracked as input
result = process(data)
stx.io.save(result, "output.png") # Tracked as output
return 0
# Verify later
stx.clew.status() # Like git status
stx.clew.run("session_id") # Verify by hash
stx.clew.chain("output.png") # Trace to source
CLI Commands
scitex clew status # Show changed files
scitex clew list # List all tracked runs
scitex clew run <session_id> # Verify a specific run
scitex clew chain <file> # Trace dependency chain
scitex clew stats # Database statistics
Verification Levels
CACHE – Hash comparison only (fast). Checks if files match stored hashes.
RERUN – Re-execute scripts and compare outputs (thorough). Catches logic errors.
# Fast: hash comparison
result = stx.clew.run("session_id")
# Thorough: re-execute and compare
result = stx.clew.run("session_id", from_scratch=True)
Dependency Chains
Clew traces parent_session links to build a DAG from final output
back to original source:
chain = stx.clew.chain("final_figure.png")
# Shows: source.py → intermediate.csv → analysis.py → final_figure.png
# Visualize as Mermaid DAG
stx.clew.mermaid("session_id")
Verification Statuses
VERIFIED– Files match expected hashesMISMATCH– Files differ from stored hashesMISSING– Files no longer existUNKNOWN– No prior tracking data
Key Functions
status()– Show changed items (likegit status)run(session_id)– Verify a specific runchain(target_file)– Trace dependency chainlist_runs(limit, status)– List tracked runsstats()– Database statistics
API Reference
See scitex.clew API Reference for the auto-generated Python API.