Repro Module (stx.repro)
Reproducibility utilities: random state management, ID generation, timestamps, and array hashing.
Quick Reference
import scitex as stx
# Fix all random seeds (numpy, torch, random, ...)
rng = stx.repro.get() # Global manager (seed=42)
rng = stx.repro.reset(seed=123) # Reset with new seed
# Named generators for deterministic results
data_gen = rng.get_np_generator("data")
data = data_gen.random(100) # Same seed+name = same result
# Unique identifiers
stx.repro.gen_id()
# → "2026Y-02M-13D-14h30m15s_a3Bc9xY2"
stx.repro.gen_timestamp()
# → "2026-0213-1430"
# Verify reproducibility
rng.verify(data, "train_data") # First: caches hash
rng.verify(data, "train_data") # Later: verifies match
RandomStateManager
Central class for managing random states across libraries.
rng = stx.repro.RandomStateManager(seed=42)
# Named generators (same name + seed = deterministic)
np_gen = rng.get_np_generator("experiment")
torch_gen = rng.get_torch_generator("model")
# Checkpoint and restore
rng.checkpoint("before_training")
rng.restore("before_training.pkl")
# Temporary seed change
with rng.temporary_seed(999):
noise = rng.get_np_generator("noise").random(10)
Automatically fixes seeds for: random, numpy, torch (+ CUDA),
tensorflow, jax.
Available Functions
get(verbose)– Get or create global RandomStateManager singletonreset(seed, verbose)– Reset global instance with new seedfix_seeds(seed, ...)– Legacy function (use RandomStateManager instead)gen_id(time_format, N)– Generate unique timestamp + random IDgen_timestamp()– Generate timestamp string for file naminghash_array(array_data)– SHA256 hash of numpy array (16 chars)
API Reference
scitex-repro — Reproducibility utilities for scientific computing.
Provides tools for reproducible scientific computing: - Random state management (RandomStateManager) - ID generation (gen_ID) - Timestamp generation (gen_timestamp) - Array hashing (hash_array)
- scitex.repro.gen_ID(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8, *, now_fn=None)
Generate a unique identifier with timestamp and random characters.
Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.
- Parameters:
time_format (str, optional) – Format string for timestamp portion. Default is “%YY-%mM-%dD-%Hh%Mm%Ss” which produces “2025Y-05M-31D-12h30m45s” format.
N (int, optional) – Number of random characters to append. Default is 8.
now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.
- Returns:
Unique identifier in format “{timestamp}_{random_chars}”
- Return type:
Examples
>>> id1 = gen_id() >>> print(id1) '2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4) >>> print(id2) '20250531_xY9a'
>>> # For experiment tracking >>> exp_id = gen_id() >>> save_path = f"results/experiment_{exp_id}.pkl"
- scitex.repro.gen_id(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8, *, now_fn=None)[source]
Generate a unique identifier with timestamp and random characters.
Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.
- Parameters:
time_format (str, optional) – Format string for timestamp portion. Default is “%YY-%mM-%dD-%Hh%Mm%Ss” which produces “2025Y-05M-31D-12h30m45s” format.
N (int, optional) – Number of random characters to append. Default is 8.
now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.
- Returns:
Unique identifier in format “{timestamp}_{random_chars}”
- Return type:
Examples
>>> id1 = gen_id() >>> print(id1) '2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4) >>> print(id2) '20250531_xY9a'
>>> # For experiment tracking >>> exp_id = gen_id() >>> save_path = f"results/experiment_{exp_id}.pkl"
- scitex.repro.gen_timestamp(*, now_fn=None)[source]
Generate a timestamp string for file naming.
Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.
- Parameters:
now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.
- Returns:
Timestamp string in format “YYYY-MMDD-HHMM”
- Return type:
Examples
>>> timestamp = gen_timestamp() >>> print(timestamp) '2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv" >>> print(filename) 'experiment_2025-0531-1230.csv'
- scitex.repro.timestamp(*, now_fn=None)
Generate a timestamp string for file naming.
Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.
- Parameters:
now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.
- Returns:
Timestamp string in format “YYYY-MMDD-HHMM”
- Return type:
Examples
>>> timestamp = gen_timestamp() >>> print(timestamp) '2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv" >>> print(filename) 'experiment_2025-0531-1230.csv'
- scitex.repro.hash_array(array_data)[source]
Generate hash for array data.
Creates a deterministic hash for numpy arrays, useful for verifying data integrity and reproducibility.
- Parameters:
array_data (np.ndarray) – Array to hash
- Returns:
16-character hash string
- Return type:
Examples
>>> import numpy as np >>> data = np.array([1, 2, 3, 4, 5]) >>> hash1 = hash_array(data) >>> hash2 = hash_array(data) >>> hash1 == hash2 True
- class scitex.repro.RandomStateManager(seed=42, verbose=False)[source]
Simple, robust random state manager for scientific computing.
Examples
>>> from scitex_repro import RandomStateManager >>> >>> # Method 1: Direct usage >>> rng = RandomStateManager(seed=42) >>> data = rng("data").random(100) >>> >>> # Verify reproducibility >>> rng.verify(data, "my_data")
- get_np_generator(name)[source]
Get or create a named NumPy random generator.
- Parameters:
name (str) – Generator name (e.g., “data”, “model”, “augment”)
- Returns:
Independent NumPy random generator
- Return type:
Examples
>>> rng = RandomStateManager(42) >>> gen = rng.get_np_generator("data") >>> values = gen.random(100) >>> perm = gen.permutation(100)
- __call__(name, verbose=None)[source]
Get or create a named NumPy random generator.
This is a backward compatibility wrapper for get_np_generator(). Consider using get_np_generator() directly for clarity.
- Parameters:
- Returns:
NumPy random generator with deterministic seed
- Return type:
- verify(obj, name=None, verbose=True)[source]
Verify object matches cached hash (detects broken reproducibility).
First call: caches the object’s hash Later calls: verifies object matches cached hash
- Parameters:
obj (Any) – Object to verify (array, tensor, data, model weights, etc.) Supports: numpy arrays, torch tensors, tf tensors, jax arrays, lists, dicts, pandas dataframes, and basic types
name (str, optional) – Cache name. Auto-generated if not provided.
- Returns:
True if matches cache (or first call), False if different
- Return type:
Examples
>>> data = generate_data() >>> rng.verify(data, "train_data") # First run: caches >>> # Next run: >>> rng.verify(data, "train_data") # Verifies match
- get_sklearn_random_state(name)[source]
Get a random state for scikit-learn.
Scikit-learn uses integers for random_state parameter.
Examples
>>> rng = RandomStateManager(42) >>> from sklearn.model_selection import train_test_split >>> X_train, X_test = train_test_split( ... X, test_size=0.2, ... random_state=rng.get_sklearn_random_state("split") ... )
- get_torch_generator(name)[source]
Get or create a named PyTorch generator.
- Parameters:
name (str) – Generator name
- Returns:
PyTorch generator with deterministic seed
- Return type:
Examples
>>> rng = RandomStateManager(42) >>> gen = rng.get_torch_generator("model") >>> torch.randn(5, 5, generator=gen)
- scitex.repro.get(verbose=False)[source]
Get or create the global RandomStateManager instance.
- Parameters:
verbose (bool, optional) – Whether to print status messages (default: False)
- Returns:
Global instance
- Return type:
Examples
>>> from scitex_repro import get >>> rng = get() >>> data = rng("data").random(100)