Repro Module (stx.repro)

Reproducibility utilities: random state management, ID generation, timestamps, and array hashing.

Quick Reference

import scitex as stx

# Fix all random seeds (numpy, torch, random, ...)
rng = stx.repro.get()           # Global manager (seed=42)
rng = stx.repro.reset(seed=123) # Reset with new seed

# Named generators for deterministic results
data_gen = rng.get_np_generator("data")
data = data_gen.random(100)  # Same seed+name = same result

# Unique identifiers
stx.repro.gen_id()
# → "2026Y-02M-13D-14h30m15s_a3Bc9xY2"

stx.repro.gen_timestamp()
# → "2026-0213-1430"

# Verify reproducibility
rng.verify(data, "train_data")  # First: caches hash
rng.verify(data, "train_data")  # Later: verifies match

RandomStateManager

Central class for managing random states across libraries.

rng = stx.repro.RandomStateManager(seed=42)

# Named generators (same name + seed = deterministic)
np_gen = rng.get_np_generator("experiment")
torch_gen = rng.get_torch_generator("model")

# Checkpoint and restore
rng.checkpoint("before_training")
rng.restore("before_training.pkl")

# Temporary seed change
with rng.temporary_seed(999):
    noise = rng.get_np_generator("noise").random(10)

Automatically fixes seeds for: random, numpy, torch (+ CUDA), tensorflow, jax.

Available Functions

  • get(verbose) – Get or create global RandomStateManager singleton

  • reset(seed, verbose) – Reset global instance with new seed

  • fix_seeds(seed, ...) – Legacy function (use RandomStateManager instead)

  • gen_id(time_format, N) – Generate unique timestamp + random ID

  • gen_timestamp() – Generate timestamp string for file naming

  • hash_array(array_data) – SHA256 hash of numpy array (16 chars)

API Reference

scitex-repro — Reproducibility utilities for scientific computing.

Provides tools for reproducible scientific computing: - Random state management (RandomStateManager) - ID generation (gen_ID) - Timestamp generation (gen_timestamp) - Array hashing (hash_array)

scitex.repro.gen_ID(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8, *, now_fn=None)

Generate a unique identifier with timestamp and random characters.

Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.

Parameters:
  • time_format (str, optional) – Format string for timestamp portion. Default is “%YY-%mM-%dD-%Hh%Mm%Ss” which produces “2025Y-05M-31D-12h30m45s” format.

  • N (int, optional) – Number of random characters to append. Default is 8.

  • now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.

Returns:

Unique identifier in format “{timestamp}_{random_chars}”

Return type:

str

Examples

>>> id1 = gen_id()
>>> print(id1)
'2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4)
>>> print(id2)
'20250531_xY9a'
>>> # For experiment tracking
>>> exp_id = gen_id()
>>> save_path = f"results/experiment_{exp_id}.pkl"
scitex.repro.gen_id(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8, *, now_fn=None)[source]

Generate a unique identifier with timestamp and random characters.

Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.

Parameters:
  • time_format (str, optional) – Format string for timestamp portion. Default is “%YY-%mM-%dD-%Hh%Mm%Ss” which produces “2025Y-05M-31D-12h30m45s” format.

  • N (int, optional) – Number of random characters to append. Default is 8.

  • now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.

Returns:

Unique identifier in format “{timestamp}_{random_chars}”

Return type:

str

Examples

>>> id1 = gen_id()
>>> print(id1)
'2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4)
>>> print(id2)
'20250531_xY9a'
>>> # For experiment tracking
>>> exp_id = gen_id()
>>> save_path = f"results/experiment_{exp_id}.pkl"
scitex.repro.gen_timestamp(*, now_fn=None)[source]

Generate a timestamp string for file naming.

Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.

Parameters:

now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.

Returns:

Timestamp string in format “YYYY-MMDD-HHMM”

Return type:

str

Examples

>>> timestamp = gen_timestamp()
>>> print(timestamp)
'2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv"
>>> print(filename)
'experiment_2025-0531-1230.csv'
scitex.repro.timestamp(*, now_fn=None)

Generate a timestamp string for file naming.

Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.

Parameters:

now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.

Returns:

Timestamp string in format “YYYY-MMDD-HHMM”

Return type:

str

Examples

>>> timestamp = gen_timestamp()
>>> print(timestamp)
'2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv"
>>> print(filename)
'experiment_2025-0531-1230.csv'
scitex.repro.hash_array(array_data)[source]

Generate hash for array data.

Creates a deterministic hash for numpy arrays, useful for verifying data integrity and reproducibility.

Parameters:

array_data (np.ndarray) – Array to hash

Returns:

16-character hash string

Return type:

str

Examples

>>> import numpy as np
>>> data = np.array([1, 2, 3, 4, 5])
>>> hash1 = hash_array(data)
>>> hash2 = hash_array(data)
>>> hash1 == hash2
True
class scitex.repro.RandomStateManager(seed=42, verbose=False)[source]

Simple, robust random state manager for scientific computing.

Examples

>>> from scitex_repro import RandomStateManager
>>>
>>> # Method 1: Direct usage
>>> rng = RandomStateManager(seed=42)
>>> data = rng("data").random(100)
>>>
>>> # Verify reproducibility
>>> rng.verify(data, "my_data")
__init__(seed=42, verbose=False)[source]

Initialize with automatic module detection.

get_np_generator(name)[source]

Get or create a named NumPy random generator.

Parameters:

name (str) – Generator name (e.g., “data”, “model”, “augment”)

Returns:

Independent NumPy random generator

Return type:

numpy.random.Generator

Examples

>>> rng = RandomStateManager(42)
>>> gen = rng.get_np_generator("data")
>>> values = gen.random(100)
>>> perm = gen.permutation(100)
__call__(name, verbose=None)[source]

Get or create a named NumPy random generator.

This is a backward compatibility wrapper for get_np_generator(). Consider using get_np_generator() directly for clarity.

Parameters:
  • name (str) – Generator name

  • verbose (bool, optional) – Whether to show deprecation warning

Returns:

NumPy random generator with deterministic seed

Return type:

numpy.random.Generator

verify(obj, name=None, verbose=True)[source]

Verify object matches cached hash (detects broken reproducibility).

First call: caches the object’s hash Later calls: verifies object matches cached hash

Parameters:
  • obj (Any) – Object to verify (array, tensor, data, model weights, etc.) Supports: numpy arrays, torch tensors, tf tensors, jax arrays, lists, dicts, pandas dataframes, and basic types

  • name (str, optional) – Cache name. Auto-generated if not provided.

Returns:

True if matches cache (or first call), False if different

Return type:

bool

Examples

>>> data = generate_data()
>>> rng.verify(data, "train_data")  # First run: caches
>>> # Next run:
>>> rng.verify(data, "train_data")  # Verifies match
checkpoint(name='checkpoint')[source]

Save current state of all generators.

restore(checkpoint)[source]

Restore from checkpoint.

temporary_seed(seed)[source]

Context manager for temporary seed change.

get_sklearn_random_state(name)[source]

Get a random state for scikit-learn.

Scikit-learn uses integers for random_state parameter.

Parameters:

name (str) – Generator name

Returns:

Random state integer for sklearn

Return type:

int

Examples

>>> rng = RandomStateManager(42)
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test = train_test_split(
...     X, test_size=0.2,
...     random_state=rng.get_sklearn_random_state("split")
... )
get_torch_generator(name)[source]

Get or create a named PyTorch generator.

Parameters:

name (str) – Generator name

Returns:

PyTorch generator with deterministic seed

Return type:

torch.Generator

Examples

>>> rng = RandomStateManager(42)
>>> gen = rng.get_torch_generator("model")
>>> torch.randn(5, 5, generator=gen)
get_generator(name)[source]

Alias for get_np_generator for compatibility.

clear_cache(patterns=None)[source]

Clear verification cache files.

Parameters:

patterns (str or list of str, optional) – Specific cache patterns to clear. If None, clears all.

Returns:

Number of cache files removed

Return type:

int

scitex.repro.get(verbose=False)[source]

Get or create the global RandomStateManager instance.

Parameters:

verbose (bool, optional) – Whether to print status messages (default: False)

Returns:

Global instance

Return type:

RandomStateManager

Examples

>>> from scitex_repro import get
>>> rng = get()
>>> data = rng("data").random(100)
scitex.repro.reset(seed=42, verbose=False)[source]

Reset global RandomStateManager with new seed.

Parameters:
  • seed (int) – New seed value

  • verbose (bool, optional) – Whether to print status messages (default: False)

Returns:

New global instance

Return type:

RandomStateManager

Examples

>>> from scitex_repro import reset
>>> rng = reset(seed=123)
scitex.repro.fix_seeds(seed=42, os=True, random=True, np=True, torch=True, tf=False, jax=False, verbose=False, **kwargs)[source]

Deprecated: Use RandomStateManager instead.

This function maintains backward compatibility with the old fix_seeds API.