scitex.db API Reference
Database operations module for scitex.
PostgreSQL support is optional and requires psycopg2. All public
symbols are imported lazily via PEP 562 __getattr__ so that
import scitex_db stays under the §10 cold-start budget — Click
runs the CLI once per Tab press, and slow imports break tab-completion.
- class scitex.db.PostgreSQL(dbname=None, user=None, password=None, host='localhost', port=5432)[source]
Bases:
_BackupMixin,_BatchMixin,_ConnectionMixin,_ImportExportMixin,_IndexMixin,_MaintenanceMixin,_QueryMixin,_RowMixin,_SchemaMixin,_TableMixin,_TransactionMixin,_BlobMixin- __call__(return_summary=False, print_summary=True, table_names=None, verbose=True, limit=5)[source]
Display or return database summary.
- property summary
Property to quickly access database summary.
- class scitex.db.SQLite3(db_path, use_temp=False, compress_by_default=False, autocommit=False)[source]
Bases:
_ArrayMixin,_ConnectionMixin,_QueryMixin,_TransactionMixin,_ColumnMixin,_TableMixin,_IndexMixin,_RowMixin,_BatchMixin,_BlobMixin,_ImportExportMixin,_MaintenanceMixin,_GitMixinSQLite database manager with automatic metadata handling, numpy array storage, and compression.
This class provides a comprehensive interface for SQLite database operations with automatic compression, thread-safe operations, and specialized numpy array handling.
- Features:
Automatic compression for BLOB data (70-90% reduction)
Thread-safe operations with proper connection management
Metadata handling for BLOB columns
Batch processing support
Context manager support for proper resource cleanup
Examples
Basic usage with context manager (recommended):
>>> with SQLite3("data.db", compress_by_default=True) as db: ... db.create_table("experiments", {"id": "INTEGER PRIMARY KEY", "data": "BLOB"}) ... data = np.random.random((1000, 100)) ... db.save_array("experiments", data, column="data", additional_columns={"id": 1})
Array storage and retrieval:
>>> with SQLite3("data.db") as db: ... # Save numpy array ... db.save_array( ... table_name="measurements", ... data=np.random.random((1000, 100)), ... column="data", ... additional_columns={"name": "experiment_1", "timestamp": 1234567890} ... ) ... # Load array ... loaded = db.load_array("measurements", "data", where="name = 'experiment_1'")
Generic object storage:
>>> with SQLite3("data.db") as db: ... db.save_blob( ... table_name="objects", ... data={"weights": array, "params": {"lr": 0.001}}, ... key="model_v1" ... ) ... loaded_obj = db.load_blob("objects", key="model_v1")
Notes
Always use context manager (with statement) for proper resource cleanup
BLOB columns automatically get metadata columns: {column}_dtype, {column}_shape, {column}_compressed
Compression is enabled by default for arrays > 1KB
Thread-safe operations are supported
- __init__(db_path, use_temp=False, compress_by_default=False, autocommit=False)[source]
Initialize SQLite database manager.
- Parameters:
db_path (str) – Path to the SQLite database file
use_temp (bool, optional) – Whether to use a temporary copy of the database, by default False
compress_by_default (bool, optional) – Whether to compress BLOB data by default when not explicitly specified, by default False
autocommit (bool, optional) – Whether to automatically commit transactions, by default False
Warning
- UserWarning
If not used with context manager, warns about potential resource leaks
- __call__(return_summary=False, print_summary=True, table_names=None, verbose=True, limit=5)[source]
Display database summary information.
- Parameters:
return_summary (bool, optional) – Whether to return summary dict, by default False
print_summary (bool, optional) – Whether to print summary to console, by default True
table_names (Optional[List[str]], optional) – Specific table names to summarize, by default None (all tables)
verbose (bool, optional) – Whether to show detailed information, by default True
limit (int, optional) – Maximum number of rows to display per table, by default 5
- Returns:
Summary dictionary if return_summary=True, else None
- Return type:
dict or None
- property summary
Quick access to database summary.
- scitex.db.batch_health_check(db_paths, verbose=False, fix_issues=False)[source]
Run health check on multiple databases
- scitex.db.check_health(db_path, verbose=True, fix_issues=False)[source]
Comprehensive health check for SQLite database
- scitex.db.delete_duplicates(*args, **kwargs)[source]
Delete duplicate entries from an SQLite database table.
Deprecated since version This: function is deprecated as it’s SQLite3-specific. Use scitex.db._sqlite3.delete_sqlite3_duplicates() instead.
- scitex.db.delete_sqlite3_duplicates(lpath_db, table_name, columns='all', include_blob=False, chunk_size=10000, dry_run=True)[source]
- scitex.db.inspect(lpath_db, table_names=None, sample_size=5, skip_count=False, verbose=True)[source]
Optimized database inspection.
Example: >>> inspect(‘path/to/database.db’) >>> inspect(‘path/to/database.db’, [‘table1’], skip_count=True)
- Parameters:
- Return type:
- Returns:
List of inspection results