scitex.db API Reference

Database operations module for scitex.

PostgreSQL support is optional and requires psycopg2. All public symbols are imported lazily via PEP 562 __getattr__ so that import scitex_db stays under the §10 cold-start budget — Click runs the CLI once per Tab press, and slow imports break tab-completion.

class scitex.db.PostgreSQL(dbname=None, user=None, password=None, host='localhost', port=5432)[source]

Bases: _BackupMixin, _BatchMixin, _ConnectionMixin, _ImportExportMixin, _IndexMixin, _MaintenanceMixin, _QueryMixin, _RowMixin, _SchemaMixin, _TableMixin, _TransactionMixin, _BlobMixin

__call__(return_summary=False, print_summary=True, table_names=None, verbose=True, limit=5)[source]

Display or return database summary.

property summary

Property to quickly access database summary.

class scitex.db.SQLite3(db_path, use_temp=False, compress_by_default=False, autocommit=False)[source]

Bases: _ArrayMixin, _ConnectionMixin, _QueryMixin, _TransactionMixin, _ColumnMixin, _TableMixin, _IndexMixin, _RowMixin, _BatchMixin, _BlobMixin, _ImportExportMixin, _MaintenanceMixin, _GitMixin

SQLite database manager with automatic metadata handling, numpy array storage, and compression.

This class provides a comprehensive interface for SQLite database operations with automatic compression, thread-safe operations, and specialized numpy array handling.

Features:
  • Automatic compression for BLOB data (70-90% reduction)

  • Thread-safe operations with proper connection management

  • Metadata handling for BLOB columns

  • Batch processing support

  • Context manager support for proper resource cleanup

Examples

Basic usage with context manager (recommended):

>>> with SQLite3("data.db", compress_by_default=True) as db:
...     db.create_table("experiments", {"id": "INTEGER PRIMARY KEY", "data": "BLOB"})
...     data = np.random.random((1000, 100))
...     db.save_array("experiments", data, column="data", additional_columns={"id": 1})

Array storage and retrieval:

>>> with SQLite3("data.db") as db:
...     # Save numpy array
...     db.save_array(
...         table_name="measurements",
...         data=np.random.random((1000, 100)),
...         column="data",
...         additional_columns={"name": "experiment_1", "timestamp": 1234567890}
...     )
...     # Load array
...     loaded = db.load_array("measurements", "data", where="name = 'experiment_1'")

Generic object storage:

>>> with SQLite3("data.db") as db:
...     db.save_blob(
...         table_name="objects",
...         data={"weights": array, "params": {"lr": 0.001}},
...         key="model_v1"
...     )
...     loaded_obj = db.load_blob("objects", key="model_v1")

Notes

  • Always use context manager (with statement) for proper resource cleanup

  • BLOB columns automatically get metadata columns: {column}_dtype, {column}_shape, {column}_compressed

  • Compression is enabled by default for arrays > 1KB

  • Thread-safe operations are supported

__init__(db_path, use_temp=False, compress_by_default=False, autocommit=False)[source]

Initialize SQLite database manager.

Parameters:
  • db_path (str) – Path to the SQLite database file

  • use_temp (bool, optional) – Whether to use a temporary copy of the database, by default False

  • compress_by_default (bool, optional) – Whether to compress BLOB data by default when not explicitly specified, by default False

  • autocommit (bool, optional) – Whether to automatically commit transactions, by default False

Warning

UserWarning

If not used with context manager, warns about potential resource leaks

__enter__()[source]

Enter context manager.

__exit__(exc_type, exc_val, exc_tb)[source]

Exit context manager and ensure proper cleanup.

__del__()[source]

Destructor with context manager usage warning.

__call__(return_summary=False, print_summary=True, table_names=None, verbose=True, limit=5)[source]

Display database summary information.

Parameters:
  • return_summary (bool, optional) – Whether to return summary dict, by default False

  • print_summary (bool, optional) – Whether to print summary to console, by default True

  • table_names (Optional[List[str]], optional) – Specific table names to summarize, by default None (all tables)

  • verbose (bool, optional) – Whether to show detailed information, by default True

  • limit (int, optional) – Maximum number of rows to display per table, by default 5

Returns:

Summary dictionary if return_summary=True, else None

Return type:

dict or None

property summary

Quick access to database summary.

scitex.db.batch_health_check(db_paths, verbose=False, fix_issues=False)[source]

Run health check on multiple databases

Parameters:
  • db_paths (list of str) – List of database paths

  • verbose (bool, default False) – Print results for each database

  • fix_issues (bool, default False) – Attempt to fix issues

Returns:

Results for each database

Return type:

dict

scitex.db.check_health(db_path, verbose=True, fix_issues=False)[source]

Comprehensive health check for SQLite database

Parameters:
  • db_path (str) – Path to database file

  • verbose (bool, default True) – Print detailed results

  • fix_issues (bool, default False) – Attempt to fix detected issues

Returns:

Health check results

Return type:

dict

scitex.db.delete_duplicates(*args, **kwargs)[source]

Delete duplicate entries from an SQLite database table.

Deprecated since version This: function is deprecated as it’s SQLite3-specific. Use scitex.db._sqlite3.delete_sqlite3_duplicates() instead.

scitex.db.delete_sqlite3_duplicates(lpath_db, table_name, columns='all', include_blob=False, chunk_size=10000, dry_run=True)[source]
Return type:

Tuple[Optional[int], Optional[int]]

scitex.db.inspect(lpath_db, table_names=None, sample_size=5, skip_count=False, verbose=True)[source]

Optimized database inspection.

Example: >>> inspect(‘path/to/database.db’) >>> inspect(‘path/to/database.db’, [‘table1’], skip_count=True)

Parameters:
  • lpath_db (str) – Path to the SQLite database file

  • table_names (Optional[List[str]]) – List of table names to inspect (None for all)

  • sample_size (int) – Number of sample rows to retrieve

  • skip_count (bool) – Skip row counting for large tables (much faster)

  • verbose (bool) – Print inspection results

Return type:

List[Dict[str, Any]]

Returns:

List of inspection results