Stats Module (`stx.stats`)

23 statistical tests with effect sizes, confidence intervals, and publication-ready formatting.

Quick Reference

import scitex as stx
import numpy as np

g1 = np.random.normal(0, 1, 50)
g2 = np.random.normal(0.5, 1, 50)

# Run a test
result = stx.stats.test_ttest_ind(g1, g2)

# Result is a flat dict with all details
print(result["statistic"])     # t-statistic
print(result["pvalue"])        # p-value
print(result["effect_size"])   # Cohen's d
print(result["stars"])         # Significance stars

# Get as DataFrame or LaTeX
df = stx.stats.test_ttest_ind(g1, g2, return_as="dataframe")
tex = stx.stats.test_ttest_ind(g1, g2, return_as="latex")

Available Tests

Parametric

test_ttest_1samp(data, popmean) – One-sample t-test
test_ttest_ind(g1, g2) – Independent two-sample t-test
test_ttest_rel(g1, g2) – Paired t-test
test_anova(*groups) – One-way ANOVA
test_anova_2way(data, factor1, factor2) – Two-way ANOVA
test_anova_rm(data, groups) – Repeated measures ANOVA

Non-parametric

test_mannwhitneyu(g1, g2) – Mann-Whitney U test
test_wilcoxon(g1, g2) – Wilcoxon signed-rank test
test_kruskal(*groups) – Kruskal-Wallis H test
test_friedman(*groups) – Friedman test
test_brunner_munzel(g1, g2) – Brunner-Munzel test

Correlation

test_pearson(x, y) – Pearson correlation
test_spearman(x, y) – Spearman rank correlation
test_kendall(x, y) – Kendall tau correlation
test_theilsen(x, y) – Theil-Sen robust regression

Categorical

test_chi2(observed) – Chi-squared test
test_fisher(table) – Fisher’s exact test
test_mcnemar(table) – McNemar test
test_cochran_q(*groups) – Cochran’s Q test

Normality

test_shapiro(data) – Shapiro-Wilk test
test_ks_1samp(data) – One-sample Kolmogorov-Smirnov test
test_ks_2samp(x, y) – Two-sample Kolmogorov-Smirnov test
test_normality(*samples) – Multi-sample normality check

Seaborn-Style Data Parameter

All two-sample and one-sample tests accept an optional data parameter for DataFrame/CSV column resolution (like seaborn):

import pandas as pd

df = pd.read_csv("experiment.csv")

# Two-sample: column names as x/y
result = stx.stats.test_ttest_ind(x="before", y="after", data=df)

# One-sample: column name as x
result = stx.stats.test_shapiro(x="scores", data=df)

# Multi-group: value + group columns
result = stx.stats.test_anova(data=df, value_col="score", group_col="treatment")

# Also works with CSV path
result = stx.stats.test_ttest_ind(x="col1", y="col2", data="data.csv")

Test Recommendation

Not sure which test to use? Let SciTeX recommend:

recommendations = stx.stats.recommend_tests(
    n_groups=2,
    sample_sizes=[30, 35],
    outcome_type="continuous",
    paired=False,
)

Output Formats

Every test supports return_as parameter:

"dict" (default) – Returns plain dict with all results
"dataframe" – Returns pandas DataFrame

Descriptive Statistics

stx.stats.describe(data)                    # mean, std, median, IQR, etc.
stx.stats.effect_sizes.cohens_d(g1, g2)     # Cohen's d
stx.stats.test_normality(g1, g2)            # Multi-sample normality
stx.stats.p_to_stars(0.003)                 # "**"

Multiple Comparison Correction

corrected = stx.stats.correct_pvalues(
    [0.01, 0.03, 0.05, 0.001],
    method="fdr_bh",
)

Post-hoc Tests

stx.stats.posthoc_test(
    [g1, g2, g3],
    group_names=["Control", "Treatment A", "Treatment B"],
    method="tukey",
)

API Reference

scitex-stats — Publication-ready statistical testing framework.

Functionalities

run_test(name, …) — single dispatcher across 23 tests (parametric, nonparametric, correlation, categorical, normality) returning a unified result dict (statistic, pvalue, effect_size, power, formatted, …).
recommend_tests(StatContext(…)) — design-driven test selection from number of groups, sample sizes, outcome type, paired vs between.
effect_sizes, power, correct, posthoc, descriptive, auto — submodules exposing the primitives behind run_test (Cohen’s d / Cliff’s delta / eta-sq / sample-size-ttest / Bonferroni / FDR / Tukey HSD / Dunn / …).
APA / Nature / LaTeX formatting via result[“formatted”].

IO

Reads: numeric arrays (numpy.ndarray, pandas.DataFrame, pandas.Series, sequences); optional .env walk-up via scitex-config; runtime cache under $SCITEX_DIR/stats/runtime/.
Writes: nothing by default — pure functions returning result dicts. Caller persists via scitex_io.save(…) if desired.

Dependencies

Hard: numpy, scipy, pandas, scitex-dev, scitex-config, scitex-logging.
Optional ([plot]): matplotlib. ([mcp]): fastmcp. ([figrecipe]): figrecipe.

Standalone import:

import scitex_stats as ss
result = ss.run_test("ttest_ind", data=g1, data2=g2)
print(result["formatted"])  # APA-style summary

CLI: scitex-stats <command>. MCP: 10 tools for AI agents.

scitex.stats.run_test(test_name, data=None, data2=None, groups=None, alternative='two-sided', plot=False, popmean=0, return_as='dict', json_safe=True, **kwargs)[source]

Run a statistical test by name and return a normalised result dict.

Parameters:

test_name (str) – Name of the test (e.g. "ttest_ind", "anova"). See available_tests() for the full list including aliases.
data (array-like, optional) – Primary data array.
data2 (array-like, optional) – Second data array (for two-sample / paired tests).
groups (list of array-like, optional) – List of group arrays (for ANOVA, Kruskal, etc.).
alternative (str, default "two-sided") – Alternative hypothesis for applicable tests.
plot (bool, default False) – Whether to generate a plot.
popmean (float, default 0) – Population mean for one-sample t-test.
return_as (str, default "dict") – Passed through to the underlying test function.
json_safe (bool, default True) – If True, apply to_json_safe() to the result.
**kwargs (Any) – Additional keyword arguments forwarded to the test function.

Returns:

Test result dictionary. When json_safe is True, all numpy scalars are converted to Python types and a formatted key is added.

Return type:

dict

Raises:

ValueError – If test_name is not recognised.

scitex.stats.available_tests()[source]

Return sorted list of canonical test names accepted by run_test.

Returns:: Accepted test name strings (including aliases).
Return type:: list of str

scitex.stats.describe(x, axis=-1, dim=None, keepdims=False, funcs=None, device=None, batch_size=-1)[source]

Compute descriptive statistics.

Parameters:

x (array-like) – Input data (numpy array or torch tensor)
axis (int, default=-1) – Deprecated. Use dim instead
dim (int or tuple of ints, optional) – Dimension(s) along which to compute statistics
keepdims (bool, default=False) – Whether to keep reduced dimensions
funcs (list of str or "all", optional) – Statistical functions to compute. Clean names (mean, std, median, etc.) use nan-safe implementations. Legacy nan-prefixed names (nanmean, nanstd, etc.) are also accepted. If None, uses default: [“mean”, “std”, “kurtosis”, “skewness”, “q25”, “median”, “q75”].
device (optional) – Device for torch tensors (ignored for numpy)
batch_size (int, default=-1) – Batch size for processing (currently unused)

Returns:

Computed statistics stacked along last dimension and their names

Return type:

Tuple[ndarray or Tensor, List[str]]

scitex.stats.to_json_safe(result)[source]

Convert a test result dict to JSON-serializable Python types.

Handles numpy scalar/array conversion, inf/nan to None, key aliases for common frontend expectations, and builds a formatted summary string.

Parameters:: result (dict) – Raw test result dictionary from any stx.stats.test_*() function.
Returns:: JSON-safe dict with consistent keys and a formatted summary.
Return type:: dict

Examples

>>> import numpy as np
>>> raw = {"pvalue": np.float64(0.023), "statistic": np.float64(2.45),
...        "stat_symbol": "t", "effect_size": np.float64(0.85),
...        "effect_size_metric": "Cohen's d", "stars": "*"}
>>> safe = to_json_safe(raw)
>>> type(safe["pvalue"])
<class 'float'>
>>> "p_value" in safe
True
>>> "formatted" in safe
True

class scitex.stats.Stats(analyses=<factory>, software=<factory>, schema_name='fsb.stats', schema_version='1.1.0')[source]

Complete statistics specification for a bundle.

Stored in stats/stats.json.

analyses: List[Analysis]

software: Dict[str, str]

schema_name: str = 'fsb.stats'

schema_version: str = '1.1.0'

to_dict()[source]

Return type:: Dict[str, Any]

to_json(indent=2)[source]

Return type:: str

classmethod from_dict(data)[source]

Return type:: Stats

classmethod from_json(json_str)[source]

Return type:: Stats

scitex.stats.test_result_to_stats(result)[source]

Convert a test-result dict to the canonical Stats schema.

Parameters:

result (dict) –

Test result dictionary. Supports both formats:

Legacy flat format:

{"name": "Control vs Treatment", "method": "t-test",
 "p_value": 0.003, "effect_size": 1.21, "ci95": [0.5, 1.8]}

New nested format (from test functions):

{"method": {"name": "t-test", "variant": "independent"},
 "results": {"statistic": 2.5, "statistic_name": "t",
             "p_value": 0.01}}

Returns:

Stats object suitable for bundle storage.

Return type:

Stats

Raises:

ImportError – If scitex-io is not installed.

scitex.stats.save_stats(comparisons, path, metadata=None, as_zip=False)[source]

Save statistical results as a SciTeX bundle (kind="stats").

Parameters:

comparisons (list of dict, or Stats) – List of comparison-result dicts (each converted via test_result_to_stats()), or an already-built Stats object.
path (str or Path) – Output bundle path (directory, or .zip when as_zip).
metadata (dict, optional) – Currently unused placeholder kept for API stability.
as_zip (bool, optional) – If True, save as a ZIP archive (.zip suffix enforced).

Returns:

Path to the saved bundle.

Return type:

Path

Raises:

ImportError – If scitex-io is not installed.

scitex.stats.load_stats(path)[source]

Load a stats bundle into a flat, plot-friendly dict.

Parameters:: path (str or Path) – Path to the bundle.
Returns:: {"comparisons": [...], "metadata": {...}} where each comparison is a flat dict (name, method, p_value, effect_size, ci95, formatted).
Return type:: dict
Raises:: ImportError – If scitex-io is not installed.

scitex.stats.to_figrecipe(stats_result)[source]

Convert scitex.stats result(s) to figrecipe format.

Parameters:: stats_result (dict or list of dict) – Statistical result(s) from scitex.stats functions.
Returns:: Figrecipe-compatible format with ‘comparisons’ list.
Return type:: dict

scitex.stats.annotate(ax, stats, positions=None, style='stars', **kwargs)[source]

Add statistical annotations to a plot.

Parameters:

ax (Axes or AxisWrapper) – The axes to annotate.
stats (dict or list of dict) – Statistical results (auto-converted if needed).
positions (dict, optional) – Group name to x position mapping.
style (str) – ‘stars’, ‘p_value’, or ‘both’.

Returns:

Created matplotlib artist objects.

Return type:

list

scitex.stats.load_and_annotate(ax, path, positions=None, style='stars', **kwargs)[source]

Load stats from bundle file and annotate plot.

Parameters:

ax (Axes or AxisWrapper) – The axes to annotate.
path (str) – Path to .statsz or .zip bundle.
positions (dict, optional) – Group name to x position mapping.
style (str) – ‘stars’, ‘p_value’, or ‘both’.

Returns:

Created matplotlib artist objects.

Return type:

list

class scitex.stats.StatContext(n_groups, sample_sizes, outcome_type, design, paired=None, has_control_group=False, n_factors=1, normality_ok=None, variance_homogeneity_ok=None, missing_allowed=False, group_names=None, control_group_name=None)[source]

Statistical context for determining which tests are applicable.

This dataclass captures all the information needed to decide which statistical tests can be applied to the current data/figure context. It is used by check_applicable() to filter the test registry.

Parameters:

n_groups (int) – Number of groups/levels to compare (e.g., 2 for A vs B).
sample_sizes (list of int) – Sample sizes per group in the same order as the groups.
outcome_type (OutcomeType) – Type of outcome variable: - “continuous”: numeric, interval/ratio scale - “ordinal”: ordered categories, ranks - “binary”: 0/1 or yes/no - “categorical”: nominal with >= 2 categories
design (DesignType) – Overall experimental design: - “between”: independent groups - “within”: repeated measures / paired - “mixed”: mixed design (some within, some between)
paired (bool or None) – Whether the comparison is explicitly paired. If None, inferred from design.
has_control_group (bool) – Whether a control group is identifiable (for Dunnett etc.).
n_factors (int) – Number of factors; 1 for one-way, 2 for two-way, etc.
normality_ok (bool or None) – Result of normality check if available. True = normal, False = non-normal, None = unknown.
variance_homogeneity_ok (bool or None) – Result of homogeneity test (e.g., Levene). True = homogeneous, False = heteroscedastic, None = unknown.
missing_allowed (bool) – Whether missing data is allowed for the chosen method.
group_names (list of str, optional) – Names of the groups for display purposes.
control_group_name (str, optional) – Name of the control group if has_control_group is True.

Examples

>>> # Two-group independent comparison
>>> ctx = StatContext(
...     n_groups=2,
...     sample_sizes=[30, 32],
...     outcome_type="continuous",
...     design="between",
...     paired=False,
...     has_control_group=False,
...     n_factors=1
... )

>>> # Three-group repeated measures
>>> ctx = StatContext(
...     n_groups=3,
...     sample_sizes=[20, 20, 20],
...     outcome_type="continuous",
...     design="within",
...     paired=True,
...     has_control_group=True,
...     n_factors=1,
...     control_group_name="baseline"
... )

>>> # Check if data appears normal
>>> ctx.normality_ok = True
>>> ctx.variance_homogeneity_ok = True

n_groups: int

sample_sizes: List[int]

outcome_type: Literal['continuous', 'ordinal', 'binary', 'categorical']

design: Literal['between', 'within', 'mixed']

paired: bool | None = None

has_control_group: bool = False

n_factors: int = 1

normality_ok: bool | None = None

variance_homogeneity_ok: bool | None = None

missing_allowed: bool = False

group_names: List[str] | None = None

control_group_name: str | None = None

__post_init__()[source]: Validate and set defaults.

property n_total: int: Total sample size across all groups.

property min_n_per_group: int: Minimum sample size per group.

property effective_paired: bool | None

Effective paired status, considering design.

Returns:: True if paired/within, False if unpaired/between, None if unknown.
Return type:: bool or None

to_dict()[source]

Convert to dictionary for JSON serialization.

Return type:: Dict[str, Any]

classmethod from_dict(data)[source]

Create from dictionary.

Return type:: StatContext

classmethod from_data(y, group, design='between', outcome_type=None, **kwargs)[source]

Create StatContext from data arrays.

Parameters:

y (np.ndarray) – Outcome values.
group (np.ndarray) – Group labels for each observation.
design (DesignType) – Experimental design.
outcome_type (OutcomeType, optional) – Type of outcome. If None, inferred from data.
**kwargs – Additional arguments for StatContext.

Returns:

Context built from the data.

Return type:

StatContext

Examples

>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> group = np.array(['A', 'A', 'A', 'B', 'B', 'B'])
>>> ctx = StatContext.from_data(y, group)
>>> ctx.n_groups
2

class scitex.stats.TestRule(name, family, min_groups, max_groups, outcome_types, supports_paired, supports_unpaired, design_allowed, requires_control_group, min_n_total, min_n_per_group, needs_normality, needs_equal_variance, min_factors, max_factors, priority=0, description='')[source]

Applicability rule for a specific statistical test.

Each TestRule defines the conditions under which a test is applicable. The check_applicable() function uses these rules to filter tests for a given StatContext.

Parameters:

name (str) – Internal name of the test (e.g., “ttest_ind”, “brunner_munzel”).
family (TestFamily) – High-level family of the test: - “parametric”: t-test, ANOVA, etc. - “nonparametric”: Mann-Whitney, Kruskal-Wallis, etc. - “categorical”: Chi-square, Fisher’s exact, etc. - “correlation”: Pearson, Spearman, etc. - “normality”: Shapiro-Wilk, etc. - “effect_size”: Cohen’s d, eta-squared, etc. - “posthoc”: Tukey, Dunnett, etc. - “other”: Other tests (Levene, etc.)
min_groups (int) – Minimum required number of groups.
max_groups (int or None) – Maximum allowed number of groups. None means no upper bound.
outcome_types (set of str) – Allowed outcome types for this test.
supports_paired (bool) – Whether the test supports paired/repeated measures.
supports_unpaired (bool) – Whether the test supports independent groups.
design_allowed (set of str) – Allowed designs, e.g., {“between”, “within”}.
requires_control_group (bool) – Whether a dedicated control group is required (e.g., Dunnett).
min_n_total (int or None) – Minimum total sample size. None means no constraint.
min_n_per_group (int or None) – Minimum sample size per group.
needs_normality (bool) – Whether test assumes normality (check normality_ok).
needs_equal_variance (bool) – Whether test assumes equal variances (check variance_homogeneity_ok).
min_factors (int or None) – Minimum number of factors.
max_factors (int or None) – Maximum number of factors.
priority (int) – Priority score for recommendation. Higher = more recommended. Brunner-Munzel has priority 110 as the recommended default for 2 groups.
description (str) – Human-readable description for tooltips.

Examples

>>> rule = TestRule(
...     name="ttest_ind",
...     family="parametric",
...     min_groups=2,
...     max_groups=2,
...     outcome_types={"continuous"},
...     supports_paired=False,
...     supports_unpaired=True,
...     design_allowed={"between"},
...     requires_control_group=False,
...     min_n_total=4,
...     min_n_per_group=2,
...     needs_normality=True,
...     needs_equal_variance=False,
...     min_factors=1,
...     max_factors=1,
...     priority=90,
...     description="Independent samples t-test (Welch)"
... )

name: str

family: Literal['parametric', 'nonparametric', 'categorical', 'correlation', 'normality', 'effect_size', 'posthoc', 'other']

min_groups: int

max_groups: int | None

outcome_types: Set[str]

supports_paired: bool

supports_unpaired: bool

design_allowed: Set[str]

requires_control_group: bool

min_n_total: int | None

min_n_per_group: int | None

needs_normality: bool

needs_equal_variance: bool

min_factors: int | None

max_factors: int | None

priority: int = 0

description: str = ''

class scitex.stats.StatStyle(id, label, target, stat_symbol_format=<factory>, p_format='p = {p:.3f}', alpha_thresholds=<factory>, effect_label_format=<factory>, n_format='n_{%s} = %d', decimal_places_p=3, decimal_places_stat=2, decimal_places_effect=2)[source]

Style configuration for statistical reporting.

Defines how to format statistical results for a specific journal or output format.

Parameters:

id (str) – Unique identifier for this style.
label (str) – Human-readable label (e.g., “APA (LaTeX)”).
target (OutputTarget) – Output format: “latex”, “html”, or “plain”.
stat_symbol_format (dict) – Maps statistic symbols to their formatted versions.
p_format (str) – Format string for p-values.
alpha_thresholds (list of (float, str)) – P-value thresholds for stars.
effect_label_format (dict) – Maps effect size names to their formatted labels.
n_format (str) – Format string for sample sizes.
decimal_places_p (int) – Decimal places for p-values.
decimal_places_stat (int) – Decimal places for test statistics.
decimal_places_effect (int) – Decimal places for effect sizes.

id: str

label: str

target: Literal['latex', 'html', 'plain']

stat_symbol_format: Dict[str, str]

p_format: str = 'p = {p:.3f}'

alpha_thresholds: List[Tuple[float, str]]

effect_label_format: Dict[str, str]

n_format: str = 'n_{%s} = %d'

decimal_places_p: int = 3

decimal_places_stat: int = 2

decimal_places_effect: int = 2

format_stat(symbol, value, df=None)[source]

Format a test statistic.

Parameters:

symbol (str) – Statistic symbol (e.g., “t”, “F”, “chi2”).
value (float) – Statistic value.
df (float, optional) – Degrees of freedom.

Returns:

Formatted statistic string.

Return type:

str

format_p(p_value)[source]

Format a p-value.

Parameters:: p_value (float) – P-value to format.
Returns:: Formatted p-value string.
Return type:: str

format_effect(name, value)[source]

Format an effect size.

Parameters:

name (str) – Effect size name (e.g., “cohens_d_ind”).
value (float) – Effect size value.

Returns:

Formatted effect size string.

Return type:

str

format_n(group, n)[source]

Format a sample size.

Parameters:

group (str) – Group name/label.
n (int) – Sample size.

Returns:

Formatted sample size string.

Return type:

str

p_to_stars(p_value)[source]

Convert p-value to significance stars.

Parameters:: p_value (float) – P-value.
Returns:: Stars string (”*”, “”, “*”, or “ns”).
Return type:: str

scitex.stats.recommend_tests(ctx, top_k=3, families=None)[source]

Recommend tests for the given context.

Returns test names sorted by priority. Brunner-Munzel is the recommended default for 2-group comparisons (priority 110).

Parameters:

ctx (StatContext) – Context inferred from figure/data.
top_k (int) – Number of top tests to return.
families (list of TestFamily or None) – Families to consider. If None, uses standard test families (parametric, nonparametric, categorical, correlation).

Returns:

test_names – Internal names of recommended tests, sorted by priority.

Return type:

list of str

Examples

>>> ctx = StatContext(
...     n_groups=2,
...     sample_sizes=[30, 32],
...     outcome_type="continuous",
...     design="between",
...     paired=False,
...     has_control_group=False,
...     n_factors=1
... )
>>> recommended = recommend_tests(ctx, top_k=3)
>>> "brunner_munzel" in recommended
True

scitex.stats.check_applicable(rule, ctx)[source]

Check whether a given statistical test is applicable to the context.

This function evaluates all conditions in the TestRule against the StatContext and returns both the result and human-readable reasons for any failures (suitable for tooltips).

Parameters:

rule (TestRule) – The rule definition for a specific test.
ctx (StatContext) – The context inferred from the figure and data.

Return type:

Tuple[bool, List[str]]

Returns:

ok (bool) – True if applicable, False otherwise.
reasons (list of str) – If not applicable, human-readable reasons for tooltips.

Examples

>>> from scitex_stats.auto import StatContext, TEST_RULES, check_applicable
>>> ctx = StatContext(
...     n_groups=2,
...     sample_sizes=[30, 32],
...     outcome_type="continuous",
...     design="between",
...     paired=False,
...     has_control_group=False,
...     n_factors=1
... )
>>> rule = TEST_RULES["ttest_ind"]
>>> ok, reasons = check_applicable(rule, ctx)
>>> ok
True

>>> ctx.normality_ok = False
>>> ok, reasons = check_applicable(rule, ctx)
>>> ok
False
>>> "normality" in reasons[0].lower()
True

scitex.stats.get_stat_style(style_id)[source]

Look up a statistical reporting style by its ID.

Parameters:: style_id (str) – Style identifier (e.g., “apa_latex”, “nature_html”).
Returns:: The requested style, or APA LaTeX as fallback.
Return type:: StatStyle

Examples

>>> style = get_stat_style("apa_latex")
>>> style.label
'APA (LaTeX)'

>>> style = get_stat_style("unknown")  # Falls back to APA
>>> style.id
'apa_latex'

scitex.stats.p_to_stars(p_value, style=None)[source]

Convert p-value to significance stars.

Uses the alpha thresholds from the specified style.

Parameters:

p_value (float or None) – P-value to convert.
style (str or StatStyle, optional) – Style to use for thresholds.

Returns:

Stars string (”*”, “”, “*”, or “ns”).

Return type:

str

Examples

>>> p_to_stars(0.001)
'***'
>>> p_to_stars(0.03)
'*'
>>> p_to_stars(0.10)
'ns'

scitex.stats.test_ttest_ind(x, y, var_x='x', var_y='y', alternative='two-sided', equal_var=True, alpha=0.05, plot=False, ax=None, data=None, return_as='dict', verbose=False)[source]

Perform independent samples t-test.

Parameters:

x (array or Series) – First sample
y (array or Series) – Second sample
var_x (str, default 'x') – Label for first sample
var_y (str, default 'y') – Label for second sample
alternative ({'two-sided', 'greater', 'less'}, default 'two-sided') – Alternative hypothesis: - ‘two-sided’: means are different - ‘greater’: mean of x is greater than y - ‘less’: mean of x is less than y
equal_var (bool, default True) – Assume equal population variances (Student’s t-test) If False, use Welch’s t-test
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate visualization
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string values for x/y are resolved as column names (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
verbose (bool, default False) – Whether to print test results

Returns:

results – Test results including: - test_method: Name of test performed - statistic: t-statistic value - pvalue: p-value - stars: Significance stars - significant: Whether null hypothesis is rejected - effect_size: Cohen’s d - power: Statistical power - n_x, n_y: Sample sizes - var_x, var_y: Variable labels - H0: Null hypothesis description

Return type:

dict or DataFrame

Notes

The independent samples t-test compares means of two independent groups.

Null hypothesis: μ_x = μ_y Alternative (two-sided): μ_x ≠ μ_y

The t-statistic is computed as:

\[\begin{split}t = \\frac{\\bar{x} - \\bar{y}}{s_p \\sqrt{\\frac{1}{n_x} + \\frac{1}{n_y}}}\end{split}\]

where $s_p$ is the pooled standard deviation.

For Welch’s t-test (unequal variances), the denominator uses separate variances and degrees of freedom are adjusted.

References

Examples

>>> x = np.array([1, 2, 3, 4, 5])
>>> y = np.array([2, 3, 4, 5, 6])
>>> result = test_ttest_ind(x, y)
>>> result['pvalue']
0.109...

>>> # With auto-created figure
>>> result = test_ttest_ind(x, y, plot=True)

>>> # Plot on existing axes
>>> fig, ax = plt.subplots()
>>> result = test_ttest_ind(x, y, ax=ax)

>>> # As DataFrame
>>> df = test_ttest_ind(x, y, return_as='dataframe')
>>> df['stars'].iloc[0]
'ns'

scitex.stats.test_ttest_rel(x, y, var_x='before', var_y='after', alternative='two-sided', alpha=0.05, plot=False, ax=None, data=None, return_as='dict')[source]

Perform paired samples t-test (related/dependent samples).

Parameters:

x (array or Series) – First sample (e.g., pre-test, baseline)
y (array or Series) – Second sample (e.g., post-test, follow-up) Must have same length as x
var_x (str, default 'before') – Label for first sample
var_y (str, default 'after') – Label for second sample
alternative ({'two-sided', 'greater', 'less'}, default 'two-sided') – Alternative hypothesis: - ‘two-sided’: means differ - ‘greater’: mean(x - y) > 0 - ‘less’: mean(x - y) < 0
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate visualization
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string values for x/y are resolved as column names (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format

Returns:

results – Test results (same structure as test_ttest_ind)

Return type:

dict or DataFrame

Notes

The paired t-test compares means of matched observations (within-subjects).

When to use: - Before-after measurements on same subjects - Matched pairs (twins, siblings, matched controls) - Repeated measures at two time points

Assumptions: - Differences (x - y) are normally distributed - Pairs are independent across subjects - No assumption about equality of variances

The test statistic is:

\[\begin{split}t = \\frac{\\bar{d}}{s_d / \\sqrt{n}}\end{split}\]

where $\\bar{d}$ is mean difference and $s_d$ is SD of differences.

Effect size (Cohen’s d for paired samples):

\[\begin{split}d = \\frac{\\bar{d}}{s_d}\end{split}\]

This measures the standardized change from baseline.

References

Examples

>>> before = np.array([10, 12, 15, 18, 20])
>>> after = np.array([12, 14, 17, 20, 22])
>>> result = test_ttest_rel(before, after)
>>> result['pvalue']
0.001...

>>> # With visualization
>>> fig, ax = plt.subplots()
>>> result = test_ttest_rel(before, after, ax=ax)
>>> plt.show()

scitex.stats.test_ttest_1samp(x, popmean=0, var_x='sample', alternative='two-sided', alpha=0.05, plot=False, ax=None, data=None, return_as='dict')[source]

Perform one-sample t-test.

Parameters:

x (array or Series) – Sample data
popmean (float, default 0) – Expected population mean (null hypothesis value)
var_x (str, default 'sample') – Label for sample
alternative ({'two-sided', 'greater', 'less'}, default 'two-sided') – Alternative hypothesis: - ‘two-sided’: mean ≠ popmean - ‘greater’: mean > popmean - ‘less’: mean < popmean
alpha (float, default 0.05) – Significance level
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If provided, plots visualization on given axes.
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string value for x is resolved as a column name (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format

Returns:

results – Test results

Return type:

dict or DataFrame

Notes

The one-sample t-test compares sample mean to a known population mean.

When to use: - Test if sample mean differs from theoretical/known value - Compare observed data to standard/reference value - Test if mean differs from zero (common in difference scores)

Assumptions: - Data are normally distributed - Observations are independent

The test statistic is:

\[\begin{split}t = \\frac{\\bar{x} - \\mu_0}{s / \\sqrt{n}}\end{split}\]

where $\\mu_0$ is the hypothesized population mean.

Effect size (Cohen’s d for one sample):

\[\begin{split}d = \\frac{\\bar{x} - \\mu_0}{s}\end{split}\]

References

Examples

>>> # Test if sample mean differs from 0
>>> x = np.array([1, 2, 3, 4, 5])
>>> result = test_ttest_1samp(x, popmean=0)
>>> result['pvalue']
0.003...

>>> # Test if sample mean differs from 100
>>> scores = np.array([95, 98, 102, 105, 108])
>>> result = test_ttest_1samp(scores, popmean=100)

scitex.stats.test_anova(groups=None, var_names=None, alpha=0.05, check_assumptions=True, plot=False, ax=None, data=None, value_col=None, group_col=None, return_as='dict', decimals=3, verbose=False)[source]

Perform one-way ANOVA for independent samples.

Parameters:

groups (list of arrays) – List of sample arrays for each group (minimum 2 groups)
var_names (list of str, optional) – Names for each group. If None, uses ‘Group 1’, ‘Group 2’, etc.
alpha (float, default 0.05) – Significance level
check_assumptions (bool, default True) – Whether to check normality and homogeneity assumptions
plot (bool, default False) – Whether to generate visualization
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided with value_col and group_col, groups are extracted automatically (seaborn-style).
value_col (str, optional) – Column containing measurement values (used with data=).
group_col (str, optional) – Column containing group labels (used with data=).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – Whether to print test results

Returns:

results – Test results including: - test_method: ‘One-way ANOVA’ - statistic: F-statistic value - pvalue: p-value - stars: Significance stars - significant: Whether null hypothesis is rejected - effect_size: Eta-squared (η²) - effect_size_metric: ‘eta-squared’ - effect_size_interpretation: Interpretation of eta-squared - n_groups: Number of groups - n_samples: Sample sizes for each group - df_between: Degrees of freedom between groups - df_within: Degrees of freedom within groups - var_names: Group labels - assumptions_met: Whether assumptions are satisfied - H0: Null hypothesis description

Return type:

dict or DataFrame

Notes

One-way ANOVA (Analysis of Variance) tests whether samples from different groups have the same population mean.

Null Hypothesis (H0): All groups have equal population means

Alternative Hypothesis (H1): At least one group mean differs

Assumptions: 1. Independence: Observations within and between groups are independent 2. Normality: Data in each group are normally distributed

Can be checked with test_shapiro()

Robust to moderate violations with large samples (n > 30 per group)

Homogeneity of variance: Groups have equal population variances - Can be checked with Levene’s test - If violated, consider Welch’s ANOVA or non-parametric alternative

When assumptions are violated: - Non-normality: Use test_kruskal() (Kruskal-Wallis test) - Unequal variances: Use Welch’s ANOVA (not yet implemented) - Outliers present: Use test_kruskal() or remove outliers

F-Statistic:

\[\begin{split}F = \\frac{MS_{between}}{MS_{within}} = \\frac{SS_{between}/(k-1)}{SS_{within}/(N-k)}\end{split}\]

Where: - k: Number of groups - N: Total sample size - SS: Sum of squares - MS: Mean square

Effect Size (Eta-squared):

\[\begin{split}\\eta^2 = \\frac{SS_{between}}{SS_{total}}\end{split}\]

Interpretation: - η² < 0.01: negligible - η² < 0.06: small - η² < 0.14: medium - η² ≥ 0.14: large

Post-hoc tests: If significant, perform pairwise comparisons with correction: - test_ttest_ind() for all pairs (if assumptions met) - test_brunner_munzel() for all pairs (robust alternative) - correct_bonferroni() or correct_fdr() for multiple comparisons

References

Examples

>>> # Three groups with different means
>>> group1 = np.array([1, 2, 3, 4, 5])
>>> group2 = np.array([3, 4, 5, 6, 7])
>>> group3 = np.array([5, 6, 7, 8, 9])
>>> result = test_anova([group1, group2, group3])
>>> result['rejected']
True

>>> # With auto-created figure
>>> result = test_anova(
...     [group1, group2, group3],
...     var_names=['Control', 'Treatment 1', 'Treatment 2'],
...     plot=True
... )

>>> # Plot on existing axes
>>> fig, ax = plt.subplots()
>>> result = test_anova([group1, group2, group3], ax=ax)

>>> # Export results
>>> from scitex_stats.utils._normalizers import convert_results
>>> convert_results(result, return_as='excel', path='anova_results.xlsx')

scitex.stats.test_anova_rm(data, subject_col=None, condition_col=None, value_col=None, condition_names=None, alpha=0.05, correction='auto', check_sphericity=True, plot=False, ax=None, return_as='dict', decimals=3, verbose=False)[source]

Perform repeated measures ANOVA for within-subjects designs.

Parameters:

data (array or DataFrame) –
- If array: shape (n_subjects, n_conditions), wide format
- If DataFrame with subject_col/condition_col: long format
- If DataFrame without: wide format (rows=subjects, cols=conditions)
subject_col (str, optional) – Column name for subject IDs (long format)
condition_col (str, optional) – Column name for conditions (long format)
value_col (str, optional) – Column name for values (long format)
condition_names (list of str, optional) – Names for conditions (wide format)
alpha (float, default 0.05) – Significance level
correction ({'auto', 'none', 'gg'}, default 'auto') – Correction method: - ‘auto’: Apply GG correction if sphericity violated - ‘none’: No correction - ‘gg’: Always apply Greenhouse-Geisser correction
check_sphericity (bool, default True) – Whether to test sphericity assumption
plot (bool, default False) – Whether to generate profile plot
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – Whether to print test results

Return type:

Union[dict, DataFrame, Tuple]

Returns:

result (dict or DataFrame) – Test results including: - statistic: F-statistic - pvalue: p-value (possibly corrected) - df_effect: Degrees of freedom for effect - df_error: Degrees of freedom for error - effect_size: Partial eta-squared - sphericity_W: Mauchly’s W (if checked) - sphericity_pvalue: Sphericity test p-value - sphericity_met: Whether sphericity assumption met - epsilon_gg: Greenhouse-Geisser epsilon - correction_applied: Which correction was applied - significant: Whether to reject null hypothesis
If plot=True, returns tuple of (result, figure)

Notes

Repeated measures ANOVA tests whether the means differ across multiple conditions measured on the same subjects (within-subjects factor).

Null Hypothesis (H0): All condition means are equal

Assumptions: 1. Independence of subjects: Different subjects are independent 2. Normality: Differences between conditions are normally distributed 3. Sphericity: Variances of differences between all pairs of conditions

are equal (tested with Mauchly’s test)

Sphericity: The sphericity assumption is unique to repeated measures ANOVA. If violated: - Greenhouse-Geisser correction: More conservative, use when ε < 0.75 - Huynh-Feldt correction: Less conservative (not implemented) - Multivariate approach: MANOVA (not implemented)

Greenhouse-Geisser Correction: Adjusts degrees of freedom by multiplying by epsilon (ε): - df_effect_adj = ε × df_effect - df_error_adj = ε × df_error

Effect Size (Partial η²):

\[\begin{split}\\eta_p^2 = \\frac{SS_{effect}}{SS_{effect} + SS_{error}}\end{split}\]

Interpretation same as regular eta-squared: - < 0.01: negligible - < 0.06: small - < 0.14: medium - ≥ 0.14: large

Post-hoc tests: If significant, use pairwise t-tests with correction: - test_ttest_rel() for all pairs - correct_bonferroni() or correct_holm() for multiple comparisons

Examples

>>> import numpy as np
>>> from scitex_stats.tests.parametric import test_anova_rm
>>>
>>> # Wide format: subjects × conditions
>>> data = np.array([
...     [5.2, 6.1, 7.3, 6.8],  # Subject 1
...     [4.8, 5.9, 6.7, 6.2],  # Subject 2
...     [5.5, 6.4, 7.1, 7.0],  # Subject 3
...     [4.9, 5.7, 6.9, 6.5],  # Subject 4
... ])
>>>
>>> result = test_anova_rm(
...     data,
...     condition_names=['Baseline', 'Week 1', 'Week 2', 'Week 3'],
...     plot=True
... )
>>>
>>> print(f"F = {result['statistic']:.2f}, p = {result['pvalue']:.4f}")
>>> print(f"Sphericity met: {result['sphericity_met']}")
>>> print(f"Partial η² = {result['effect_size']:.3f}")

References

See also

test_anova: One-way ANOVA for independent samples
test_friedman: Non-parametric alternative (no sphericity assumption)

scitex.stats.test_anova_2way(data, factor_a=None, factor_b=None, value=None, factor_a_name='Factor A', factor_b_name='Factor B', alpha=0.05, check_assumptions=True, plot=False, ax=None, return_as='dict', decimals=3, verbose=False)[source]

Perform two-way ANOVA for factorial designs.

Parameters:

data (DataFrame or array) –
- If DataFrame: requires factor_a, factor_b, value column names
- If array: 2D or 3D array (see factor_a, factor_b parameters)
factor_a (str or array, optional) –
- If str: column name for factor A in DataFrame
- If array: factor A levels for each observation
factor_b (str or array, optional) –
- If str: column name for factor B in DataFrame
- If array: factor B levels for each observation
value (str, optional) – Column name for dependent variable (required if data is DataFrame)
factor_a_name (str, default 'Factor A') – Name for factor A
factor_b_name (str, default 'Factor B') – Name for factor B
alpha (float, default 0.05) – Significance level
check_assumptions (bool, default True) – Whether to check normality and homogeneity assumptions
plot (bool, default False) – Whether to generate interaction plot
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – Whether to print test results

Return type:

Union[dict, DataFrame, Tuple]

Returns:

result (dict or DataFrame) – Test results including for each effect (A, B, interaction): - effect: Name of effect - statistic: F-statistic - pvalue: p-value - df_effect: Degrees of freedom for effect - df_error: Degrees of freedom for error - effect_size: Partial eta-squared - rejected: Whether to reject null hypothesis - significant: Same as rejected
If plot=True, returns tuple of (result, figure)

Notes

Two-way ANOVA tests the effects of two independent categorical variables (factors) on a continuous dependent variable, including their interaction.

Three Hypotheses Tested: 1. Main effect of Factor A: Marginal means of A levels differ 2. Main effect of Factor B: Marginal means of B levels differ 3. Interaction A×B: Effect of A depends on level of B (and vice versa)

Null Hypotheses: - H0_A: All marginal means of Factor A are equal - H0_B: All marginal means of Factor B are equal - H0_AB: No interaction between Factors A and B

Assumptions: 1. Independence: Observations are independent 2. Normality: Residuals are normally distributed within each cell 3. Homogeneity of variance: Equal variances across all cells

Sum of Squares Decomposition:

\[SS_{total} = SS_A + SS_B + SS_{AB} + SS_{error}\]

Where: - SS_A: Sum of squares for main effect A - SS_B: Sum of squares for main effect B - SS_AB: Sum of squares for interaction A×B - SS_error: Sum of squares for error (within cells)

F-statistics:

\[\begin{split}F_A = \\frac{MS_A}{MS_{error}}, \\quad F_B = \\frac{MS_B}{MS_{error}}, \\quad F_{AB} = \\frac{MS_{AB}}{MS_{error}}\end{split}\]

Effect Size (Partial η²):

\[\begin{split}\\eta_p^2 = \\frac{SS_{effect}}{SS_{effect} + SS_{error}}\end{split}\]

Interpreting Results: - Significant interaction: Main effects should be interpreted cautiously.

Use simple effects analysis or interaction plots.

Non-significant interaction: Main effects can be interpreted directly.

Post-hoc tests: If main effects are significant: - Pairwise comparisons with test_ttest_ind() - Apply corrections: correct_bonferroni(), correct_holm()

If interaction is significant: - Simple effects: test effect of A at each level of B - Pairwise comparisons within each level

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from scitex_stats.tests.parametric import test_anova_2way
>>>
>>> # Example: Drug (2 levels) × Gender (2 levels)
>>> np.random.seed(42)
>>> n_per_cell = 10
>>>
>>> data = pd.DataFrame({
...     'Drug': ['Placebo']*20 + ['Active']*20,
...     'Gender': (['Male']*10 + ['Female']*10) * 2,
...     'Score': np.concatenate([
...         np.random.normal(50, 10, 10),  # Placebo, Male
...         np.random.normal(55, 10, 10),  # Placebo, Female
...         np.random.normal(65, 10, 10),  # Active, Male
...         np.random.normal(75, 10, 10),  # Active, Female (interaction)
...     ])
... })
>>>
>>> result = test_anova_2way(
...     data,
...     factor_a='Drug',
...     factor_b='Gender',
...     value='Score',
...     plot=True
... )
>>>
>>> for effect in result:
...     print(f"{effect['effect']}: F = {effect['statistic']:.2f}, p = {effect['pvalue']:.4f}")

References

See also

test_anova: One-way ANOVA
test_anova_rm: Repeated measures ANOVA

scitex.stats.test_brunner_munzel(x, y, var_x='x', var_y='y', alternative='two-sided', alpha=0.05, plot=False, ax=None, data=None, return_as='dict', verbose=False)[source]

Perform Brunner-Munzel test (non-parametric).

Parameters:

x (array or Series) – First sample
y (array or Series) – Second sample
var_x (str, default 'x') – Label for first sample
var_y (str, default 'y') – Label for second sample
alternative ({'two-sided', 'greater', 'less'}, default 'two-sided') – Alternative hypothesis: - ‘two-sided’: distributions differ - ‘greater’: x tends to be greater than y - ‘less’: x tends to be less than y
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate visualization
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string values for x/y are resolved as column names (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
verbose (bool, default False) – Whether to print test results

Returns:

results – Test results including: - test_method: ‘Brunner-Munzel test’ - statistic_name: ‘W’ - statistic: W-statistic value - pvalue: p-value - stars: Significance stars - rejected: Whether null hypothesis is rejected - effect_size: P(X > Y) (primary effect size) - effect_size_metric: ‘P(X>Y)’ - effect_size_interpretation: Interpretation of P(X>Y) - effect_size_secondary: Cliff’s delta (secondary effect size) - effect_size_secondary_metric: “Cliff’s delta” - effect_size_secondary_interpretation: Interpretation of delta - n_x, n_y: Sample sizes - var_x, var_y: Variable labels - H0: Null hypothesis description

Return type:

dict or DataFrame

Notes

The Brunner-Munzel test is a non-parametric test for comparing two independent samples. It is more robust than the t-test when: - Distributions are non-normal - Variances are unequal - Sample sizes differ - Data contain outliers

Unlike Mann-Whitney U test, Brunner-Munzel does not assume equal variances and provides better control of Type I error rate.

The test statistic W is approximately t-distributed:

\[W = \frac{\hat{p} - 0.5}{\sqrt{\hat{\sigma}^2}}\]

where $\hat{p}$ is an estimate of P(X > Y).

Effect Sizes:

P(X > Y): Probability that a random value from X exceeds a random value from Y. Interpretation: - 0.50: No effect (chance) - 0.56: Small effect - 0.64: Medium effect - 0.71: Large effect
Cliff’s delta (δ): Ranges from -1 to 1, related to P(X>Y) by: δ = 2×P(X>Y) - 1. Interpretation: - |δ| < 0.147: Negligible - |δ| < 0.33: Small - |δ| < 0.474: Medium - |δ| ≥ 0.474: Large

References

Examples

>>> x = np.array([1, 2, 3, 4, 5])
>>> y = np.array([2, 3, 4, 5, 6])
>>> result = test_brunner_munzel(x, y)
>>> result['pvalue']
0.109...
>>> result['effect_size']  # P(X > Y)
0.2
>>> result['effect_size_secondary']  # Cliff's delta
-0.6

>>> # With auto-created figure
>>> result = test_brunner_munzel(x, y, plot=True)

>>> # Plot on existing axes
>>> fig, ax = plt.subplots()
>>> result = test_brunner_munzel(x, y, ax=ax)

>>> # As DataFrame
>>> df = test_brunner_munzel(x, y, return_as='dataframe')

scitex.stats.test_wilcoxon(x, y, var_x='before', var_y='after', alternative='two-sided', alpha=0.05, plot=False, ax=None, data=None, return_as='dict', verbose=False)[source]

Perform Wilcoxon signed-rank test (non-parametric paired test).

Parameters:

x (array or Series) – First sample (e.g., pre-test, baseline)
y (array or Series) – Second sample (e.g., post-test, follow-up) Must have same length as x
var_x (str, default 'before') – Label for first sample
var_y (str, default 'after') – Label for second sample
alternative ({'two-sided', 'greater', 'less'}, default 'two-sided') – Alternative hypothesis: - ‘two-sided’: distributions differ - ‘greater’: x tends to be greater than y - ‘less’: x tends to be less than y
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate visualization
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string values for x/y are resolved as column names (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
verbose (bool, default False) – Whether to print test results

Returns:

results – Test results including: - test_method: ‘Wilcoxon signed-rank test’ - statistic: W-statistic (sum of signed ranks) - pvalue: p-value - stars: Significance stars - significant: Whether null hypothesis is rejected - effect_size: rank-biserial correlation - effect_size_metric: ‘rank-biserial correlation’ - n_pairs: number of pairs (excluding zeros) - n_zeros: number of zero differences (ties)

Return type:

dict or DataFrame

Notes

The Wilcoxon signed-rank test is the non-parametric alternative to the paired t-test. It tests whether the median of differences is zero.

When to use: - Paired samples (before-after, matched pairs) - Data are not normally distributed - Ordinal data or continuous data with outliers - Robust alternative to paired t-test

Assumptions: - Paired observations - Differences are symmetric around the median - Ordinal or continuous data

How it works: 1. Compute differences: d = x - y 2. Remove zero differences 3. Rank absolute differences 4. Sum ranks of positive differences (W+) 5. Sum ranks of negative differences (W-) 6. Test statistic: W = min(W+, W-)

Effect size (rank-biserial correlation):

\[r = \frac{W_+ - W_-}{n(n+1)/2}\]

Ranges from -1 to 1: - r close to 1: x > y (large positive effect) - r close to 0: no difference - r close to -1: x < y (large negative effect)

Interpretation: - |r| < 0.1: negligible - |r| < 0.3: small - |r| < 0.5: medium - |r| ≥ 0.5: large

References

Examples

>>> before = np.array([10, 12, 15, 18, 20])
>>> after = np.array([12, 14, 17, 20, 22])
>>> result = test_wilcoxon(before, after)
>>> result['pvalue']
0.062...

>>> # With visualization
>>> result, fig = test_wilcoxon(before, after, plot=True)

scitex.stats.test_kruskal(groups=None, var_names=None, alpha=0.05, plot=False, ax=None, data=None, value_col=None, group_col=None, return_as='dict', decimals=3, verbose=False)[source]

Perform Kruskal-Wallis H test for independent samples.

Parameters:

groups (list of arrays) – List of sample arrays for each group (minimum 2 groups)
var_names (list of str, optional) – Names for each group. If None, uses ‘Group 1’, ‘Group 2’, etc.
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate box plots
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided with value_col and group_col, groups are extracted automatically (seaborn-style).
value_col (str, optional) – Column containing measurement values (used with data=).
group_col (str, optional) – Column containing group labels (used with data=).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – Whether to print test results

Returns:

results – Test results including: - test_method: ‘Kruskal-Wallis H test’ - statistic: H-statistic value - pvalue: p-value - stars: Significance stars - significant: Whether null hypothesis is rejected - effect_size: Epsilon-squared (ε²) - effect_size_metric: ‘epsilon-squared’ - effect_size_interpretation: Interpretation of epsilon-squared - n_groups: Number of groups - n_samples: Sample sizes for each group - var_names: Group labels - H0: Null hypothesis description

Return type:

dict or DataFrame

Notes

The Kruskal-Wallis H test is a non-parametric alternative to one-way ANOVA. It tests whether samples originate from the same distribution by comparing the ranks of observations across groups.

Null Hypothesis (H0): All groups have the same population median (more precisely: all groups have identical distribution functions)

Assumptions: - Independent observations within and between groups - Ordinal or continuous data - Similar distribution shapes across groups (for median interpretation)

Advantages over ANOVA: - No normality assumption required - Robust to outliers - Works with ordinal data - More powerful than ANOVA for heavy-tailed distributions

When to use: - Comparing 3+ independent groups - Data violate normality (use test_shapiro to check) - Presence of outliers - Ordinal data (e.g., Likert scales)

Test Statistic H:

\[H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1)\]

Where: - k: Number of groups - N: Total sample size - R_i: Sum of ranks for group i - n_i: Sample size of group i

Effect Size (Epsilon-squared):

\[\epsilon^2 = \frac{H - k + 1}{N - k}\]

Interpretation (similar to eta-squared): - ε² < 0.01: negligible - ε² < 0.06: small - ε² < 0.14: medium - ε² ≥ 0.14: large

Post-hoc tests: If significant, use pairwise comparisons with correction: - test_brunner_munzel() for all pairs - correct_bonferroni() or correct_fdr() for multiple comparisons

Tied ranks: Handled automatically by scipy.stats.kruskal()

References

Examples

>>> # Three groups with different medians
>>> group1 = np.array([1, 2, 3, 4, 5])
>>> group2 = np.array([3, 4, 5, 6, 7])
>>> group3 = np.array([5, 6, 7, 8, 9])
>>> result = test_kruskal([group1, group2, group3])
>>> result['rejected']
True

>>> # With custom names and plot
>>> result, fig = test_kruskal(
...     [group1, group2, group3],
...     var_names=['Control', 'Treatment 1', 'Treatment 2'],
...     plot=True
... )

>>> # Export results
>>> from scitex_stats.utils._normalizers import convert_results
>>> convert_results(result, return_as='excel', path='kruskal_results.xlsx')

scitex.stats.test_mannwhitneyu(x, y, var_x='x', var_y='y', alternative='two-sided', alpha=0.05, plot=False, ax=None, data=None, return_as='dict', decimals=3, verbose=False)[source]

Perform Mann-Whitney U test (Wilcoxon rank-sum test).

Parameters:

x (arrays or Series) – Two independent samples to compare
y (arrays or Series) – Two independent samples to compare
var_x (str) – Labels for samples
var_y (str) – Labels for samples
alternative ({'two-sided', 'less', 'greater'}, default 'two-sided') – Alternative hypothesis
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate visualization
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string values for x/y are resolved as column names (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – Whether to print test results

Returns:

results – Test results including: - test_method: ‘Mann-Whitney U test’ - statistic: U-statistic value - pvalue: p-value - stars: Significance stars - significant: Whether null hypothesis is rejected - effect_size: Rank-biserial correlation - effect_size_metric: ‘rank-biserial correlation’ - effect_size_interpretation: Interpretation - n_x, n_y: Sample sizes - var_x, var_y: Variable labels - H0: Null hypothesis description

Return type:

dict or DataFrame

Notes

The Mann-Whitney U test (also known as Wilcoxon rank-sum test) is a non-parametric test for comparing two independent samples.

Null Hypothesis (H0): The two samples come from distributions with equal medians (more precisely: P(X > Y) = 0.5)

Test Statistic U:

\[U = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1\]

Where: - n_1, n_2: Sample sizes - R_1: Sum of ranks for sample 1

Effect Size (Rank-biserial correlation):

\[r = 1 - \frac{2U}{n_1 n_2}\]

Or equivalently:

\[r = \frac{2(\bar{R}_1 - \bar{R}_2)}{n_1 + n_2}\]

Interpretation: - |r| < 0.1: negligible - |r| < 0.3: small - |r| < 0.5: medium - |r| ≥ 0.5: large

Advantages: - No normality assumption required - Robust to outliers - Works with ordinal data - More powerful than t-test for non-normal data

When to use: - Comparing two independent groups - Data violate normality - Presence of outliers - Ordinal data (e.g., Likert scales) - Small sample sizes

Comparison with other tests: - vs t-test: More robust, less powerful when assumptions met - vs Brunner-Munzel: MWU assumes identical shape, BM does not - vs KS test: MWU tests location, KS tests entire distribution

Note on relationship to Brunner-Munzel: Mann-Whitney U assumes samples have the same distribution shape (differing only in location). For more robust analysis without this assumption, use test_brunner_munzel() instead.

References

Examples

>>> # Basic usage
>>> x = np.array([1, 2, 3, 4, 5])
>>> y = np.array([3, 4, 5, 6, 7])
>>> result = test_mannwhitneyu(x, y)
>>> result['rejected']
True

>>> # With auto-created figure
>>> result = test_mannwhitneyu(x, y, plot=True)

>>> # Plot on existing axes
>>> fig, ax = plt.subplots()
>>> result = test_mannwhitneyu(x, y, ax=ax)

>>> # With verbose output
>>> result = test_mannwhitneyu(x, y, verbose=True)

scitex.stats.test_friedman(data, subject_col=None, condition_col=None, value_col=None, condition_names=None, alpha=0.05, plot=False, ax=None, return_as='dict', decimals=3, verbose=False)[source]

Perform Friedman test for repeated measures (non-parametric).

Non-parametric alternative to repeated measures ANOVA. Tests whether distributions differ across 3+ related samples using ranks.

Parameters:

data (array or DataFrame) –
- If array: shape (n_subjects, n_conditions), wide format
- If DataFrame with subject_col/condition_col: long format
- If DataFrame without: wide format (rows=subjects, cols=conditions)
subject_col (str, optional) – Column name for subject IDs (long format)
condition_col (str, optional) – Column name for conditions (long format)
value_col (str, optional) – Column name for values (long format)
condition_names (list of str, optional) – Names for conditions (wide format)
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate visualization
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – Whether to print test results

Returns:

result – Test results including: - statistic: Chi-square statistic (Friedman’s χ²) - pvalue: p-value - df: Degrees of freedom (k - 1) - kendall_w: Kendall’s W (coefficient of concordance) - effect_size: Kendall’s W - effect_size_interpretation: interpretation - n_subjects: Number of subjects - n_conditions: Number of conditions - mean_ranks: Mean rank for each condition - significant: Whether to reject null hypothesis

Return type:

dict or DataFrame

Notes

The Friedman test is the non-parametric alternative to repeated measures ANOVA. It is used when: - Normality assumption is violated - Data are ordinal (e.g., Likert scales) - Sample sizes are small

Null Hypothesis (H0): All conditions have the same distribution

Alternative Hypothesis (H1): At least one condition differs

Procedure: 1. Rank observations within each subject (across conditions) 2. Compute sum of ranks for each condition 3. Calculate test statistic based on rank sums

Test Statistic:

\[\chi^2_F = \frac{12}{nk(k+1)} \sum_{j=1}^{k} R_j^2 - 3n(k+1)\]

Where: - n: Number of subjects - k: Number of conditions - R_j: Sum of ranks for condition j

Effect Size (Kendall’s W):

\[W = \frac{12 \sum_{j=1}^{k}(R_j - \bar{R})^2}{n^2(k^3 - k)}\]

Interpretation: - W < 0.1: negligible agreement - W < 0.3: weak agreement - W < 0.5: moderate agreement - W < 0.7: strong agreement - W ≥ 0.7: very strong agreement

Assumptions: - Paired/repeated observations (same subjects) - At least ordinal scale data - 3+ conditions (for 2 conditions, use Wilcoxon signed-rank test)

Post-hoc tests: If significant: - Pairwise Wilcoxon signed-rank tests - Apply corrections: correct_bonferroni(), correct_holm()

Advantages: - No normality assumption - Robust to outliers - Works with ordinal data - No sphericity assumption

Disadvantages: - Less powerful than RM-ANOVA when assumptions are met - Requires at least ordinal data - Sensitive to ties

Examples

>>> import numpy as np
>>> from scitex_stats.tests.nonparametric import test_friedman
>>>
>>> # Example: Pain ratings (ordinal) across 4 time points
>>> data = np.array([
...     [7, 6, 5, 4],  # Subject 1
...     [8, 7, 6, 5],  # Subject 2
...     [6, 5, 4, 3],  # Subject 3
...     [9, 8, 7, 6],  # Subject 4
... ])
>>>
>>> result = test_friedman(
...     data,
...     condition_names=['Baseline', '1 week', '2 weeks', '3 weeks'],
...     plot=True
... )
>>>
>>> print(f"χ² = {result['statistic']:.2f}, p = {result['pvalue']:.4f}")
>>> print(f"Kendall's W = {result['kendall_w']:.3f}")

References

See also

test_anova_rm: Parametric alternative (repeated measures ANOVA)
test_wilcoxon: For 2 related samples
test_kruskal: For 3+ independent samples

scitex.stats.test_pearson(x, y, var_x='x', var_y='y', alternative='two-sided', alpha=0.05, plot=False, ax=None, data=None, return_as='dict', decimals=3, verbose=False)[source]

Perform Pearson correlation test.

Parameters:

x (arrays or Series) – Two continuous variables
y (arrays or Series) – Two continuous variables
var_x (str) – Labels for variables
var_y (str) – Labels for variables
alternative ({'two-sided', 'less', 'greater'}, default 'two-sided') – Alternative hypothesis
alpha (float, default 0.05) – Significance level for confidence interval
plot (bool, default False) – Whether to generate scatter plot
ax (matplotlib.axes.Axes, optional) – Axes object to plot on. If None and plot=True, creates new figure. If provided, automatically enables plotting.
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string values for x/y are resolved as column names (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – Whether to print test results

Returns:

results – Test results including: - test_method: ‘Pearson correlation’ - statistic: Pearson correlation coefficient - pvalue: p-value - stars: Significance stars - significant: Whether null hypothesis is rejected - ci_lower, ci_upper: Confidence interval bounds - r_squared: Coefficient of determination - effect_size: Correlation coefficient (same as statistic) - effect_size_metric: ‘Pearson r’ - effect_size_interpretation: Interpretation - n: Sample size (after removing NaN pairs) - var_x, var_y: Variable labels - H0: Null hypothesis description

Return type:

dict or DataFrame

Notes

Pearson correlation coefficient measures the linear relationship between two continuous variables.

Null Hypothesis (H0): No linear correlation (ρ = 0)

Pearson’s r:

\[\begin{split}r = \frac{\\sum(x_i - \bar{x})(y_i - \bar{y})}{\\sqrt{\\sum(x_i - \bar{x})^2 \\sum(y_i - \bar{y})^2}}\end{split}\]

Range: -1 ≤ r ≤ 1 - r = 1: Perfect positive linear relationship - r = 0: No linear relationship - r = -1: Perfect negative linear relationship

Coefficient of determination (R²):

\[R^2 = r^2\]

R² represents the proportion of variance in y explained by x.

Interpretation (Cohen, 1988): - |r| < 0.1: negligible - |r| < 0.3: small - |r| < 0.5: medium - |r| ≥ 0.5: large

Assumptions: 1. Linearity: Relationship between variables is linear 2. Normality: Both variables are normally distributed (for hypothesis testing) 3. Homoscedasticity: Variance is constant across the range 4. Independence: Observations are independent

When to use: - Assessing linear relationship between two continuous variables - Both variables approximately normally distributed - No major outliers present - Relationship appears linear on scatter plot

When NOT to use: - Non-linear relationships (consider transformation or Spearman) - Ordinal data (use Spearman) - Severe outliers present (use Spearman) - Non-normal distributions (use Spearman)

Confidence Interval: Computed using Fisher’s z-transformation:

\[\begin{split}z = 0.5 \\ln\\left(\frac{1+r}{1-r}\right)\end{split}\]

References

Examples

>>> # Strong positive correlation
>>> x = np.array([1, 2, 3, 4, 5])
>>> y = np.array([2, 4, 5, 7, 8])
>>> result = test_pearson(x, y)
>>> result['statistic']
0.98...

>>> # With visualization
>>> result, fig = test_pearson(x, y, plot=True)

scitex.stats.test_spearman(x, y, var_x='x', var_y='y', alternative='two-sided', alpha=0.05, plot=False, ax=None, data=None, return_as='dict', decimals=3, verbose=False)[source]

Spearman’s rank correlation coefficient test.

Non-parametric measure of monotonic relationship between two variables. Uses rank-transformed data, more robust to outliers than Pearson.

Parameters:

x (array-like) – First variable
y (array-like) – Second variable (same length as x)
var_x (str, default 'x') – Name of first variable
var_y (str, default 'y') – Name of second variable
alternative ({'two-sided', 'less', 'greater'}, default 'two-sided') – Alternative hypothesis: - ‘two-sided’: ρ ≠ 0 - ‘less’: ρ < 0 - ‘greater’: ρ > 0
alpha (float, default 0.05) – Significance level
plot (bool, default False) – If True, create visualization with scatter plot of ranks
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string values for x/y are resolved as column names (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Return format
decimals (int, default 3) – Number of decimal places for rounding

Returns:

result – Test results with: - test_method: Name of test - statistic: Spearman’s rho (ρ) - pvalue: p-value - alternative: Alternative hypothesis - alpha: Significance level - significant: Whether result is significant - stars: Significance stars - effect_size: Same as statistic (ρ) - effect_size_metric: ‘rho’ - effect_size_interpretation: Interpretation - rho_squared: Proportion of variance explained - n: Sample size - var_x: First variable name - var_y: Second variable name

Return type:

dict or DataFrame or (dict, Figure)

Notes

Spearman’s ρ is the Pearson correlation of rank-transformed variables.

Assumptions: - Observations are independent - Variables are at least ordinal

Interpretation (same as Pearson): - |ρ| < 0.1: negligible - |ρ| < 0.3: small - |ρ| < 0.5: medium - |ρ| ≥ 0.5: large

References

Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72-101.

Examples

>>> import numpy as np
>>> from scitex_stats.tests.correlation import test_spearman

# Example 1: Perfect monotonic relationship >>> x = np.array([1, 2, 3, 4, 5]) >>> y = np.array([1, 4, 9, 16, 25]) # Quadratic relationship >>> result = test_spearman(x, y, var_x=’x’, var_y=’y²’, plot=True) >>> print(result)

# Example 2: Outlier-robust correlation >>> x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 100]) # Outlier >>> y = np.array([2, 4, 6, 8, 10, 12, 14, 16, 18, 20]) >>> result = test_spearman(x, y, plot=True) >>> print(f”Spearman ρ = {result[‘statistic’]:.3f}, p = {result[‘pvalue’]:.4f}”)

# Example 3: Compare with Pearson >>> from scitex_stats.tests.correlation import test_pearson >>> x = np.random.exponential(scale=2, size=50) >>> y = x + np.random.normal(0, 1, size=50) >>> spearman_result = test_spearman(x, y) >>> pearson_result = test_pearson(x, y) >>> print(f”Spearman: ρ = {spearman_result[‘statistic’]:.3f}”) >>> print(f”Pearson: r = {pearson_result[‘statistic’]:.3f}”)

# Example 4: Ordinal data >>> satisfaction = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5]) >>> quality = np.array([2, 3, 4, 4, 5, 1, 2, 3, 5, 4]) >>> result = test_spearman(satisfaction, quality, … var_x=’Satisfaction’, var_y=’Quality’, … plot=True)

# Example 5: One-tailed test >>> x = np.arange(20) >>> y = x + np.random.normal(0, 2, size=20) >>> result = test_spearman(x, y, alternative=’greater’) >>> print(f”One-tailed p-value: {result[‘pvalue’]:.4f}”)

# Example 6: Non-linear monotonic relationship >>> x = np.linspace(0, 10, 50) >>> y = np.log(x + 1) + np.random.normal(0, 0.1, size=50) >>> result = test_spearman(x, y, var_x=’x’, var_y=’log(x+1)’, plot=True)

# Example 7: Export to various formats >>> result = test_spearman(x, y, return_as=’dataframe’) >>> convert_results(result, return_as=’latex’, path=’spearman.tex’) >>> convert_results(result, return_as=’csv’, path=’spearman.csv’)

scitex.stats.test_kendall(x, y, var_x='x', var_y='y', alternative='two-sided', variant='b', alpha=0.05, plot=False, ax=None, data=None, return_as='dict', decimals=3, verbose=False)[source]

Perform Kendall’s tau correlation test.

Parameters:

x (array or Series) – First variable
y (array or Series) – Second variable
var_x (str, default 'x') – Name for x variable
var_y (str, default 'y') – Name for y variable
alternative ({'two-sided', 'less', 'greater'}, default 'two-sided') – Alternative hypothesis: - ‘two-sided’: tau ≠ 0 - ‘less’: tau < 0 (negative association) - ‘greater’: tau > 0 (positive association)
variant ({'b', 'c'}, default 'b') – Tau variant: - ‘b’: tau-b (Kendall’s tau-b, accounts for ties) - ‘c’: tau-c (Stuart’s tau-c, for contingency tables)
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate scatter plot
ax (matplotlib.axes.Axes, optional) – Axes to plot on. If provided, plot is set to True
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string values for x/y are resolved as column names (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – If True, print test results to logger

Returns:

result – Test results including: - test_method: Name of test - statistic: Kendall’s tau coefficient - pvalue: p-value - tau_squared: tau² (proportion of variance explained) - effect_size: tau (same as statistic) - effect_size_interpretation: interpretation - n: Sample size - n_concordant: Number of concordant pairs - n_discordant: Number of discordant pairs - n_ties: Number of tied pairs - significant: Whether to reject null hypothesis - stars: Significance stars

Return type:

dict or DataFrame

Notes

Kendall’s tau is a non-parametric measure of monotonic association between two variables. It is based on concordant and discordant pairs.

Null Hypothesis (H0): No monotonic association (tau = 0)

Alternative Hypothesis (H1): Monotonic association exists

Concordant vs Discordant Pairs: For pairs (x_i, y_i) and (x_j, y_j): - Concordant: (x_i < x_j and y_i < y_j) or (x_i > x_j and y_i > y_j) - Discordant: (x_i < x_j and y_i > y_j) or (x_i > x_j and y_i < y_j)

Kendall’s tau-b (accounts for ties):

\[\begin{split}\tau_b = \frac{n_c - n_d}{\\sqrt{(n_0 - n_1)(n_0 - n_2)}}\end{split}\]

Where: - n_c: Number of concordant pairs - n_d: Number of discordant pairs - n_0: n(n-1)/2 (total possible pairs) - n_1: Sum of t_i(t_i-1)/2 for ties in x - n_2: Sum of u_j(u_j-1)/2 for ties in y

Interpretation: - tau = 1: Perfect positive association - tau = 0: No association - tau = -1: Perfect negative association

Effect size interpretation (same as correlation): - |tau| < 0.1: negligible - |tau| < 0.3: small - |tau| < 0.5: medium - |tau| ≥ 0.5: large

Advantages over Spearman: - More robust to outliers - Better for small samples - Better interpretation (probability of concordance) - More accurate p-values with ties

Disadvantages: - Computationally more expensive (O(n²)) - Generally smaller magnitude than Spearman’s rho - Less intuitive interpretation than Pearson

When to use Kendall’s tau: - Small sample sizes (n < 30) - Data with many ties - Ordinal data - Non-normal data with outliers

Examples

>>> import numpy as np
>>> from scitex_stats.tests.correlation import test_kendall
>>>
>>> # Monotonic relationship with ties
>>> x = np.array([1, 2, 2, 3, 4, 4, 5, 6, 7])
>>> y = np.array([2, 3, 3, 5, 6, 6, 8, 9, 10])
>>>
>>> result = test_kendall(x, y, var_x='Treatment Dose', var_y='Response',
...                       plot=True)
>>> print(f"τ = {result['statistic']:.3f}, p = {result['pvalue']:.4f}")
>>> print(f"Concordant pairs: {result['n_concordant']}")

References

See also

test_spearman: Alternative rank correlation
test_pearson: Parametric correlation

scitex.stats.test_theilsen(x, y, var_x='x', var_y='y', data=None, return_as='dict', verbose=True)[source]

Theil-Sen robust regression estimator.

A robust non-parametric regression method that estimates the slope as the median of all pairwise slopes. Highly resistant to outliers (up to 29.3% breakdown point).

Parameters:

x (array-like) – Independent variable
y (array-like) – Dependent variable
var_x (str, default="x") – Name of independent variable (for reporting)
var_y (str, default="y") – Name of dependent variable (for reporting)
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string values for x/y are resolved as column names (seaborn-style).
return_as (str, default="dict") – Format of return value: “dict” or “dataframe”
verbose (bool, default=True) – Whether to print results

Returns:

Dictionary or DataFrame containing: - slope : float

Theil-Sen slope estimate (median of pairwise slopes)

interceptfloat
Intercept of the regression line
low_slopefloat
Lower bound of slope confidence interval
high_slopefloat
Upper bound of slope confidence interval
var_xstr
Name of independent variable
var_ystr
Name of dependent variable

Return type:

dict or pd.DataFrame

Notes

The Theil-Sen estimator: - Is robust to outliers (up to ~29% outliers) - Has no distributional assumptions - Is asymptotically normal - Has ~64% efficiency compared to OLS for normal data - Computational complexity: O(n²)

References

Examples

>>> import numpy as np
>>> from scitex_stats.tests.correlation import test_theilsen
>>> x = np.array([1, 2, 3, 4, 5])
>>> y = np.array([2, 4, 6, 8, 10])
>>> result = test_theilsen(x, y, verbose=False)
>>> print(f"Slope: {result['slope']:.3f}")
Slope: 2.000

>>> # With outlier
>>> y_outlier = np.array([2, 4, 6, 8, 100])  # One extreme outlier
>>> result = test_theilsen(x, y_outlier, verbose=False)
>>> print(f"Robust slope: {result['slope']:.3f}")
Robust slope: 2.000

scitex.stats.test_chi2(observed, var_row=None, var_col=None, alpha=0.05, correction=True, plot=False, ax=None, return_as='dict', decimals=3, verbose=False)[source]

Chi-square test of independence for contingency tables.

Tests whether two categorical variables are independent.

Parameters:

observed (array-like or DataFrame) – Observed frequencies as contingency table (rows × columns) If DataFrame, row/column names used as variable names
var_row (str, optional) – Name of row variable (default: ‘row_variable’)
var_col (str, optional) – Name of column variable (default: ‘col_variable’)
alpha (float, default 0.05) – Significance level
correction (bool, default True) – Apply Yates’ continuity correction for 2×2 tables
plot (bool, default False) – If True, create mosaic plot visualization
ax (matplotlib.axes.Axes, optional) – Axes to plot on. If provided, plot is set to True
return_as ({'dict', 'dataframe'}, default 'dict') – Return format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – If True, print test results to logger

Returns:

result – Test results with: - test_method: Name of test - statistic: Chi-square statistic (χ²) - pvalue: p-value - df: Degrees of freedom - alpha: Significance level - significant: Whether result is significant - stars: Significance stars - effect_size: Cramér’s V - effect_size_metric: “Cramér’s V” - effect_size_interpretation: Interpretation - n: Total sample size - expected_min: Minimum expected frequency - var_row: Row variable name - var_col: Column variable name

Return type:

dict or DataFrame

Notes

Chi-square test of independence tests: H₀: Two categorical variables are independent H₁: Two categorical variables are associated

Test statistic: χ² = Σ[(O - E)² / E] where O = observed frequencies, E = expected frequencies

Assumptions: 1. Independence of observations 2. Expected frequencies ≥ 5 in at least 80% of cells 3. No expected frequencies < 1

For 2×2 tables with small expected frequencies, use Fisher’s exact test instead.

Cramér’s V measures strength of association (0 to 1): - 0 = no association - 1 = perfect association

References

Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press.

Examples

>>> import numpy as np
>>> from scitex_stats.tests.categorical import test_chi2

# Example 1: 2×2 contingency table (treatment × outcome) >>> observed = np.array([[30, 10], [20, 40]]) >>> result = test_chi2(observed, var_row=’Treatment’, var_col=’Outcome’, plot=True) >>> print(result)

# Example 2: Using DataFrame >>> import pandas as pd >>> df = pd.DataFrame([[12, 8, 5], [15, 20, 10]], … index=[‘Group A’, ‘Group B’], … columns=[‘Low’, ‘Med’, ‘High’]) >>> result = test_chi2(df, plot=True)

# Example 3: Test gender × preference association >>> observed = np.array([ … [20, 30, 15], # Male: product A, B, C … [25, 20, 40] # Female: product A, B, C … ]) >>> result = test_chi2(observed, var_row=’Gender’, var_col=’Product’, plot=True) >>> print(f”χ² = {result[‘statistic’]:.2f}, p = {result[‘pvalue’]:.4f}”) >>> print(f”Cramér’s V = {result[‘effect_size’]:.3f} ({result[‘effect_size_interpretation’]})”)

# Example 4: Small expected frequencies warning >>> observed = np.array([[2, 8], [3, 7]]) # Small counts >>> result = test_chi2(observed)

# Example 5: Export to various formats >>> result = test_chi2(observed, return_as=’dataframe’) >>> convert_results(result, return_as=’latex’, path=’chi2_test.tex’)

scitex.stats.test_fisher(observed, var_row=None, var_col=None, alternative='two-sided', alpha=0.05, plot=False, ax=None, return_as='dict', decimals=3, verbose=False)[source]

Fisher’s exact test for 2×2 contingency tables.

Tests association between two binary categorical variables. Exact test (no large-sample approximation required).

Parameters:

observed (array-like or DataFrame) – 2×2 contingency table as [[a, b], [c, d]] If DataFrame, row/column names used as variable names
var_row (str, optional) – Name of row variable (default: ‘row_variable’)
var_col (str, optional) – Name of column variable (default: ‘col_variable’)
alternative ({'two-sided', 'less', 'greater'}, default 'two-sided') – Alternative hypothesis: - ‘two-sided’: odds ratio ≠ 1 - ‘less’: odds ratio < 1 - ‘greater’: odds ratio > 1
alpha (float, default 0.05) – Significance level for confidence interval
plot (bool, default False) – If True, create visualization
ax (matplotlib.axes.Axes, optional) – Axes to plot on. If provided, plot is set to True
return_as ({'dict', 'dataframe'}, default 'dict') – Return format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – If True, print test results to logger

Returns:

result – Test results with: - test_method: Name of test - statistic: Odds ratio - pvalue: Exact p-value - alternative: Alternative hypothesis - alpha: Significance level - significant: Whether result is significant - stars: Significance stars - effect_size: Odds ratio - effect_size_metric: ‘Odds ratio’ - effect_size_interpretation: Interpretation - ci_lower: Lower CI bound for odds ratio - ci_upper: Upper CI bound for odds ratio - n: Total sample size - var_row: Row variable name - var_col: Column variable name

Return type:

dict or DataFrame

Notes

Fisher’s exact test computes exact probability of observed table (and more extreme tables) under independence assumption.

H₀: Two binary variables are independent (OR = 1) H₁: Variables are associated (OR ≠ 1)

Odds Ratio (OR): For table [[a, b], [c, d]]: OR = (a × d) / (b × c)

Interpretation: - OR = 1: No association - OR > 1: Positive association - OR < 1: Negative association

When to use: - 2×2 contingency tables - Small sample sizes (any cell < 5) - Need exact p-value (not approximation)

Advantages over chi-square: - Exact test (valid for any sample size) - No minimum expected frequency requirement - More powerful for small samples

References

Fisher, R. A. (1922). On the interpretation of χ² from contingency tables, and the calculation of P. Journal of the Royal Statistical Society, 85(1), 87-94.

Examples

>>> import numpy as np
>>> from scitex_stats.tests.categorical import test_fisher

# Example 1: Small 2×2 table (treatment × outcome) >>> observed = [[8, 2], [1, 5]] >>> result = test_fisher(observed, var_row=’Treatment’, var_col=’Response’, plot=True) >>> print(result)

# Example 2: Case-control study >>> exposed_cases = 12 >>> unexposed_cases = 5 >>> exposed_controls = 8 >>> unexposed_controls = 20 >>> observed = [[exposed_cases, unexposed_cases], … [exposed_controls, unexposed_controls]] >>> result = test_fisher(observed, var_row=’Exposure’, var_col=’Disease’) >>> print(f”OR = {result[‘statistic’]:.2f}, 95% CI [{result[‘ci_lower’]:.2f}, {result[‘ci_upper’]:.2f}]”) >>> print(f”p = {result[‘pvalue’]:.4f}”)

# Example 3: One-tailed test (expect positive association) >>> observed = [[10, 2], [3, 8]] >>> result = test_fisher(observed, alternative=’greater’) >>> print(f”One-tailed p = {result[‘pvalue’]:.4f}”)

# Example 4: Using pandas DataFrame >>> import pandas as pd >>> df = pd.DataFrame([[15, 5], [3, 10]], … index=[‘Group A’, ‘Group B’], … columns=[‘Success’, ‘Failure’]) >>> result = test_fisher(df, plot=True)

# Example 5: Compare with chi-square >>> from scitex_stats.tests.categorical import test_chi2 >>> observed = [[5, 10], [10, 5]] >>> fisher_result = test_fisher(observed) >>> chi2_result = test_chi2(observed) >>> print(f”Fisher’s exact p = {fisher_result[‘pvalue’]:.4f}”) >>> print(f”Chi-square p = {chi2_result[‘pvalue’]:.4f}”)

scitex.stats.test_mcnemar(observed, var_before=None, var_after=None, correction=True, alpha=0.05, plot=False, ax=None, return_as='dict', decimals=3, verbose=False)[source]

Perform McNemar’s test for paired categorical data.

Tests whether there is a significant change in proportions for paired binary data. Appropriate for before-after studies with binary outcomes.

Parameters:

observed (array-like, shape (2, 2)) –
2×2 contingency table: [[a, b],

[c, d]]

where: - a: both conditions negative (0,0) - b: before negative, after positive (0,1) - c: before positive, after negative (1,0) - d: both conditions positive (1,1)
var_before (str, optional) – Name for before condition
var_after (str, optional) – Name for after condition
correction (bool, default True) – Whether to apply continuity correction (recommended for small samples)
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate visualization
ax (matplotlib.axes.Axes, optional) – Axes to plot on. If provided, plot is set to True
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding
verbose (bool, default False) – If True, print test results to logger

Returns:

result – Test results including: - test_method: Name of test - statistic: χ² statistic - pvalue: p-value - df: degrees of freedom (always 1) - b: count of (before=0, after=1) - c: count of (before=1, after=0) - odds_ratio: b / c - effect_size: odds ratio - effect_size_interpretation: interpretation - significant: whether to reject null hypothesis - stars: significance stars

Return type:

dict or DataFrame

Notes

McNemar’s test is used for paired nominal data, testing whether row and column marginal frequencies are equal (marginal homogeneity).

The test statistic is based on the discordant pairs (b and c):

\[\begin{split}\\chi^2 = \\frac{(b - c)^2}{b + c} \\quad \\text{(without correction)}\end{split}\]

\[\begin{split}\\chi^2 = \\frac{(|b - c| - 1)^2}{b + c} \\quad \\text{(with correction)}\end{split}\]

Null hypothesis: The marginal proportions are equal (no change) Alternative: The marginal proportions differ (significant change)

Assumptions: - Paired data (matched observations) - Binary outcomes for both conditions - Large enough sample (b + c ≥ 10 recommended for chi-square approximation)

Effect size (Odds Ratio): OR = b / c - OR = 1: no change - OR > 1: increase (more transitions from 0→1 than 1→0) - OR < 1: decrease (more transitions from 1→0 than 0→1)

Examples

>>> import numpy as np
>>> from scitex_stats.tests.categorical import test_mcnemar
>>>
>>> # Example: Treatment effectiveness (before/after)
>>> # Rows: before, Columns: after
>>> # [[no→no, no→yes],
>>> #  [yes→no, yes→yes]]
>>> observed = [[59, 6],   # 59 stayed negative, 6 improved
...             [16, 19]]  # 16 relapsed, 19 stayed positive
>>>
>>> result = test_mcnemar(observed, var_before='Before Treatment',
...                       var_after='After Treatment', plot=True)
>>> print(f"χ² = {result['statistic']:.2f}, p = {result['pvalue']:.4f}")
>>> print(f"Odds Ratio = {result['odds_ratio']:.2f}")

References

See also

test_chi2: For independent (unpaired) categorical data
test_fisher: For 2×2 tables with small expected frequencies

scitex.stats.test_cochran_q(data, subject_col=None, condition_col=None, value_col=None, condition_names=None, alpha=0.05, plot=False, return_as='dict', decimals=3)[source]

Perform Cochran’s Q test for binary repeated measures.

Extension of McNemar’s test to 3+ conditions. Tests whether proportions of successes differ across multiple related binary measurements.

Parameters:

data (array or DataFrame) –
- If array: shape (n_subjects, n_conditions), wide format with 0/1 values
- If DataFrame with subject_col/condition_col: long format
- If DataFrame without: wide format (rows=subjects, cols=conditions)
subject_col (str, optional) – Column name for subject IDs (long format)
condition_col (str, optional) – Column name for conditions (long format)
value_col (str, optional) – Column name for binary values (long format)
condition_names (list of str, optional) – Names for conditions (wide format)
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate visualization
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding

Return type:

Union[dict, DataFrame, Tuple]

Returns:

result (dict or DataFrame) – Test results including: - statistic: Cochran’s Q statistic - pvalue: p-value - df: Degrees of freedom (k - 1) - effect_size: Kendall’s W - effect_size_interpretation: interpretation - n_subjects: Number of subjects - n_conditions: Number of conditions - proportions: Success proportion for each condition - n_successes: Number of successes per condition - significant: Whether to reject null hypothesis - stars: Significance stars
If plot=True, returns tuple of (result, figure)

Notes

Cochran’s Q test is used for repeated binary measurements (dichotomous data) on the same subjects across 3+ conditions.

Null Hypothesis (H0): Proportions of successes are equal across conditions

Alternative Hypothesis (H1): At least one proportion differs

Test Statistic:

\[\begin{split}Q = \\frac{(k-1)[k\\sum_{j=1}^{k}G_j^2 - N^2]}{k\\sum_{i=1}^{n}L_i - \\sum_{i=1}^{n}L_i^2} # noqa: D301\end{split}\]

Where: - k: Number of conditions - n: Number of subjects - G_j: Number of successes in condition j - L_i: Number of successes for subject i (across conditions) - N: Total number of successes

Q follows chi-square distribution with k-1 degrees of freedom.

Effect Size (Kendall’s W for binary):

\[\begin{split}W = \\frac{\\sum_{j=1}^{k}(G_j - \\bar{G})^2}{n(k-1)k/12} # noqa: D301\end{split}\]

Interpretation: - W < 0.1: negligible - W < 0.3: small - W < 0.5: medium - W ≥ 0.5: large

Assumptions: - Binary outcomes (0/1, success/failure, yes/no) - Repeated measurements on same subjects - At least 3 conditions (for 2 conditions, use McNemar’s test)

Relation to other tests: - Extension of McNemar’s test (2 conditions → 3+ conditions) - Binary version of Friedman test - Can use Friedman test on same data (Q ≈ Friedman χ²)

Post-hoc tests: If significant: - Pairwise McNemar tests - Apply corrections: correct_bonferroni(), correct_holm()

Advantages: - Appropriate for binary repeated measures - No normality assumption - Accounts for within-subject correlation

Disadvantages: - Requires binary data - Sensitive to subjects with all 0s or all 1s - Less powerful than parametric alternatives if assumptions met

Examples

>>> import numpy as np
>>> from scitex_stats.tests.categorical import test_cochran_q
>>>
>>> # Example: Treatment success (0=fail, 1=success) across 4 visits
>>> data = np.array([
...     [0, 0, 1, 1],  # Subject 1: improved over time
...     [0, 1, 1, 1],  # Subject 2: improved
...     [0, 0, 0, 1],  # Subject 3: late improvement
...     [1, 1, 1, 1],  # Subject 4: always success
...     [0, 0, 1, 1],  # Subject 5: improved
... ])
>>>
>>> result = test_cochran_q(
...     data,
...     condition_names=['Visit 1', 'Visit 2', 'Visit 3', 'Visit 4'],
...     plot=True
... )
>>>
>>> print(f"Q = {result['statistic']:.2f}, p = {result['pvalue']:.4f}")
>>> print(f"Proportions: {result['proportions']}")

References

See also

test_mcnemar: For 2 binary conditions
test_friedman: Non-parametric repeated measures (non-binary)

scitex.stats.test_shapiro(x, var_x='x', alpha=0.05, plot=False, ax=None, data=None, return_as='dict', verbose=False)[source]

Perform Shapiro-Wilk test for normality.

Parameters:

x (array or Series) – Sample to test
var_x (str, default 'x') – Label for sample
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate Q-Q plot
ax (matplotlib.axes.Axes, optional) – Axes to plot on. If provided, plot is set to True
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string value for x is resolved as a column name (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
verbose (bool, default False) – If True, print test results to logger

Returns:

results – Test results including: - test_method: ‘Shapiro-Wilk test’ - statistic: W-statistic value (0 to 1, closer to 1 = more normal) - pvalue: p-value - stars: Significance stars - significant: Whether null hypothesis is rejected (True = not normal) - normal: Whether data appears normal (True = normal) - recommendation: Suggested statistical approach - n: Sample size - var_x: Variable label

Return type:

dict or DataFrame

Notes

The Shapiro-Wilk test tests the null hypothesis that data come from a normal distribution.

Null Hypothesis (H0): Data are normally distributed

Test Statistic W: Ranges from 0 to 1 - W close to 1: Data appear normal - W much less than 1: Data deviate from normality

p-value interpretation: - p > α (typically 0.05): Fail to reject H0, data appear normal - p ≤ α: Reject H0, data significantly deviate from normality

Important considerations: - Sensitive to sample size: with n > 50, may detect trivial deviations - Works best for 3 ≤ n ≤ 5000 - Should be combined with visual inspection (Q-Q plots) - Large samples: focus on Q-Q plots over p-values - Small samples: test may lack power to detect non-normality

Recommendations based on results: - Normal (p > 0.05): Use parametric tests (t-test, ANOVA, Pearson) - Non-normal (p ≤ 0.05): Use non-parametric tests (Brunner-Munzel, Wilcoxon, Spearman) - Borderline: Check Q-Q plot and consider robustness

References

Examples

>>> # Normal data
>>> x = np.random.normal(0, 1, 100)
>>> result = test_shapiro(x)
>>> result['normal']
True

>>> # Non-normal data
>>> x = np.random.exponential(2, 100)
>>> result = test_shapiro(x)
>>> result['normal']
False

>>> # With Q-Q plot
>>> result, fig = test_shapiro(x, plot=True)

scitex.stats.test_normality(*samples, var_names=None, alpha=0.05, warn=True)[source]

Check normality for multiple samples using Shapiro-Wilk test.

Parameters:

*samples (arrays) – Samples to check
var_names (list of str, optional) – Names for each sample
alpha (float, default 0.05) – Significance level
warn (bool, default True) – Whether to log warnings for non-normal data

Returns:

Dictionary with results for each sample: - ‘all_normal’: bool, True if all samples are normal - ‘results’: list of individual test results - ‘recommendation’: str, overall recommendation

Return type:

dict

Examples

>>> x = np.random.normal(0, 1, 50)
>>> y = np.random.exponential(2, 50)
>>> check = check_normality(x, y, var_names=['Normal', 'Exponential'])
>>> check['all_normal']
False
>>> check['recommendation']
'Some samples deviate from normality. Consider non-parametric tests.'

scitex.stats.test_ks_1samp(x, cdf='norm', args=(), var_x='x', alternative='two-sided', alpha=0.05, plot=False, ax=None, data=None, return_as='dict', decimals=3, verbose=False)[source]

Perform one-sample Kolmogorov-Smirnov test.

Parameters:

x (array or Series) – Sample to test
cdf (str or callable, default 'norm') – Reference distribution. Either: - String: ‘norm’, ‘uniform’, ‘expon’, etc. (scipy.stats distribution name) - Callable: CDF function
args (tuple, default ()) – Distribution parameters (e.g., (loc, scale) for normal)
var_x (str, default 'x') – Label for sample
alternative ({'two-sided', 'less', 'greater'}, default 'two-sided') – Alternative hypothesis
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate CDF comparison plot
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string value for x is resolved as a column name (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding

Return type:

Union[dict, DataFrame]

Returns:

results (dict or DataFrame) – Test results including: - test_method: ‘Kolmogorov-Smirnov test (1-sample)’ - statistic_name: ‘D’ - statistic: KS D-statistic (maximum CDF difference) - pvalue: p-value - pstars: Significance stars - rejected: Whether null hypothesis is rejected - n_x: Sample size - var_x: Variable label - reference_distribution: Name of reference distribution - H0: Null hypothesis description
fig (matplotlib.figure.Figure, optional) – Figure with CDF comparison (only if plot=True)

Notes

The one-sample Kolmogorov-Smirnov test compares the empirical cumulative distribution function (ECDF) of the sample against a reference CDF.

Null Hypothesis (H0): Data follow the specified distribution

Test Statistic D:

\[\begin{split}D = \\sup_x |F_n(x) - F(x)|\end{split}\]

Where: - F_n(x): Empirical CDF of sample - F(x): Reference CDF

Advantages: - Distribution-free (no assumptions about data) - Can test against any continuous distribution - More general than Shapiro-Wilk (not limited to normality)

Disadvantages: - Less powerful than Shapiro-Wilk for normality testing - Sensitive to sample size (large n → high power, may detect trivial deviations) - Assumes continuous distribution (not suitable for discrete data)

When to use: - Testing goodness-of-fit to any continuous distribution - Comparing sample to theoretical distribution - When Shapiro-Wilk is not applicable (non-normal distributions) - Large sample sizes (n > 50)

References

Examples

>>> # Test if data are normally distributed
>>> x = np.random.normal(0, 1, 100)
>>> result = test_ks_1samp(x, cdf='norm', args=(0, 1))
>>> result['rejected']
False

>>> # Test if data are uniformly distributed
>>> x = np.random.uniform(0, 1, 100)
>>> result = test_ks_1samp(x, cdf='uniform', args=(0, 1))

scitex.stats.test_ks_2samp(x, y, var_x='x', var_y='y', alternative='two-sided', alpha=0.05, plot=False, ax=None, data=None, return_as='dict', decimals=3, verbose=False)[source]

Perform two-sample Kolmogorov-Smirnov test.

Parameters:

x (arrays or Series) – Two samples to compare
y (arrays or Series) – Two samples to compare
var_x (str) – Labels for samples
var_y (str) – Labels for samples
alternative ({'two-sided', 'less', 'greater'}, default 'two-sided') – Alternative hypothesis
alpha (float, default 0.05) – Significance level
plot (bool, default False) – Whether to generate CDF comparison plot
data (DataFrame, str, or None, optional) – DataFrame or CSV path. When provided, string values for x/y are resolved as column names (seaborn-style).
return_as ({'dict', 'dataframe'}, default 'dict') – Output format
decimals (int, default 3) – Number of decimal places for rounding

Return type:

Union[dict, DataFrame]

Returns:

results (dict or DataFrame) – Test results including: - test_method: ‘Kolmogorov-Smirnov test (2-sample)’ - statistic_name: ‘D’ - statistic: KS D-statistic - pvalue: p-value - pstars: Significance stars - rejected: Whether null hypothesis is rejected - n_x, n_y: Sample sizes - var_x, var_y: Variable labels - H0: Null hypothesis description
fig (matplotlib.figure.Figure, optional) – Figure with CDF comparison (only if plot=True)

Notes

The two-sample Kolmogorov-Smirnov test compares the ECDFs of two samples.

Null Hypothesis (H0): Both samples come from the same distribution

Test Statistic D:

\[\begin{split}D = \\sup_x |F_{n_1}(x) - F_{n_2}(x)|\end{split}\]

Where F_{n_1} and F_{n_2} are the empirical CDFs.

Advantages: - Distribution-free (non-parametric) - Tests entire distribution, not just location - Can detect differences in location, scale, or shape

Disadvantages: - Less powerful than t-test when assumptions are met - Most sensitive to differences near the center of distributions - Less sensitive to tail differences

When to use: - Comparing two independent samples - No assumptions about distribution shape - Want to test overall distribution equality (not just means) - Alternative to t-test when normality violated

Comparison with other tests: - vs t-test: More robust, less powerful - vs Mann-Whitney U: Tests different hypotheses (distribution vs median) - vs Brunner-Munzel: KS tests full distribution, BM tests P(X>Y)

Examples

>>> # Two samples from same distribution
>>> x = np.random.normal(0, 1, 100)
>>> y = np.random.normal(0, 1, 100)
>>> result = test_ks_2samp(x, y)
>>> result['rejected']
False

>>> # Two samples from different distributions
>>> x = np.random.normal(0, 1, 100)
>>> y = np.random.normal(2, 1, 100)
>>> result = test_ks_2samp(x, y)
>>> result['rejected']
True

scitex.stats.test_kendalls_w(data, *, subj_col=None, rater_col=None, score_col=None, use_abs=False, alpha=0.05, return_as='dict', decimals=3, verbose=False)[source]

Kendall’s coefficient of concordance W.

Parameters:

data (ndarray | DataFrame | dict) – 2-D matrix of scores (rows = subjects, cols = raters), or a long-format DataFrame with (subj, rater, score) triples (pass subj_col, rater_col, score_col to pivot).
subj_col (str, optional) – Column names for the long-format input. Ignored if data is already a wide matrix.
rater_col (str, optional) – Column names for the long-format input. Ignored if data is already a wide matrix.
score_col (str, optional) – Column names for the long-format input. Ignored if data is already a wide matrix.
use_abs (bool, default False) – If True, rank |score| instead of score. Useful when the sign of the score is irrelevant to the agreement question (e.g. ranking channels by |effect size|).
alpha (float, default 0.05) – Significance level (used only for the formatted summary).
return_as ({"dict", "dataframe"}, default "dict") – Output container.
decimals (int, default 3) – Rounding for the formatted summary.
verbose (bool, default False)

Returns:

Keys: name, statistic, W, S, n, k, dof, chi2, pvalue, alpha, significant, formatted, sym, stars, effect_size, effect_size_label, interpretation.

Return type:

dict | DataFrame

Notes

W is computed on the within-rater ranks of each subject. Ties are handled by scipy.stats.rankdata (average ranks). With k raters and n subjects:

\[S = \sum_{i=1}^{n} \Big(R_i - \bar{R}\Big)^2, \qquad W = \frac{12 S}{k^2 (n^3 - n)}\]

The significance approximation uses Friedman’s chi-square statistic $\chi^2 = k(n-1) W$ with $n-1$ degrees of freedom. The approximation is reasonable for $n \ge 5$.

References

Kendall, M. G. & Babington Smith, B. (1939). The Problem of m Rankings. Annals of Mathematical Statistics, 10(3), 275-287. Schmidt, F. (1997). Managing Project Risk and Uncertainty. Legendre, P. (2005). Species associations: the Kendall coefficient of concordance revisited. Journal of Agricultural, Biological, and Environmental Statistics, 10(2), 226-245.

scitex.stats.test_icc(data, *, form='3,k', subj_col=None, rater_col=None, score_col=None, alpha=0.05, return_as='dict', decimals=3, verbose=False)[source]

Intraclass correlation (Shrout & Fleiss 1979).

Parameters:

data (ndarray | DataFrame | dict) – 2-D (n_subjects, k_raters) matrix of continuous scores, or a long-format DataFrame with (subj, rater, score) triples (pass subj_col, rater_col, score_col to pivot).
form ({"1,1", "2,1", "3,1", "1,k", "2,k", "3,k"}, default "3,k") – Which ICC form to surface at the top level (statistic, pvalue, sym, formatted). All six are always available in the result dict.
subj_col (str, optional) – Long-form column names; ignored if data is wide.
rater_col (str, optional) – Long-form column names; ignored if data is wide.
score_col (str, optional) – Long-form column names; ignored if data is wide.
alpha (float, default 0.05)
return_as ({"dict", "dataframe"}, default "dict")
decimals (int, default 3)
verbose (bool, default False)

Returns:

Keys include ICC(1,1), ICC(2,1), ICC(3,1), ICC(1,k), ICC(2,k), ICC(3,k), plus the selected form’s statistic / pvalue / df1 / df2 / formatted at the top level.

Return type:

dict | DataFrame

Notes

Form selection table (Shrout & Fleiss 1979, McGraw & Wong 1996):

Form

Model

Type

Measure

Use case

ICC(1, k)

1-way

random

avg of k

raters
interchang.

ICC(2, k)

2-way

random

avg of k

generalise
to popul.

ICC(3, k)

2-way

mixed

avg of k

these k
are it

Use form="3,k" (default) when the k raters in your data ARE the raters of interest (e.g. months of recording in a single patient) — this is the most common choice for repeated measures.

References

Shrout, P. E. & Fleiss, J. L. (1979). Intraclass Correlations: Uses in Assessing Rater Reliability. Psychological Bulletin, 86(2), 420. McGraw, K. O. & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1). Koo, T. K. & Li, M. Y. (2016). A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. Journal of Chiropractic Medicine, 15(2), 155-163.

Stats Module (stx.stats)

Quick Reference

Available Tests

Seaborn-Style Data Parameter

Test Recommendation

Output Formats

Descriptive Statistics

Multiple Comparison Correction

Post-hoc Tests

API Reference

Functionalities

IO

Dependencies

Stats Module (`stx.stats`)