scembed.scIBAggregator#

class scembed.scIBAggregator(entity, project, output_dir=None)#

Aggregator for WandB sweep results with scIB metrics visualization.

Retrieves runs from a WandB project, organizes them by integration method, and provides visualization capabilities using scIB metrics formatting [LButtnerC+22].

Parameters:
  • entity (str) – WandB entity name.

  • project (str) – WandB project name.

  • output_dir (str | Path | None (default: None)) – Directory for saving downloaded models and embeddings. If None, creates temporary directory.

Attributes table#

available_methods

List of available methods in the data.

Methods table#

aggregate([sort_by, min_max_scale_for_selection])

Aggregate best run per method into unified results structure.

fetch_runs()

Fetch runs from WandB and process into internal storage.

get_method_runs(method[, sort_by, min_max_scale])

Get all data for a specific method, optionally sorted by a metric.

get_models_and_embeddings([...])

Download models and embeddings for the best-performing run per method.

Attributes#

scIBAggregator.available_methods#

List of available methods in the data.

Methods#

scIBAggregator.aggregate(sort_by='Total', min_max_scale_for_selection=True)#

Aggregate best run per method into unified results structure.

Creates self.results with the same structure as individual method results: - configs: DataFrame with best configurations per method - scib_benchmarker: Benchmarker object with all best runs - other_logs: DataFrame with other logs from best runs per method

Parameters:
  • sort_by (str (default: 'Total')) – Metric to sort by for selecting best run per method. Default is “Total” (overall scIB score).

  • min_max_scale_for_selection (bool (default: True)) – Whether to use min-max scaled metrics for selecting the best run per method. If True, uses scaled metrics for fair comparison across different metric ranges. If False, uses raw metrics for selection. Default is True. Note: reported metrics are always raw (non-scaled) values.

Return type:

None

scIBAggregator.fetch_runs()#

Fetch runs from WandB and process into internal storage.

Return type:

None

scIBAggregator.get_method_runs(method, sort_by='Total', min_max_scale=True)#

Get all data for a specific method, optionally sorted by a metric.

Parameters:
  • method (str) – Method name to retrieve data for.

  • sort_by (str (default: 'Total')) – Metric name to sort runs by. Default is “Total”. Must be a valid metric name (e.g., ‘Total’, ‘Silhouette label’, etc.).

  • min_max_scale (bool (default: True)) – Whether to use min-max scaled metrics for sorting. Default is True. If True, uses scaled metrics for fair comparison across different metric ranges. If False, uses raw metrics for sorting.

Return type:

dict[str, DataFrame | Benchmarker]

Returns:

dict Dictionary with ‘configs’ (DataFrame), ‘scib_benchmarker’ (Benchmarker), and ‘other_logs’ (DataFrame). The ‘configs’ and ‘other_logs’ DataFrames are sorted by the specified metric in descending order (best runs first).

scIBAggregator.get_models_and_embeddings(model_artifact_name='trained_model', embedding_artifact_name='embedding')#

Download models and embeddings for the best-performing run per method.

Creates a folder structure in output_dir with models and embeddings organized by method. Models are only downloaded for methods that have them (GPU-based methods).

Parameters:
  • model_artifact_name (str (default: 'trained_model')) – Name of the model artifact in wandb.

  • embedding_artifact_name (str (default: 'embedding')) – Name of the embedding artifact in wandb.

Return type:

None