skripts.ML package

Submodules

skripts.ML.ML4com module

Methods for Machine Learning for community inference.

class skripts.ML.ML4com.SKL_Classifier(X, ys, cv, configuration_space: ConfigurationSpace, classifier, n_trials: int)[source]

Bases: object

Representation of Scikit-learn classifier for SMAC3

train(config: Configuration, seed: int = 0) float64[source]

Train the classifier with given configuration.

Parameters:
  • config (ConfigSpace.Configuration) – Configuration

  • seed (int, optional) – Seed value to control randomness, defaults to 0

Returns:

Loss (1-accuracy)

Return type:

np.float64

skripts.ML.ML4com.dict_permutations(dictionary: dict) List[dict][source]

Combine all value combinations in a dictionary into a list of dictionaries.

Parameters:

dictionary (dict) – Input dictionary

Returns:

All permutations of dictionary

Return type:

List[dict]

skripts.ML.ML4com.evaluate_model_sklearn(X, ys, labels, indir, data_source, algorithm_name, outdir, verbosity=0)[source]

Evaluate a model against the given hyperparameters for all organisms and save the resulting metrics and feature importances.

Parameters:
  • X (dataframe-like) – Data values

  • ys (dataframe-like) – True classes for one sample

  • labels (dataframe-like) – Labels

  • indir (path-like) – Input directory

  • data_source (str) – Data source

  • algorithm_name (str) – Algorithm name

  • outdir (path-like) – Output directory

  • verbosity (int, optional) – Level of verbosity, defaults to 0

skripts.ML.ML4com.extract_best_hyperparameters_from_incumbent(incumbent, configuration_space: ConfigurationSpace)[source]

Extract a optimized set of hyperparameters from incumbent. Returns default if none was found.

Parameters:
  • incumbent (config(s)) – _description_

  • configuration_space (ConfigSpace.ConfigurationSpace) – Configuration Space

Returns:

Best hyperparameters

Return type:

config

skripts.ML.ML4com.extract_metrics(true_labels, prediction, scoring, run_label=None, cv_i=None, metrics_df: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [Run, Cross-Validation run, Accuracy, AUC, TPR, FPR, Threshold, Conf_Mat] Index: []) DataFrame[source]

Extract metrics from machine learning. Metrics are TPR, FPR, Thresholds for ROC curve + AUC and accuracy with confusion matrix.

Parameters:
  • true_labels (array-like) – True labels

  • prediction (array-like) – Prediction

  • scoring (array-like) – Scoring

  • run_label (any, optional) – Label of run, defaults to None

  • cv_i (any (usually int), optional) – cross-validation instance, defaults to None

  • metrics_df – Dataframe with all extracted metrics, defaults to

pd.DataFrame(columns=[“Run”, “Cross-Validation run”, “Accuracy”, “AUC”, “TPR”, “FPR”, “Threshold”, “Conf_Mat”]) :type metrics_df: pd.DataFrame, optional :return: Dataframe, filled with metrics :rtype: pandas.DataFrame

skripts.ML.ML4com.individual_layers_to_tuple(config) dict[source]

Change hidden_layer_sizes to tuple representation.

Parameters:

config (dict-like) – Configuration

Returns:

Config with changed hidden_layer_sizes as tuples

Return type:

dict

skripts.ML.ML4com.intersect_impute_on_left(df_base: DataFrame, df_right: DataFrame, imputation: str = 'zero') DataFrame[source]

Use all indices from the left (df_base) DataFrame and fill it with the intersection of df_right. Indices without match are imputed by zero, mean or via kNN, whereas k can be specified as a number. The default is 5.

Parameters:
  • df_base (pandas.DataFrame) – Left datframe

  • df_right (pandas.DataFrame) – Right dataframe

  • imputation (str, optional) – Imputation procedure [zero|mean|kNN], defaults to “zero”

Returns:

Merged dataframe with imputed values

Return type:

pandas.DataFrame

skripts.ML.ML4com.join_df_metNames(df: DataFrame, grouper='peakID', include_mass=False) DataFrame[source]

Join dataframe column metNames along grouper column. Sets common index for combination of positively and negatively charged dataframes along their metabolite Names

Parameters:
  • df (pandas.Dataframe) – Input dataframe

  • grouper (str, optional) – Grouper column, defaults to “peakID”

  • include_mass (bool, optional) – Include mass column, defaults to False

Returns:

Combined datafraame

Return type:

pandas.Dataframe

skripts.ML.ML4com.nested_cross_validate_model_sklearn(X, ys, labels, classifier, configuration_space, n_trials, name, algorithm_name, outdir, fold: KFold | StratifiedKFold = KFold(n_splits=5, random_state=None, shuffle=False), inner_fold: int = 3, n_workers: int = 1, verbosity: int = 0)[source]

Cross-validate a model against the given hyperparameters for all organisms in a nested manner.

Parameters:
  • X (dataframe-like) – Data values

  • ys (dataframe-like) – True classes for one sample

  • labels (dataframe-like) – Labels

  • classifier (Classifier from sklearn) – Classifier

  • configuration_space (ConfigSpace.ConfigurationSpace) – Configuration Space

  • n_trials (int) – Number of trials

  • name (str) – Name of run

  • algorithm_name (str) – Algorithm name

  • outdir (path-like) – Output directory

  • fold (Union[KFold, StratifiedKFold], optional) – Outer fold, defaults to KFold()

  • inner_fold (int, optional) – Inner fold, defaults to 3

  • n_workers (int, optional) – Number of workers, defaults to 1

  • verbosity (int, optional) – Level of verbosity, defaults to 0

Returns:

Dataframe with metrics on different levels

Return type:

tuple[pandas.DataFrame]

skripts.ML.ML4com.plot_cv_confmat(ys, target_labels, accuracies, confusion_matrices, outdir, name)[source]

Plot heatmap of confusion matrix

Parameters:
  • ys (dataframe-like) – Targets

  • target_labels (array-like) – Target labels

  • accuracies (dataframe-like) – Accuracies

  • confusion_matrices (dataframe-like) – Confusion matrices

  • outdir (path-like) – Output directory

  • name (str) – Name of run

skripts.ML.ML4com.plot_decision_trees(model, feature_names, class_names, outdir, name)[source]

Plot decision trees of model

Parameters:
  • model (model-lie) – Model

  • feature_names (array-like) – Feature names

  • class_names (array-like) – Class names

  • outdir (path-like) – Output directory

  • name (str) – Name of run

skripts.ML.ML4com.plot_metrics_df(metrics_df, organism_metrics_df, overall_metrics_df, algorithm_name, outdir, show=False)[source]

Plot the extracted metrics as a heatmap and ROC AUC curve

Parameters:
  • metrics_df (pandas.DataFrame) – Metrics dataframe

  • organism_metrics_df (pandas.DataFrame) – Metrics dataframe on organism level

  • overall_metrics_df (pandas.DataFrame) – Metrics dataframe on overall level

  • algorithm_name (str) – Algorithm name

  • outdir (path-like) – Output directory

  • show (bool, optional) – Show plot, defaults to False

skripts.ML.ML4com.tune_classifier(X, y, classifier, cv, configuration_space: ConfigurationSpace, n_workers: int, n_trials: int, name: str, algorithm_name: str, outdir, verbosity: int = 0)[source]

Perform hyperparameter tuning on an Sklearn classifier.

Parameters:
  • X (dataframe-like) – Data values

  • ys (dataframe-like) – True classes for one sample

  • classifier (Classifier from sklearn) – Classifier

  • cv (cv-scheme) – Cross-validation scheme

  • configuration_space (ConfigSpace.ConfigurationSpace) – Configuration Space

  • n_workers (int) – Number of workers to work in parallel

  • n_trials (int) – Number of trials

  • name (str) – Name of run

  • algorithm_name (str) – Algorithm name

  • outdir (path-like) – Output directory

  • verbosity (int, optional) – Level of verbosity, defaults to 0

Returns:

Incumbent

Return type:

configs (list, single config)

skripts.ML.ML4com.tune_train_model_sklearn(X, ys, labels, classifier, configuration_space, n_workers, n_trials, source: str, name, algorithm_name, outdir, fold: KFold | StratifiedKFold = KFold(n_splits=5, random_state=None, shuffle=False), verbosity=0)[source]

Tune and train a model in sklearn.

Parameters:
  • X (dataframe-like) – Data values

  • ys (dataframe-like) – True classes for one sample

  • labels (dataframe-like) – Labels

  • classifier (Classifier from sklearn) – Classifier

  • configuration_space (ConfigSpace.ConfigurationSpace) – Configuration Space

  • n_workers (int) – Number of workers

  • n_trials (int) – Number of trials

  • source (str) – Source of data

  • name (str) – Name of run

  • algorithm_name (str) – Algorithm name

  • outdir (path-like) – Output directory

  • fold (Union[KFold, StratifiedKFold], optional) – Fold for cross validation during tuning, defaults to KFold()

  • verbosity (int, optional) – Level of verbosity, defaults to 0

skripts.ML.ML4com_run module

Module contents