skripts.ML package

Submodules

skripts.ML.ML4com module

Methods for Machine Learning for community inference.

class skripts.ML.ML4com.SKL_Classifier(X, ys, cv, configuration_space: ConfigurationSpace, classifier, n_trials: int)[source]

Bases: object

Representation of Scikit-learn classifier for SMAC3

train(config: Configuration, seed: int = 0) → float64[source]

Train the classifier with given configuration.

Parameters:

config (ConfigSpace.Configuration) – Configuration
seed (int, optional) – Seed value to control randomness, defaults to 0

Returns:

Loss (1-accuracy)

Return type:

np.float64

skripts.ML.ML4com.dict_permutations(dictionary: dict) → List[dict][source]

Combine all value combinations in a dictionary into a list of dictionaries.

Parameters:: dictionary (dict) – Input dictionary
Returns:: All permutations of dictionary
Return type:: List[dict]

skripts.ML.ML4com.evaluate_model_sklearn(X, ys, labels, indir, data_source, algorithm_name, outdir, verbosity=0)[source]

Evaluate a model against the given hyperparameters for all organisms and save the resulting metrics and feature importances.

Parameters:

X (dataframe-like) – Data values
ys (dataframe-like) – True classes for one sample
labels (dataframe-like) – Labels
indir (path-like) – Input directory
data_source (str) – Data source
algorithm_name (str) – Algorithm name
outdir (path-like) – Output directory
verbosity (int, optional) – Level of verbosity, defaults to 0

skripts.ML.ML4com.extract_best_hyperparameters_from_incumbent(incumbent, configuration_space: ConfigurationSpace)[source]

Extract a optimized set of hyperparameters from incumbent. Returns default if none was found.

Parameters:

incumbent (config(s)) – _description_
configuration_space (ConfigSpace.ConfigurationSpace) – Configuration Space

Returns:

Best hyperparameters

Return type:

config

skripts.ML.ML4com.extract_metrics(true_labels, prediction, scoring, run_label=None, cv_i=None, metrics_df: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [Run, Cross-Validation run, Accuracy, AUC, TPR, FPR, Threshold, Conf_Mat] Index: []) → DataFrame[source]

Extract metrics from machine learning. Metrics are TPR, FPR, Thresholds for ROC curve + AUC and accuracy with confusion matrix.

Parameters:

true_labels (array-like) – True labels
prediction (array-like) – Prediction
scoring (array-like) – Scoring
run_label (any, optional) – Label of run, defaults to None
cv_i (any (usually int), optional) – cross-validation instance, defaults to None
metrics_df – Dataframe with all extracted metrics, defaults to

pd.DataFrame(columns=[“Run”, “Cross-Validation run”, “Accuracy”, “AUC”, “TPR”, “FPR”, “Threshold”, “Conf_Mat”]) :type metrics_df: pd.DataFrame, optional :return: Dataframe, filled with metrics :rtype: pandas.DataFrame

skripts.ML.ML4com.individual_layers_to_tuple(config) → dict[source]

Change hidden_layer_sizes to tuple representation.

Parameters:: config (dict-like) – Configuration
Returns:: Config with changed hidden_layer_sizes as tuples
Return type:: dict

skripts.ML.ML4com.intersect_impute_on_left(df_base: DataFrame, df_right: DataFrame, imputation: str = 'zero') → DataFrame[source]

Use all indices from the left (df_base) DataFrame and fill it with the intersection of df_right. Indices without match are imputed by zero, mean or via kNN, whereas k can be specified as a number. The default is 5.

Parameters:

df_base (pandas.DataFrame) – Left datframe
df_right (pandas.DataFrame) – Right dataframe
imputation (str, optional) – Imputation procedure [zero|mean|kNN], defaults to “zero”

Returns:

Merged dataframe with imputed values

Return type:

pandas.DataFrame

skripts.ML.ML4com.join_df_metNames(df: DataFrame, grouper='peakID', include_mass=False) → DataFrame[source]

Join dataframe column metNames along grouper column. Sets common index for combination of positively and negatively charged dataframes along their metabolite Names

Parameters:

df (pandas.Dataframe) – Input dataframe
grouper (str, optional) – Grouper column, defaults to “peakID”
include_mass (bool, optional) – Include mass column, defaults to False

Returns:

Combined datafraame

Return type:

pandas.Dataframe

skripts.ML.ML4com.nested_cross_validate_model_sklearn(X, ys, labels, classifier, configuration_space, n_trials, name, algorithm_name, outdir, fold: KFold | StratifiedKFold = KFold(n_splits=5, random_state=None, shuffle=False), inner_fold: int = 3, n_workers: int = 1, verbosity: int = 0)[source]

Cross-validate a model against the given hyperparameters for all organisms in a nested manner.

Parameters:

X (dataframe-like) – Data values
ys (dataframe-like) – True classes for one sample
labels (dataframe-like) – Labels
classifier (Classifier from sklearn) – Classifier
configuration_space (ConfigSpace.ConfigurationSpace) – Configuration Space
n_trials (int) – Number of trials
name (str) – Name of run
algorithm_name (str) – Algorithm name
outdir (path-like) – Output directory
fold (Union[KFold, StratifiedKFold], optional) – Outer fold, defaults to KFold()
inner_fold (int, optional) – Inner fold, defaults to 3
n_workers (int, optional) – Number of workers, defaults to 1
verbosity (int, optional) – Level of verbosity, defaults to 0

Returns:

Dataframe with metrics on different levels

Return type:

tuple[pandas.DataFrame]

skripts.ML.ML4com.plot_cv_confmat(ys, target_labels, accuracies, confusion_matrices, outdir, name)[source]

Plot heatmap of confusion matrix

Parameters:

ys (dataframe-like) – Targets
target_labels (array-like) – Target labels
accuracies (dataframe-like) – Accuracies
confusion_matrices (dataframe-like) – Confusion matrices
outdir (path-like) – Output directory
name (str) – Name of run

skripts.ML.ML4com.plot_decision_trees(model, feature_names, class_names, outdir, name)[source]

Plot decision trees of model

Parameters:

model (model-lie) – Model
feature_names (array-like) – Feature names
class_names (array-like) – Class names
outdir (path-like) – Output directory
name (str) – Name of run

skripts.ML.ML4com.plot_metrics_df(metrics_df, organism_metrics_df, overall_metrics_df, algorithm_name, outdir, show=False)[source]

Plot the extracted metrics as a heatmap and ROC AUC curve

Parameters:

metrics_df (pandas.DataFrame) – Metrics dataframe
organism_metrics_df (pandas.DataFrame) – Metrics dataframe on organism level
overall_metrics_df (pandas.DataFrame) – Metrics dataframe on overall level
algorithm_name (str) – Algorithm name
outdir (path-like) – Output directory
show (bool, optional) – Show plot, defaults to False

skripts.ML.ML4com.tune_classifier(X, y, classifier, cv, configuration_space: ConfigurationSpace, n_workers: int, n_trials: int, name: str, algorithm_name: str, outdir, verbosity: int = 0)[source]

Perform hyperparameter tuning on an Sklearn classifier.

Parameters:

X (dataframe-like) – Data values
ys (dataframe-like) – True classes for one sample
classifier (Classifier from sklearn) – Classifier
cv (cv-scheme) – Cross-validation scheme
configuration_space (ConfigSpace.ConfigurationSpace) – Configuration Space
n_workers (int) – Number of workers to work in parallel
n_trials (int) – Number of trials
name (str) – Name of run
algorithm_name (str) – Algorithm name
outdir (path-like) – Output directory
verbosity (int, optional) – Level of verbosity, defaults to 0

Returns:

Incumbent

Return type:

configs (list, single config)

skripts.ML.ML4com.tune_train_model_sklearn(X, ys, labels, classifier, configuration_space, n_workers, n_trials, source: str, name, algorithm_name, outdir, fold: KFold | StratifiedKFold = KFold(n_splits=5, random_state=None, shuffle=False), verbosity=0)[source]

Tune and train a model in sklearn.

Parameters:

X (dataframe-like) – Data values
ys (dataframe-like) – True classes for one sample
labels (dataframe-like) – Labels
classifier (Classifier from sklearn) – Classifier
configuration_space (ConfigSpace.ConfigurationSpace) – Configuration Space
n_workers (int) – Number of workers
n_trials (int) – Number of trials
source (str) – Source of data
name (str) – Name of run
algorithm_name (str) – Algorithm name
outdir (path-like) – Output directory
fold (Union[KFold, StratifiedKFold], optional) – Fold for cross validation during tuning, defaults to KFold()
verbosity (int, optional) – Level of verbosity, defaults to 0

skripts.ML package

Submodules

skripts.ML.ML4com module

skripts.ML.ML4com_run module

Module contents