rna_code.data.feature_selection package

Submodules

rna_code.data.feature_selection.base_feature_selector module

Abstract class for feature selection

class rna_code.data.feature_selection.base_feature_selector.BaseFeatureSelector(threshold: float | None = None, n_features: int | None = None)

Bases: ABC

Base Abstract class for FeatureSelectors

Parameters:
  • threshold (float | None, optional) – Selection threshold for given task, by default None

  • n_features (int | None, optional) – Number of features to select for given task, by default None

abstract select_features(data_array: ndarray, **kwargs) ndarray

Select features from data array according to self.threshold.

Parameters:

data_array (np.ndarray) – Data to select features from.

Returns:

Filtered data.

Return type:

np.ndarray

rna_code.data.feature_selection.expression_selector module

Expression level based selection module

class rna_code.data.feature_selection.expression_selector.ExpressionSelector(threshold: float | None = None, n_features: int | None = None)

Bases: BaseFeatureSelector

Feature selection based on expression threshold

Parameters:
  • threshold (float | None, optional) – Expression threshold, by default None

  • n_features (int | None, optional) – Number of features to select for given task, by default None

select_features(data_array) ndarray

Selects features based on gene expression levels.

Parameters:
  • data_array (numpy.ndarray) – The dataset to process.

  • threshold (float) – The threshold for feature selection.

Returns:

A list of boolean values indicating selected features.

Return type:

list

rna_code.data.feature_selection.laplacian_selector module

Module for Laplacian Score based feature selection

class rna_code.data.feature_selection.laplacian_selector.LaplacianSelector(threshold: float | None = None, n_features: int | None = None, k: int = 5)

Bases: BaseFeatureSelector

Feature selection based on Laplacian score.

Parameters:
  • threshold (float | None, optional) – Laplacian score threshold, by default None

  • n_features (int | None, optional) – Number of features to select for given task, by default None

  • k (int, optional) – Number of neighbors for the kNN algorithm, by default 5

laplacian_score(X)

Computes the Laplacian Score for each feature of the dataset.

Parameters:
  • X (numpy.ndarray) – The dataset (samples x features).

  • k (int) – Number of neighbors for the KNN graph.

Returns:

Array of Laplacian scores for each feature.

Return type:

numpy.ndarray

select_features(data_array)

Selects features based on Laplacian Score.

Parameters:
  • data_array (numpy.ndarray) – The dataset to process.

  • threshold (float) – The threshold for feature selection.

  • k (int) – Number of neighbors for the KNN graph.

  • verbose (int) – Controls the verbosity of the function.

Returns:

A list of boolean values indicating selected features.

Return type:

list

rna_code.data.feature_selection.lasso_selector module

Module for Lasso Regression based feature selection

class rna_code.data.feature_selection.lasso_selector.LassoSelector(labels: list | Series, threshold: float = 0, sgdc_params: dict | None = None, class_balancing: Literal['match_smaller_sample', 'balanced', None] = None)

Bases: BaseFeatureSelector

Feature selection based on Lasso Regression Coefficient.

Parameters:
  • labels (list) – List of labels for regression

  • threshold (float) – Lasso coefficient threshold for selection, by default 0

  • n_features (int | None, optional) – Number of features to select for given task, by default None

  • sgdc_params (dict | None, optional) – Parameters for stochastic gradient descent, by default None

  • class_balancing (Literal["match_smaller_sample", "balanced", None], optional) – How to solve class imbalance, by default None

select_features(data_array)

Selects features using LASSO regression.

Parameters:
  • data_array (numpy.ndarray) – The dataset to process.

  • labels (numpy.ndarray) – The labels associated with the data.

  • sgdc_params (dict, optional) – Parameters for the SGDClassifier.

  • class_balancing (str, optional) – Method for class balancing.

Returns:

A list of boolean values indicating selected features.

Return type:

list

rna_code.data.feature_selection.mad_selector module

Module for Mean absolute deviation based feature selection

class rna_code.data.feature_selection.mad_selector.MADSelector(threshold: float, ceiling: int = 150, n_features: int | None = None)

Bases: BaseFeatureSelector

Feature selection based on Mean Absolute Deviation threshold

Parameters:
  • threshold (float) – Minimum threshold for variables, by default None

  • n_features (int | None, optional) – Number of features to select for given task, by default None

  • ceiling (int, optional) – Maximum value to prevent outliers, by default 150

select_features(data_array: ndarray) list

Selects features based on Median Absolute Deviation (MAD).

Parameters:

data_array (numpy.ndarray) – The dataset to process.

Returns:

A list of boolean values indicating selected features.

Return type:

list

Module contents