rna_code.data.feature_selection package¶

Submodules¶

rna_code.data.feature_selection.base_feature_selector module¶

Abstract class for feature selection

class rna_code.data.feature_selection.base_feature_selector.BaseFeatureSelector(threshold: float | None = None, n_features: int | None = None)¶

Bases: ABC

Base Abstract class for FeatureSelectors

Parameters:

threshold (float | None, optional) – Selection threshold for given task, by default None
n_features (int | None, optional) – Number of features to select for given task, by default None

abstract select_features(data_array: ndarray, **kwargs) → ndarray¶

Select features from data array according to self.threshold.

Parameters:: data_array (np.ndarray) – Data to select features from.
Returns:: Filtered data.
Return type:: np.ndarray

rna_code.data.feature_selection.expression_selector module¶

Expression level based selection module

class rna_code.data.feature_selection.expression_selector.ExpressionSelector(threshold: float | None = None, n_features: int | None = None)¶

Bases: BaseFeatureSelector

Feature selection based on expression threshold

Parameters:

threshold (float | None, optional) – Expression threshold, by default None
n_features (int | None, optional) – Number of features to select for given task, by default None

select_features(data_array) → ndarray¶

Selects features based on gene expression levels.

Parameters:

data_array (numpy.ndarray) – The dataset to process.
threshold (float) – The threshold for feature selection.

Returns:

A list of boolean values indicating selected features.

Return type:

list

rna_code.data.feature_selection.laplacian_selector module¶

Module for Laplacian Score based feature selection

class rna_code.data.feature_selection.laplacian_selector.LaplacianSelector(threshold: float | None = None, n_features: int | None = None, k: int = 5)¶

Bases: BaseFeatureSelector

Feature selection based on Laplacian score.

Parameters:

threshold (float | None, optional) – Laplacian score threshold, by default None
n_features (int | None, optional) – Number of features to select for given task, by default None
k (int, optional) – Number of neighbors for the kNN algorithm, by default 5

laplacian_score(X)¶

Computes the Laplacian Score for each feature of the dataset.

Parameters:

X (numpy.ndarray) – The dataset (samples x features).
k (int) – Number of neighbors for the KNN graph.

Returns:

Array of Laplacian scores for each feature.

Return type:

numpy.ndarray

select_features(data_array)¶

Selects features based on Laplacian Score.

Parameters:

data_array (numpy.ndarray) – The dataset to process.
threshold (float) – The threshold for feature selection.
k (int) – Number of neighbors for the KNN graph.
verbose (int) – Controls the verbosity of the function.

Returns:

A list of boolean values indicating selected features.

Return type:

list

rna_code.data.feature_selection.lasso_selector module¶

Module for Lasso Regression based feature selection

class rna_code.data.feature_selection.lasso_selector.LassoSelector(labels: list | Series, threshold: float = 0, sgdc_params: dict | None = None, class_balancing: Literal['match_smaller_sample', 'balanced', None] = None)¶

Bases: BaseFeatureSelector

Feature selection based on Lasso Regression Coefficient.

Parameters:

labels (list) – List of labels for regression
threshold (float) – Lasso coefficient threshold for selection, by default 0
n_features (int | None, optional) – Number of features to select for given task, by default None
sgdc_params (dict | None, optional) – Parameters for stochastic gradient descent, by default None
class_balancing (Literal["match_smaller_sample", "balanced", None], optional) – How to solve class imbalance, by default None

select_features(data_array)¶

Selects features using LASSO regression.

Parameters:

data_array (numpy.ndarray) – The dataset to process.
labels (numpy.ndarray) – The labels associated with the data.
sgdc_params (dict, optional) – Parameters for the SGDClassifier.
class_balancing (str, optional) – Method for class balancing.

Returns:

A list of boolean values indicating selected features.

Return type:

list

rna_code.data.feature_selection.mad_selector module¶

Module for Mean absolute deviation based feature selection

class rna_code.data.feature_selection.mad_selector.MADSelector(threshold: float, ceiling: int = 150, n_features: int | None = None)¶

Bases: BaseFeatureSelector

Feature selection based on Mean Absolute Deviation threshold

Parameters:

threshold (float) – Minimum threshold for variables, by default None
n_features (int | None, optional) – Number of features to select for given task, by default None
ceiling (int, optional) – Maximum value to prevent outliers, by default 150

select_features(data_array: ndarray) → list¶

Selects features based on Median Absolute Deviation (MAD).

Parameters:: data_array (numpy.ndarray) – The dataset to process.
Returns:: A list of boolean values indicating selected features.
Return type:: list

rna_code.data.feature_selection package¶

Submodules¶

rna_code.data.feature_selection.base_feature_selector module¶

rna_code.data.feature_selection.expression_selector module¶

rna_code.data.feature_selection.laplacian_selector module¶

rna_code.data.feature_selection.lasso_selector module¶

rna_code.data.feature_selection.mad_selector module¶

Module contents¶