rna_code.data.feature_selection package¶
Submodules¶
rna_code.data.feature_selection.base_feature_selector module¶
Abstract class for feature selection
- class rna_code.data.feature_selection.base_feature_selector.BaseFeatureSelector(threshold: float | None = None, n_features: int | None = None)¶
Bases:
ABCBase Abstract class for FeatureSelectors
- Parameters:
threshold (float | None, optional) – Selection threshold for given task, by default None
n_features (int | None, optional) – Number of features to select for given task, by default None
- abstract select_features(data_array: ndarray, **kwargs) ndarray¶
Select features from data array according to self.threshold.
- Parameters:
data_array (np.ndarray) – Data to select features from.
- Returns:
Filtered data.
- Return type:
np.ndarray
rna_code.data.feature_selection.expression_selector module¶
Expression level based selection module
- class rna_code.data.feature_selection.expression_selector.ExpressionSelector(threshold: float | None = None, n_features: int | None = None)¶
Bases:
BaseFeatureSelectorFeature selection based on expression threshold
- Parameters:
threshold (float | None, optional) – Expression threshold, by default None
n_features (int | None, optional) – Number of features to select for given task, by default None
- select_features(data_array) ndarray¶
Selects features based on gene expression levels.
- Parameters:
data_array (numpy.ndarray) – The dataset to process.
threshold (float) – The threshold for feature selection.
- Returns:
A list of boolean values indicating selected features.
- Return type:
list
rna_code.data.feature_selection.laplacian_selector module¶
Module for Laplacian Score based feature selection
- class rna_code.data.feature_selection.laplacian_selector.LaplacianSelector(threshold: float | None = None, n_features: int | None = None, k: int = 5)¶
Bases:
BaseFeatureSelectorFeature selection based on Laplacian score.
- Parameters:
threshold (float | None, optional) – Laplacian score threshold, by default None
n_features (int | None, optional) – Number of features to select for given task, by default None
k (int, optional) – Number of neighbors for the kNN algorithm, by default 5
- laplacian_score(X)¶
Computes the Laplacian Score for each feature of the dataset.
- Parameters:
X (numpy.ndarray) – The dataset (samples x features).
k (int) – Number of neighbors for the KNN graph.
- Returns:
Array of Laplacian scores for each feature.
- Return type:
numpy.ndarray
- select_features(data_array)¶
Selects features based on Laplacian Score.
- Parameters:
data_array (numpy.ndarray) – The dataset to process.
threshold (float) – The threshold for feature selection.
k (int) – Number of neighbors for the KNN graph.
verbose (int) – Controls the verbosity of the function.
- Returns:
A list of boolean values indicating selected features.
- Return type:
list
rna_code.data.feature_selection.lasso_selector module¶
Module for Lasso Regression based feature selection
- class rna_code.data.feature_selection.lasso_selector.LassoSelector(labels: list | Series, threshold: float = 0, sgdc_params: dict | None = None, class_balancing: Literal['match_smaller_sample', 'balanced', None] = None)¶
Bases:
BaseFeatureSelectorFeature selection based on Lasso Regression Coefficient.
- Parameters:
labels (list) – List of labels for regression
threshold (float) – Lasso coefficient threshold for selection, by default 0
n_features (int | None, optional) – Number of features to select for given task, by default None
sgdc_params (dict | None, optional) – Parameters for stochastic gradient descent, by default None
class_balancing (Literal["match_smaller_sample", "balanced", None], optional) – How to solve class imbalance, by default None
- select_features(data_array)¶
Selects features using LASSO regression.
- Parameters:
data_array (numpy.ndarray) – The dataset to process.
labels (numpy.ndarray) – The labels associated with the data.
sgdc_params (dict, optional) – Parameters for the SGDClassifier.
class_balancing (str, optional) – Method for class balancing.
- Returns:
A list of boolean values indicating selected features.
- Return type:
list
rna_code.data.feature_selection.mad_selector module¶
Module for Mean absolute deviation based feature selection
- class rna_code.data.feature_selection.mad_selector.MADSelector(threshold: float, ceiling: int = 150, n_features: int | None = None)¶
Bases:
BaseFeatureSelectorFeature selection based on Mean Absolute Deviation threshold
- Parameters:
threshold (float) – Minimum threshold for variables, by default None
n_features (int | None, optional) – Number of features to select for given task, by default None
ceiling (int, optional) – Maximum value to prevent outliers, by default 150
- select_features(data_array: ndarray) list¶
Selects features based on Median Absolute Deviation (MAD).
- Parameters:
data_array (numpy.ndarray) – The dataset to process.
- Returns:
A list of boolean values indicating selected features.
- Return type:
list