rna_code.data.interface package¶
Submodules¶
rna_code.data.interface.BRCA_interface module¶
Interface with the BRCA dataset.
- class rna_code.data.interface.BRCA_interface.BRCAInterface(data_path: Path = PosixPath('/home/runner/work/biosequence_encoding/biosequence_encoding/rna_code/../data/BRCA'), metadata_path: Path = PosixPath('/home/runner/work/biosequence_encoding/biosequence_encoding/rna_code/../data/BRCA/metadata.cart.2023-09-22.json'))¶
Bases:
BaseInterfaceInterface wth app and file system for the BRCA dataset.
- Parameters:
data_path (Path, optional) – Path of the directory containing the data, by default BRCA_DATA_PATH
metadata_path (Path, optional) – Path of the metadata file, by default BRCA_METADATA_FILE
- property entry_names: list[str]¶
Get entries names
- Returns:
List containing the name for each observation
- Return type:
list[str]
- find_subtypes()¶
Find subtypes associated with each observation based on subtype file.
- load_patients()¶
Load patients based on pre computed entries
- setup()¶
Perform all necessary steps to provide with a dataset.
rna_code.data.interface.base_interface module¶
Base class for interfacing app with file system.
- class rna_code.data.interface.base_interface.BaseInterface(data_path: Path, metadata_path: Path)¶
Bases:
ABCBase class for interfacing app with file system
- Parameters:
data_path (Path) – Data path
metadata_path (Path) – Metadata file path
- static get_gene_names_from_file(filename: str, header: int = 0, skiprows: List[int] | None = None) DataFrame¶
Retrieve a list of gene names from a specified file.
- Parameters:
filename (str) – Path to the file from which to read the names.
header (int, optional) – Row number to use as the header (column names). Defaults to 0.
skiprows (list of int, optional) – Rows to skip at the start of the file.
- Returns:
A DataFrame containing the names from the file.
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If the specified file does not exist.
ParserError – If there is an error in parsing the file.
- static load_patient_data(filename: str, header: int = 0) Series¶
Load patient data from a specified file.
- Parameters:
filename (str) – Path to the data file.
header (int, optional) – Row number to use as the header. Defaults to 0.
- Returns:
A pandas Series containing TPM values from the file.
- Return type:
pd.Series
- Raises:
FileNotFoundError – If the specified file does not exist.
- static retrieve_position(names, drop_na=False)¶
Retrieve genomic positions for a list of gene names.
- Parameters:
names (pd.DataFrame) – DataFrame containing gene names.
drop_na (bool, optional) – Flag to drop NA values. Defaults to False.
verbose (int, optional) – Verbosity level.
- Returns:
DataFrame with retrieved genomic positions and symbols.
- Return type:
pd.DataFrame
rna_code.data.interface.cptac_3_interface module¶
Interface with the CPTAC-3 dataset.
- class rna_code.data.interface.cptac_3_interface.CPTAC3Interface(data_path: Path = PosixPath('/home/runner/work/biosequence_encoding/biosequence_encoding/rna_code/../data/CPTAC-3'), metadata_path: Path = PosixPath('/home/runner/work/biosequence_encoding/biosequence_encoding/rna_code/../data/CPTAC-3/metadata.repository.2024-11-07.json'))¶
Bases:
BaseInterfaceInterface wth app and file system for the CPTAC-3 dataset.
- Parameters:
data_path (Path, optional) – Path of the directory containing the data, by default CPTAC_3_DATA_PATH
metadata_path (Path, optional) – Path of the metadata file, by default CPTAC_3_METADATA_FILE
- property entry_names: list[str]¶
Get entries names
- Returns:
List containing the name for each observation
- Return type:
list[str]
- find_subtypes()¶
Find subtypes associated with each observation based on subtype file.
- load_patients()¶
Load patients based on pre computed entries
- setup()¶
Perform all necessary steps to provide with a dataset.