rna_code.data.interface package¶

Submodules¶

rna_code.data.interface.BRCA_interface module¶

Interface with the BRCA dataset.

class rna_code.data.interface.BRCA_interface.BRCAInterface(data_path: Path = PosixPath('/home/runner/work/biosequence_encoding/biosequence_encoding/rna_code/../data/BRCA'), metadata_path: Path = PosixPath('/home/runner/work/biosequence_encoding/biosequence_encoding/rna_code/../data/BRCA/metadata.cart.2023-09-22.json'))¶

Bases: BaseInterface

Interface wth app and file system for the BRCA dataset.

Parameters:

data_path (Path, optional) – Path of the directory containing the data, by default BRCA_DATA_PATH
metadata_path (Path, optional) – Path of the metadata file, by default BRCA_METADATA_FILE

property entry_names: list[str]¶

Get entries names

Returns:: List containing the name for each observation
Return type:: list[str]

find_subtypes()¶: Find subtypes associated with each observation based on subtype file.

load_patients()¶: Load patients based on pre computed entries

setup()¶: Perform all necessary steps to provide with a dataset.

rna_code.data.interface.base_interface module¶

Base class for interfacing app with file system.

class rna_code.data.interface.base_interface.BaseInterface(data_path: Path, metadata_path: Path)¶

Bases: ABC

Base class for interfacing app with file system

Parameters:

data_path (Path) – Data path
metadata_path (Path) – Metadata file path

static get_gene_names_from_file(filename: str, header: int = 0, skiprows: List[int] | None = None) → DataFrame¶

Retrieve a list of gene names from a specified file.

Parameters:

filename (str) – Path to the file from which to read the names.
header (int, optional) – Row number to use as the header (column names). Defaults to 0.
skiprows (list of int, optional) – Rows to skip at the start of the file.

Returns:

A DataFrame containing the names from the file.

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If the specified file does not exist.
ParserError – If there is an error in parsing the file.

static load_patient_data(filename: str, header: int = 0) → Series¶

Load patient data from a specified file.

Parameters:

filename (str) – Path to the data file.
header (int, optional) – Row number to use as the header. Defaults to 0.

Returns:

A pandas Series containing TPM values from the file.

Return type:

pd.Series

Raises:

FileNotFoundError – If the specified file does not exist.

static retrieve_position(names, drop_na=False)¶

Retrieve genomic positions for a list of gene names.

Parameters:

names (pd.DataFrame) – DataFrame containing gene names.
drop_na (bool, optional) – Flag to drop NA values. Defaults to False.
verbose (int, optional) – Verbosity level.

Returns:

DataFrame with retrieved genomic positions and symbols.

Return type:

pd.DataFrame

rna_code.data.interface.cptac_3_interface module¶

Interface with the CPTAC-3 dataset.

class rna_code.data.interface.cptac_3_interface.CPTAC3Interface(data_path: Path = PosixPath('/home/runner/work/biosequence_encoding/biosequence_encoding/rna_code/../data/CPTAC-3'), metadata_path: Path = PosixPath('/home/runner/work/biosequence_encoding/biosequence_encoding/rna_code/../data/CPTAC-3/metadata.repository.2024-11-07.json'))¶

Bases: BaseInterface

Interface wth app and file system for the CPTAC-3 dataset.

Parameters:

data_path (Path, optional) – Path of the directory containing the data, by default CPTAC_3_DATA_PATH
metadata_path (Path, optional) – Path of the metadata file, by default CPTAC_3_METADATA_FILE

property entry_names: list[str]¶

Get entries names

Returns:: List containing the name for each observation
Return type:: list[str]

find_subtypes()¶: Find subtypes associated with each observation based on subtype file.

load_patients()¶: Load patients based on pre computed entries

setup()¶: Perform all necessary steps to provide with a dataset.

rna_code.data.interface package¶

Submodules¶

rna_code.data.interface.BRCA_interface module¶

rna_code.data.interface.base_interface module¶

rna_code.data.interface.cptac_3_interface module¶

Module contents¶