dotools_py.utility.add_gene_metadata#
- dotools_py.utility.add_gene_metadata(data, gene_key, species='mouse', add_gene_id=False)[source]#
Add gene metadata to AnnData or DataFrame.
Add gene metadata obtained from the GTF or Uniprot-database. This information includes, the gene biotype (e.g., protein-coding, lncRNA, etc.); the ENSEMBL gene ID and the subcellular location.
- Parameters:
- data
AnnData|DataFrame Annotated data matrix or pandas dataframe with for example results from differential gene expression analysis.
- gene_key
str name of the key with gene names. If an AnnData is provided the .var name column name with gene names. If the gene names are in
var_names, specifyvar_names.- species
Literal['mouse','human'] (default:'mouse') the input species.
- add_gene_id
bool(default:False) Add gene id (ENSEMBL ID) information.
- data
- Return type:
- Returns:
Returns a dataframe or AnnData object. Three new columns will be set:
biotype,locationsandgene_id.
Examples
>>> import dotools_py as do >>> # AnnData Input >>> adata = do.dt.example_10x_processed() >>> adata = add_gene_metadata(adata, "var_names", "human") >>> adata.var[["biotype", "gene_id", "locations"]].head(5) biotype gene_id locations ATP2A1-AS1 lncRNA ENSG00000260442 Unreview status Uniprot STK17A protein_coding ENSG00000164543 nucleus C19orf18 protein_coding ENSG00000177025 membrane TPP2 protein_coding ENSG00000134900 nucleus,cytoplasm MFSD1 protein_coding ENSG00000118855 membrane,cytoplasm >>> >>> # Dataframe Input >>> df = pd.DataFrame(["Acta2", "Tagln", "Ptprc", "Vcam1"], columns=["genes"]) >>> df = add_gene_metadata(df, "genes") >>> df.head() genes biotype locations gene_id 0 Acta2 protein_coding cytoplasm ENSMUSG00000035783 1 Tagln protein_coding cytoplasm ENSMUSG00000032085 2 Ptprc protein_coding membrane ENSMUSG00000026395 3 Vcam1 protein_coding secreted,membrane ENSMUSG00000027962