在熊猫管道中分配

Gre*_*urm 8 python pandas

比方说,我有以下DataFrame与原始输入数据,并希望使用一系列pandas函数(" 管道 ")来处理它.特别是,我想重命名和删除列,并根据另一列添加其他列.

    Gene stable ID  Gene name   Gene type   miRBase accession   miRBase ID
0   ENSG00000274494 MIR6832     miRNA       MI0022677           hsa-mir-6832
1   ENSG00000283386 MIR4659B    miRNA       MI0017291           hsa-mir-4659b
2   ENSG00000221456 MIR1202     miRNA       MI0006334           hsa-mir-1202
3   ENSG00000199102 MIR302C     miRNA       MI0000773           hsa-mir-302c
Run Code Online (Sandbox Code Playgroud)

目前我做了以下(有效):

tmp_df = df.\
         drop("Gene type", axis=1).\
         rename(columns = {
            "Gene stable ID": "ENSG",
            "Gene name": "gene_name",
            "miRBase accession": "MI",
            "miRBase ID": "mirna_name"
         })

result = tmp_df.assign(species = tmp_df.mirna_name.str[:3])
Run Code Online (Sandbox Code Playgroud)

结果:

    ENSG            gene_name   MI          mirna_name      species
0   ENSG00000274494 MIR6832     MI0022677   hsa-mir-6832    hsa
1   ENSG00000283386 MIR4659B    MI0017291   hsa-mir-4659b   hsa
2   ENSG00000221456 MIR1202     MI0006334   hsa-mir-1202    hsa
3   ENSG00000199102 MIR302C     MI0000773   hsa-mir-302c    hsa
Run Code Online (Sandbox Code Playgroud)

是否可以将assign命令直接放入"管道"?分配一个额外的临时变量感觉很麻烦.在这种情况下,我不知道如何引用相应的重命名列('mirna_name').

All*_*len 5

您可以使用管道:

tmp_df = df.\
         drop("Gene type", axis=1).\
         rename(columns = {
            "Gene stable ID": "ENSG",
            "Gene name": "gene_name",
            "miRBase accession": "MI",
            "miRBase ID": "mirna_name"
         }).\
         pipe(lambda x: x.assign(species = x.mirna_name.str[:3]))

tmp_df
Out[365]: 
              ENSG gene_name         MI     mirna_name species
0  ENSG00000274494   MIR6832  MI0022677   hsa-mir-6832     hsa
1  ENSG00000283386  MIR4659B  MI0017291  hsa-mir-4659b     hsa
2  ENSG00000221456   MIR1202  MI0006334   hsa-mir-1202     hsa
3  ENSG00000199102   MIR302C  MI0000773   hsa-mir-302c     hsa
Run Code Online (Sandbox Code Playgroud)

正如@Tom指出的,在这种情况下,也可以不使用管道来完成此操作:

df.\
         drop("Gene type", axis=1).\
         rename(columns = {
            "Gene stable ID": "ENSG",
            "Gene name": "gene_name",
            "miRBase accession": "MI",
            "miRBase ID": "mirna_name"
         }).\
         assign(species = lambda x: x.mirna_name.str[:3])
Run Code Online (Sandbox Code Playgroud)

  • 这里不需要`.pipe`。您可以将lambda放入分配中,例如`.assign(species = lambda x:x.mirna_name.str [:3])` (2认同)