我有一个字典,里面填充了我导入的两个文件的数据,但有些数据是以nan形式出现的.如何使用nan删除数据?
我的代码是:
import matplotlib.pyplot as plt
from pandas.lib import Timestamp
import numpy as np
from datetime import datetime
import pandas as pd
import collections
orangebook = pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\products2.txt',sep='~', parse_dates=['Approval_Date'])
specificdrugs=pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\Drugs.txt',sep=',')
"""This is a dictionary that collects data from the .txt file
This dictionary has a key,value pair for every generic name with its corresponding approval date """
drugdict={}
for d in specificdrugs['Generic Name']:
drugdict.dropna()
drugdict[d]=orangebook[orangebook.Ingredient==d.upper()]['Approval_Date'].min()
Run Code Online (Sandbox Code Playgroud)
我应该添加或删除此代码以确保字典中没有值为nan的键值对?
如何将pandas dataFrame转换为字典的稀疏字典,其中仅显示某些截止的索引.在下面的玩具示例中,我只想要值> 0的每列的索引
import pandas as pd
table1 = [['gene_a', -1 , 1], ['gene_b', 1, 1],['gene_c', 0, -1]]
df1 = pd.DataFrame(table)
df1.columns = ['gene','cell_1', 'cell_2']
df1 = df1.set_index('gene')
dfasdict = df1.to_dict(orient='dict')
Run Code Online (Sandbox Code Playgroud)
这给出了:
dfasdict = {'cell_1': {'gene_a': -1, 'gene_b': 0, 'gene_c': 0}, 'cell_2': {'gene_a': 1, 'gene_b': -1, 'gene_c': -1}}
但是所需的输出是稀疏字典,其中只显示小于零的值:
desired = {'cell_1': {'gene_a': -1}, 'cell_2': {'gene_b': -1, 'gene_c': -1}}
我可以dfasdict在创建之后进行一些处理以更改字典,但我想在同一步骤中进行转换,因为之后的处理涉及迭代非常大的字典.这有可能在熊猫中完成吗?