我需要帮助netX或python中的任何其他图形库.我有字典和键,每个键有几个值:
{nan: array([nan, nan, nan, nan, nan, nan, nan], dtype=object),
'BBDD': array([nan, nan, nan, nan, nan, nan, nan], dtype=object),
'AAAD': array(['BBDD', nan, nan, nan, nan, nan, nan], dtype=object),
'AAFF': array(['AAAD', nan, nan, nan, nan, nan, nan], dtype=object),
'MMCC': array(['AAAD', nan, nan, nan, nan, nan, nan], dtype=object),
'KKLL': array(['AAFF', 'MMCC', 'AAAD', 'BBDD', nan, nan, nan], dtype=object),
'GGHH': array(['KKLL', 'NI4146', 'MMCC', nan, nan, nan, nan],dtype=object), ...}
Run Code Online (Sandbox Code Playgroud)
现在我的问题是,如何将这个数据中的数据放到图形中,其中键是节点,值是边.通过dict迭代哪种方式最好?
import networkx as nx
import matplotlib.pyplot as plt
g = nx.DiGraph()
g.add_nodes_from([1,2,3,4,5])
g.add_edge(1,2)
g.add_edge(4,2)
g.add_edge(3,5) …
Run Code Online (Sandbox Code Playgroud) 我有一个大约50万行的数据框。如我所见,有很多重复的行,那么如何删除所有列(大约80列)中具有相同值的重复行,而不仅仅是一个?
df:
period_start_time id val1 val2 val3
06.13.2017 22:00:00 i53 32 2 10
06.13.2017 22:00:00 i32 32 2 10
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i20 7 7 22
06.13.2017 22:00:00 i20 7 7 22
Run Code Online (Sandbox Code Playgroud)
所需的输出:
period_start_time id val1 val2 val3
06.13.2017 22:00:00 i53 32 2 10
06.13.2017 22:00:00 i32 32 2 10
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i20 7 7 22
Run Code Online (Sandbox Code Playgroud) 我有一个非空对象的时间列,我无法将其转换为 timedelta 或 datetime。
Time msg
12:29:36.306000 Setup
12:29:36.507000 Alerting
12:29:38.207000 Service
12:29:39.194000 Setup
12:30:05.773000 Alerting
12:30:06.205000 Service
12:32:07.315000 Setup
12:32:17.194000 Service
12:32:26.889000 Setup
12:36:06.274000 Alerting
12:36:08.523000 Service
12:37:59.200000 Setup
12:47:10.652000 Alerting
12:47:43.921000 Setup
Run Code Online (Sandbox Code Playgroud)
当我输入 df.info() 时,我发现“时间”列不是 null 对象,并且我无法将其转换为 timedelta 或 datetime(对于这一点,很明显为什么我不能这样做)。那么,找到连续消息(时间增量)之间的差异的解决方案是什么,但如果时间增量< 5秒则通过。输出:
Time msg diff
12:29:36.306000 Setup
12:29:36.507000 Alerting
12:29:38.207000 Service
12:29:39.194000 Setup
12:30:05.773000 Alerting
12:30:06.205000 Service
12:32:07.315000 Setup
12:32:17.194000 Service
12:32:26.889000 Setup
12:36:06.274000 Alerting 6.30***
12:36:08.523000 Service
12:37:59.200000 Setup
12:47:10.652000 Alerting 11.02***
12:47:43.921000 Setup
Run Code Online (Sandbox Code Playgroud)
我尝试过这样的事情:
df['diff'] = (df['Time']df['Time'].shift()).fillna(0) …
Run Code Online (Sandbox Code Playgroud) 为什么这不起作用:我在字典里面有一本字典
{'rrr-rrr/CCC-3/FFFF-1': {'ActiveSet': '0'},
'rrr-rrr/CCC-4/FFFF-1': {'ActiveSet': '1'},
...}
Run Code Online (Sandbox Code Playgroud)
我需要丢弃CCC为3(CCC-3)的密钥.我试过的方式是这样的:
my_dict = {k: v
for k, v in my_dict.iteritems()
if k.split('/')[1].split('-')[1]!= 3
}
Run Code Online (Sandbox Code Playgroud)
并且此代码没有错误,但没有任何反应.我也试过在CCC号码的内部字典中创建新键,但这也不行.期望的输出:
{'rrr-rrr/CCC-4/FFFF-1': {'ActiveSet': '1'},
...}
Run Code Online (Sandbox Code Playgroud) 我有两个数据帧,我需要按一列连接,并且如果该 id 包含在第二个数据帧的同一列中,则仅从第一个数据帧中获取行:
df1:
id a b
2 1 1
3 0.5 1
4 1 2
5 2 1
Run Code Online (Sandbox Code Playgroud)
df2:
id c d
2 fs a
5 fa f
Run Code Online (Sandbox Code Playgroud)
期望的输出:
df:
id a b
2 1 1
5 2 1
Run Code Online (Sandbox Code Playgroud)
我尝试过 df1.join(df2("id"),"left"),但出现错误:'Dataframe' 对象不可调用。
python ×4
dataframe ×2
dictionary ×2
pandas ×2
apache-spark ×1
graph ×1
join ×1
key ×1
networkx ×1
pyspark ×1
python-2.7 ×1
timedelta ×1