您好我有以下数据帧.
Group Size
Short Small
Short Small
Moderate Medium
Moderate Small
Tall Large
Run Code Online (Sandbox Code Playgroud)
我想计算同一行在数据帧中出现的时间的频率.
Group Size Time
Short Small 2
Moderate Medium 1
Moderate Small 1
Tall Large 1
Run Code Online (Sandbox Code Playgroud) 我正在尝试在 mac 机器上安装fiona和。geopandas
我已经安装了GDAL槽brew install gdal --HEAD
如果我尝试安装fiona,我会得到以下信息:
sudo pip3 install fiona\nCollecting fiona\n Downloading Fiona-1.8.18.tar.gz (1.3 MB)\n |\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 1.3 MB 4.7 MB/s \n ERROR: Command errored out with exit status 1:\n command: /usr/local/opt/python@3.9/bin/python3.9 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/tmp/pip-install-sk0rqerp/fiona_e389978ed5a44635a5158a0ba3df8562/setup.py'"'"'; __file__='"'"'/private/tmp/pip-install-sk0rqerp/fiona_e389978ed5a44635a5158a0ba3df8562/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\\r\\n'"'"', '"'"'\\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/tmp/pip-pip-egg-info-bngdnoq7\n cwd: /private/tmp/pip-install-sk0rqerp/fiona_e389978ed5a44635a5158a0ba3df8562/\n Complete output (2 lines):\n Failed to get options via gdal-config: [Errno 2] No such file or directory: …Run Code Online (Sandbox Code Playgroud) 我有一个如下表
df
+------------------------------------+-----------------------+
|identifier |timestamp |
+------------------------------------+-----------------------+
|86311425-0890-40a5-8950-54cbaaa60815|2020-03-18 14:41:55 UTC|
|38e121a8-f21f-4d10-bb69-26eb045175b5|2020-03-13 15:19:21 UTC|
|1a69c9b0-283b-4b6d-89ac-66f987280c66|2020-03-16 12:59:51 UTC|
|c7b5c53f-bf40-498f-8302-4b3329322bc9|2020-03-18 22:05:06 UTC|
|0d3d807b-9b3a-466e-907c-c22402240730|2020-03-17 18:40:03 UTC|
+------------------------------------+-----------------------+
tmp.printSchema()
root
|-- identifier: string (nullable = true)
|-- timestamp: string (nullable = true)
Run Code Online (Sandbox Code Playgroud)
我想要一个只包含时间戳中的日期和小时的列。
我正在尝试以下操作:
from pyspark.sql.functions import hour
df = df.withColumn("hour", hour(col("timestamp")))
Run Code Online (Sandbox Code Playgroud)
但我得到以下结果
+--------------------+--------------------+----+
| identifier| timestamp|hour|
+--------------------+--------------------+----+
|321869c3-71e5-41d...|2020-03-19 03:34:...|null|
|226b8d50-2c6a-471...|2020-03-19 02:59:...|null|
|47818b7c-34b5-43c...|2020-03-19 01:41:...|null|
|f5ca5599-7252-49d...|2020-03-19 04:25:...|null|
|add2ae24-aa7b-4d3...|2020-03-19 01:50:...|null|
+--------------------+--------------------+----+
Run Code Online (Sandbox Code Playgroud)
虽然我想拥有
+--------------------+--------------------+-------------------+
| identifier| timestamp|hour |
+--------------------+--------------------+-------------------+
|321869c3-71e5-41d...|2020-03-19 03:00:...|2020-03-19 03:00:00|
|226b8d50-2c6a-471...|2020-03-19 02:59:...|2020-03-19 02:00:00|
|47818b7c-34b5-43c...|2020-03-19 01:41:...|2020-03-19 01:00:00| …Run Code Online (Sandbox Code Playgroud) 您好我有以下数据帧
df =
Record_ID Time
94704 2014-03-10 07:19:19.647342
94705 2014-03-10 07:21:44.479363
94706 2014-03-10 07:21:45.479581
94707 2014-03-10 07:21:54.481588
94708 2014-03-10 07:21:55.481804
Run Code Online (Sandbox Code Playgroud)
有可能有以下吗?
df1 =
Record_ID Time
94704 2014-03-10 07:19:19
94705 2014-03-10 07:21:44
94706 2014-03-10 07:21:45
94707 2014-03-10 07:21:54
94708 2014-03-10 07:21:55
Run Code Online (Sandbox Code Playgroud) 我将一个.csv文件作为数据框读取,如下所示:
import pandas as pd
df = pd.read_csv('myFile.csv')
df.head()
BoroName geometry
0 Brooklyn MULTIPOLYGON (((-73.97604935657381 40.63127590...
1 Queens MULTIPOLYGON (((-73.80379022888098 40.77561011...
2 Queens MULTIPOLYGON (((-73.8610972440186 40.763664477...
3 Queens MULTIPOLYGON (((-73.75725671509139 40.71813860...
4 Manhattan MULTIPOLYGON (((-73.94607828674226 40.82126321...
Run Code Online (Sandbox Code Playgroud)
我想将其转换为 geopandas 数据框。
import geopandas as gpd
crs = {'init': 'epsg:4326'}
gdf = gpd.GeoDataFrame(df, crs=crs).set_geometry('geometry')
Run Code Online (Sandbox Code Playgroud)
但我收到以下错误
TypeError: Input must be valid geometry objects: MULTIPOLYGON (((-73.97604935657381 40.631275905646774, -73.97716511994669 40.63074665412933,....
Run Code Online (Sandbox Code Playgroud) 我有一个如下所示的数据框
df =
a ID1 ID2 Proximity
0 0 900000498 NaN 0.000000
1 1 900000498 900004585 3.900000
2 2 900000498 900005562 3.900000
3 3 900000498 900008613 0.000000
4 4 900000498 900012333 0.000000
5 5 900000498 900019524 3.900000
6 6 900000498 900019877 0.000000
7 7 900000498 900020141 3.900000
8 8 900000498 900022133 3.900000
9 9 900000498 900022919 0.000000
Run Code Online (Sandbox Code Playgroud)
我想为给定的夫妇找到ID1-ID2相应的Proximity值。例如,给定[900000498, 900022133]我想要的输入作为输出3.900000
我正在使用以下过程在子图中绘制不同的图形。
fig = figure(figsize=(10,11))
subplots_adjust(wspace=0.5,hspace=0.2)
iplot = 330
for i in range(9):
iplot += 1
ax = fig.add_subplot(iplot)
## Comparison Mreal - Real
tmp = REAL[REAL.days==days[i]]
tmp = tmp.score
tmp = np.array(tmp)
tmp = tmp.reshape(len(xv), len(br))
im = plt.imshow(tmp, interpolation='nearest', cmap='gnuplot', vmin = 0, vmax = 1, extent=[0.05,0.5,1,0.05],
aspect=0.5)
xtmp = [0.05, 0.2, 0.3, 0.4, 0.5]
plt.xticks(xtmp)
ytmp = [0.05, 0.2, 0.4, 0.6, 0.8, 1.0]
plt.yticks(ytmp)
ax.grid(False)
divider = make_axes_locatable(plt.gca())
cax = divider.append_axes("right", "5%", pad="3%")
plt.colorbar(im, cax=cax)
plt.tight_layout()
Run Code Online (Sandbox Code Playgroud)
这就是我得到的:
但是,我希望所有子图都具有相同的颜色条,例如在图的右侧。
我有一个数据df1框,对应nodes于网络中的egelist 和value节点本身,如下所示:
df
node_i node_j value_i value_j
0 3 4 89 33
1 3 2 89 NaN
2 3 5 89 69
3 0 2 45 NaN
4 0 3 45 89
5 1 2 109 NaN
6 1 8 109 NaN
Run Code Online (Sandbox Code Playgroud)
我想添加一个w对应于value_jif是否有值的列.如果value_j是NaN我想设置w作为平均的相邻节点的值的i.在node_i仅具有NaN设置值的相邻节点的情况下w=1.
所以最终的数据框应该像愚弄:
df
node_i node_j value_i value_j w
0 3 4 89 33 33 …Run Code Online (Sandbox Code Playgroud) 我生成了一个图表networkx
import networkx as nx
s = 5
G = nx.grid_graph(dim=[s,s])
nodes = list(G.nodes)
edges = list(G.edges)
p = []
for i in range(0, s):
for j in range(0, s):
p.append([i,j])
for i in range(0, len(nodes)):
G.nodes[nodes[i]]['pos'] = p[i]
pos = {}
for i in range(0, len(nodes)):
pos[nodes[i]] = p[i]
nx.draw(G, pos)
Run Code Online (Sandbox Code Playgroud)
现在我想为0和之间的每个节点分配一个值4
from random import randint
val = []
for i in range(0, len(G.nodes())):
val.append(randint(0,4))
Run Code Online (Sandbox Code Playgroud)
我想为列表中的每个节点分配颜色并绘制如下val所示的内容
我有一些来自gps数据(Lat,Lon)和shapefile我正在考虑的区域的相对数据点.所以:
import pandas as pd
import shapefile
sf = shapefile.Reader('roadshape')
df = pd.read_csv('gpsdata.csv')
Run Code Online (Sandbox Code Playgroud)
现在df包含我正在分析的数据是这样的:
ID x y
3447 11.400427 48.816806
3448 11.400759 48.816772
3449 11.401424 48.816684
3450 11.401758 48.816631
3451 11.402090 48.816566
3452 11.402422 48.816490
Run Code Online (Sandbox Code Playgroud)
我想分配给每个相应的shapefile细分市场.我正在尝试执行以下操作.我正在考虑边界框,我想尝试查看数据指向数据栏的边界框.
dfs = pd.DataFrame()
shapes = sf.shapes()
X =list()
Y=list()
for i in range(0,len(shapes)):
X.append([shapes[i].bbox[0],shapes[i].bbox[2]])
Y.append([shapes[i].bbox[1],shapes[i].bbox[3]])
dfs['X'] = X
dfs['Y'] = Y
Run Code Online (Sandbox Code Playgroud)
现在我该如何查看我的积分是哪个bbox?哪里dfs是这样的
dfs =
X Y
0 [10.9467244189, 10.9704393002] [48.2671975178, 48.2697440003]
1 [11.5138847999, 11.5143541004] [48.6497096997, 48.6515363002] …Run Code Online (Sandbox Code Playgroud)