我有一些来自gps数据(Lat,Lon)和shapefile我正在考虑的区域的相对数据点.所以:
import pandas as pd
import shapefile
sf = shapefile.Reader('roadshape')
df = pd.read_csv('gpsdata.csv')
Run Code Online (Sandbox Code Playgroud)
现在df包含我正在分析的数据是这样的:
ID x y
3447 11.400427 48.816806
3448 11.400759 48.816772
3449 11.401424 48.816684
3450 11.401758 48.816631
3451 11.402090 48.816566
3452 11.402422 48.816490
Run Code Online (Sandbox Code Playgroud)
我想分配给每个相应的shapefile细分市场.我正在尝试执行以下操作.我正在考虑边界框,我想尝试查看数据指向数据栏的边界框.
dfs = pd.DataFrame()
shapes = sf.shapes()
X =list()
Y=list()
for i in range(0,len(shapes)):
X.append([shapes[i].bbox[0],shapes[i].bbox[2]])
Y.append([shapes[i].bbox[1],shapes[i].bbox[3]])
dfs['X'] = X
dfs['Y'] = Y
Run Code Online (Sandbox Code Playgroud)
现在我该如何查看我的积分是哪个bbox?哪里dfs是这样的
dfs =
X Y
0 [10.9467244189, 10.9704393002] [48.2671975178, 48.2697440003]
1 [11.5138847999, 11.5143541004] [48.6497096997, 48.6515363002] …Run Code Online (Sandbox Code Playgroud) 我有一个数据集,我想看看哪个是其后续的最佳分布。
在第一次尝试中,我尝试用 a 来拟合它rayleigh,所以
y, x = np.histogram(data, bins=45, normed=True)
param = rayleigh.fit(y) # distribution fitting
# fitted distribution
xx = linspace(0,45,1000)
pdf_fitted = rayleigh.pdf(xx,loc=param[0],scale=param[1])
pdf = rayleigh.pdf(xx,loc=0,scale=8.5)
fig,ax = plt.subplots(figsize=(7,5))
plot(xx,pdf,'r-', lw=5, alpha=0.6, label='rayleigh pdf')
plot(xx,pdf,'k-', label='Data')
plt.bar(x[1:], y)
ax.set_xlabel('Distance, '+r'$x [km]$',size = 15)
ax.set_ylabel('Frequency, '+r'$P(x)$',size=15)
ax.legend(loc='best', frameon=False)
Run Code Online (Sandbox Code Playgroud)
我试图对gamma发行版做同样的事情但没有成功
y, x = np.histogram(net1['distance'], bins=45, normed=True)
xx = linspace(0,45,1000)
ag,bg,cg = gamma.fit(y)
pdf_gamma = gamma.pdf(xx, ag, bg,cg)
fig,ax = plt.subplots(figsize=(7,5))
# fitted distribution
plot(xx,pdf_gamma,'r-', lw=5, alpha=0.6, …Run Code Online (Sandbox Code Playgroud) 我有一个 geopandas 数据框
geometry idx
0 POLYGON ((-74.25559 40.91553, -74.24559 40.915... 0
1 POLYGON ((-74.25559 40.90553, -74.24559 40.905... 1
2 POLYGON ((-74.25559 40.89553, -74.24559 40.895... 2
3 POLYGON ((-74.25559 40.88553, -74.24559 40.885... 3
4 POLYGON ((-74.25559 40.87553, -74.24559 40.875... 4
Run Code Online (Sandbox Code Playgroud)
在哪里
gridDF['geometry'][0]
<shapely.geometry.polygon.Polygon at 0x7fa4cc6ccc50>
Run Code Online (Sandbox Code Playgroud)
我想将列中的条目转换geometry为字符串。
我想在数字标记为列数据帧的两列之间的数据帧中添加一列.在以下数据框中,第一列对应于索引,而第一列对应于列的名称.
df
0 0 1 2 3 4 5
1 6 7 4 5 2 1
2 0 3 1 3 3 4
3 9 8 4 3 6 2
Run Code Online (Sandbox Code Playgroud)
我有tmp=[2,3,5]我想要的列之间放4和5,所以
df
0 0 1 2 3 4 5 6
1 6 7 4 5 2 2 1
2 0 3 1 3 3 3 4
3 9 8 4 3 6 5 2
Run Code Online (Sandbox Code Playgroud) 我有一个dataSwiss包含瑞士城市信息的数据框。我想用普通字母用重音符号替换字母。
这就是我正在做的:
dataSwiss['Municipality'] = dataSwiss['Municipality'].str.encode('utf-8')
dataSwiss['Municipality'] = dataSwiss['Municipality'].str.replace(u"é", "e")
Run Code Online (Sandbox Code Playgroud)
但我收到以下错误:
----> 2 dataSwiss['Municipality'] = dataSwiss['Municipality'].str.replace(u"é", "e")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)
数据看起来像:
dataSwiss.Municipality
0 Zürich
1 Zürich
2 Zürich
3 Zürich
4 Zürich
5 Zürich
6 Zürich
7 Zürich
Run Code Online (Sandbox Code Playgroud)
我找到了解决方案
s = dataSwiss['Municipality']
res = s.str.decode('utf-8')
res = res.str.replace(u"é", "e")
Run Code Online (Sandbox Code Playgroud) 我有一个数据框df,其中包含从一个人Name_Give到另一个人的交易信息,Name_Receive如下所示:
df
Name_Give Name_Receive Amount
0 John Tom 300
1 Eva Tom 700
2 Sarah Tom 100
3 John Tom 200
4 Tom Eva 700
5 John Eva 300
6 Carl Eva 250
Run Code Online (Sandbox Code Playgroud)
对于每个Name_Receive j我想计算香农熵,S_j = -sum_i p_i \log p_i其中p_i是数量除以用户数量的总和j
S_Tom = - (300/1300 * np.log(300/1300) + 700/1300 * np.log(700/1300) + 100/1300 * np.log(100/1300) + 200/1300 * np.log(200/1300))
S_Eva = - (700/1250 * np.log(700/1250) + 300/1250 * …Run Code Online (Sandbox Code Playgroud) 重新安装Anaconda后,我无法再在 Python\xc2\xa03 上导入 NumPy:
\nimport numpy as np\nRun Code Online (Sandbox Code Playgroud)\n输出:
\nimport numpy as np\nRun Code Online (Sandbox Code Playgroud)\n我努力了
\nModuleNotFoundError: No module named \'numpy\'\nRun Code Online (Sandbox Code Playgroud)\n我尝试再次安装它,但我得到:
\npip install numpy\nRun Code Online (Sandbox Code Playgroud)\n输出:
\npip install numpy\nRun Code Online (Sandbox Code Playgroud)\n我正在使用pysal空间回归。这就是我正在做的事情。我首先划分训练集和验证集
import pysal as ps
m_train = ps.model.spreg.GM_Lag(ytrain, xtrain, w=w_train, spat_diag=True)
Run Code Online (Sandbox Code Playgroud)
如何使用 的系数m_train在验证集上测试模型?
print(m_train.summary)
REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable : dep_var Number of Observations: 138
Mean dependent var : 0.2200 Number of Variables : 12
S.D. dependent var : 2.2411 Degrees of Freedom : 126
Pseudo R-squared : 0.6609
Spatial Pseudo R-squared: omitted due to rho outside the boundary (-1, …Run Code Online (Sandbox Code Playgroud) 我需要拆分pyspark数据帧df并保存不同的块。
这就是我正在做的事情:我定义一个列id_tmp,然后根据该列分割数据框。
chunk = 10000
id1 = 0
id2 = chunk
df = df.withColumn('id_tmp', row_number().over(Window.orderBy(monotonically_increasing_id())) - 1)
c = df.count()
while id1 < c:
stop_df = df.filter( (tmp.id_tmp < id2) & (tmp.id_tmp >= id1))
stop_df.write.format('com.databricks.spark.csv').save('myFolder/')
id1+=chunk
id2+=chunk
Run Code Online (Sandbox Code Playgroud)
有没有更有效的方法而不定义列id_tmp
我有一个 geopandas 数据框,如下所示
gdf.head()
geometry raster_val diss
0 POLYGON ((2.50553 49.10338, 2.50553 49.09506, ... 33.299999 1
1 POLYGON ((2.50553 49.09506, 2.50553 49.08673, ... 33.790001 1
2 POLYGON ((2.52218 49.09506, 2.52218 49.08673, ... 34.000000 1
3 POLYGON ((2.14760 49.07009, 2.14760 49.06176, ... 33.180000 1
4 POLYGON ((2.02274 49.06176, 2.02274 49.05344, ... 33.459999 1
Run Code Online (Sandbox Code Playgroud)
POLYGONS我正在尝试根据专栏解散diss
gdf_diss = gdf.dissolve(by='diss')
Run Code Online (Sandbox Code Playgroud)
但我收到以下错误:
GEOSException: IllegalArgumentException: Argument must be Polygonal or LinearRing
Run Code Online (Sandbox Code Playgroud) python ×9
pandas ×4
geopandas ×2
apache-spark ×1
dataframe ×1
decode ×1
distribution ×1
entropy ×1
geolocation ×1
numpy ×1
pysal ×1
pyspark ×1
python-3.x ×1
scipy ×1
shapefile ×1
shapely ×1
statistics ×1
string ×1
unicode ×1