我有一个由几张纸组成的Excel文件。我需要将它们分别加载为单独的数据帧。对于此类任务,有什么与pd.read_csv(“”)类似的功能?
PS由于大小,我无法在Excel中复制和粘贴单张纸
我正在尝试使用以下函数导入json文件:
sku = pandas.read_json('https://cws01.worldstores.co.uk/api/product.php?product_sku=125T:FT0111')
Run Code Online (Sandbox Code Playgroud)
但是,我不断收到以下错误
ValueError:数组必须全长相同
如何将其正确导入数据框?
这是json的结构:
{
"id": "5",
"sku": "JOSH:BECO-BRN",
"last_updated": "2013-06-10 15:46:22",
...
"propertyType1": [
"manufacturer_colour"
],
"category": [
{
"category_id": "10",
"category_name": "All Products"
},
...
{
"category_id": "238",
"category_name": "All Sofas"
}
],
"root_categories": [
"516"
],
"url": "/p/Beco Suede Sofa Bed?product_id=5",
"item": [
"2"
],
"image_names": "[\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/L\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/P\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/SP\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/SS\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/ST\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/WP\\/19\\/Beco_Suede_Sofa_Bed-1.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/L\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/P\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/SP\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\",\"https:\\/\\/cdn.worldstores.co.uk \\/images\\/products\\/SS\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/ST\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\",\"https:\\/\\/cdn.worldstores.co.uk\\/images\\/products\\/WP\\/19\\/Beco_Suede_Sofa_Bed-2.jpg\"]"
Run Code Online (Sandbox Code Playgroud)
}
如何以整数形式获取 pandas 中这两个时间戳之间的毫秒差?
到目前为止我的尝试:
import pandas as pd
from datetime import datetime
ts1 = datetime(2018, 1, 1, 22, 36, 9, 38000)
ts2 = datetime(2018, 1, 1, 22, 36, 7, 908000)
df = pd.DataFrame("ts1": [ts1],"ts2": [ts2]})
df['interval_ts'] = df['ts1'] - df['ts2']
df['interval_ts'] = df['interval_ts'].apply(lambda x: x.microseconds / 1000)
Run Code Online (Sandbox Code Playgroud)
我的预期输出是:1130
我有一个由函数产生的集合结果:
Counter(df.email_address)
Run Code Online (Sandbox Code Playgroud)
它返回每个单独的电子邮件地址及其重复次数.
Counter({nan: 1618, 'store@kiddicare.com': 265, 'testorders@worldstores.co.uk': 1})
Run Code Online (Sandbox Code Playgroud)
我想要做的就是使用它就好像它是一个字典并用它创建一个pandas数据帧,其中两列用于电子邮件地址,一列用于相关值.
我尝试过:
dfr = repeaters.from_dict(repeaters, orient='index')
Run Code Online (Sandbox Code Playgroud)
但我得到以下错误:
AttributeError: 'Counter' object has no attribute 'from_dict'
Run Code Online (Sandbox Code Playgroud)
它使得Counter不是字典,因为它看起来像.有关如何将其附加到df的任何想法?
我在 pandas 数据框中有一列作为日期时间。有了这个功能:
data['yearMonth'] = data.ts_placed.map(lambda x: '{year}-{month}'.format(year=x.year,month=x.month))
Run Code Online (Sandbox Code Playgroud)
我将日期时间对象转换为
2012-08-06 10:25:39
Run Code Online (Sandbox Code Playgroud)
到
2012-8
Run Code Online (Sandbox Code Playgroud)
我需要的是将对象获取为
2012-08
Run Code Online (Sandbox Code Playgroud) 我有这个pandas数据框:
df = DataFrame({'B' : ('A','B','C','D','E','F','G'), 'C' : (1,3,5,6,8,2,5), 'D' : (5,2,6,9,3,7,3)})
B C D
0 A 1 5
1 B 3 2
2 C 5 6
3 D 6 9
4 E 8 3
5 F 2 7
6 G 5 3
Run Code Online (Sandbox Code Playgroud)
我需要使计算更好.对我来说方便的格式是:
B description value
0 A C 1
1 B C 3
2 C C 5
3 D C 6
4 E C 8
5 F C 2
6 G C 5
7 A D 5
8 …Run Code Online (Sandbox Code Playgroud) 我正在运行一个必须在Windows机器和Linux机器之间进行交互的python代码.
代码在Windows上启动,计算由服务器执行,结果返回到Windows上的文件夹中.
当我在我的Windows机器上运行代码很好但是当它通过Linux服务器传递时,我收到以下错误消息:
line 25: syntax error near unexpected token `('
line 25: `db = MySQLdb.connect(host="192.168.1.18", # host
Run Code Online (Sandbox Code Playgroud)
而代码行是:
db = MySQLdb.connect(host="192.168.1.18", # host
Run Code Online (Sandbox Code Playgroud)
错误消息的含义是什么以及如何解决?
谢谢
我有以下 df:
import numpy as np
import pandas as pd
from pandas import *
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
s = pd.Series(np.random.randn(8), index=arrays)
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
Run Code Online (Sandbox Code Playgroud)
这看起来像:
0 1 2 3
bar one -0.986089 -0.501170 1.635501 -0.789489
two 1.890491 -0.022640 -1.649097 0.984925
baz one -0.759930 -1.640487 -0.763909 -0.554997
two 1.636005 0.037158 0.567383 0.770314
foo one 0.709847 0.048332 -0.676660 1.059454
two 0.588063 0.568405 1.619102 0.393631
qux …Run Code Online (Sandbox Code Playgroud) 我在 pandas 数据框中有一列邮政编码,有时在空格前有 4 位数字,有时是 3 位,即
NE5 2NY
NE49 9PB
Run Code Online (Sandbox Code Playgroud)
用于仅用空格之前的字符替换这些字符串的正则表达式是什么?
python ×8
pandas ×6
dataframe ×3
datetime ×2
collections ×1
dictionary ×1
excel ×1
import ×1
json ×1
linux ×1
multi-index ×1
padding ×1
regex ×1