我正在尝试使用 pd.read_json() 将亚马逊评论数据加载到 pandas dataframe(这是一个 JSON 文件)中,我收到以下错误Unmatched ''"' when when decoding 'string'.我正在使用 jupyter 笔记本
数据格式:
{"reviewerID": "AGL65XWV7MH3C", "asin": "B003FMUVKO", "reviewerName": "William B. Bebout \"Acknud\"", "helpful": [0, 1], "reviewText": "Too short. I would have rated it higher if it was long enough to hold my attention! It did have significant violence but not much else.", "overall": 3.0, "summary": "Short", "unixReviewTime": 1304985600, "reviewTime": "05 10, 2011"}
Run Code Online (Sandbox Code Playgroud)
Python代码:
data =pd.read_json('sample_data.json', lines=True)
Run Code Online (Sandbox Code Playgroud) I am using matplotlib library of python for plotting graphs. My program is to detect the outliers in the system. I am making an interface for this. I have to show this graph in browser on clicking the button. Here is my php file code :-
<?php
if (isset($_POST['button']))
{
echo shell_exec("python3 /var/www/html/python/anomalies.py 2>&1");
}
?>
<html>
<body>
<center>
<form method="post">
<table>
<tr>
<td>
Enter CSV file path:
</td>
<td>
<input type="text" name="path">
</td>
</tr>
</table>
<p>
<button name="button">Identify Anomalies</button> …Run Code Online (Sandbox Code Playgroud) 使用pandas DataFrame,我想取每个第i行并将其除以第i-1行.我想使用矢量化(即,不用于循环).
例如,如果我有以下DataFrame:
1 10
2 20
8 160
32 480
Run Code Online (Sandbox Code Playgroud)
我最终会:
1 10
2 2
4 8
4 3
Run Code Online (Sandbox Code Playgroud)
NB除法运算使用旧表值,而不是更新的表值.
PS抱歉格式不好!
我有一个如下的数据框:
id timestamp name
1 2018-01-23 15:49:53 "aaa"
1 2018-01-23 15:54:56 "bbb"
1 2018-01-23 15:49:57 "bbb"
1 2018-01-23 15:49:54 "ccc"
Run Code Online (Sandbox Code Playgroud)
这是我的数据中的一组 id 示例。我有几组ID。我想要做的是将每个组折叠成一行,但根据时间戳按时间顺序排列,例如像这样
id name
1 aaa->ccc->bbb->bbb
Run Code Online (Sandbox Code Playgroud)
name 中的值按时间顺序排列,因为它们与时间戳一起出现。关于这个的任何指示?
我有一个时间序列数据集,如下所示:
Date Newspaper City1 City2 Region1Total City3 City4 Region2Total
2017-12-01 NewsPaper1 231563 8696 240259 21072 8998 30070
2017-12-01 NewsPaper2 173009 12180 185189 28910 5550 34460
2017-12-01 NewsPaper3 40511 4600 45111 5040 3330 8370
2017-12-01 NewsPaper4 37770 2980 40750 6520 1880 8400
2017-12-01 NewsPaper5 5176 900 6076 1790 5000 6790
2017-12-01 NewsPaper6 137650 8025 145675 25300 11000 36300
2017-12-01 Total 637547 38201 675748 91032 36558 127590
2018-01-01 NewsPaper1 231295 8391 239686 8790 21176 29966
2018-01-01 NewsPaper2 169937 12130 182067 7890 28850 …Run Code Online (Sandbox Code Playgroud) 有谁知道如何为列合并forloop但从任何列开始?(这种情况的第三个)
可以说这是数据帧:
spice smice skice bike dike mike
1 23 35 34 34 56
135 34 23 21 56 34
231 12 67 21 62 75
Run Code Online (Sandbox Code Playgroud)
我想通过迭代skice,自行车,堤防和Mike 只
我想创建多个名称与列之一中的值相同的数据框。我希望这段代码像这样工作:
import pandas as pd
data=pd.read_csv('athlete_events.csv')
Sports = data.Sport.unique()
for S in Sports:
name=str(S)
name=data.loc[data['Sport']==S]
Run Code Online (Sandbox Code Playgroud) 我有一个字符串,来自一篇有几百个句子的文章.我想将字符串转换为数据帧,每个句子作为一行.例如,
data = 'This is a book, to which I found exciting. I bought it for my cousin. He likes it.'
Run Code Online (Sandbox Code Playgroud)
我希望它变成:
This is a book, to which I found exciting.
I bought it for my cousin.
He likes it.
Run Code Online (Sandbox Code Playgroud)
作为一个python新手,这是我试过的:
import pandas as pd
data_csv = StringIO(data)
data_df = pd.read_csv(data_csv, sep = ".")
Run Code Online (Sandbox Code Playgroud)
使用上面的代码,所有句子都成为列名.我实际上想要它们在一列的行中.
我创建了一组遵循特定配色方案的雨云图分布(来自seaborn的Set2)。我想让我的计数图与列出的组的颜色相匹配(例如:饮食组的男性和女性计数为绿色,mod-pa 的 m:f 计数为粉红色等)。但是我无法将调色板与 x 变量和色调对齐。看来 countplot 只会根据色调着色。
我尝试过使用 set_colors 来操作哪些栏来改变颜色,我也尝试过根据如下条件映射颜色,但似乎没有任何效果。
ax = sns.countplot(x="Group", hue="Sex", data=df)
ax[0].set_color('r')
TypeError: 'AxesSubplot' object does not support indexing
value=(df['Group']=='DIET') & (df['Sex']=='Female')
df['color']= np.where( value==True , "#9b59b6", "#3498db")
ax = sns.countplot(x="Group", hue="Sex", data=df, color=df['color'])
TypeError: 'Series' objects are mutable, thus they cannot be hashed
Run Code Online (Sandbox Code Playgroud)
完整代码
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame({"Sex" : np.random.choice(["Male", "Female"], size=1310, p=[.65, .35]),
"Group" : np.random.choice(["DIET", "MOD-PA", "HIGH-PA"],size=1310)})
# Unique …Run Code Online (Sandbox Code Playgroud)