我有 2 个数据框,它看起来像这样:DF1:
Product, Region, ProductScore
AAA, R1,100
AAA, R2,100
BBB, R2,200
BBB, R3,200
Run Code Online (Sandbox Code Playgroud)
DF2:
Region, RegionScore
R1,1
R2,2
Run Code Online (Sandbox Code Playgroud)
我怎样才能让这 2 个加入 1 个数据帧,结果应该是这样的:
Product, Region, ProductScore, RegionScore
AAA, R1,100,1
AAA, R2,100,2
BBB, R2,200,2
Run Code Online (Sandbox Code Playgroud)
非常感谢!
编辑1:
我使用了 df.merge(df_new) 得到这个错误消息:
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 4071, in merge
suffixes=suffixes, copy=copy)
File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 37, in merge
copy=copy)
File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 183, in __init__
self.join_names) = self._get_merge_keys()
File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 318, in _get_merge_keys
self._validate_specification()
File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 409, in _validate_specification
if not self.right.columns.is_unique:
AttributeError: …Run Code Online (Sandbox Code Playgroud) 我有一个数据框,如下所示:
P Q L
1 2 3
2 3
4 5 6,7
Run Code Online (Sandbox Code Playgroud)
目的是检查 中是否有任何值,如果有,则提取和列L上的值:LP
P L
1 3
4,6
4,7
Run Code Online (Sandbox Code Playgroud)
请注意, 中可能有多个值L,在超过 1 个值的情况下,我需要两行。
以下是我当前的脚本,它无法生成预期的结果。
df2 = []
ego
other
newrow = []
for item in data_DF.iterrows():
if item[1]["L"] is not None:
ego = item[1]['P']
other = item[1]['L']
newrow = ego + other + "\n"
df2.append(newrow)
data_DF2 = pd.DataFrame(df2)
Run Code Online (Sandbox Code Playgroud) 我有一些亚马逊评论数据,我已成功从文本格式转换为CSV格式,现在的问题是当我尝试使用pandas将其读入数据帧时,我收到错误消息: UnicodeDecodeError:'utf-8'codec can' t解码位置13中的字节0xf8:无效的起始字节
我理解在审查原始数据中必须有一些非utf-8,如何删除非UTF-8并保存到另一个CSV文件?
谢谢!
EDIT1:这是我将文本转换为csv的代码:
import csv
import string
INPUT_FILE_NAME = "small-movies.txt"
OUTPUT_FILE_NAME = "small-movies1.csv"
header = [
"product/productId",
"review/userId",
"review/profileName",
"review/helpfulness",
"review/score",
"review/time",
"review/summary",
"review/text"]
f = open(INPUT_FILE_NAME,encoding="utf-8")
outfile = open(OUTPUT_FILE_NAME,"w")
outfile.write(",".join(header) + "\n")
currentLine = []
for line in f:
line = line.strip()
#need to reomve the , so that the comment review text won't be in many columns
line = line.replace(',','')
if line == "":
outfile.write(",".join(currentLine))
outfile.write("\n")
currentLine = []
continue
parts = line.split(":",1)
currentLine.append(parts[1]) …Run Code Online (Sandbox Code Playgroud)