小编Jas*_*ine的帖子

如何使用 Pandas 进行左连接

我有 2 个数据框,它看起来像这样:DF1:

Product, Region, ProductScore
AAA, R1,100
AAA, R2,100
BBB, R2,200
BBB, R3,200
Run Code Online (Sandbox Code Playgroud)

DF2:

Region, RegionScore
R1,1
R2,2
Run Code Online (Sandbox Code Playgroud)

我怎样才能让这 2 个加入 1 个数据帧,结果应该是这样的:

Product, Region, ProductScore, RegionScore
AAA, R1,100,1
AAA, R2,100,2
BBB, R2,200,2
Run Code Online (Sandbox Code Playgroud)

非常感谢!

编辑1:

我使用了 df.merge(df_new) 得到这个错误消息:

  File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 4071, in merge
    suffixes=suffixes, copy=copy)
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 37, in merge
    copy=copy)
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 183, in __init__
    self.join_names) = self._get_merge_keys()
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 318, in _get_merge_keys
    self._validate_specification()
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 409, in _validate_specification
    if not self.right.columns.is_unique:
AttributeError: …
Run Code Online (Sandbox Code Playgroud)

python pandas

6
推荐指数
1
解决办法
8853
查看次数

如何迭代 DataFrame 并生成新的 DataFrame

我有一个数据框,如下所示:

P Q L
1 2 3
2 3 
4 5 6,7
Run Code Online (Sandbox Code Playgroud)

目的是检查 中是否有任何值,如果有,则提取和列L上的值:LP

P L
1 3
4,6
4,7
Run Code Online (Sandbox Code Playgroud)

请注意, 中可能有多个值L,在超过 1 个值的情况下,我需要两行。

以下是我当前的脚本,它无法生成预期的结果。

df2 = []
ego
other
newrow = []

for item in data_DF.iterrows():
    if item[1]["L"] is not None:
        ego = item[1]['P']
        other = item[1]['L']
        newrow = ego + other + "\n"
        df2.append(newrow)

data_DF2 = pd.DataFrame(df2)
Run Code Online (Sandbox Code Playgroud)

python pandas

5
推荐指数
1
解决办法
4845
查看次数

如何删除非utf 8代码并保存为csv文件python

我有一些亚马逊评论数据,我已成功从文本格式转换为CSV格式,现在的问题是当我尝试使用pandas将其读入数据帧时,我收到错误消息: UnicodeDecodeError:'utf-8'codec can' t解码位置13中的字节0xf8:无效的起始字节

我理解在审查原始数据中必须有一些非utf-8,如何删除非UTF-8并保存到另一个CSV文件?

谢谢!

EDIT1:这是我将文本转换为csv的代码:

import csv
import string
INPUT_FILE_NAME = "small-movies.txt"
OUTPUT_FILE_NAME = "small-movies1.csv"
header = [
    "product/productId",
    "review/userId",
    "review/profileName",
    "review/helpfulness",
    "review/score",
    "review/time",
    "review/summary",
    "review/text"]
f = open(INPUT_FILE_NAME,encoding="utf-8")

outfile = open(OUTPUT_FILE_NAME,"w")

outfile.write(",".join(header) + "\n")
currentLine = []
for line in f:

   line = line.strip()  
   #need to reomve the , so that the comment review text won't be in many columns
   line = line.replace(',','')

   if line == "":
      outfile.write(",".join(currentLine))
      outfile.write("\n")
      currentLine = []
      continue
   parts = line.split(":",1)
   currentLine.append(parts[1]) …
Run Code Online (Sandbox Code Playgroud)

python encoding utf-8

1
推荐指数
1
解决办法
5584
查看次数

标签 统计

python ×3

pandas ×2

encoding ×1

utf-8 ×1