小编Jas*_*ine的帖子

如何使用 Pandas 进行左连接

我有 2 个数据框，它看起来像这样：DF1：

Product, Region, ProductScore
AAA, R1,100
AAA, R2,100
BBB, R2,200
BBB, R3,200

Run Code Online (Sandbox Code Playgroud)

DF2：

Region, RegionScore
R1,1
R2,2

Run Code Online (Sandbox Code Playgroud)

我怎样才能让这 2 个加入 1 个数据帧，结果应该是这样的：

Product, Region, ProductScore, RegionScore
AAA, R1,100,1
AAA, R2,100,2
BBB, R2,200,2

Run Code Online (Sandbox Code Playgroud)

非常感谢！

编辑1：

我使用了 df.merge(df_new) 得到这个错误消息：

  File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 4071, in merge
    suffixes=suffixes, copy=copy)
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 37, in merge
    copy=copy)
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 183, in __init__
    self.join_names) = self._get_merge_keys()
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 318, in _get_merge_keys
    self._validate_specification()
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 409, in _validate_specification
    if not self.right.columns.is_unique:
AttributeError: …

Run Code Online (Sandbox Code Playgroud)

python pandas

Jas*_*ine

2018 07-20

6
推荐指数

1
解决办法

8853
查看次数

如何迭代 DataFrame 并生成新的 DataFrame

我有一个数据框，如下所示：

Run Code Online (Sandbox Code Playgroud)

目的是检查中是否有任何值，如果有，则提取和列L上的值：LP

P L
1 3
4,6
4,7

Run Code Online (Sandbox Code Playgroud)

请注意，中可能有多个值L，在超过 1 个值的情况下，我需要两行。

以下是我当前的脚本，它无法生成预期的结果。

df2 = []
ego
other
newrow = []

for item in data_DF.iterrows():
    if item[1]["L"] is not None:
        ego = item[1]['P']
        other = item[1]['L']
        newrow = ego + other + "\n"
        df2.append(newrow)

data_DF2 = pd.DataFrame(df2)

Run Code Online (Sandbox Code Playgroud)

python pandas

Jas*_*ine

2019 09-26

5
推荐指数

1
解决办法

4845
查看次数

如何删除非utf 8代码并保存为csv文件python

我有一些亚马逊评论数据,我已成功从文本格式转换为CSV格式,现在的问题是当我尝试使用pandas将其读入数据帧时,我收到错误消息: UnicodeDecodeError:'utf-8'codec can' t解码位置13中的字节0xf8:无效的起始字节

我理解在审查原始数据中必须有一些非utf-8,如何删除非UTF-8并保存到另一个CSV文件？

谢谢!

EDIT1:这是我将文本转换为csv的代码:

import csv
import string
INPUT_FILE_NAME = "small-movies.txt"
OUTPUT_FILE_NAME = "small-movies1.csv"
header = [
    "product/productId",
    "review/userId",
    "review/profileName",
    "review/helpfulness",
    "review/score",
    "review/time",
    "review/summary",
    "review/text"]
f = open(INPUT_FILE_NAME,encoding="utf-8")

outfile = open(OUTPUT_FILE_NAME,"w")

outfile.write(",".join(header) + "\n")
currentLine = []
for line in f:

   line = line.strip()  
   #need to reomve the , so that the comment review text won't be in many columns
   line = line.replace(',','')

   if line == "":
      outfile.write(",".join(currentLine))
      outfile.write("\n")
      currentLine = []
      continue
   parts = line.split(":",1)
   currentLine.append(parts[1]) …

Run Code Online (Sandbox Code Playgroud)

python encoding utf-8

Jas*_*ine

2015 09-23

1
推荐指数

1
解决办法

5584
查看次数