所以我的数据集有n个日期的位置信息.问题是每个日期实际上是一个不同的列标题.例如,CSV看起来像
location name Jan-2010 Feb-2010 March-2010
A "test" 12 20 30
B "foo" 18 20 25
Run Code Online (Sandbox Code Playgroud)
我想要的是它看起来像
location name Date Value
A "test" Jan-2010 12
A "test" Feb-2010 20
A "test" March-2010 30
B "foo" Jan-2010 18
B "foo" Feb-2010 20
B "foo" March-2010 25
Run Code Online (Sandbox Code Playgroud)
问题是我不知道列中有多少个日期(虽然我知道它们总是会在名字后面开始)
我正在尝试在Python pandas中做一些数据工作,并且无法写出我的结果.我将我的数据作为CSV文件读取并导出每个脚本,因为它自己的CSV文件工作正常.最近虽然我已经尝试将所有内容导出到带有工作表的1个Excel文件中,但是一些工作表给我一个错误
"'utf8'编解码器无法解码位置1中的字节0xe9:无效的连续字节"
我不知道如何开始找到任何可能导致导出到Excel的问题的字符.不知道为什么它出口到CSV就好了虽然:(
相关的路线
from pandas import ExcelWriter
data = pd.read_csv(input)
writer = ExcelWriter(output) #output is just the filename
fundraisers.to_excel(writer, "fundraisers")
locations.to_excel(writer, "locations") #error
locations.to_csv(outputcsv) #works
writer.save()
Run Code Online (Sandbox Code Playgroud)
打印数据帧的负责人
Event ID Constituent ID Email Address First Name \ Last Name
f 1 A A 1
F 4 L R C
M 1 1 A D
F 4 A A G
M 2 0 R G
M 3 O O H
M 2 T E H
M 2 A A H …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用Pyinstaller来生成我的python代码的exe来轻松分发.每次我尝试运行pyinstaller.py时都会收到错误"[Errno 22]无效模式('rb')或文件名:''"
我在这个问题上看到了一些其他的帖子说这个问题通常是由于文件路径中的硬编码来读取数据引起的,但我的所有文件路径都是使用变量完成的,并询问用户文件的位置.
File "pyinstaller.py", line 18, in <module>
run()
File "\PyInstaller-2.1\PyInstaller-2.1\PyInstaller\main.py", line 88, in run
run_build(opts, spec_file, pyi_config)
File "\PyInstaller-2.1\PyInstaller-2.1\PyInstaller\main.py", line 46, in run_build
PyInstaller.build.main(pyi_config, spec_file, **opts.__dict__)
File "\PyInstaller-2.1\PyInstaller-2.1\PyInstaller\build.py", line 1924, in main
build(specfile, kw.get('distpath'), kw.get('workpath'), kw.get('clean_build'))
File "\PyInstaller-2.1\PyInstaller-2.1\PyInstaller\build.py", line 1873, in build
execfile(spec)
File "\PyInstaller-2.1\PyInstaller-2.1\guimain\guimain.spec", line 17, in <module>
console=True )
File "\PyInstaller-2.1\PyInstaller-2.1\PyInstaller\build.py", line 1170, in __init__
strip_binaries=self.strip, upx_binaries=self.upx,
File "\PyInstaller-2.1\PyInstaller-2.1\PyInstaller\build.py", line 1008, in __init__
self.__postinit__()
File "\PyInstaller-2.1\PyInstaller-2.1\PyInstaller\build.py", line 309, in __postinit__
self.assemble()
File "\PyInstaller-2.1\PyInstaller-2.1\PyInstaller\build.py", line 1050, in …Run Code Online (Sandbox Code Playgroud) 我目前有2个数据帧,1个用于捐赠者,1个用于筹款.理想情况下,我想要找到的是,如果有任何筹款人也捐赠,如果是的话,将一些信息复制到我的募捐人数据集(捐赠者姓名,电子邮件和他们的第一次捐赠).我的数据存在问题1)我需要通过姓名和电子邮件进行匹配,但用户可能会略有不同的名称(来自Kat和Kathy).2)捐赠者和筹款人的名称重复.2a)有了捐赠者,我可以获得独特的名字/电子邮件组合,因为我只关心第一个捐赠日期2b)虽然我需要保留两行而不会丢失数据,如日期.
我现在的示例代码:
import pandas as pd
import datetime
from fuzzywuzzy import fuzz
import difflib
donors = pd.DataFrame({"name": pd.Series(["John Doe","John Doe","Tom Smith","Jane Doe","Jane Doe","Kat test"]), "Email": pd.Series(['a@a.ca','a@a.ca','b@b.ca','c@c.ca','something@a.ca','d@d.ca']),"Date": (["27/03/2013 10:00:00 AM","1/03/2013 10:39:00 AM","2/03/2013 10:39:00 AM","3/03/2013 10:39:00 AM","4/03/2013 10:39:00 AM","27/03/2013 10:39:00 AM"])})
fundraisers = pd.DataFrame({"name": pd.Series(["John Doe","John Doe","Kathy test","Tes Ester", "Jane Doe"]),"Email": pd.Series(['a@a.ca','a@a.ca','d@d.ca','asdf@asdf.ca','something@a.ca']),"Date": pd.Series(["2/03/2013 10:39:00 AM","27/03/2013 11:39:00 AM","3/03/2013 10:39:00 AM","4/03/2013 10:40:00 AM","27/03/2013 10:39:00 AM"])})
donors["Date"] = pd.to_datetime(donors["Date"], dayfirst=True)
fundraisers["Date"] = pd.to_datetime(donors["Date"], dayfirst=True)
donors["code"] = donors.apply(lambda row: str(row['name'])+' '+str(row['Email']), axis=1)
idx = donors.groupby('code')["Date"].transform(min) == …Run Code Online (Sandbox Code Playgroud) 所以我有一些我正在尝试使用的CSV文件,但是其中一些文件具有多个具有相同名称的列.
例如,我可以有这样的csv:
ID Name a a a b b
1 test1 1 NaN NaN "a" NaN
2 test2 NaN 2 NaN "a" NaN
3 test3 2 3 NaN NaN "b"
4 test4 NaN NaN 4 NaN "b"
Run Code Online (Sandbox Code Playgroud)
加载到pandasis给我这个:
ID Name a a.1 a.2 b b.1
1 test1 1 NaN NaN "a" NaN
2 test2 NaN 2 NaN "a" NaN
3 test3 2 3 NaN NaN "b"
4 test4 NaN NaN 4 NaN "b"
Run Code Online (Sandbox Code Playgroud)
我想要做的是将这些相同的名称列合并为1列(如果有多个值保持这些值分开),我理想的输出将是这个
ID Name a b
1 test1 …Run Code Online (Sandbox Code Playgroud)