假设我有两个看起来像这样的列表:
L1=['Smith, John, 2008, 12, 10, Male', 'Bates, John, 2006, 1, Male', 'Johnson, John, 2009, 1, 28, Male', 'James, John, 2008, 3, Male']
L2=['Smith, Joy, 2008, 12, 10, Female', 'Smith, Kevin, 2008, 12, 10, Male', 'Smith, Matt, 2008, 12, 10, Male', 'Smith, Carol, 2000, 12, 11, Female', 'Smith, Sue, 2000, 12, 11, Female', 'Johnson, Alex, 2008, 3, Male', 'Johnson, Emma, 2008, 3, Female', 'James, Peter, 2008, 3, Male', 'James, Chelsea, 2008, 3, Female']
Run Code Online (Sandbox Code Playgroud)
我想用它做的是比较一个家庭中每个人(相同的姓氏)与他们每个家庭中的'John'的日期.日期从包括年,月和日,到年和月,再到年.我想找到约翰的约会和他的每个家庭成员之间的差异到我能得到的最具体的一点(如果一个约会全部有3个部分而另一个只有月份和年份,那么只能找到几个月和几年的时差).这是我到目前为止所尝试的,它没有用,因为它没有使用正确的名称和日期(它只给了每个约翰一个兄弟姐妹),它计算日期之间的时间方式令人困惑和错误:
for line in L1:
type=line.split(',')
if len(type)>=1:
family=type[0]
if len(type)==6:
yearA=type[2]
monthA=type[3]
dayA=type[4]
sex=type[5]
print '%s, John Published in %s, %s, %s, %s' %(family, yearA, monthA, dayA, sex)
elif len(type)==5:
yearA=type[2]
monthA=type[3]
sex=type[4]
print '%s, John Published in %s, %s, %s' %(family, yearA, monthA, sex)
elif len(type)==4:
yearA=type[2]
sex=type[3]
print '%s, John Published in %s, %s' %(family, yearA, sex)
for line in L2:
if re.search(family, line):
word=line.split(',')
name=word[1]
if len(word)==6:
yearB=word[2]
monthB=word[3]
dayB=word[4]
sex=word[5]
elif len(word)==5:
yearB=word[2]
monthB=word[3]
sex=word[4]
elif len(word)==4:
yearB=word[2]
sex=word[3]
if dayA and dayB:
yeardiff= int(yearA)-int(yearB)
monthdiff=int(monthA)-int(monthB)
daydiff=int(dayA)-int(dayB)
print'%s, %s Published %s year(s), %s month(s), %s day(s) before/after John, %s' %(family, name, yeardiff, monthdiff, daydiff, sex)
elif not dayA and not dayB and monthA and monthB:
yeardiff= int(yearA)-int(yearB)
monthdiff=int(monthA)-int(monthB)
print'%s, %s Published %s year(s), %s month(s), before/after John, %s' %(family, name, yeardiff, monthdiff, sex)
elif not monthA and not monthB and yearA and yearB:
yeardiff= int(yearA)-int(yearB)
print'%s, %s Published %s year(s), before/after John, %s' %(family, name, yeardiff, sex)
Run Code Online (Sandbox Code Playgroud)
我想最终看到这样的东西,并且如果可能的话,允许程序区分兄弟姐妹是在之前还是之后出现,并且只打印在比较的两个日期中存在的月份和日期:
Smith, John Published in 2008, 12, 10, Male
Smith, Joy Published _ year(s) _month(s) _day(s) before/after John, Female
Smith, Kevin Published _ year(s) _month(s) _day(s) before/after John, Male
Smith, Matt Published _ year(s) _month(s) _day(s) before/after John, Male
Smith, Carol Published _ year(s) _month(s) _day(s) before/after John, Female
Smith, Sue Published _ year(s) _month(s) _day(s) before/after John, Female
Bates, John Published in 2006, 1, Male
Johnson, John Published in 2009, 1, 28, Male
Johnson, Alex Published _ year(s) _month(s) _day(s) before/after John, Male
Johnson, Emma Published _ year(s) _month(s) _day(s) before/after John, Female
James, John Published in 2008, 3, Male
James, Peter Published _ year(s) _month(s) _day(s) before/after John, Male
James, Chelsea Published _ year(s) _month(s) _day(s) before/after John, Female
Run Code Online (Sandbox Code Playgroud)
正如Joe Kington所建议的那样,dateutil模块对此非常有用.特别是,它可以告诉您两个日期之间的差异,包括年,月和日.(自己进行计算将涉及考虑闰年等.使用经过良好测试的模块比重新发明这个轮子更好.)
这个问题适合课程.
让我们创建一个Person类来跟踪一个人的姓名,性别和出版日期:
class Person(object):
def __init__(self,lastname,firstname,gender=None,year=None,month=None,day=None):
self.lastname=lastname
self.firstname=firstname
self.ymd=VagueDate(year,month,day)
self.gender=gender
Run Code Online (Sandbox Code Playgroud)
发布日期可能缺少数据,所以让我们创建一个特殊的类来处理缺少的日期数据:
class VagueDate(object):
def __init__(self,year=None,month=None,day=None):
self.year=year
self.month=month
self.day=day
def __sub__(self,other):
d1=self.asdate()
d2=other.asdate()
rd=relativedelta.relativedelta(d1,d2)
years=rd.years
months=rd.months if self.month and other.month else None
days=rd.days if self.day and other.day else None
return VagueDateDelta(years,months,days)
Run Code Online (Sandbox Code Playgroud)
该datetime模块定义datetime.datetime对象,并使用datetime.timedelta对象来表示两个datetime.datetime对象之间的差异.类似地,让我们定义一个VagueDateDelta来表示两个VagueDates 之间的差异:
class VagueDateDelta(object):
def __init__(self,years=None,months=None,days=None):
self.years=years
self.months=months
self.days=days
def __str__(self):
if self.days is not None and self.months is not None:
return '{s.years} years, {s.months} months, {s.days} days'.format(s=self)
elif self.months is not None:
return '{s.years} years, {s.months} months'.format(s=self)
else:
return '{s.years} years'.format(s=self)
Run Code Online (Sandbox Code Playgroud)
既然我们已经为自己建立了一些方便的工具,那么解决问题就不难了.
第一步是解析字符串列表并将它们转换为Person对象:
def parse_person(text):
data=map(str.strip,text.split(','))
lastname=data[0]
firstname=data[1]
gender=data[-1]
ymd=map(int,data[2:-1])
return Person(lastname,firstname,gender,*ymd)
johns=map(parse_person,L1)
peeps=map(parse_person,L2)
Run Code Online (Sandbox Code Playgroud)
接下来我们重组peeps为家庭成员的词典:
family=collections.defaultdict(list)
for person in peeps:
family[person.lastname].append(person)
Run Code Online (Sandbox Code Playgroud)
最后,您只需循环遍历johns每个人的家庭成员john,比较发布日期,并报告结果.
完整脚本可能如下所示:
import datetime as dt
import dateutil.relativedelta as relativedelta
import pprint
import collections
class VagueDateDelta(object):
def __init__(self,years=None,months=None,days=None):
self.years=years
self.months=months
self.days=days
def __str__(self):
if self.days is not None and self.months is not None:
return '{s.years} years, {s.months} months, {s.days} days'.format(s=self)
elif self.months is not None:
return '{s.years} years, {s.months} months'.format(s=self)
else:
return '{s.years} years'.format(s=self)
class VagueDate(object):
def __init__(self,year=None,month=None,day=None):
self.year=year
self.month=month
self.day=day
def __sub__(self,other):
d1=self.asdate()
d2=other.asdate()
rd=relativedelta.relativedelta(d1,d2)
years=rd.years
months=rd.months if self.month and other.month else None
days=rd.days if self.day and other.day else None
return VagueDateDelta(years,months,days)
def asdate(self):
# You've got to make some kind of arbitrary decision when comparing
# vague dates. Here I make the arbitrary decision that missing info
# will be treated like 1s for the purpose of calculating differences.
return dt.date(self.year,self.month or 1,self.day or 1)
def __str__(self):
if self.day is not None and self.month is not None:
return '{s.year}, {s.month}, {s.day}'.format(s=self)
elif self.month is not None:
return '{s.year}, {s.month}'.format(s=self)
else:
return '{s.year}'.format(s=self)
class Person(object):
def __init__(self,lastname,firstname,gender=None,year=None,month=None,day=None):
self.lastname=lastname
self.firstname=firstname
self.ymd=VagueDate(year,month,day)
self.gender=gender
def age_diff(self,other):
return self.ymd-other.ymd
def __str__(self):
fmt='{s.lastname}, {s.firstname} ({s.gender}) ({d.year},{d.month},{d.day})'
return fmt.format(s=self,d=self.ymd)
__repr__=__str__
def __lt__(self,other):
d1=self.ymd.asdate()
d2=other.ymd.asdate()
return d1<d2
def parse_person(text):
data=map(str.strip,text.split(','))
lastname=data[0]
firstname=data[1]
gender=data[-1]
ymd=map(int,data[2:-1])
return Person(lastname,firstname,gender,*ymd)
def main():
L1=['Smith, John, 2008, 12, 10, Male', 'Bates, John, 2006, 1, Male',
'Johnson, John, 2009, 1, 28, Male', 'James, John, 2008, 3, Male']
L2=['Smith, Joy, 2008, 12, 10, Female', 'Smith, Kevin, 2008, 12, 10, Male',
'Smith, Matt, 2008, 12, 10, Male', 'Smith, Carol, 2000, 12, 11, Female',
'Smith, Sue, 2000, 12, 11, Female', 'Johnson, Alex, 2008, 3, Male',
'Johnson, Emma, 2008, 3, Female', 'James, Peter, 2008, 3, Male',
'James, Chelsea, 2008, 3, Female']
johns=map(parse_person,L1)
peeps=map(parse_person,L2)
print(pprint.pformat(johns))
print
print(pprint.pformat(peeps))
print
family=collections.defaultdict(list)
for person in peeps:
family[person.lastname].append(person)
# print(family)
pub_fmt='{j.lastname}, {j.firstname} Published in {j.ymd}, {j.gender}'
rel_fmt=' {r.lastname}, {r.firstname} Published {d} {ba} John, {r.gender}'
for john in johns:
print(pub_fmt.format(j=john))
for relative in family[john.lastname]:
diff=john.ymd-relative.ymd
ba='before' if relative<john else 'after'
print(rel_fmt.format(
r=relative,
d=diff,
ba=ba,
))
if __name__=='__main__':
main()
Run Code Online (Sandbox Code Playgroud)
产量
[Smith, John (Male) (2008,12,10),
Bates, John (Male) (2006,1,None),
Johnson, John (Male) (2009,1,28),
James, John (Male) (2008,3,None)]
[Smith, Joy (Female) (2008,12,10),
Smith, Kevin (Male) (2008,12,10),
Smith, Matt (Male) (2008,12,10),
Smith, Carol (Female) (2000,12,11),
Smith, Sue (Female) (2000,12,11),
Johnson, Alex (Male) (2008,3,None),
Johnson, Emma (Female) (2008,3,None),
James, Peter (Male) (2008,3,None),
James, Chelsea (Female) (2008,3,None)]
Smith, John Published in 2008, 12, 10, Male
Smith, Joy Published 0 years, 0 months, 0 days after John, Female
Smith, Kevin Published 0 years, 0 months, 0 days after John, Male
Smith, Matt Published 0 years, 0 months, 0 days after John, Male
Smith, Carol Published 7 years, 11 months, 29 days before John, Female
Smith, Sue Published 7 years, 11 months, 29 days before John, Female
Bates, John Published in 2006, 1, Male
Johnson, John Published in 2009, 1, 28, Male
Johnson, Alex Published 0 years, 10 months before John, Male
Johnson, Emma Published 0 years, 10 months before John, Female
James, John Published in 2008, 3, Male
James, Peter Published 0 years, 0 months after John, Male
James, Chelsea Published 0 years, 0 months after John, Female
Run Code Online (Sandbox Code Playgroud)