原始数据集是:
# (numbersofrating,title,avg_rating)
newRDD =[(3,'monster',4),(4,'minions 3D',5),....]
Run Code Online (Sandbox Code Playgroud)
我想在newRDD中选择前N个avg_ratings.我使用下面的代码,它有一个错误.
selectnewRDD = (newRDD.map(x, key =lambda x: x[2]).sortBy(......))
TypeError: map() takes no keyword arguments
Run Code Online (Sandbox Code Playgroud)
预期数据应为:
# (numbersofrating,title,avg_rating)
selectnewRDD =[(4,'minions 3D',5),(3,'monster',4)....]
Run Code Online (Sandbox Code Playgroud) 出现错误,当我尝试拆分时
l =[u'this is friday', u'holiday begin']
split_l =l.split()
print(split_l)
Run Code Online (Sandbox Code Playgroud)
错误是:
Traceback (most recent call last):
File "C:\Users\spotify_track2.py", line 19, in <module>
split_l =l.split()
AttributeError: 'list' object has no attribute 'split'
Run Code Online (Sandbox Code Playgroud)
所以我不知道处理这种错误。
我有一个问题要创建一个新变量.我有几个变量名为A,B,C,D,E,F,G.所有变量都是0/1二进制变量.所以我想创建一个新变量,显示任何3或更多等于1的变量.
例如,
new_variable =0;
if ANY 3 or more variables(A,B,C,D,E,F,G) =1 then new_variable =1;
Run Code Online (Sandbox Code Playgroud)