我有一个数据框“ df1”:
adj response
beautiful ["She's a beautiful girl/woman, and also a good teacher."]
good ["She's a beautiful girl/woman, and also a good teacher."]
hideous ["This city is hideous, let's move to the countryside."]
Run Code Online (Sandbox Code Playgroud)
这是对象列表:
object=["girl","teacher","city","countryside","woman"]
Run Code Online (Sandbox Code Playgroud)
码:
df1['response_split']=df1['response'].str.split(",")
Run Code Online (Sandbox Code Playgroud)
拆分后,数据框将如下所示:
adj response_split
beautiful ["She's a beautiful girl/woman", " and also a good teacher."]
good ["She's a beautiful girl/woman", " and also a good teacher."]
hideous ["This city is hideous", " let's move to the countryside."]
Run Code Online (Sandbox Code Playgroud)
我想添加另一列“ response_object”,如果他们找到响应的adj,则从列表对象中找到其对象:预期结果
adj response_split response_object …Run Code Online (Sandbox Code Playgroud) 因此,目前我正在尝试弄清楚如何从 MovieLense 构建电影推荐系统(https://grouplens.org/datasets/movielens/100k/)。我阅读了教程中的一些说明。
library(dplyr)
library(recommenderlab)
library(magrittr)
data <- read.table("u.data", header = F, stringsAsFactors = T)
head(data)
V1 V2 V3 V4
1 196 242 3 881250949
2 186 302 3 891717742
3 22 377 1 878887116
4 244 51 2 880606923
5 166 346 1 886397596
6 298 474 4 884182806
Run Code Online (Sandbox Code Playgroud)
说明:V1是 userid,V2是 itemid,V3是 rating
现在我需要将格式记录到 ratingMatrix,结果将是这样的:
1 2 3 4 5 6 7 8 9 10
1 5 3 4 3 3 …Run Code Online (Sandbox Code Playgroud)