Sar*_*ang 6 python dataframe pandas
我有一个熊猫数据框如下。
activity User_Id \
0 VIEWED MOVIE 158d292ec18a49
1 VIEWED MOVIE 158d292ec18a49
2 VIEWED MOVIE 158d292ec18a49
3 VIEWED MOVIE 158d292ec18a49
4 VIEWED MOVIE 158e00978d7a6c
Media_Title Media_Type User_Rating
0 20th Asian Athletics Championship-2013 Held At... NA
1 Tu Majha Saangaati NA
2 Home Cooking NA
3 Mix Dil Se NA
4 Value, Virtues, Ethics & Morality NA
Run Code Online (Sandbox Code Playgroud)
我正在尝试使用pandasql的sqldf软件包编写一个SQL查询,如下所示。
distinct_activity_user = pandasql.sqldf(" select User_Id from pmm_activity", locals())
Run Code Online (Sandbox Code Playgroud)
我得到的错误是:
OperationalError: (sqlite3.OperationalError) too many SQL variables [SQL: 'INSERT INTO pmm_activity (activity, "User_Id", "Media_Title", "Media_Type", "User_Rating") VALUES
Run Code Online (Sandbox Code Playgroud)
这可能是与列名称中的空格相关的问题。当我尝试使用您提供的数据时,我经历了这一点。我有一个使用的例子sqlite3。这是一个可能可以解决您的问题的示例:
import sqlite3 as sql
import pandas as pd
file = "..../movie.csv"
df = pd.read_csv(file, sep=";", dtype='unicode' )
Run Code Online (Sandbox Code Playgroud)
这是数据帧的样子
conn = sql.connect('movie2.db')
df.to_sql('movie', conn)
conn = sql.connect('movie2.db')
Movie = pd.read_sql('SELECT distinct "User_Id " FROM movie', conn)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
521 次 |
| 最近记录: |