HY,
我是python中的新手,我希望使用这个简单的查询将一些数据从Oracle数据库导入python(pandas dataframe)
SELECT*
FROM TRANSACTION
WHERE DIA_DAT >=to_date('15.02.28 00:00:00', 'YY.MM.DD HH24:MI:SS')
AND (locations <> 'PUERTO RICO'
OR locations <> 'JAPAN')
AND CITY='LONDON'
Run Code Online (Sandbox Code Playgroud)
我做了什么
import cx_Oracle
ip = 'XX.XX.X.XXX'
port = YYYY
SID = 'DW'
dsn_tns = cx_Oracle.makedsn(ip, port, SID)
connection = cx_Oracle.connect('BA', 'PASSWORD', dsn_tns)
df_ora = pd.read_sql('SELECT* FROM TRANSACTION WHERE DIA_DAT>=to_date('15.02.28 00:00:00', 'YY.MM.DD HH24:MI:SS') AND (locations <> 'PUERTO RICO' OR locations <> 'JAPAN') AND CITY='LONDON'', con=connection)
Run Code Online (Sandbox Code Playgroud)
但我有这个错误
SyntaxError: invalid syntax
Run Code Online (Sandbox Code Playgroud)
我做错了什么?
谢谢
HY,
我在一个拥有400k行和3列的sparkcontext中有一个数据帧.驱动程序有143.5的存储内存
16/03/21 19:52:35 INFO BlockManagerMasterEndpoint: Registering block manager localhost:55613 with 143.5 GB RAM, BlockManagerId(driver, localhost, 55613)
16/03/21 19:52:35 INFO BlockManagerMaster: Registered BlockManager
Run Code Online (Sandbox Code Playgroud)
我想要将此DataFrame的内容作为Pandas返回
我做到了
df_users = UserDistinct.toPandas()
Run Code Online (Sandbox Code Playgroud)
但我有这个错误
16/03/21 20:01:08 ERROR Executor: Exception in task 7.0 in stage 6.0 (TID 439)
java.lang.OutOfMemoryError
at java.io.ByteArrayOutputStream.hugeCapacity(Unknown Source)
at java.io.ByteArrayOutputStream.grow(Unknown Source)
at java.io.ByteArrayOutputStream.ensureCapacity(Unknown Source)
at java.io.ByteArrayOutputStream.write(Unknown Source)
at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Unknown Source)
at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(Unknown Source)
at java.io.ObjectOutputStream.writeObject0(Unknown Source)
at java.io.ObjectOutputStream.writeObject(Unknown Source)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:239)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at …Run Code Online (Sandbox Code Playgroud) 我有这个PySpark DataFrame
df = pd.DataFrame(np.array([
["aa@gmail.com",2,3], ["aa@gmail.com",5,5],
["bb@gmail.com",8,2], ["cc@gmail.com",9,3]
]), columns=['user','movie','rating'])
sparkdf = sqlContext.createDataFrame(df, samplingRatio=0.1)
Run Code Online (Sandbox Code Playgroud)
user movie rating
aa@gmail.com 2 3
aa@gmail.com 5 5
bb@gmail.com 8 2
cc@gmail.com 9 3
Run Code Online (Sandbox Code Playgroud)
我需要添加一个按用户排名的新列
我想要这个输出
user movie rating Rank
aa@gmail.com 2 3 1
aa@gmail.com 5 5 1
bb@gmail.com 8 2 2
cc@gmail.com 9 3 3
Run Code Online (Sandbox Code Playgroud)
我怎样才能做到这一点?
HY,
我是Spark的新手,我正在尝试使用ML推荐.
我的守则
df = sqlContext.createDataFrame(
[(0, 0, 4.0), (0, 1, 2.0), (1, 1, 3.0), (1, 2, 4.0), (2, 1, 1.0), (2, 2, 5.0)],
["user", "item", "rating"])
als = ALS(rank=10, maxIter=5)
model = als.fit(df)
model.userFactors.orderBy("id").collect()
Run Code Online (Sandbox Code Playgroud)
如何为所有电影的所有用户获得2推荐?
谢谢你的时间.
我尝试安装rpy2但我在下面有这个错误.
我在网上看到问题是环境变量,但我在系统变量路径中有 C:\ Rtools\bin和C:\ Program Files\R\R-3.2.2\bin.
我究竟做错了什么?
错误:
C:\Users\rmalveslocal>pip install rpy2
Collecting rpy2
Downloading rpy2-2.7.6.tar.gz (177kB)
100% |################################| 180kB 1.3MB/s
Complete output from command python setup.py egg_info:
R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3. …Run Code Online (Sandbox Code Playgroud) HY,
我已经多次运行Spark(Spyder IDE).今天我收到了这个错误(代码是一样的)
from py4j.java_gateway import JavaGateway
gateway = JavaGateway()
os.environ['SPARK_HOME']="C:/Apache/spark-1.6.0"
os.environ['JAVA_HOME']="C:/Program Files/Java/jre1.8.0_71"
sys.path.append("C:/Apache/spark-1.6.0/python/")
os.environ['HADOOP_HOME']="C:/Apache/spark-1.6.0/winutils/"
from pyspark import SparkContext
from pyspark import SparkConf
conf = SparkConf()
The system cannot find the path specified.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Apache\spark-1.6.0\python\pyspark\conf.py", line 104, in __init__
SparkContext._ensure_initialized()
File "C:\Apache\spark-1.6.0\python\pyspark\context.py", line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "C:\Apache\spark-1.6.0\python\pyspark\java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number") …Run Code Online (Sandbox Code Playgroud) 这可能是一个非常简单的问题,但直到现在我还没有找到任何解决方案,而且我已经花了几个小时了
我有一个 HTML 文件“teste.html” 我使用 jinja 2 来更改我的 HTML
env = Environment(loader=FileSystemLoader('.'))
template = env.get_template("teste.html")
template_vars = {"Client" : Name}
html_out = template.render(template_vars)
type(html_out)
Run Code Online (Sandbox Code Playgroud)
以 s 字符串的 html_out 结束
type(html_out)
Out[60]: str
Run Code Online (Sandbox Code Playgroud)
现在我想将这个 html_out 保存在 HTML 文档中
我怎样才能做到这一点?
apache-spark ×4
pyspark ×4
python ×4
cx-oracle ×1
html ×1
jinja2 ×1
oracle ×1
pandas ×1
pyspark-sql ×1
python-3.x ×1
r ×1
rpy2 ×1
system ×1