我sudo pip install -U nltk按照nltk文档的建议做了.但是,我得到以下输出:
Collecting nltk
Downloading nltk-3.0.5.tar.gz (1.0MB)
100% |????????????????????????????????| 1.0MB 516kB/s
Collecting six>=1.9.0 (from nltk)
Downloading six-1.9.0-py2.py3-none-any.whl
Installing collected packages: six, nltk
Found existing installation: six 1.4.1
DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
Uninstalling six-1.4.1:
Exception:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/pip/basecommand.py", line 211, in …Run Code Online (Sandbox Code Playgroud) 我创建了一个表
create 'tablename', 'columnfamily1'
Run Code Online (Sandbox Code Playgroud)
现在可以添加另一个列系列'columnfamily2'吗?方法是什么?
我理解Group没有使用多个元组,因此我们在PIG中使用了COGROUP.但是,今天检查GROUP命令对我有用.我使用的是PIG-0.12.0.我的命令和输出如下.
grunt> grpvar = GROUP C by $2, B by $2;
grunt> cogrpvar = COGROUP C by $2, B by $2;
grunt> describe grpvar;
grpvar: {group: chararray,C: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)},B: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)}}
grunt> describe cogrpvar;
cogrpvar: {group: chararray,C: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)},B: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)}}
Run Code Online (Sandbox Code Playgroud)
GROUP预计会像这样工作吗?GROUP和COGROUP有什么区别?
src = Folder1/Folder2/file1
(编辑:Folder1 也有其他文件和文件夹)
dst = Folder3
复制文件后,我想要
Folder3/Folder1/Folder2/file1
我认为shutil.copy 不会重新创建文件夹并且shutil.copytree仅用于文件夹(编辑:如果没有其他文件,我可以直接复制文件夹)。
该列在一行中多次使用分隔符,因此split并不那么简单。
拆分时,在这种情况下只需考虑第一个分隔符出现。
截至目前,我正在这样做。
不过我觉得还有更好的解决办法吗?
testdf= spark.createDataFrame([("Dog", "meat,bread,milk"), ("Cat", "mouse,fish")],["Animal", "Food"])
testdf.show()
+------+---------------+
|Animal| Food|
+------+---------------+
| Dog|meat,bread,milk|
| Cat| mouse,fish|
+------+---------------+
testdf.withColumn("Food1", split(col("Food"), ",").getItem(0))\
.withColumn("Food2",expr("regexp_replace(Food, Food1, '')"))\
.withColumn("Food2",expr("substring(Food2, 2)")).show()
+------+---------------+-----+----------+
|Animal| Food|Food1| Food2|
+------+---------------+-----+----------+
| Dog|meat,bread,milk| meat|bread,milk|
| Cat| mouse,fish|mouse| fish|
+------+---------------+-----+----------+
Run Code Online (Sandbox Code Playgroud) python ×2
apache-pig ×1
apache-spark ×1
copy ×1
hadoop ×1
hbase ×1
nltk ×1
nosql ×1
pyspark ×1
python-2.7 ×1
split ×1