我想替换卷曲引号:
str = '“I don’t know what you mean by ‘glory,’ ” Alice said.';
Run Code Online (Sandbox Code Playgroud)
使用:
str.replace(/['"]/g,'');
Run Code Online (Sandbox Code Playgroud)
为什么它不起作用?我怎样才能做到这一点?
我正在尝试为HiveServer2安装Python客户端驱动程序:https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PythonClientDriver
安装说:"HiveServer2的Python客户端驱动程序可以在https://github.com/BradRuderman/pyhs2 上获得.它包括所有必需的软件包,如SASL和Thrift包装器 ".
然而,pip install pyhs2在SASL编译上运行失败(见下文).我安装了Hadoop 2.2.0并在localhost上工作.请帮忙,安装Python客户端.
[root@localhost /]# pip install pyhs2
Requirement already satisfied (use --upgrade to upgrade): pyhs2 in /usr/lib/python2.6/site-packages
Downloading/unpacking sasl (from pyhs2)
Downloading sasl-0.1.3.tar.gz
Running setup.py (path:/tmp/pip_build_root/sasl/setup.py) egg_info for package sasl
Downloading/unpacking thrift (from pyhs2)
Downloading thrift-0.9.1.tar.gz
Running setup.py (path:/tmp/pip_build_root/thrift/setup.py) egg_info for package thrift
Installing collected packages: sasl, thrift
Running setup.py install for sasl
building '_saslwrapper' extension
gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 …Run Code Online (Sandbox Code Playgroud) 在构建Sqoop2时:
mvn包-Pbinary
我收到一个错误:
":行家现场-插件:3.0-β-3:现场:执行org.apache.maven.plugins甲所需的类缺少组织/ Sonatype的/乙醚/图形/ DependencyFilter"
如何构建Sqoop2?
我在跑步:
Apache Maven 3.2.1
Java版本:1.7.0_51
CentOS 6.5,内核2.6.32-431.5.1.el6.x86_64
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.0-beta-3:site (packaging-documentation) on project sqoop-docs: Execution packaging-documentation of goal org.apache.maven.plugins:maven-site-plugin:3.0-beta-3:site failed: A required class was missing while executing org.apache.maven.plugins:maven-site-plugin:3.0-beta-3:site: org/sonatype/aether/graph/DependencyFilter
[ERROR] -----------------------------------------------------
[ERROR] realm = plugin>org.apache.maven.plugins:maven-site-plugin:3.0-beta-3
[ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
[ERROR] urls[0] = file:/home/dk/.m2/repository/org/apache/maven/plugins/maven-site-plugin/3.0-beta-3/maven-site-plugin-3.0-beta-3.jar
[ERROR] urls[1] = file:/home/dk/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.0/maven-reporting-api-3.0.jar
[ERROR] urls[2] = file:/home/dk/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.14/plexus-interpolation-1.14.jar
[ERROR] urls[3] = file:/home/dk/.m2/repository/org/sonatype/sisu/sisu-inject-bean/1.4.2/sisu-inject-bean-1.4.2.jar
[ERROR] urls[4] = file:/home/dk/.m2/repository/org/sonatype/sisu/sisu-guice/2.1.7/sisu-guice-2.1.7-noaop.jar
[ERROR] urls[5] = file:/home/dk/.m2/repository/org/codehaus/plexus/plexus-component-annotations/1.5.5/plexus-component-annotations-1.5.5.jar
[ERROR] urls[6] = file:/home/dk/.m2/repository/org/sonatype/plexus/plexus-sec-dispatcher/1.3/plexus-sec-dispatcher-1.3.jar
[ERROR] urls[7] = …Run Code Online (Sandbox Code Playgroud) 我正在运行CentOS 6.5,kernel2.6.32-431.5.1.el6.x86_64#1 SMP.我正在尝试安装Rattle- 用于R编程语言的数据挖掘工具.拨浪鼓是从R shell安装的.尽管我安装了最新的GTK,但在尝试安装Rattle时我得到了configure: error: GTK version 2.8.0 required(见下文).怎么解决这个?
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
...
> library(rattle)
Rattle: A free graphical interface for data mining with R.
Version 3.0.4 r177 Copyright (c) 2006-2014 Togaware Pty Ltd.
Type 'rattle()' to shake, rattle, and roll your data.
> rattle()
The package 'RGtk2' is required to display the Rattle GUI. It does not
appear …Run Code Online (Sandbox Code Playgroud) 注意:这个问题与答案不一样:"Pandas:在groupby之后对每个组进行抽样"
试图弄清楚如何使用pandas.DataFrame.sample或任何其他功能来平衡这些数据:
df[class].value_counts()
c1 9170
c2 5266
c3 4523
c4 2193
c5 1956
c6 1896
c7 1580
c8 1407
c9 1324
Run Code Online (Sandbox Code Playgroud)
我需要得到每个类(c1,c2,.. c9)的随机样本,其中样本大小等于具有最小实例数的类的大小.在此示例中,样本大小应为类c9 = 1324的大小.
用熊猫做任何简单的方法吗?
更新
为澄清我的问题,请在上表中:
c1 9170
c2 5266
c3 4523
...
Run Code Online (Sandbox Code Playgroud)
数字是c1,c2,c3,...类的实例计数,因此实际数据如下所示:
c1 'foo'
c2 'bar'
c1 'foo-2'
c1 'foo-145'
c1 'xxx-07'
c2 'zzz'
...
Run Code Online (Sandbox Code Playgroud)
等等
更新2
澄清更多:
d = {'class':['c1','c2','c1','c1','c2','c1','c1','c2','c3','c3'],
'val': [1,2,1,1,2,1,1,2,3,3]
}
df = pd.DataFrame(d)
class val
0 c1 1
1 c2 2
2 c1 1
3 c1 1
4 c2 2 …Run Code Online (Sandbox Code Playgroud) 在Eclipse中组装时,我的MapReduce作业运行正常,Eclipse项目中包含的所有可能的Hadoop和Hive jar都作为依赖项.(这些是单节点,本地Hadoop安装附带的jar).
然而,当尝试运行使用Maven项目组装的相同程序时(见下文),我得到:
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
Run Code Online (Sandbox Code Playgroud)
使用以下Maven项目汇编程序时会发生此异常:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.bigdata.hadoop</groupId>
<artifactId>FieldCounts</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>FieldCounts</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive.hcatalog</groupId>
<artifactId>hcatalog-core</artifactId>
<version>0.12.0</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>16.0.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>${jdk.version}</source>
<target>${jdk.version}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution> …Run Code Online (Sandbox Code Playgroud) 出于某种原因,我在尝试使用Keras模型指定f1分数时收到错误消息:
model.compile(optimizer='adam', loss='mse', metrics=['accuracy', 'f1_score'])
Run Code Online (Sandbox Code Playgroud)
我收到此错误:
ValueError: Unknown metric function:f1_score
Run Code Online (Sandbox Code Playgroud)
在我使用'model.compile'的同一文件中提供'f1_score'函数之后:
def f1_score(y_true, y_pred):
# Count positive samples.
c1 = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
c2 = K.sum(K.round(K.clip(y_pred, 0, 1)))
c3 = K.sum(K.round(K.clip(y_true, 0, 1)))
# If there are no true samples, fix the F1 score at 0.
if c3 == 0:
return 0
# How many selected items are relevant?
precision = c1 / c2
# How many relevant items are selected?
recall = c1 / c3
# Calculate …Run Code Online (Sandbox Code Playgroud) 我需要一个tokenizer,它给出一个字符串,其中包含单词之间的任意空格,将创建一个没有空子字符串的单词数组.
例如,给定一个字符串:
" I dont know what you mean by glory Alice said."
Run Code Online (Sandbox Code Playgroud)
我用:
str2.split(" ")
Run Code Online (Sandbox Code Playgroud)
这也返回空子字符串:
["", "I", "dont", "know", "what", "you", "mean", "by", "glory", "", "Alice", "said."]
Run Code Online (Sandbox Code Playgroud)
如何从数组中过滤掉空字符串?
以下是评级为 1,2 或 3 星的项目的示例。\n我正在尝试计算每月项目评级(星级)的所有组合。
\n在以下示例中,项目 10 在第 1 个月进行了评级,并且有两个评级等于 1、一个评级等于 2、一个评级等于 3。
\ninp = pd.DataFrame({'month':[1,1,1,1,1,2,2,2], \n 'item':[10,10,10,10,20,20,20,20], \n 'star':[1,2,1,3,3,2,2,3]}\n )\n\n month item star\n0 1 10 1\n1 1 10 2\n2 1 10 1\n3 1 10 3\n4 1 20 3\n5 2 20 2\n6 2 20 2\n7 2 20 3\nRun Code Online (Sandbox Code Playgroud)\n对于上面给定的输入帧输出应该是:
\n month item star_1_cnt star_2_cnt star_3_cnt\n0 1 10 2 1 1\n1 1 20 0 0 1\n2 2 20 0 2 1\nRun Code Online (Sandbox Code Playgroud)\n我试图从以下代码开始解决问题,\n该结果仍然需要转换为输出帧的所需格式,并且给出了错误的答案:
\n1 20 3 (1, …Run Code Online (Sandbox Code Playgroud) 尝试计算数据框中具有类似"种类"的行时:
import pandas as pd
items = [('aaa','aaa text 1'), ('aaa','aaa text 2'), ('aaa','aaa text 3'),
('bb', 'bb text 1'), ('bb', 'bb text 2'), ('bb', 'bb text 3'),
('bb', 'bb text 4'),
('cccc','cccc text 1'), ('cccc','cccc text 2'),
('dd', 'dd text 1'),
('e', 'e text 1'),
('fff', 'fff text 1'),
]
df = pd.DataFrame(items, columns=['kind', 'msg'])
df
kind msg
0 aaa aaa text 1
1 aaa aaa text 2
2 aaa aaa text 3
3 bb bb text 1
4 …Run Code Online (Sandbox Code Playgroud)