所以假设我有一个包含以下内容的文本文件:
Hello what is up. ^M
^M
What are you doing?
Run Code Online (Sandbox Code Playgroud)
我想删除^M并将其替换为后面的行.所以我的输出看起来像:
Hello what is up. What are you doing?
Run Code Online (Sandbox Code Playgroud)
我如何在Python中执行上述操作?或者,如果有任何方法使用unix命令,请告诉我.
我的文字如图所示:
list1 = ["My name is xyz", "My name is pqr", "I work in abc"]
Run Code Online (Sandbox Code Playgroud)
以上将是使用 kmeans 聚类文本的训练集。
list2 = ["My name is xyz", "I work in abc"]
Run Code Online (Sandbox Code Playgroud)
以上是我的测试集。
我构建了一个矢量化器和模型,如下所示:
vectorizer = TfidfVectorizer(min_df = 0, max_df=0.5, stop_words = "english", charset_error = "ignore", ngram_range = (1,3))
vectorized = vectorizer.fit_transform(list1)
km=KMeans(n_clusters=2, init='k-means++', n_init=10, max_iter=1000, tol=0.0001, precompute_distances=True, verbose=0, random_state=None, copy_x=True, n_jobs=1)
km.fit(vectorized)
Run Code Online (Sandbox Code Playgroud)
如果我尝试预测“list2”测试集的集群:
km.predict(list2)
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
ValueError: Incorrect number of features. Got 2 features, expected 5
Run Code Online (Sandbox Code Playgroud)
有人告诉我用它Pipeline来解决这个问题。所以我写了下面的代码:
pipe = Pipeline([('vect', vectorizer), ('vectorized', …Run Code Online (Sandbox Code Playgroud) python machine-learning k-means scikit-learn scikit-learn-pipeline
假设我有一个如下所示的数据帧:
in:
mydata = [{'subid' : 'B14-111', 'age': 75, 'fdg':1.78},
{'subid' : 'B14-112', 'age': 22, 'fdg':1.56},]
df = pd.DataFrame(mydata)
out:
age fdg subid
0 75 1.78 B14-111
1 22 1.56 B14-112
Run Code Online (Sandbox Code Playgroud)
我想根据"age"列将数据帧分成两个不同的数据帧,如下所示:
out:
df1:
age fdg subid
0 75 1.78 B14-111
df2:
age fdg subid
1 22 1.56 B14-112
Run Code Online (Sandbox Code Playgroud)
我怎样才能做到这一点?
所以我有一个队列:
q = Queue.Queue()
Run Code Online (Sandbox Code Playgroud)
我在里面放一些东西。
items = ["First", "Second"]
for val in items:
q.put(val)
Run Code Online (Sandbox Code Playgroud)
我正在生成15个线程。
for i in range(15):
tname = 't-%s' % i
t = my_thread(some_func, q, tname)
t.start()
q.join()
Run Code Online (Sandbox Code Playgroud)
my_thread类的外观如下:
class my_thread(threading.Thread):
def __init__(self, some_func, q_, name=''):
threading.Thread.__init__(self)
self.func = some_func
self.process_q = q_
self.name = name
self.prefix = name
def run(self):
stime = time.time()
logging.info('%s thread staring at : %s' % (threading.currentThread().getname(), time.ctime(stime)))
while True:
if self.process_q.empty():
break
queue_item = self.process_q.get()
self.name = self.prefix + '-' + …Run Code Online (Sandbox Code Playgroud) 我是R的新手,我有一个图形对象,它是从数据框对象“ allTog”创建的,如下所示:
library(igraph)
df.g <- graph.data.frame(d = allTog, directed = TRUE)
plot(df.g, vertex.label = V(df.g)$name)
Run Code Online (Sandbox Code Playgroud)

allTog数据帧由下式给出
allTog <- data.frame(
source = c("chamber", "chamber", "chamber", "chamber", "chamber",
"check", "check", "issue", "issue", "issue"),
target = c("check", "issue", "leak", "process", "found", "power",
"customer", "customer", "wafer", "replaced")
)
Run Code Online (Sandbox Code Playgroud)
列“ row.names”和“ values”在这里没有意义。
如何从每个根节点(在本例中为“ chamber”)遍历到每个叶节点并获取路径,即所有节点(顶点)名称?我正在寻找一个通用的解决方案,因为我的根节点可以随代码的每次运行而变化。例如,在下一次运行中,根节点可以是“ issue”。
我想要的输出是:
chamber->check->power
chamber->issue->replaced
chamber->process
chamber->issue->customer
Run Code Online (Sandbox Code Playgroud)
等等...
我有以下列表:
l = [["a", "done"], ["c", "not done"]]
Run Code Online (Sandbox Code Playgroud)
如果每个子列表的第二个元素是"完成",我想删除该子列表.所以输出应该是:
l = [["c", "not done"]]
Run Code Online (Sandbox Code Playgroud)
显然以下不起作用:
for i in range(len(l)):
if l[i][1] == "done":
l.pop(0)
Run Code Online (Sandbox Code Playgroud) 好的,我在python中有两个列表
a = ['bad', 'horrible']
b = ['bad', 'good']
Run Code Online (Sandbox Code Playgroud)
我正在使用set运算符来比较两个列表,并在两个集合之间存在一个常用词时给出输出.
print set(a) & set (b)
Run Code Online (Sandbox Code Playgroud)
这给出了输出,
set(['bad'])
Run Code Online (Sandbox Code Playgroud)
有没有在输出中删除关键字'set'?
我希望输出看起来像
['bad']
Run Code Online (Sandbox Code Playgroud) python ×6
comparison ×1
futex ×1
graph ×1
igraph ×1
k-means ×1
list ×1
nested-lists ×1
pandas ×1
python-2.x ×1
queue ×1
r ×1
replace ×1
scikit-learn ×1
set ×1
treenode ×1