标签: pipeline

comments_list=[]
comments=response.xpath(somexpath)
for x in comments.extract():
        comments_list.append(x)
    ScrapyItem['comments'] =comments_list

Run Code Online (Sandbox Code Playgroud)

python pipeline scrapy

Nin*_*ina

2015 09-23

12
推荐指数

3
解决办法

7429
查看次数

我明白了,但我不相信.R中的合法名称,管道操作和点

在尝试理解Win Vector博客中描述的基础R"Bizarro管道"时,我确认简单的示例在R中产生管状行为而没有安装包.例如:

> 2  ->.; exp(.)
[1] 7.389056

Run Code Online (Sandbox Code Playgroud)

我发现点用作plyr和magrittr中的运算符.我花了几个小时在基地R中找到我能想到的点运算符的同义词,我知道的每一个帮助工具; 我甚至跑了一些荒谬的正则表达式搜索.最后,在绝望中,我尝试了这个:

>. <- 27
>.
[1] 27

Run Code Online (Sandbox Code Playgroud)

到目前为止,我没有确认一个裸露的点,甚至没有"它的名字",在R中是一个有效的变量名.但是我仍然希望这仅仅是一些更明智行为的副作用,记录在案某处.

是吗？如果是这样,在哪里？

我承认,在Win Vector博客中首次出现时,作者认为这是一个笑话.

pipeline r pipe naming-conventions operators

and*_*ewH

lucky-day

12
推荐指数

1
解决办法

211
查看次数

如何使用Scikit学习将功能与不同尺寸的输出相结合

我正在使用Pipeline和FeatureUnion的scikit-learn来从不同的输入中提取特征.我的数据集中的每个样本(实例)都指的是具有不同长度的文档.我的目标是独立计算每个文档的顶部tfidf,但我不断收到此错误消息:

ValueError:blocks [0,:]具有不兼容的行维度.得到块[0,1] .shape [0] == 1,预计2000.

2000是训练数据的大小.这是主要代码:

book_summary= Pipeline([
   ('selector', ItemSelector(key='book')),
   ('tfidf', TfidfVectorizer(analyzer='word', ngram_range(1,3), min_df=1, lowercase=True, stop_words=my_stopword_list, sublinear_tf=True))
])

book_contents= Pipeline([('selector3', book_content_count())]) 

ppl = Pipeline([
    ('feats', FeatureUnion([
         ('book_summary', book_summary),
         ('book_contents', book_contents)])),
    ('clf', SVC(kernel='linear', class_weight='balanced') ) # classifier with cross fold 5
])

Run Code Online (Sandbox Code Playgroud)

我写了两个类来处理每个管道功能.我的问题是book_contents管道,它主要处理每个样本并独立返回每本书的TFidf矩阵.

class book_content_count(): 
  def count_contents2(self, bookid):
        book = open('C:/TheCorpus/'+str(int(bookid))+'_book.csv', 'r')       
        book_data = pd.read_csv(book, header=0, delimiter=',', encoding='latin1',error_bad_lines=False,dtype=str)
                      corpus=(str([user_data['text']]).strip('[]')) 
        return corpus

    def transform(self, data_dict, y=None):
        data_dict['bookid'] #from here take the name 
        text=data_dict['bookid'].apply(self.count_contents2)
        vec_pipe= Pipeline([('vec', TfidfVectorizer(min_df = 1,lowercase …

Run Code Online (Sandbox Code Playgroud)

pipeline numpy python-3.x scikit-learn neuraxle

Abr*_*ial

2019 10-13

12
推荐指数

1
解决办法

798
查看次数

如何在不更改缓存键的情况下删除 Azure Pipeline 缓存

我有一个创建缓存的任务

- task: Cache@2
  inputs:
    key: 'sonarCache'
    path: $(SONAR_CACHE)
    cacheHitVar: CACHE_RESTORED
  displayName: Cache Sonar packages

Run Code Online (Sandbox Code Playgroud)

但是，缓存已损坏。那么我如何运行这个管道，同时告诉它忽略任何现有的缓存？

由于某种原因，我无法更改缓存键sonarCache

pipeline azure azure-devops azure-pipelines azure-pipelines-yaml

TSR*_*TSR

lucky-day

12
推荐指数

1
解决办法

6484
查看次数

部署时无法找到MSBuild目标PipelinePreDeployCopyAllFilesToOneFolder

从VS2010 RTM部署Web应用程序项目会导致MSBuild中出错.它抱怨无法找到PipelinePreDeployCopyAllFilesToOneFolder目标.

有没有办法进一步诊断这个？

谢谢.

msbuild pipeline visual-studio-2010

msa*_*ara

2010 06-06

11
推荐指数

2
解决办法

1万
查看次数

背景

我正在努力"现代化"一个已有的PHP驱动的网站.这个网站最初是一个静态网站,有几个php方法.它现在有一个移动网络应用程序,多个模型和大量动态内容.然而,超时的应用程序本身的结构并没有太大变化,因为它是一个很大程度上静态的站点,所以现在遍布包含文件,没有应用程序/表示逻辑的分离等等.这是一个烂摊子从事于.因此,当我们准备即将升级到不断增长的生态系统时,我正在重新组织所有内容并重新开发许多预先存在的功能.首先,我正在重新编码每个标志以适应MVC架构.虽然我使用PHP,但我的大部分背景都来自Ruby和Node,因此我的问题是:

实际问题

我非常喜欢Rails的资产管道,看到我正在研究的当前网站(见上面的背景)有大约10种不同的样式表和更多的javascript文件,我真的很想实现某种资产管理器当我将网站转换为MVC设置时.

我发现了Assetic,它似乎相当有趣,但我不太了解将它实现到模板系统的最佳方法(我没有使用任何预先构建的模板,如Twig,我可以找到任何参考资料)或者让它动态地提供资产.

我还发现了一个名为Pipe的东西:https://github.com/CHH/pipe,它看起来像一个非常接近的Sprockets端口,但我无法正常运行.

是否有任何应用程序实现Assetic(或Pipe),或者其他不依赖于现有模板引擎的资产打包程序,例如Twig,我可以看一下？

真的,我正在寻找能够缩小/组合多个JS和CSS文件,然后缓存它们的东西.

php pipeline assets

Con*_*ham

2013 09-10

11
推荐指数

1
解决办法

4757
查看次数

停止PowerShell管道,确保调用end

我要做的是获得一个函数来在达到时间限制时停止管道输入.我创建了一个测试函数如下:

function Test-PipelineStuff
{
    [cmdletbinding()]
    Param(
        [Parameter(ValueFromPipeLIne=$true)][int]$Foo,
        [Parameter(ValueFromPipeLIne=$true)][int]$MaxMins
    )

    begin { 
        "THE START" 
        $StartTime = Get-Date
        $StopTime = (get-date).AddMinutes($MaxMins)
        "Stop time is: $StopTime"
    } 

    process 
    {  
        $currTime = Get-Date
        if( $currTime -lt $StopTime ){
            "Processing $Foo"            
        }
        else{
            continue;
        }
    }

    end { "THE END" }
}

Run Code Online (Sandbox Code Playgroud)

这肯定会阻止管道,但它永远不会调用我的"end {}"块,在这种情况下它是至关重要的.有没有人知道为什么当我使用"继续"停止管道时,我的"end {}"块没有被调用？如果我抛出PipelineStoppedException,行为似乎是相同的.

powershell pipeline powershell-4.0

cam*_*.rw

2015 06-04

11
推荐指数

1
解决办法

764
查看次数

在管道中使用分类器后的度量标准

我继续调查管道.我的目标是仅使用管道执行机器学习的每个步骤.使用其他用例更灵活,更容易调整我的管道.所以我做了什么:

第1步:填写NaN值
第2步:将分类值转换为数字
第3步:分类器
第4步:GridSearch
第5步:添加指标(失败)

这是我的代码:

import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.feature_selection import SelectKBest
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score


class FillNa(BaseEstimator, TransformerMixin):

    def transform(self, x, y=None):
            non_numerics_columns = x.columns.difference(
                x._get_numeric_data().columns)
            for column in x.columns:
                if column in non_numerics_columns:
                    x.loc[:, column] = x.loc[:, …

Run Code Online (Sandbox Code Playgroud)

python pipeline machine-learning scikit-learn grid-search

Jer*_*uez

2018 08-25

11
推荐指数

1
解决办法

742
查看次数