小编met*_*rsk的帖子

我什么时候应该在Python中使用类？

我已经在python中编程了大约两年; 主要是数据(pandas,mpl,numpy),还有自动化脚本和小型Web应用程序.我正在努力成为一个更好的程序员并增加我的python知识,困扰我的一件事是我从未使用过类(除了为小型web应用程序复制随机烧瓶代码之外).我一般都明白它们是什么,但我似乎无法理解为什么我需要它们通过一个简单的功能.

为了增加我的问题的特异性:我写了大量的自动报告,这些报告总是涉及从多个数据源(mongo,sql,postgres,apis)中提取数据,执行大量或少量数据修改和格式化,将数据写入csv/excel/html,通过电子邮件发送出去.脚本范围从~250行到~600行.我是否有理由使用课程来完成这项工作？为什么？

python oop

met*_*rsk

2015 10-12

124
推荐指数

4
解决办法

5万
查看次数

如何使用通配符来使用AWS CLI"cp"一组文件

我*在AWS CLI中使用时无法从某个存储桶中选择文件的子集.

添加*到这样的路径似乎不起作用

aws s3 cp s3://data/2016-08* .

amazon-s3 amazon-web-services aws-cli

met*_*rsk

2016 12-13

79
推荐指数

3
解决办法

4万
查看次数

Pandas groupby与bin计数

我有一个看起来像这样的DataFrame:

+----------+---------+-------+
| username | post_id | views |
+----------+---------+-------+
| john     |       1 |     3 |
| john     |       2 |    23 |
| john     |       3 |    44 |
| john     |       4 |    82 |
| jane     |       7 |     5 |
| jane     |       8 |    25 |
| jane     |       9 |    46 |
| jane     |      10 |    56 |
+----------+---------+-------+

Run Code Online (Sandbox Code Playgroud)

我想将它转换为计算属于某些二进制文件的视图:

+------+------+-------+-------+--------+
|      | 1-10 | 11-25 | 25-50 | 51-100 |
+------+------+-------+-------+--------+ …

Run Code Online (Sandbox Code Playgroud)

python dataframe pandas pandas-groupby

met*_*rsk

2017 10-20

26
推荐指数

1
解决办法

2万
查看次数

获取行和段落,而不是PDF上的Google Vision API OCR中的符号

我正在尝试使用Google Cloud Vision API中现在支持的PDF/TIFF文档文本检测.使用他们的示例代码,我能够提交PDF并接收带有提取文本的JSON对象.我的问题是保存到GCS的JSON文件只包含"符号"的边界框和文本,即每个单词中的每个字符.这使得JSON对象非常难以使用并且非常难以使用.我希望能够获得"LINES","PARAGRAPHS"和"BLOCKS"的文本和边界框,但我似乎无法通过该AsyncAnnotateFileRequest()方法找到方法.

示例代码如下:

def async_detect_document(gcs_source_uri, gcs_destination_uri):
    """OCR with PDF/TIFF as source files on GCS"""
    # Supported mime_types are: 'application/pdf' and 'image/tiff'
    mime_type = 'application/pdf'

    # How many pages should be grouped into each json output file.
    batch_size = 2

    client = vision.ImageAnnotatorClient()

    feature = vision.types.Feature(
        type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION)

    gcs_source = vision.types.GcsSource(uri=gcs_source_uri)
    input_config = vision.types.InputConfig(
        gcs_source=gcs_source, mime_type=mime_type)

    gcs_destination = vision.types.GcsDestination(uri=gcs_destination_uri)
    output_config = vision.types.OutputConfig(
        gcs_destination=gcs_destination, batch_size=batch_size)

    async_request = vision.types.AsyncAnnotateFileRequest(
        features=[feature], input_config=input_config,
        output_config=output_config)

    operation = client.async_batch_annotate_files(
        requests=[async_request])

    print('Waiting for the operation …

Run Code Online (Sandbox Code Playgroud)

python google-cloud-platform google-cloud-vision

met*_*rsk

2018 09-02

18
推荐指数

1
解决办法

3283
查看次数

为什么我在控制台中"未定义"？

这是我的代码:

var textArray = ['#text1', '#text2', '#text3', '#text4',
'#text5', '#text6', '#text7', '#text8']

$('#capture').click(function() {
    for (var i in textArray) {
      console.log($(i).offset());
    }
});

Run Code Online (Sandbox Code Playgroud)

不知道为什么我在控制台中未定义.我觉得我错过了很简单的事情.

javascript jquery

met*_*rsk

lucky-day

12
推荐指数

1
解决办法

445
查看次数

使用两个不同的Python发行版

我目前在我的计算机上下载并使用了连续统计分析'python发行版(称为anaconda).我的问题是我想将virtualenv用于烧瓶项目,而anaconda闪烁警告"不支持虚拟环境".有没有什么方法可以在同一台计算机上运行两个发行版,股票python和anaconda？

python software-distribution anaconda

met*_*rsk

lucky-day

11
推荐指数

1
解决办法

6426
查看次数

Python不会写入文件

我正在尝试将一个漂亮的打印电子邮件写入.txt文件,以便我可以更好地查看我要解析的内容.

这是我的代码的这一部分:

result, data = mail.uid('search', None, "(FROM 'tiffany@e.tiffany.com')") # search and return uids instead
latest_email_uid = data[0].split()[-1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = data[0][1]

html = raw_email
soup = BS(html)
pretty_email = soup.prettify('utf-8')

f = open("da_email.txt", "w")
f.write(pretty_email)
f.close

Run Code Online (Sandbox Code Playgroud)

我没有遇到任何错误,但我无法将数据写入文件.我知道数据已妥善存储在pretty_email变量中,因为我可以在控制台中将其打印出来.

有什么想法吗？

我的更新代码仍然不起作用:

result, data = mail.uid('search', None, "(FROM 'tiffany@e.tiffany.com')") # search and return uids instead
latest_email_uid = data[0].split()[-1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = data[0][1]

html = raw_email
soup = BS(html)
pretty_email = soup.prettify('utf-8')

with …

Run Code Online (Sandbox Code Playgroud)

python io parsing file beautifulsoup

met*_*rsk

2013 10-11

10
推荐指数

1
解决办法

3万
查看次数

自动化标准的jupyter/ipython笔记本导入

对于至少99%的jupyter/ipython笔记本,我使用以下导入:

import pandas as pd
from pandas.io.json import json_normalize
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np

from bson import json_util, ObjectId
import json

from datetime import datetime, timedelta
import pytz

pd.set_option('max_columns', 50)

mpl.style.use('ggplot')

%pylab inline

Run Code Online (Sandbox Code Playgroud)

有没有人发现任何类型的解决方案,允许我自动执行此操作或创建某种宏？

ipython ipython-notebook jupyter

met*_*rsk

lucky-day

9
推荐指数

2
解决办法

6979
查看次数

使用重复的索引值旋转 Pandas 数据框

我有一个数据框，其中包含每个用户加入我的网站并进行购买的行。

+---+-----+--------------------+---------+--------+-----+
|   | uid |        msg         |  _time  | gender | age |
+---+-----+--------------------+---------+--------+-----+
| 0 |   1 | confirmed_settings | 1/29/15 | M      |  37 |
| 1 |   1 | sale               | 4/13/15 | M      |  37 |
| 2 |   3 | confirmed_settings | 4/19/15 | M      |  35 |
| 3 |   4 | confirmed_settings | 2/21/15 | M      |  21 |
| 4 |   5 | confirmed_settings | 3/28/15 | M      |  18 | …

Run Code Online (Sandbox Code Playgroud)

python pandas

met*_*rsk

2015 04-29

8
推荐指数

2
解决办法

2万
查看次数