小编fma*_*ues的帖子

如何用Python从PDF中提取表格？

我有数千个 PDF 文件，仅由表格组成，结构如下：

然而，尽管结构相当合理，但我无法在不丢失结构的情况下阅读表格。

我尝试了 PyPDF2，但数据完全混乱。

import PyPDF2 

pdfFileObj = open(pdf_file.pdf, 'rb') 
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 
pageObj = pdfReader.getPage(0) 

print(pageObj.extractText())
print(pageObj.extractText().split('\n')[0]) 
print(pageObj.extractText().split('/')[0])

Run Code Online (Sandbox Code Playgroud)

我也尝试过 Tabula，但它只读取标题（而不是表格的内容）

from tabula import read_pdf

pdfFile1 = read_pdf(pdf_file.pdf, output_format = 'json') #Option 1: reads all the headers
pdfFile2 = read_pdf(pdf_file.pdf, multiple_tables = True) #Option 2: reads only the first header and few lines of content

Run Code Online (Sandbox Code Playgroud)

有什么想法吗？

python pdf

fma*_*ues

2019 05-08

7
推荐指数

1
解决办法

4万
查看次数

Selenium下载不同的验证码图像比浏览器中的图像

我正在尝试使用Selenium下载验证码图像,但是,我得到的图像与浏览器中显示的图像不同.如果我尝试再次下载图像,而不更改浏览器,我会得到一个不同的图像.

有什么想法吗？

from selenium import webdriver
import urllib


driver = webdriver.Firefox()
driver.get("http://sistemas.cvm.gov.br/?fundosreg")

# Change frame.
driver.switch_to.frame("Main")


# Download image/captcha.
img = driver.find_element_by_xpath(".//*[@id='trRandom3']/td[2]/img")
src = img.get_attribute('src')
urllib.request.urlretrieve(src, "captcha.jpeg")

Run Code Online (Sandbox Code Playgroud)

python selenium

fma*_*ues

2019 01-23

5
推荐指数

3
解决办法

7331
查看次数

用R中的data.frames计算月收益

我想计算一段时间内的证券清单的每月收益。我拥有的数据具有以下结构：

date   name  value
"2014-01-31"   a    10.0
"2014-02-28"   a    11.1
"2014-03-31"   a    12.1
"2014-04-30"   a    11.9
"2014-05-31"   a    11.5
"2014-06-30"   a    11.88
"2014-01-31"   b    6.0
"2014-02-28"   b    8.5
"2014-03-31"   b    8.2
"2014-04-30"   b    8.8
"2014-05-31"   b    8.3
"2014-06-30"   b    8.9

Run Code Online (Sandbox Code Playgroud)

我试过的代码：

database$date=as.Date(database$date)
monthlyReturn<- function(df) { (df$value[2] - df$value[1])/(df$value[1]) }
mon.returns <- ddply(database, .(name,date), monthlyReturn)

Run Code Online (Sandbox Code Playgroud)

但是，“ monthlyReturn”列的输出为零。

有什么想法吗？

r time-series financial plyr dataframe

fma*_*ues

lucky-day

0
推荐指数

1
解决办法

1384
查看次数