在Python 3中,我有一个PDF文件“ Ativos_Fevereiro_2018_servidores_rj.pdf”,具有6,041页。我在使用Ubuntu的计算机上
在每个页面的顶部,两行都是文本。在表格下方,带有标题和两列。每个表36行,最后一页较少
在每页末尾,表格之后,还有一行文字
我想从此PDF创建CSV,只考虑页面中的表格。并忽略表格前后的文字
最初,我测试了表格。但是它生成一个空文件:
from tabula import convert_into
convert_into("Ativos_Fevereiro_2018_servidores_rj.pdf", "test_s.csv", output_format="csv")
Run Code Online (Sandbox Code Playgroud)
拜托,有人知道这种方法可以使用tabula-py吗?
还是将这种文件类型的PDF转换为CSV的另一种方法?
In Python 3 and pandas I have a dataframe with a column cpf with codes
candidatos_2014.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 26245 entries, 0 to 1063
Data columns (total 7 columns):
uf 26245 non-null object
cargo 26245 non-null object
nome_completo 26245 non-null object
cpf 26245 non-null object
nome_urna 26245 non-null object
partido_eleicao 26245 non-null object
situacao 26245 non-null object
dtypes: object(7)
memory usage: 1.6+ MB
Run Code Online (Sandbox Code Playgroud)
The codes are numbers like these: "00229379273", "84274662268", "09681949153", "53135636534"...
I saved as CSV
candidatos_2014.to_csv('candidatos_2014.csv')
Run Code Online (Sandbox Code Playgroud)
我使用Ubuntu和LibreOffice。但是当我打开文件时,cpf …
在 Python3 和 Pandas 中,我有数据框:
df_projetos_api_final.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 93631 entries, 1 to 93667
Data columns (total 21 columns):
AnoMateria 93631 non-null object
CodigoMateria 93631 non-null object
DescricaoIdentificacaoMateria 93631 non-null object
DescricaoSubtipoMateria 93631 non-null object
IndicadorTramitando 93631 non-null object
NomeCasaIdentificacaoMateria 93631 non-null object
NumeroMateria 93631 non-null object
ApelidoMateria 891 non-null object
DataApresentacao 93631 non-null object
DataLeitura 54213 non-null object
EmentaMateria 93631 non-null object
ExplicacaoEmentaMateria 9461 non-null object
IndicadorComplementar 93631 non-null object
DescricaoNatureza 54352 non-null object
NomeAutor 93100 non-null object
IndicadorOutrosAutores …
Run Code Online (Sandbox Code Playgroud) 在Python 3中,我制作程序以在Twitter中提取帖子和喜欢:
import tweepy
import pandas as pd
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
Run Code Online (Sandbox Code Playgroud)
此函数接收配置文件的教学分类(仅适用于数据库组织)和配置文件的名称.它创建一个包含字典的列表,然后返回:
def linhadotempo(posicao, valor):
tela = api.user_timeline(valor)
bolha = []
for status in tela:
dicionario = {"nome": valor, "posicionamento": posicao, "posts_links": status.text, "curtidas": status.favorite_count}
bolha.append(dicionario)
return bolha
Run Code Online (Sandbox Code Playgroud)
Twitter个人资料的名称列表及其教学评级.然后转换成数据帧:
data = {
'nome': ['jeanwyllys_real', 'lucianagenro', 'jairbolsonaro', 'MBLivre'],
'posicionamento': ['esquerda', 'esquerda', 'direita', 'direita']
}
perfis = pd.DataFrame(data, columns=['nome','posicionamento'])
perfis.reset_index()
index nome posicionamento
0 …
Run Code Online (Sandbox Code Playgroud) 在python3和pandas中,我有这个数据框:
gastos_anuais.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 2 columns):
ano 5 non-null int64
valor_pago 5 non-null float64
dtypes: float64(1), int64(1)
memory usage: 280.0 bytes
gastos_anuais.reset_index()
index ano valor_pago
0 0 2014 13,082,008,854.37
1 3 2017 9,412,069,205.73
2 2 2016 7,617,420,559.22
3 1 2015 7,470,391,492.24
4 4 2018 7,099,199,179.11
Run Code Online (Sandbox Code Playgroud)
我做了一个点图:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.pointplot(x='ano', y='valor_pago', data=gastos_anuais)
plt.xticks(rotation=65)
plt.grid(True, linestyle="--")
plt.title("Gastos Destinados pelo Governo Federal (2014-2018)\n")
plt.xlabel("Anos")
plt.ylabel("Em bilhões …
Run Code Online (Sandbox Code Playgroud) 拜托,我正在使用 Google Colab 和 Python3
我在 fastprogress 中遇到了 VersionConflict 的问题。我有这个代码:
!curl -s https://course.fast.ai/setup/colab | bash
import warnings
warnings.filterwarnings('ignore')
from fastai.vision import *
from fastai.metrics import error_rate
import fastai
print(f'fastai: {fastai.__version__}')
print(f'cuda: {torch.cuda.is_available()}')
---------------------------------------------------------------------------
VersionConflict Traceback (most recent call last)
<ipython-input-17-01736c3668f8> in <module>()
1 import warnings
2 warnings.filterwarnings('ignore')
----> 3 from fastai.vision import *
4 from fastai.metrics import error_rate
5 import fastai
7 frames
/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
789 # Oops, the "best" so far conflicts with …
Run Code Online (Sandbox Code Playgroud) 请,在 python3 和 sendgrid 中,我需要以密件抄送的方式向多个地址发送电子邮件。我将这些电子邮件列在一个列表中。我正在尝试这样的个性化:
import os
import json
from sendgrid import SendGridAPIClient
from sendgrid.helpers.mail import Mail, Personalization, From, To, Cc, Bcc
recips = ['email1@gmail.com', 'email2@gmail.com', 'email2@gmail.com']
new_email = Mail(from_email='emailsender@gmail.com',
to_emails = 'one_valid_email@gmail.com',
subject= "email subject",
html_content="Hi<br><br>This is a test")
personalization = Personalization()
for bcc_addr in recips:
personalization.add_bcc(Bcc(bcc_addr))
new_email.add_personalization(personalization)
try:
sg = SendGridAPIClient('API_KEY')
response = sg.send(new_email)
print(response.status_code)
print(response.body)
print(response.headers)
except Exception as e:
print(e.to_dict)
Run Code Online (Sandbox Code Playgroud)
在使用真实电子邮件地址的测试中,出现错误: HTTP 错误 400:错误请求,带有字典:{'errors': [{'message': '所有个性化对象都需要 to 数组,并且必须至少有一封电子邮件具有有效电子邮件地址的对象。', 'field': 'personalizations.0.to', 'help': 'http://sendgrid.com/docs/API_Reference/Web_API_v3/Mail/errors.html#message.personalizations.到'}]}
请问有人知道为什么吗?
在 Python 3 和 pandas 中,我加载了几个 TXT 文件。它们没有标题并且具有相同的结构 - 46 列,每列中的信息主题相同 三种情况的示例
candidatos1 = pd.read_csv("candidatos_2014/consulta_cand_2014_AC.txt",sep=';', header=None, encoding = 'latin_1')
candidatos1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 621 entries, 0 to 620
Data columns (total 46 columns):
0 621 non-null object
1 621 non-null object
2 621 non-null int64
3 621 non-null int64
4 621 non-null object
5 621 non-null object
6 621 non-null object
7 621 non-null object
8 621 non-null int64
9 621 non-null object
10 621 non-null object
11 621 non-null …
Run Code Online (Sandbox Code Playgroud)