如何从Python中提取PDF文件中的文本?
我尝试了以下方法:
import sys
import pyPdf
def convertPdf2String(path):
content = ""
pdf = pyPdf.PdfFileReader(file(path, "rb"))
for i in range(0, pdf.getNumPages()):
content += pdf.getPage(i).extractText() + " \n"
content = " ".join(content.replace(u"\xa0", u" ").strip().split())
return content
f = open('a.txt','w+')
f.write(convertPdf2String(sys.argv[1]).encode("ascii","xmlcharrefreplace"))
f.close()
Run Code Online (Sandbox Code Playgroud)
但结果如下,而不是可读文本:
728;~˚!""˘˙˝˛˛˛˛〜˘˛˙"˘"〜#$˙˚%&˘˛〜'˙% ˝˛˙~~'#$%&('%$&))$ $ +%#, - .+ &&˝())˝) ˝+ ,, - ./ 012)(˝)*˝+, - 3˙/ 0245)6#57 + 82,55)6#57 +,+ 2,+ /!#!!&˘˘1"%˘20˛˛307%4!˘"6˛ ˝˝&/&4"9%6%4%4&5˘2)˘˘˛%:6(
我有下表:
CREATE TABLE Bable
(
id int identity primary key,
name varchar(20),
about varchar(30)
);
INSERT INTO Bable (name,about) VALUES
('??? Name Firm 1','texttexttexttext'),
('??? Name Firm 2','texttexttexttext'),
('??? Name Firm 3','texttexttexttext'),
('??? Name Firm 4','texttexttexttext'),
('??? Name Firm 5','texttexttexttext'),
('??? Name Firm $1','texttexttexttext'),
('??? Name Firm $2','texttexttexttext'),
('??? Name Firm $3','texttexttexttext'),
('??? Name Firm 6','texttexttexttext'),
('??? Name Firm 7','texttexttexttext')
Run Code Online (Sandbox Code Playgroud)
我可以编写如下查询:
SELECT * FROM Bable WHERE about = 'texttexttexttext'
Run Code Online (Sandbox Code Playgroud)
如何更改此查询以返回已排序的结果,以便首先显示名称中包含"$"的结果,然后显示不包含"$"的结果,然后每个组按name升序排序?
join有效但我想保留双引号连接给我这个
[ben,linda,john]但我想要这个
["ben", "linda", "john"]
这太疯狂了,花了两个多小时试图解决这个问题我想将列表作为字符串变量传递为什么不能 terraform 只是将我的列表作为字符串接收?为什么这么难?
所以我有
name = ["ben", "linda", "john"]
我想将它传递给 terrform 中使用的变量
var.name
为什么 terrform 不能接受这个?
我收到错误消息,说 epxtected a string,到处搜索后我无法在线找到解决方案
我已经能够得到
[ ben,linda,john ]使用join(",", var.name)但我想要["ben", "linda", "john"]
$ terraform --version
Terraform v0.12.18
+ provider.aws v2.42.0
+ provider.template v2.1.2
Run Code Online (Sandbox Code Playgroud) 我正在尝试通过Kinesis Firehose将AWS cloudwatch日志流式传输到ES.下面的terraform代码给出了错误.任何建议..错误是:
resource "aws_s3_bucket" "bucket" {
bucket = "cw-kinesis-es-bucket"
acl = "private"
}
resource "aws_iam_role" "firehose_role" {
name = "firehose_test_role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "firehose.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_elasticsearch_domain" "es" {
domain_name = "firehose-es-test"
elasticsearch_version = "1.5"
cluster_config {
instance_type = "t2.micro.elasticsearch"
}
ebs_options {
ebs_enabled = true
volume_size = 10
}
advanced_options {
"rest.action.multi.allow_explicit_index" = "true"
}
access_policies …Run Code Online (Sandbox Code Playgroud) amazon-web-services elasticsearch amazon-iam amazon-cloudwatch terraform
有没有更好的方法来优化下面的代码,这样我就不必一次又一次地请求可用区,而是可以一次性完成。由于区域是可变的,所以我无法定义硬编码的可用区域。你们可以吗我希望我的公共子网是 /24
provider "aws" {
region = var.region
}
resource "aws_vpc" "app_vpc" {
cidr_block = var.vpc_cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = var.vpc_name
}
}
data "aws_availability_zones" "available" {
state = "available"
}
#provision public subnet
resource "aws_subnet" "public_subnet_01" {
vpc_id = aws_vpc.app_vpc.id
cidr_block = var.public_subnet_01
availability_zone = data.aws_availability_zones.available.names[0]
tags = {
Name = "public_subnet_01"
}
depends_on = [aws_vpc_dhcp_options_association.dns_resolver]
}
resource "aws_subnet" "public_subnet_02" {
vpc_id = aws_vpc.app_vpc.id
cidr_block = var.public_subnet_02
availability_zone = data.aws_availability_zones.available.names[1]
tags = …Run Code Online (Sandbox Code Playgroud) 我有两个文件,domain.com/test2.php:
<div id="testDiv"></div>
<script src="http://domain.com/packages/jquery.js"></script>
<script>$("#testDiv").load("http://domain.com/test3.php", {var1:1, var2:2});</script>
Run Code Online (Sandbox Code Playgroud)
和domain.com/test3.php:
<b>var1: <?php echo $var1; ?> , var2: <?php echo $var2; ?></b>
Run Code Online (Sandbox Code Playgroud)
在这种情况下,domain.com/test2.php输出
var1: 1 , var2: 2正如人们所期望的那样,但现在让我说我想在子域中创建一个test2.php.要停止跨域脚本编写的问题,我会将这个额外的行添加到sub.domain.com/test2.php的开头:
<script>document.domain = "domain.com";</script>
Run Code Online (Sandbox Code Playgroud)
此额外行阻止跨域错误显示,但现在文件不再输出var1: 1 , var2: 2.为什么这样,我该如何解决这个问题?
我最近将 terraform 代码中的变量值移至 terraform.tfvars。我现在收到一个错误,这是由于我声明列表和映射变量的方式造成的。我收到错误的代码复制如下:
image_id = var.web_amis[var.region]
Run Code Online (Sandbox Code Playgroud)
这就是我在 terraform.tfvars 中指定这些变量的方式:
web_amis = ["ami-0dacb0c129b49f529", "ami-00068cd7555f543d5", ]
Run Code Online (Sandbox Code Playgroud)
这是我收到的错误代码:
Error: Invalid index
on autoscaling.tf line 3, in resource "aws_launch_configuration" "web_lc":
3: image_id = var.web_amis[var.region]
|----------------
| var.region is "us-east-2"
| var.web_amis is tuple with 2 elements
The given key does not identify an element in this collection value: a number
is required.
Run Code Online (Sandbox Code Playgroud)
我有一个Python软件,其中包括配置文件和联机帮助页.要安装这些,我在我的内容中有以下行setup.py(如http://docs.python.org/2/distutils/setupscript.html#installing-additional-files中所述):
data_files = [('/etc/foo', ['foo.conf']), ('/usr/share/man/man1', ['foo.1'])]
Run Code Online (Sandbox Code Playgroud)
当我想以root身份安装软件时,这很好用python setup.py install,但当然在virtualenv中失败,因为不允许用户写入/etc和/usr/share/man.
解决这个问题的最佳做法是什么?检查VIRTUAL_ENV当前环境,根本不安装这些文件?该软件将foo.conf在本地目录中查找,因此应该没问题.用户会错过该联机帮助页,但无论如何都没有理智的安装方式,因为man在virtualenv附近的任何地方都不会找到它.
是否可以回显使用||,以便它使用第一个求值为true的变量?
例如,
$a = false;
$b = 'b';
echo $a || $b || 'neither'; // evaluates to 1 ?
Run Code Online (Sandbox Code Playgroud) terraform ×5
python ×2
ajax ×1
amazon-iam ×1
cors ×1
cross-domain ×1
echo ×1
javascript ×1
php ×1
pypdf ×1
sql ×1
sql-server ×1
syntax ×1
t-sql ×1
variables ×1
virtualenv ×1