我目前正在使用Tika从上传到我在AWS Elastic Beanstalk上运行的Rails应用程序的文件中提取文本(运行Ruby 2.2的64位Amazon Linux 2016.03 v2.1.2).我也想索引扫描图像,所以我需要安装Tesseract.
我能够通过从源代码安装它来使它工作,但是它为我的部署添加了10分钟到一个新的实例.有更快的方法吗?
.ebextensions/02-tesseract.config
packages:
yum:
autoconf: []
automake: []
libtool: []
libpng-devel: []
libtiff-devel: []
zlib-devel: []
container_commands:
01-command:
command: mkdir -p install
cwd: /home/ec2-user
02-command:
command: cp .ebextensions/scripts/install_tesseract.sh /home/ec2-user/install/
03-command:
command: bash install/install_tesseract.sh
cwd: /home/ec2-user
Run Code Online (Sandbox Code Playgroud)
.ebextensions /脚本/ install_tesseract.sh
#!/usr/bin/env bash
cd_to_install () {
cd /home/ec2-user/install
}
cd_to () {
cd /home/ec2-user/install/$1
}
if ! [ -x "$(command -v tesseract)" ]; then
# Add `usr/local/bin` to PATH
echo 'pathmunge /usr/local/bin' > /etc/profile.d/usr_local.sh
chmod +x …Run Code Online (Sandbox Code Playgroud)