mon*_*zok 5 tesseract amazon-web-services epel amazon-elastic-beanstalk
我目前正在使用Tika从上传到我在AWS Elastic Beanstalk上运行的Rails应用程序的文件中提取文本(运行Ruby 2.2的64位Amazon Linux 2016.03 v2.1.2).我也想索引扫描图像,所以我需要安装Tesseract.
我能够通过从源代码安装它来使它工作,但是它为我的部署添加了10分钟到一个新的实例.有更快的方法吗?
.ebextensions/02-tesseract.config
packages:
yum:
autoconf: []
automake: []
libtool: []
libpng-devel: []
libtiff-devel: []
zlib-devel: []
container_commands:
01-command:
command: mkdir -p install
cwd: /home/ec2-user
02-command:
command: cp .ebextensions/scripts/install_tesseract.sh /home/ec2-user/install/
03-command:
command: bash install/install_tesseract.sh
cwd: /home/ec2-user
Run Code Online (Sandbox Code Playgroud)
.ebextensions /脚本/ install_tesseract.sh
#!/usr/bin/env bash
cd_to_install () {
cd /home/ec2-user/install
}
cd_to () {
cd /home/ec2-user/install/$1
}
if ! [ -x "$(command -v tesseract)" ]; then
# Add `usr/local/bin` to PATH
echo 'pathmunge /usr/local/bin' > /etc/profile.d/usr_local.sh
chmod +x /etc/profile.d/usr_local.sh
# Install leptonica
cd_to_install
wget http://www.leptonica.org/source/leptonica-1.73.tar.gz
tar -zxvf leptonica-1.73.tar.gz
cd_to leptonica-1.73
./configure
make
make install
rm -rf /home/ec2-user/install/leptonica-1.73.tar.gz
rm -rf /home/ec2-user/install/leptonica-1.73
# Install tesseract ~ the jewel of Odin's treasure room
cd_to_install
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz
tar -zxvf 3.04.01.tar.gz
cd_to tesseract-3.04.01
./autogen.sh
./configure
make
make install
ldconfig
rm -rf /home/ec2-user/install/3.04.01.tar.gz
rm -rf /home/ec2-user/install/tesseract-3.04.01
# Install tessdata
cd_to_install
wget https://github.com/tesseract-ocr/tessdata/archive/3.04.00.tar.gz
tar -zxvf 3.04.00.tar.gz
cp /home/ec2-user/install/tessdata-3.04.00/eng.* /usr/local/share/tessdata/
rm -rf /home/ec2-user/install/3.04.00.tar.gz
rm -rf /home/ec2-user/install/tessdata-3.04.00
fi
Run Code Online (Sandbox Code Playgroud)
mon*_*zok 13
简短的回答
.ebextensions/02-tesseract.config
commands:
01-libwebp:
command: "yum --enablerepo=epel --disablerepo=amzn-main -y install libwebp"
02-tesseract:
command: "yum --enablerepo=epel -y install tesseract"
Run Code Online (Sandbox Code Playgroud)
答案很长
我不熟悉非Ubuntu包管理器或ebextensions,所以经过一些挖掘后,我发现有预编译的二进制文件可以安装在稳定的EPEL repo中的Amazon Linux上.
第一个障碍是弄清楚如何使用EPEL回购.最简单的方法是使用命令enablerepo上的选项yum.
这让我们来到这里:
yum --enablerepo=epel install tesseract
Run Code Online (Sandbox Code Playgroud)
接下来,我必须解决此依赖性错误:
[root@ip-10-0-1-193 ec2-user]# yum install --enablerepo=epel tesseract
Loaded plugins: priorities, update-motd, upgrade-helper
951 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package tesseract.x86_64 0:3.04.00-3.el6 will be installed
--> Processing Dependency: liblept.so.4()(64bit) for package: tesseract-3.04.00-3.el6.x86_64
--> Running transaction check
---> Package leptonica.x86_64 0:1.72-2.el6 will be installed
--> Processing Dependency: libwebp.so.5()(64bit) for package: leptonica-1.72-2.el6.x86_64
--> Finished Dependency Resolution
Error: Package: leptonica-1.72-2.el6.x86_64 (epel)
Requires: libwebp.so.5()(64bit)
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
Run Code Online (Sandbox Code Playgroud)
我在这里找到了解决方案
只是添加epel repo并没有解决它,因为amzn-main存储库中的包似乎否决了epel存储库中的包.如果amzn-main repo中的libwebp包被排除,它应该可以工作
Tesseract安装在amzn-mainrepo中有一些依赖项.这就是为什么我第一次安装libwebp使用--disablerepo=amzn-main.
yum --enablerepo=epel --disablerepo=amzn-main install libwebp
yum --enablerepo=epel install tesseract
Run Code Online (Sandbox Code Playgroud)
最后,以下是如何使用选项在Elastic Beanstalk上安装yum软件包:
.ebextensions/02-tesseract.config
commands:
01-libwebp:
command: "yum --enablerepo=epel --disablerepo=amzn-main -y install libwebp"
02-tesseract:
command: "yum --enablerepo=epel -y install tesseract"
Run Code Online (Sandbox Code Playgroud)
幸运的是,这也是在Elastic Beanstalk上安装Tesseract的最简单方法!
| 归档时间: |
|
| 查看次数: |
1907 次 |
| 最近记录: |