Gib*_*rno 5 mysql myisam full-text-search my.cnf
我在https://serverfault.com/questions/353888/mysql-full-text-search-cause-high-usage-cpu 上提出了一个问题一些用户建议在这里提问。
我们建立了一个新闻网站。每天我们都会从web api输入数以万计的数据。
为了提供精准的搜索服务,我们的表使用了MyISAM,建立了全文索引(标题、内容、日期)。我们的网站正在测试 Godaddy VDS,内存为 2GB,空间为 30GB(无交换,因为 VDS 不允许构建交换)。CPU是Intel(R) Xeon(R) CPU L5609 @ 1.87GHz
运行一个 ./mysqltuner.pl
我们得到一些结果:
-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.5.20
[OK] Operating on 32-bit architecture with less than 2GB RAM
-------- Storage Engine Statistics -------------------------------------------
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 396M (Tables: 39)
[--] Data in InnoDB tables: 208K (Tables: 8)
[!!] Total fragmented tables: 9
-------- Security Recommendations -------------------------------------------
[!!] User '@ip-XX-XX-XX-XX.ip.secureserver.net'
[!!] User '@localhost'
-------- Performance Metrics -------------------------------------------------
[--] Up for: 17h 27m 58s (1M q [20.253 qps], 31K conn, TX: 513M, RX: 303M)
[--] Reads / Writes: 61% / 39%
[--] Total buffers: 168.0M global + 2.7M per thread (151 max threads)
[OK] Maximum possible memory usage: 573.8M (28% of installed RAM)
[OK] Slow queries: 0% (56/1M)
[!!] Highest connection usage: 100% (152/151)
[OK] Key buffer size / total MyISAM indexes: 8.0M/162.5M
[OK] Key buffer hit rate: 100.0% (2B cached / 882K reads)
[!!] Query cache is disabled
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 17K sorts)
[!!] Temporary tables created on disk: 49% (32K on disk / 64K total)
[!!] Thread cache is disabled
[!!] Table cache hit rate: 0% (400 open / 298K opened)
[OK] Open file limit used: 41% (421/1K)
[!!] Table locks acquired immediately: 77%
[OK] InnoDB data size / buffer pool: 208.0K/128.0M
-------- Recommendations -----------------------------------------------------
General recommendations:
Run OPTIMIZE TABLE to defragment tables for better performance
MySQL started within last 24 hours - recommendations may be inaccurate
Enable the slow query log to troubleshoot bad queries
Reduce or eliminate persistent connections to reduce connection usage
When making adjustments, make tmp_table_size/max_heap_table_size equal
Reduce your SELECT DISTINCT queries without LIMIT clauses
Set thread_cache_size to 4 as a starting value
Increase table_cache gradually to avoid file descriptor limits
Optimize queries and/or use InnoDB to reduce lock wait
Variables to adjust:
max_connections (> 151)
wait_timeout (< 28800)
interactive_timeout (< 28800)
query_cache_size (>= 8M)
tmp_table_size (> 16M)
max_heap_table_size (> 16M)
thread_cache_size (start at 4)
table_cache (> 400)
Run Code Online (Sandbox Code Playgroud)
这里是 my.cnf
[mysqld]
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 256M
max_allowed_packet = 16M
max_connections = 1024
wait_timeout = 5
table_open_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 2M
myisam_sort_buffer_size = 128M
thread_cache_size = 8
query_cache_size= 256M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8
ft_min_word_len = 2
read_rnd_buffer_size=2M
tmp_table_size=128M
Run Code Online (Sandbox Code Playgroud)
我不确定如何优化my.cnf取决于./mysqltuner.pl返回结果。
Rol*_*DBA 10
我有一个有趣的惊喜给你。
您可以做的唯一优化全文索引不是在 my.cnf 级别。这完全是关于两件事:
有543 个停用词您可能希望也可能不想从 FULLTEXT 索引中过滤掉。停用词列表是在编译时构建的。您可以使用自己的列表覆盖该列表,如下所示:
好的,现在让我们创建我们的停用词列表。我通常将英文文章设置为唯一的停用词。
echo "a" > /var/lib/mysql/stopwords.txt
echo "an" >> /var/lib/mysql/stopwords.txt
echo "the" >> /var/lib/mysql/stopwords.txt
Run Code Online (Sandbox Code Playgroud)
接下来,将选项添加到 /etc/my.cnf 加上允许 1 个字母、2 个字母和 3 个字母的单词
[mysqld]
ft_min_word_len=1
ft_stopword_file=/var/lib/mysql/stopwords.txt
Run Code Online (Sandbox Code Playgroud)
最后重启mysql
service mysql restart
Run Code Online (Sandbox Code Playgroud)
如果您有任何带有 FULLTEXT 索引的表,您必须删除这些 FULLTEXT 索引并重新创建它们。
下面是关于使用全表索引的 MySQL 查询的一个鲜为人知的事实:有时 MySQL 查询优化器完全停止使用 FULLTEXT 索引并执行全表扫描。
下面是一个例子:
use test
drop table if exists ft_test;
create table ft_test
(
id int not null auto_increment,
txt text,
primary key (id),
FULLTEXT (txt)
) ENGINE=MyISAM;
insert into ft_test (txt) values
('mount camaroon'),('mount camaron'),('mount camnaroon'),
('mount cameroon'),('mount cemeroon'),('mount camnaroon'),
('mount camraon'),('mount camaraon'),('mount camaran'),
('mount camnaraon'),('mount cameroan'),('mount cemeroan'),
('mount camnaraon'),('munt camraon'),('munt camaraon'),
('munt camaran'),('munt camnaraon'),('munt cameroan'),
('munt cemeroan'),('munt camnaraon'),('mount camraan');
select * from ft_test WHERE MATCH(txt) AGAINST ("+mount +cameroon" IN BOOLEAN MODE);
Run Code Online (Sandbox Code Playgroud)
这是加载的示例数据:
mysql> use test
Database changed
mysql> drop table if exists ft_test;
Query OK, 0 rows affected (0.00 sec)
mysql> create table ft_test
-> (
-> id int not null auto_increment,
-> txt text,
-> primary key (id),
-> FULLTEXT (txt)
-> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.03 sec)
mysql> insert into ft_test (txt) values
-> ('mount camaroon'),('mount camaron'),('mount camnaroon'),
-> ('mount cameroon'),('mount cemeroon'),('mount camnaroon'),
-> ('mount camraon'),('mount camaraon'),('mount camaran'),
-> ('mount camnaraon'),('mount cameroan'),('mount cemeroan'),
-> ('mount camnaraon'),('munt camraon'),('munt camaraon'),
-> ('munt camaran'),('munt camnaraon'),('munt cameroan'),
-> ('munt cemeroan'),('munt camnaraon'),('mount camraan');
Query OK, 21 rows affected (0.00 sec)
Records: 21 Duplicates: 0 Warnings: 0
mysql>
Run Code Online (Sandbox Code Playgroud)
这是一个示例查询及其 EXPLAIN 计划
mysql> select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE);
+----+----------------+
| id | txt |
+----+----------------+
| 4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)
mysql> explain select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ft_test
type: fulltext
possible_keys: txt
key: txt
key_len: 0
ref:
rows: 1
Extra: Using where
1 row in set (0.00 sec)
mysql>
Run Code Online (Sandbox Code Playgroud)
好的,很好,使用了 FULLTEXT 索引。
现在,让我们稍微改变一下查询
mysql> select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE) = 1;
+----+----------------+
| id | txt |
+----+----------------+
| 4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)
mysql> explain select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE) = 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ft_test
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 21
Extra: Using where
1 row in set (0.00 sec)
mysql>
Run Code Online (Sandbox Code Playgroud)
OMG FULLTEXT 索引怎么了?MySQL 查询优化器基本上对它嗤之以鼻。如果您正在对 ft_test 表执行 JOIN,一旦发出全文搜索的 WHERE 子句并且它执行相同的操作,谁知道查询的其余部分会发生什么。
解决方案是重构查询并尝试隔离 FULLTEXT 搜索并仅收集键。然后 LEFT JOIN 这些键到原始表。
例子
SELECT B.*
FROM (SELECT id from ft_test
WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
LEFT JOIN ft_test B USING (id);
Run Code Online (Sandbox Code Playgroud)
对于此查询,这是结果及其 EXPLAIN
mysql> SELECT B.*
-> FROM (SELECT id from ft_test
-> WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
-> LEFT JOIN ft_test B USING (id);
+----+----------------+
| id | txt |
+----+----------------+
| 4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)
mysql> explain SELECT B.*
-> FROM (SELECT id from ft_test
-> WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
-> LEFT JOIN ft_test B USING (id)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: system
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
Extra:
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: B
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: const
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: ft_test
type: fulltext
possible_keys: txt
key: txt
key_len: 0
ref:
rows: 1
Extra: Using where
3 rows in set (0.00 sec)
mysql>
Run Code Online (Sandbox Code Playgroud)
请注意,在 EXPLAIN 计划的 DERIVED2 部分中,确实使用了 FULLTEXT 索引。
您必须养成决定数据库将有多少停用词的习惯,创建停用词列表,配置它,然后创建/重新创建所有 FULLTEXT 索引。您还必须养成重构 FULLTEXT 搜索查询的习惯,以使 MySQL 查询优化器不会生成错误的 EXPLAIN 计划或使参与 EXPLAIN 计划的其余查询的索引无效。
| 归档时间: |
|
| 查看次数: |
9780 次 |
| 最近记录: |