我想如果有人可以研究性能优化:
我有一个在 VMWare 5.1 上运行的 Ubuntu 12.04,具有 32 GB RAM 和 8 个内核(CPU 调度没有问题,因为 VM 几乎是主机上的一个)硬件是 IBM 刀片,带有 2xE5-2660 CPU
我正在运行 Mysql 5.5,并且有一个如下所示的表:
ochrange | CREATE TABLE `ochrange` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rangestart` int(8) NOT NULL,
`rangeend` int(8) NOT NULL,
`rangelength` int(11) NOT NULL DEFAULT '1',
`networkoperator` varchar(6) COLLATE latin1_danish_ci NOT NULL,
`serviceoperator` varchar(6) COLLATE latin1_danish_ci NOT NULL,
`numbertype` varchar(6) COLLATE latin1_danish_ci NOT NULL,
`lastupdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`lastupdateFile` varchar(255) COLLATE latin1_danish_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `rangestart_2` (`rangestart`,`rangeend`),
KEY `rangelength` (`rangelength`)
) ENGINE=MyISAM AUTO_INCREMENT=189138 DEFAULT CHARSET=latin1 COLLATE=latin1_danish_ci |
Run Code Online (Sandbox Code Playgroud)
该表包含 187,500 行。
我正在运行这样的查询:
SELECT `networkoperator`,`numbertype`
FROM `och`.`ochrange`
WHERE '20972128'
BETWEEN `rangestart` AND `rangeend`
ORDER BY `rangelength` ASC LIMIT 1;
mysql> EXPLAIN SELECT `networkoperator`,`numbertype`
-> FROM `och`.`ochrange`
-> WHERE '20972128'
-> BETWEEN `rangestart` AND `rangeend`
-> ORDER BY `rangelength` ASC LIMIT 1;
+----+-------------+----------+-------+---------------+-------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-------------+---------+------+------+-------------+
| 1 | SIMPLE | ochrange | index | rangestart_2 | rangelength | 4 | NULL | 46 | Using where |
+----+-------------+----------+-------+---------------+-------------+---------+------+------+-------------+
1 row in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)
慢日志中没有其他查询,其他查询次数最少。
在我的 CPU 达到最大值之前,我可以像这样执行 60 qps,并且服务器的负载约为 150,而我在 VMWare 主机上使用的是 21000 Mhz。
我没有 IO 等待 (0.5%) 并且内存使用情况似乎很好。
查询缓存被禁用,因为在短时间内没有相同的选择。
有人对如何获得更多 qps 有任何建议吗?
这是我的服务器变量:
Variable_name: auto_increment_increment
Value: 1
Variable_name: auto_increment_offset
Value: 1
Variable_name: autocommit
Value: ON
Variable_name: automatic_sp_privileges
Value: ON
Variable_name: back_log
Value: 50
Variable_name: basedir
Value: /usr
Variable_name: big_tables
Value: OFF
Variable_name: binlog_cache_size
Value: 32768
Variable_name: binlog_direct_non_transactional_updates
Value: OFF
Variable_name: binlog_format
Value: STATEMENT
Variable_name: binlog_stmt_cache_size
Value: 32768
Variable_name: bulk_insert_buffer_size
Value: 8388608
Variable_name: character_set_client
Value: utf8
Variable_name: character_set_connection
Value: utf8
Variable_name: character_set_database
Value: latin1
Variable_name: character_set_filesystem
Value: binary
Variable_name: character_set_results
Value: utf8
Variable_name: character_set_server
Value: latin1
Variable_name: character_set_system
Value: utf8
Variable_name: character_sets_dir
Value: /usr/share/mysql/charsets/
Variable_name: collation_connection
Value: utf8_general_ci
Variable_name: collation_database
Value: latin1_swedish_ci
Variable_name: collation_server
Value: latin1_swedish_ci
Variable_name: completion_type
Value: NO_CHAIN
Variable_name: concurrent_insert
Value: AUTO
Variable_name: connect_timeout
Value: 10
Variable_name: datadir
Value: /var/lib/mysql/
Variable_name: date_format
Value: %Y-%m-%d
Variable_name: datetime_format
Value: %Y-%m-%d %H:%i:%s
Variable_name: default_storage_engine
Value: InnoDB
Variable_name: default_week_format
Value: 0
Variable_name: delay_key_write
Value: ON
Variable_name: delayed_insert_limit
Value: 100
Variable_name: delayed_insert_timeout
Value: 300
Variable_name: delayed_queue_size
Value: 1000
Variable_name: div_precision_increment
Value: 4
Variable_name: engine_condition_pushdown
Value: ON
Variable_name: error_count
Value: 0
Variable_name: event_scheduler
Value: OFF
Variable_name: expire_logs_days
Value: 7
Variable_name: external_user
Value:
Variable_name: flush
Value: OFF
Variable_name: flush_time
Value: 0
Variable_name: foreign_key_checks
Value: ON
Variable_name: ft_boolean_syntax
Value: + -><()~*:""&|
Variable_name: ft_max_word_len
Value: 84
Variable_name: ft_min_word_len
Value: 4
Variable_name: ft_query_expansion_limit
Value: 20
Variable_name: ft_stopword_file
Value: (built-in)
Variable_name: general_log
Value: OFF
Variable_name: general_log_file
Value: /var/lib/mysql/db-nrlookup.log
Variable_name: group_concat_max_len
Value: 1024
Variable_name: have_compress
Value: YES
Variable_name: have_crypt
Value: YES
Variable_name: have_csv
Value: YES
Variable_name: have_dynamic_loading
Value: YES
Variable_name: have_geometry
Value: YES
Variable_name: have_innodb
Value: YES
Variable_name: have_ndbcluster
Value: NO
Variable_name: have_openssl
Value: DISABLED
Variable_name: have_partitioning
Value: YES
Variable_name: have_profiling
Value: YES
Variable_name: have_query_cache
Value: YES
Variable_name: have_rtree_keys
Value: YES
Variable_name: have_ssl
Value: DISABLED
Variable_name: have_symlink
Value: YES
Variable_name: hostname
Value: db-nrlookup
Variable_name: identity
Value: 0
Variable_name: ignore_builtin_innodb
Value: OFF
Variable_name: init_connect
Value:
Variable_name: init_file
Value:
Variable_name: init_slave
Value:
Variable_name: innodb_adaptive_flushing
Value: ON
Variable_name: innodb_adaptive_hash_index
Value: ON
Variable_name: innodb_additional_mem_pool_size
Value: 8388608
Variable_name: innodb_autoextend_increment
Value: 8
Variable_name: innodb_autoinc_lock_mode
Value: 1
Variable_name: innodb_buffer_pool_instances
Value: 1
Variable_name: innodb_buffer_pool_size
Value: 134217728
Variable_name: innodb_change_buffering
Value: all
Variable_name: innodb_checksums
Value: ON
Variable_name: innodb_commit_concurrency
Value: 0
Variable_name: innodb_concurrency_tickets
Value: 500
Variable_name: innodb_data_file_path
Value: ibdata1:10M:autoextend
Variable_name: innodb_data_home_dir
Value:
Variable_name: innodb_doublewrite
Value: ON
Variable_name: innodb_fast_shutdown
Value: 1
Variable_name: innodb_file_format
Value: Antelope
Variable_name: innodb_file_format_check
Value: ON
Variable_name: innodb_file_format_max
Value: Antelope
Variable_name: innodb_file_per_table
Value: OFF
Variable_name: innodb_flush_log_at_trx_commit
Value: 1
Variable_name: innodb_flush_method
Value:
Variable_name: innodb_force_load_corrupted
Value: OFF
Variable_name: innodb_force_recovery
Value: 0
Variable_name: innodb_io_capacity
Value: 200
Variable_name: innodb_large_prefix
Value: OFF
Variable_name: innodb_lock_wait_timeout
Value: 50
Variable_name: innodb_locks_unsafe_for_binlog
Value: OFF
Variable_name: innodb_log_buffer_size
Value: 8388608
Variable_name: innodb_log_file_size
Value: 5242880
Variable_name: innodb_log_files_in_group
Value: 2
Variable_name: innodb_log_group_home_dir
Value: ./
Variable_name: innodb_max_dirty_pages_pct
Value: 75
Variable_name: innodb_max_purge_lag
Value: 0
Variable_name: innodb_mirrored_log_groups
Value: 1
Variable_name: innodb_old_blocks_pct
Value: 37
Variable_name: innodb_old_blocks_time
Value: 0
Variable_name: innodb_open_files
Value: 300
Variable_name: innodb_print_all_deadlocks
Value: OFF
Variable_name: innodb_purge_batch_size
Value: 20
Variable_name: innodb_purge_threads
Value: 0
Variable_name: innodb_random_read_ahead
Value: OFF
Variable_name: innodb_read_ahead_threshold
Value: 56
Variable_name: innodb_read_io_threads
Value: 4
Variable_name: innodb_replication_delay
Value: 0
Variable_name: innodb_rollback_on_timeout
Value: OFF
Variable_name: innodb_rollback_segments
Value: 128
Variable_name: innodb_spin_wait_delay
Value: 6
Variable_name: innodb_stats_method
Value: nulls_equal
Variable_name: innodb_stats_on_metadata
Value: ON
Variable_name: innodb_stats_sample_pages
Value: 8
Variable_name: innodb_strict_mode
Value: OFF
Variable_name: innodb_support_xa
Value: ON
Variable_name: innodb_sync_spin_loops
Value: 30
Variable_name: innodb_table_locks
Value: ON
Variable_name: innodb_thread_concurrency
Value: 0
Variable_name: innodb_thread_sleep_delay
Value: 10000
Variable_name: innodb_use_native_aio
Value: OFF
Variable_name: innodb_use_sys_malloc
Value: ON
Variable_name: innodb_version
Value: 5.5.34
Variable_name: innodb_write_io_threads
Value: 4
Variable_name: insert_id
Value: 0
Variable_name: interactive_timeout
Value: 28800
Variable_name: join_buffer_size
Value: 131072
Variable_name: keep_files_on_create
Value: OFF
Variable_name: key_buffer_size
Value: 8589934592
Variable_name: key_cache_age_threshold
Value: 300
Variable_name: key_cache_block_size
Value: 1024
Variable_name: key_cache_division_limit
Value: 100
Variable_name: large_files_support
Value: ON
Variable_name: large_page_size
Value: 0
Variable_name: large_pages
Value: OFF
Variable_name: last_insert_id
Value: 0
Variable_name: lc_messages
Value: en_US
Variable_name: lc_messages_dir
Value: /usr/share/mysql/
Variable_name: lc_time_names
Value: en_US
Variable_name: license
Value: GPL
Variable_name: local_infile
Value: ON
Variable_name: lock_wait_timeout
Value: 31536000
Variable_name: locked_in_memory
Value: OFF
Variable_name: log
Value: OFF
Variable_name: log_bin
Value: ON
Variable_name: log_bin_trust_function_creators
Value: OFF
Variable_name: log_error
Value: /var/log/mysql/error.log
Variable_name: log_output
Value: FILE
Variable_name: log_queries_not_using_indexes
Value: OFF
Variable_name: log_slave_updates
Value: OFF
Variable_name: log_slow_queries
Value: ON
Variable_name: log_warnings
Value: 1
Variable_name: long_query_time
Value: 10.000000
Variable_name: low_priority_updates
Value: OFF
Variable_name: lower_case_file_system
Value: OFF
Variable_name: lower_case_table_names
Value: 0
Variable_name: max_allowed_packet
Value: 134217728
Variable_name: max_binlog_cache_size
Value: 18446744073709547520
Variable_name: max_binlog_size
Value: 209715200
Variable_name: max_binlog_stmt_cache_size
Value: 18446744073709547520
Variable_name: max_connect_errors
Value: 10
Variable_name: max_connections
Value: 8000
Variable_name: max_delayed_threads
Value: 20
Variable_name: max_error_count
Value: 64
Variable_name: max_heap_table_size
Value: 16777216
Variable_name: max_insert_delayed_threads
Value: 20
Variable_name: max_join_size
Value: 18446744073709551615
Variable_name: max_length_for_sort_data
Value: 1024
Variable_name: max_long_data_size
Value: 134217728
Variable_name: max_prepared_stmt_count
Value: 16382
Variable_name: max_relay_log_size
Value: 0
Variable_name: max_seeks_for_key
Value: 18446744073709551615
Variable_name: max_sort_length
Value: 1024
Variable_name: max_sp_recursion_depth
Value: 0
Variable_name: max_tmp_tables
Value: 32
Variable_name: max_user_connections
Value: 0
Variable_name: max_write_lock_count
Value: 18446744073709551615
Variable_name: metadata_locks_cache_size
Value: 1024
Variable_name: min_examined_row_limit
Value: 0
Variable_name: multi_range_count
Value: 256
Variable_name: myisam_data_pointer_size
Value: 6
Variable_name: myisam_max_sort_file_size
Value: 9223372036853727232
Variable_name: myisam_mmap_size
Value: 18446744073709551615
Variable_name: myisam_recover_options
Value: BACKUP
Variable_name: myisam_repair_threads
Value: 1
Variable_name: myisam_sort_buffer_size
Value: 8388608
Variable_name: myisam_stats_method
Value: nulls_unequal
Variable_name: myisam_use_mmap
Value: OFF
Variable_name: net_buffer_length
Value: 16384
Variable_name: net_read_timeout
Value: 30
Variable_name: net_retry_count
Value: 10
Variable_name: net_write_timeout
Value: 60
Variable_name: new
Value: OFF
Variable_name: old
Value: OFF
Variable_name: old_alter_table
Value: OFF
Variable_name: old_passwords
Value: ON
Variable_name: open_files_limit
Value: 40000
Variable_name: optimizer_prune_level
Value: 1
Variable_name: optimizer_search_depth
Value: 62
Variable_name: optimizer_switch
Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on
Variable_name: performance_schema
Value: OFF
Variable_name: performance_schema_events_waits_history_long_size
Value: 10000
Variable_name: performance_schema_events_waits_history_size
Value: 10
Variable_name: performance_schema_max_cond_classes
Value: 80
Variable_name: performance_schema_max_cond_instances
Value: 1000
Variable_name: performance_schema_max_file_classes
Value: 50
Variable_name: performance_schema_max_file_handles
Value: 32768
Variable_name: performance_schema_max_file_instances
Value: 10000
Variable_name: performance_schema_max_mutex_classes
Value: 200
Variable_name: performance_schema_max_mutex_instances
Value: 1000000
Variable_name: performance_schema_max_rwlock_classes
Value: 30
Variable_name: performance_schema_max_rwlock_instances
Value: 1000000
Variable_name: performance_schema_max_table_handles
Value: 100000
Variable_name: performance_schema_max_table_instances
Value: 50000
Variable_name: performance_schema_max_thread_classes
Value: 50
Variable_name: performance_schema_max_thread_instances
Value: 1000
Variable_name: pid_file
Value: /var/run/mysqld/mysqld.pid
Variable_name: plugin_dir
Value: /usr/lib/mysql/plugin/
Variable_name: port
Value: 3306
Variable_name: preload_buffer_size
Value: 32768
Variable_name: profiling
Value: OFF
Variable_name: profiling_history_size
Value: 15
Variable_name: protocol_version
Value: 10
Variable_name: proxy_user
Value:
Variable_name: pseudo_slave_mode
Value: OFF
Variable_name: pseudo_thread_id
Value: 30736
Variable_name: query_alloc_block_size
Value: 8192
Variable_name: query_cache_limit
Value: 8388608
Variable_name: query_cache_min_res_unit
Value: 4096
Variable_name: query_cache_size
Value: 0
Variable_name: query_cache_type
Value: ON
Variable_name: query_cache_wlock_invalidate
Value: OFF
Variable_name: query_prealloc_size
Value: 8192
Variable_name: rand_seed1
Value: 0
Variable_name: rand_seed2
Value: 0
Variable_name: range_alloc_block_size
Value: 4096
Variable_name: read_buffer_size
Value: 131072
Variable_name: read_only
Value: OFF
Variable_name: read_rnd_buffer_size
Value: 262144
Variable_name: relay_log
Value:
Variable_name: relay_log_index
Value:
Variable_name: relay_log_info_file
Value: relay-log.info
Variable_name: relay_log_purge
Value: ON
Variable_name: relay_log_recovery
Value: OFF
Variable_name: relay_log_space_limit
Value: 0
Variable_name: report_host
Value:
Variable_name: report_password
Value:
Variable_name: report_port
Value: 3306
Variable_name: report_user
Value:
Variable_name: rpl_recovery_rank
Value: 0
Variable_name: secure_auth
Value: OFF
Variable_name: secure_file_priv
Value:
Variable_name: server_id
Value: 7
Variable_name: skip_external_locking
Value: ON
Variable_name: skip_name_resolve
Value: OFF
Variable_name: skip_networking
Value: OFF
Variable_name: skip_show_database
Value: OFF
Variable_name: slave_compressed_protocol
Value: OFF
Variable_name: slave_exec_mode
Value: STRICT
Variable_name: slave_load_tmpdir
Value: /tmp
Variable_name: slave_max_allowed_packet
Value: 1073741824
Variable_name: slave_net_timeout
Value: 3600
Variable_name: slave_skip_errors
Value: OFF
Variable_name: slave_transaction_retries
Value: 10
Variable_name: slave_type_conversions
Value:
Variable_name: slow_launch_time
Value: 2
Variable_name: slow_query_log
Value: ON
Variable_name: slow_query_log_file
Value: /var/lib/mysql/db-nrlookup-slow.log
Variable_name: socket
Value: /var/run/mysqld/mysqld.sock
Variable_name: sort_buffer_size
Value: 2097152
Variable_name: sql_auto_is_null
Value: OFF
Variable_name: sql_big_selects
Value: ON
Variable_name: sql_big_tables
Value: OFF
Variable_name: sql_buffer_result
Value: OFF
Variable_name: sql_log_bin
Value: ON
Variable_name: sql_log_off
Value: OFF
Variable_name: sql_low_priority_updates
Value: OFF
Variable_name: sql_max_join_size
Value: 18446744073709551615
Variable_name: sql_mode
Value:
Variable_name: sql_notes
Value: ON
Variable_name: sql_quote_show_create
Value: ON
Variable_name: sql_safe_updates
Value: OFF
Variable_name: sql_select_limit
Value: 18446744073709551615
Variable_name: sql_slave_skip_counter
Value: 0
Variable_name: sql_warnings
Value: OFF
Variable_name: ssl_ca
Value:
Variable_name: ssl_capath
Value:
Variable_name: ssl_cert
Value:
Variable_name: ssl_cipher
Value:
Variable_name: ssl_key
Value:
Variable_name: storage_engine
Value: InnoDB
Variable_name: stored_program_cache
Value: 256
Variable_name: sync_binlog
Value: 0
Variable_name: sync_frm
Value: ON
Variable_name: sync_master_info
Value: 0
Variable_name: sync_relay_log
Value: 0
Variable_name: sync_relay_log_info
Value: 0
Variable_name: system_time_zone
Value: CET
Variable_name: table_definition_cache
Value: 400
Variable_name: table_open_cache
Value: 4096
Variable_name: thread_cache_size
Value: 64
Variable_name: thread_concurrency
Value: 10
Variable_name: thread_handling
Value: one-thread-per-connection
Variable_name: thread_stack
Value: 196608
Variable_name: time_format
Value: %H:%i:%s
Variable_name: time_zone
Value: SYSTEM
Variable_name: timed_mutexes
Value: OFF
Variable_name: timestamp
Value: 1385625067
Variable_name: tmp_table_size
V
Mic*_*bot 16
根据阅读您从解释输出中发布的查询计划,您可能很难相信服务器实际如何处理此查询的解释……但该解释确实说明了为什么性能会不好。
由于您要求按 `rangelength` 对结果进行排序,并且由于 (`rangestart`,`rangeend`) 上的 B-TREE 索引不太适合解析“y 和 z 之间的 x”表达式,优化器具有决定使用 `rangelength` 上的索引来确定它将继续读取每一行的顺序,如有必要,在整个表(类型 = 索引)中,按 rangelength 上的索引排序的升序(类型 = 索引, key = rangelength),直到找到匹配 where 子句的第一行(extra = using where)。由于正在以所需的顺序读取行,服务器可以在第一行之后停止......所以我认为这个查询表现出很大的可变性,这取决于必须扫描多少表或索引来解决任何特定的问题价值。
有两种方法可以改善这一点。
选项 1:第一个建议是添加一个索引,其中包含您正在排序和选择的所有三个值...但不是出于通常的原因,因为查询不会像那样使用它。
ALTER TABLE ochrange ADD KEY(rangelength,rangeend,rangestart);
Run Code Online (Sandbox Code Playgroud)
这还远不是这个查询的理想索引,但与您现在拥有的索引相比,它具有三个优势:
WHERE
子句中所有感兴趣的值都在索引中找到,因此优化器应该能够根据索引扫描来限定或取消行,而不必读取表数据,也许可以做更多。关于第 3 点的非常重要的说明:我并不是说这个索引将用于查找匹配的行,因为它不能完全用于那个。然而,它至少应该比当前计划更有效地使用,因为它包含我们需要用于过滤的值,并且因为它也可能允许快速消除 rangeend 的超出范围的值,并且在剩余的值中,它可能允许也消除 rangestart 的超出范围的值。
我还建议 where 子句以一种不那么模糊但逻辑上等效的形式编写,以可能使优化器的事情变得更容易一些:
WHERE 20972128
BETWEEN `rangestart` AND `rangeend`
ORDER BY `rangelength` ASC LIMIT 1;
Run Code Online (Sandbox Code Playgroud)
...变成这样:
WHERE rangestart <= 20972128
AND rangeend >= 20972128
ORDER BY rangelength ASC LIMIT 1;
Run Code Online (Sandbox Code Playgroud)
( rangestart
, rangeend
) 索引乍一看似乎更有用,但是 2 列 B-Tree 不太适合查找上下限之间的值,就像这样。
住宅电话簿是对 (last_name, first_name) 上的两列索引的恰当类比,并说明了为什么这种索引没有提供看起来的那么多好处。
在这样的目录中,给定姓氏“Smith”和名字“John”,很容易找到所有名为 Smith 的人,并且很容易找到与姓氏 Smith 相伴的名字 John。但是,使用电话簿中的索引来查找所有姓为 John 的人而不管姓氏是完全不可能的。
我们要求此查询中的索引,无论是以原始方式编写还是以我建议的方式编写,都是在同一行中找到 'rangestart' <= 20972128 和伴随的 rangeend >= 20972128 的所有行。这就像试图在电话簿中查找姓氏为 Smith 或在电话簿中出现在 Smith 之前的任何其他姓氏的所有人,然后在这些人中找到名字为 John 的人,或者在词汇上(按字母顺序)“大于”(在)约翰之后的任何其他名称。任务会很乏味,我们唯一的安慰是我们不必检查目录中史密斯之后的任何页面,但我们必须检查每个前一页上的每个条目,然后才能找到我们要查找的内容.
尽管如此,选项#1,添加一个新索引,似乎值得一试。在使用该索引进行测试后,还值得添加另一个索引,该索引位于 (rangelength,rangestart,rangeend) 上,以查看优化器更喜欢使用哪一个。希望它会使用其中之一,并且根据表中的数据和查询中的值,它可能会交替,也可能不会。
在某些人看来,选项 #2 显然有点“开箱即用”,但它是我用来查找特定 IP 地址块的解决方案(IPv4 地址本质上是 INT UNSIGNED,具有低/高边界),其中特定的 IP 地址谎言,用于地理编码。我因为在 Stack Overflow 上再次提出这种技术而感到有些悲痛,但我只能得出结论,提出反对意见的人只是“小小思考”,因为我真的看不出这不是一个很好的解决方案的理由。我所指的主题是空间索引。我认为我遇到的反对意见是基于 MySQL 的空间扩展最初用于操纵地理空间的假设 数据......但将它们的使用限制在纬度和经度上是完全没有道理的。
MySQL 中的空间索引实现为R-Trees。
数据结构的关键思想是将附近的对象分组,并在树的下一个更高级别中用它们的最小边界矩形表示它们;R-tree 中的“R”代表矩形。由于所有对象都位于此边界矩形内,因此不与边界矩形相交的查询也不能与任何包含的对象相交。在叶级,每个矩形描述一个对象;在更高的层次上,越来越多的对象的聚合。这也可以看作是对数据集的越来越粗略的近似。
我们试图在值的“空间”中找到特定值存在的位置,因此利用旨在解决特定对象适合哪些“空间”的索引结构是有意义的……空间索引。
从技术上讲, rangstart/rangeend 连续体是一个一维空间,因为它由范围内的点组成,这些点都存在于一条连续线上,尽管我个人发现如果每对 (rangestart, rangeend) 都被说明更容易解释作为一个盒子,从 (min,min) 到 (max,min) 到 (max,max) 到 (min,max) 再回到 (min,min)。从这个例子中很容易看出,如果我们有一个索引结构,可以快速确定我们在空间中的特定点存在或不存在的框集,那么我们可以快速遍历该索引以找到正确的位置。在这种情况下,我们必须找到正确的盒子集,然后找到这些盒子中最小的盒子(假设我对“范围长度”内容的性质的猜测是正确的)我们的小点(实际上是一个“盒子” ”
与其重复已经完成的工作,我将参考 Jeremy Cole 对该主题的撰写:
http://blog.jcole.us/2007/11/24/on-efficiently-geo-referencing-ips-with-maxmind-geoip-and-mysql-gis/
我的处理方式略有不同,但原则都在那里,一旦您了解这里发生的事情,我怀疑您会认为这非常适合您正在尝试做的事情,并且可能与我们中的任何一个都略有不同。
但关于空间索引的最后一点。这是我的查询示例,其中空间列称为“node_polygon”并且类型为GEOMETRY
:
SELECT ...
FROM geo_block b
WHERE MBRContains(b.node_polygon,POINT(in_ip_unsigned,in_ip_unsigned))
Run Code Online (Sandbox Code Playgroud)
我提到这个查询结构是因为它说明了一个重要的观点。这是几乎总是当你在使用列作为参数传递给函数的情况下WHERE
条款,这是不好的设计,因为它可以防止用于解析表达式的索引,将导致全表扫描或一些类似的。
WHERE YEAR(birthday) = 1973; # bad
WHERE birthday >= '1973-01-01' AND birthday < '1974-01-01'; # good
Run Code Online (Sandbox Code Playgroud)
前一个表达式必须评估YEAR()
每一行中的“生日”列,而后一个表达式可以利用“生日”上的索引并进行范围扫描。
空间索引是不同的,因为优化器将MBRContains()
和MBRWithin()
函数理解为意味着应该根据常量在空间索引内标识的范围来评估列和常量。这些“函数”是罕见的函数示例,它们仍然允许优化器意识到它知道比针对每一行评估函数更好的方法来解析查询。
在我的应用程序中,我不需要排序,因为表受到限制,因此没有两个条目可以接触或重叠——每个给定的 IP 地址要么恰好适合一个块,要么根本不适合任何块。在您的情况下,您可能仍然需要按范围长度排序,而我会尝试什么,取决于您是构建几何结构线还是框,将在通过适当的几何函数排序时测试您的性能,例如Area()或Glength(),直接比较范围的大小,而不是使用 rangelength 列。我不知道哪个会执行更好的几何函数或按现有列范围长度排序。
旁注,正如评论中指出的那样,您也不应该引用您在 where 子句中使用的整数,因为它与整数列匹配,并且通过引用它,您会导致将一件事隐式转换为另一件事事物(文字被转换为整数,或者 rangestart/rangeend 的每个值都被转换为字符串)以进行比较。服务器可能正在做正确的事情并将字符串转换为整数,但最好使用与您匹配的数据类型相同的数据类型进行查询。