快速算法在一组范围中快速找到一个数字所属的范围?

本文介绍了快速算法在一组范围中快速找到一个数字所属的范围?的处理方法,对大家解决问题具有一定的参考价值

问题描述

我有几个数字范围.这些范围不重叠 - 因为它们不重叠,所以逻辑结果是任何时候都不能有多个范围的一部分.每个范围都是连续的(单个范围内没有空洞,因此范围 8 到 16 将真正包含 8 到 16 之间的所有数字),但两个范围之间可能存在空洞(例如,范围从 64 开始到 128,下一个范围从 256 开始到 384),所以一些数字可能根本不属于任何范围(在这个例子中,数字 129 到 255 不属于任何范围).

I have several number ranges. Those ranges are not overlapping - as they are not overlapping, the logical consequence is that no number can be part of more than one range at any time. Each range is continuously (there are no holes within a single range, so a range 8 to 16 will really contain all numbers between 8 and 16), but there can be holes between two ranges (e.g. range starts at 64 and goes to 128, next range starts at 256 and goes to 384), so some numbers may not belong to any range at all (numbers 129 to 255 would not belong to any range in this example).

我得到一个数字,需要知道该数字属于哪个范围...如果它属于任何范围.否则我需要知道它不属于任何范围.当然速度很重要;我不能简单地检查 O(n) 的所有范围,因为可能有数千个范围.

I'm getting a number and need to know to which range the number belongs to... if it belongs to any range at all. Otherwise I need to know that it does not belong to any range. Of course speed is important; I can not simply check all the ranges which would be O(n), as there might be thousands of ranges.

一个简单的解决方案是将所有数字保存在一个已排序的数组中并对其进行二分查找.那至少会给我 O(log n).当然,二进制搜索必须稍微修改,因为它必须始终检查范围的最小和最大数字.如果要查找的数字介于两者之间,则我们找到了正确的范围,否则我们必须搜索低于或高于当前范围的范围.如果最后只剩下一个范围并且数字不在该范围内,则该数字根本不在范围内,我们可以返回未找到"结果.

A simple solution was keeping all numbers in a sorted array and run a binary search on it. That would give me at least O(log n). Of course the binary search must be somewhat modified as it must always check against the smallest and biggest number of a range. If the number to look for is in between, we have found the correct range, otherwise we must search ranges below or above the current one. If there is only one range left in the end and the number is not within that range, the number is within no range at all and we can return a "not found" result.

范围也可以以某种树结构链接在一起.这基本上就像一个带有二分搜索的排序列表.优点是修改树比排序数组更快(添加/删除范围),但与我们浪费一些额外时间来保持树平衡不同,随着时间的推移,树可能会变得非常不平衡,这将导致搜索比对排序数组的二分搜索慢得多.

Ranges could also be chained together in some kind of tree structure. This is basically like a sorted list with binary search. The advantage is that it will be faster to modify a tree than a sorted array (adding/removing range), but unlike we waste some extra time on keeping the tree balanced, the tree might get very unbalanced over the time and that will lead to much slower searches than a binary search on a sorted array.

人们可以争论哪种解决方案更好或更坏,因为在实践中搜索和修改操作的数量几乎是平衡的(每秒执行的搜索和添加/删除操作的数量相同).

One can argue which solution is better or worse as in practice the number of searches and modification operations will be almost balanced (there will be an equal number of searches and add/remove operations performed per second).

对于此类问题,是否有比排序列表或树更好的数据结构?也许在最好的情况下比 O(log n) 更好,在最坏的情况下比 O(log n) 更好?

Is there maybe a better data structure than a sorted list or a tree for this kind of problem? Maybe one that could be even better than O(log n) in best case and O(log n) in worst case?

一些可能对这里有帮助的附加信息如下:所有范围始终以 2 的幂的倍数开始和结束.它们总是以相同的 2 次幂开始和结束(例如,它们都以 4 的倍数或 8 的倍数或 16 的倍数等开始/结束).2 的幂在运行时不能改变.在添加第一个范围之前,必须设置 2 的幂,并且所有添加的范围必须以该值的倍数开始/结束,直到应用程序终止.我认为这可以用于优化,就好像它们都以例如的倍数开始8、我可以忽略所有比较操作的前3位,如果有的话,其他位会告诉我范围.

Some additional information that might help here is the following: All ranges always start and end at multiple of a power of two. They always all start and end at the same power of two (e.g. they all start/end at a multiple of 4 or at a multiple of 8 or at a multiple of 16 and so on). The power of two cannot change during run time. Before the first range is added, the power of two must be set and all ranges ever added must start/end at a multiple of this value till the application terminates. I think this can be used for optimization, as if they all start at a multiple of e.g. 8, I can ignore the first 3 bits for all comparison operations, the other bits alone will tell me the range if any.

我阅读了部分树和范围树.这些是问题的最佳解决方案吗?可能有更好的解决方案吗?这个问题听起来类似于 malloc 实现必须做的事情(例如,每个被释放的内存块都属于一个可用内存范围,而 malloc 实现必须找出哪个),那么这些通常如何解决问题?

I read about section and ranges trees. Are these optimal solutions to the problem? Are there possibly better solutions? The problem sounds similar to what a malloc implementation must do (e.g. every free'd memory block belongs to a range of available memory and the malloc implementation must find out to which one), so how do those commonly solve the issue?

推荐答案

在运行各种基准测试后,我得出的结论是,只有树状结构可以在这里工作.排序列表当然显示了良好的查找性能 - O(log n) - 但它显示了可怕的更新性能(与树相比,插入和删除的速度慢了 10 倍以上!).

After running various benchmarks, I came to the conclusion that only a tree like structure can work here. A sorted list shows of course good lookup performance - O(log n) - but it shows horribly update performance (inserts and removals are slower by more than the factor 10 compared to trees!).

平衡二叉树也有 O(log n) 的查找性能,但是更新速度要快得多,也在 O(log n) 左右,而排序列表更像是 O(n) 的更新 (O(log n)n) 找到插入的位置或要删除的元素,但最多必须在列表中移动 n 个元素,这是 O(n)).

A balanced binary tree also has O(log n) lookup performance, however it is much faster to update, also around O(log n), while a sorted list is more like O(n) for updates (O(log n) to find the position for insert or the element to delete, but then up to n elements must be moved within the list and this is O(n)).

我实现了 AVL 树、红黑树、Treap、AA 树和 B 树的各种变体(B 在这里表示拜耳树,而不是二叉树).结果:拜耳树几乎从来没有赢过.它们的查找很好,但它们的更新性能很差(因为在 B 树的每个节点中,您再次拥有一个排序列表!).拜耳树仅在读取/写入节点是非常缓慢的操作的情况下(例如,当节点直接从/向硬盘读取或写入时)是优越的 - 因为 B 树必须读取/写入的节点比其他任何节点少得多树,所以在这种情况下它会赢.但是,如果我们将这棵树留在内存中,那么它就没有机会对抗其他树,对不起所有 B-Tree 粉丝.

I implemented an AVL tree, a red-black tree, a Treap, an AA-Tree and various variations of B-Trees (B means Bayer Tree here, not Binary). Result: Bayer trees almost never win. Their lookup is good, but their update performance is bad (as within each node of a B-Tree you have a sorted list again!). Bayer trees are only superior in cases where reading/writing a node is a very slow operation (e.g. when the nodes are directly read or written from/to hard disk) - as a B-Tree must read/write much less nodes than any other tree, so in such a case it will win. If we are having the tree in memory though, it stands no chance against other trees, sorry for all the B-Tree fans out there.

Treap 最容易实现(不到其他平衡树所需的代码行数的一半,仅是非平衡树所需代码的两倍)并且显示了良好的查找和更新平均性能......但我们可以做得比那更好.

A Treap was easiest to implement (less than half the lines of code you need for other balanced trees, only twice the code you need for an unbalanced tree) and shows good average performance for lookups and updates... but we can do better than that.

AA-Tree 显示出惊人的良好查找性能 - 我不知道为什么.它们有时会击败所有其他树(不是很远,但仍然足以不巧合)......并且移除性能还可以,但是除非我太愚蠢而无法正确实现它们,否则插入性能非常糟糕(它执行每次插入的树旋转比任何其他树都要多——即使是 B 树也有更快的插入性能).

An AA-Tree shows amazing good lookup performance - I have no idea why. They sometimes beat all other trees (not by far, but still enough to not be coincident)... and the removal performance is okay, however unless I'm too stupid to implement them correctly, the insert performance is really bad (it performs much more tree rotations on every insert than any other tree - even B-Trees have faster insert performance).

这给我们留下了两个经典,AVL 和 RB-Tree.它们都非常相似,但经过数小时的基准测试,有一点很清楚:AVL 树肯定比 RB 树具有更好的查找性能.差异并不大,但在所有基准测试中,它们有 2/3 会赢得查找测试.不足为奇,毕竟 AVL Trees 比 RB-Trees 更严格平衡,所以在大多数情况下它们更接近最优二叉树.我们在这里不是在谈论巨大的差异,这总是一场势均力敌的比赛.

This leaves us with two classics, AVL and RB-Tree. They are both pretty similar but after hours of benchmarking, one thing is clear: AVL Trees definitely have better lookup performance than RB-Trees. The difference is not gigantic, but in 2/3 out of all benchmarks they will win the lookup test. Not too surprising, after all AVL Trees are more strictly balanced than RB-Trees, so they are closer to the optimal binary tree in most cases. We are not talking about a huge difference here, it is always a close race.

另一方面,RB Trees 在几乎所有测试运行中都击败了 AVL Trees,这并不是一场势均力敌的比赛.和以前一样,这是意料之中的.与 AVL 树相比,RB 树不那么严格平衡,在插入时执行的树旋转要少得多.

On the other hand RB Trees beat AVL Trees for inserts in almost all test runs and that is not such a close race. As before, that is expected. Being less strictly balanced RB Trees perform much less tree rotations on inserts compared to AVL Trees.

删除节点怎么样?这里似乎在很大程度上取决于节点的数量.对于小节点数(小于一百万),RB Trees 再次拥有 AVL Trees;差异甚至比插入还要大.相当出乎意料的是,一旦节点数量增长到超过一百万个节点,AVL 树似乎迎头赶上,并且与 RB 树的差异缩小,直到它们或多或少同样快.不过,这可能是系统的影响.它可能与进程的内存使用或 CPU 缓存等有关.对 RB 树的负面影响比对 AVL 树的负面影响更大的东西,因此 AVL 树可以迎头赶上.对于查找(AVL 通常更快,无论有多少节点)和插入(RB 通常更快,无论有多少节点),没有观察到相同的效果.

How about removal of nodes? Here it seems to depend a lot on the number of nodes. For small node numbers (everything less than half a million) RB Trees again own AVL Trees; the difference is even bigger than for inserts. Rather unexpected is that once the node number grows beyond a million nodes AVL Trees seems to catch up and the difference to RB Trees shrinks until they are more or less equally fast. This could be an effect of the system, though. It could have to do with memory usage of the process or CPU caching or the like. Something that has a more negative effect on RB Trees than it has on AVL Trees and thus AVL Trees can catch up. The same effect is not observed for lookups (AVL usually faster, regardless how many nodes) and inserts (RB usually faster, regardless how many nodes).

结论:
我认为我能得到的最快的是使用 RB-Trees 时,因为查找次数只会略高于插入和删除的次数,无论 AVL 的查找速度有多快,整体性能都会受到更差的插入的影响/删除性能.

Conclusion:
I think the fastest I can get is when using RB-Trees, since the number of lookups will only be somewhat higher than the number of inserts and deletions and no matter how fast AVL is on lookups, the overall performance will suffer from their worse insert/deletion performance.

也就是说,除非这里有人想出一个更好的数据结构来大量拥有 RB Trees ;-)

That is, unless anyone here may come up with a much better data structure that will own RB Trees big time ;-)

这篇关于快速算法在一组范围中快速找到一个数字所属的范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,WP2

admin_action_{$_REQUEST[‘action’]}

do_action( "admin_action_{$_REQUEST[‘action’]}" )动作钩子::在发送“Action”请求变量时激发。Action Hook: Fires when an ‘action’ request variable is sent.目录锚点:#说明#源码说明(Description)钩子名称的动态部分$_REQUEST['action']引用从GET或POST请求派生的操作。源码(Source)更新版本源码位置使用被使用2.6.0 wp-admin/admin.php:...

日期:2020-09-02 17:44:16 浏览:1127

admin_footer-{$GLOBALS[‘hook_suffix’]}

do_action( "admin_footer-{$GLOBALS[‘hook_suffix’]}", string $hook_suffix )操作挂钩:在默认页脚脚本之后打印脚本或数据。Action Hook: Print scripts or data after the default footer scripts.目录锚点:#说明#参数#源码说明(Description)钩子名的动态部分,$GLOBALS['hook_suffix']引用当前页的全局钩子后缀。参数(Parameters)参数类...

日期:2020-09-02 17:44:20 浏览:1032

customize_save_{$this->id_data[‘base’]}

do_action( "customize_save_{$this->id_data[‘base’]}", WP_Customize_Setting $this )动作钩子::在调用WP_Customize_Setting::save()方法时激发。Action Hook: Fires when the WP_Customize_Setting::save() method is called.目录锚点:#说明#参数#源码说明(Description)钩子名称的动态部分,$this->id_data...

日期:2020-08-15 15:47:24 浏览:775

customize_value_{$this->id_data[‘base’]}

apply_filters( "customize_value_{$this->id_data[‘base’]}", mixed $default )过滤器::过滤未作为主题模式或选项处理的自定义设置值。Filter Hook: Filter a Customize setting value not handled as a theme_mod or option.目录锚点:#说明#参数#源码说明(Description)钩子名称的动态部分,$this->id_date['base'],指的是设置...

日期:2020-08-15 15:47:24 浏览:866

get_comment_author_url

过滤钩子:过滤评论作者的URL。Filter Hook: Filters the comment author’s URL.目录锚点:#源码源码(Source)更新版本源码位置使用被使用 wp-includes/comment-template.php:32610...

日期:2020-08-10 23:06:14 浏览:903

network_admin_edit_{$_GET[‘action’]}

do_action( "network_admin_edit_{$_GET[‘action’]}" )操作挂钩:启动请求的处理程序操作。Action Hook: Fires the requested handler action.目录锚点:#说明#源码说明(Description)钩子名称的动态部分$u GET['action']引用请求的操作的名称。源码(Source)更新版本源码位置使用被使用3.1.0 wp-admin/network/edit.php:3600...

日期:2020-08-02 09:56:09 浏览:848

network_sites_updated_message_{$_GET[‘updated’]}

apply_filters( "network_sites_updated_message_{$_GET[‘updated’]}", string $msg )筛选器挂钩:在网络管理中筛选特定的非默认站点更新消息。Filter Hook: Filters a specific, non-default site-updated message in the Network admin.目录锚点:#说明#参数#源码说明(Description)钩子名称的动态部分$_GET['updated']引用了非默认的...

日期:2020-08-02 09:56:03 浏览:834

pre_wp_is_site_initialized

过滤器::过滤在访问数据库之前是否初始化站点的检查。Filter Hook: Filters the check for whether a site is initialized before the database is accessed.目录锚点:#源码源码(Source)更新版本源码位置使用被使用 wp-includes/ms-site.php:93910...

日期:2020-07-29 10:15:38 浏览:809

WordPress 的SEO 教学:如何在网站中加入关键字(Meta Keywords)与Meta 描述(Meta Description)?

你想在WordPress 中添加关键字和meta 描述吗?关键字和meta 描述使你能够提高网站的SEO。在本文中,我们将向你展示如何在WordPress 中正确添加关键字和meta 描述。为什么要在WordPress 中添加关键字和Meta 描述?关键字和说明让搜寻引擎更了解您的帖子和页面的内容。关键词是人们寻找您发布的内容时,可能会搜索的重要词语或片语。而Meta Description则是对你的页面和文章的简要描述。如果你想要了解更多关于中继标签的资讯,可以参考Google的说明。Meta 关键字和描...

日期:2020-10-03 21:18:25 浏览:1619

谷歌的SEO是什么

SEO (Search Engine Optimization)中文是搜寻引擎最佳化,意思近于「关键字自然排序」、「网站排名优化」。简言之,SEO是以搜索引擎(如Google、Bing)为曝光媒体的行销手法。例如搜寻「wordpress教学」,会看到本站的「WordPress教学:12个课程…」排行Google第一:关键字:wordpress教学、wordpress课程…若搜寻「网站架设」,则会看到另一个网页排名第1:关键字:网站架设、架站…以上两个网页,每月从搜寻引擎导入自然流量,达2万4千:每月「有机搜...

日期:2020-10-30 17:23:57 浏览:1262