“Ω(n log n)屏障"的规则是什么?排序算法?

本文介绍了“Ω(n log n)屏障"的规则是什么?排序算法?的处理方法，对大家解决问题具有一定的参考价值

问题描述

我写了一个简单的程序，在 O(n) 中排序.它的内存效率非常低，但这不是重点.

I wrote a simple program that sorts in O(n). It is highly memory inefficient, but that's not the point.

它使用HashMap背后的原理进行排序:

It uses the principle behind a HashMap for sorting:

public class NLogNBreak {
    public static class LinkedListBack {
        public LinkedListBack(int val){
            first = new Node();
            first.val = val;
        }
        public Node first = null;
        public void insert(int i){
            Node n = new Node();
            n.val = i;
            n.next = first;
            first = n;
        }
    }

    private static class Node {
        public Node next = null;
        public int val;
    }

    //max > in[i] > 0
    public static LinkedListBack[] sorted(int[] in, int max){
        LinkedListBack[] ar = new LinkedListBack[max + 1];
        for (int i = 0; i < in.length; i++) {
            int val = in[i];
            if(ar[val] == null){
                ar[val] = new LinkedListBack(val);
            } else {
                ar[val].insert(val);
            }
        }
        return ar;
    }
}

那么，即使它以一种时髦的格式返回结果，这也算是一种 O(n) 吗?

So does this count as a sort of O(n), even though it returns the result in a funky format?

推荐答案

直接回答你的问题:

您的排序算法在技术上不是 O(n) 而是 O(n + max)，因为您需要创建一个大小为 max 的数组，这需要 O(max) 时间.
这不是问题；事实上，这是一个著名的排序算法的特例，它打破了Ω(n log n) 障碍.

那么这个 Ω(n log n) 障碍是什么?它从何而来?你如何打破它?

So what is this Ω(n log n) barrier? Where does it come from? And how do you break it?

Ω(n log n) 屏障是任何基于比较的排序算法的平均案例速度的信息理论下界.如果您被允许应用于数组元素以区分它们的唯一操作是执行某种比较，那么您的排序算法在平均情况下不会比 Ω(n log n) 做得更好.

The Ω(n log n) barrier is the information-theoretical lower bound on the average-case speed of any comparison-based sorting algorithm. If the only operations you are permitted to apply to array elements to distinguish them is to perform some sort of comparison, then your sorting algorithm can't do better than Ω(n log n) in the average-case.

要理解为什么会这样，让我们考虑一下算法在执行过程中任意时刻的状态.随着算法的运行，它可以获得一些关于输入元素排序方式的信息.假设如果算法有一些关于输入元素原始排序的信息 X，那么算法处于状态 X.

To understand why this is, let's think about the state of the algorithm at any point during its execution. As the algorithm is running, it can gain some amount of information about the way that the input elements were ordered. Let's say that if the algorithm has some set of information X about the original ordering of the input elements, then the algorithm is in state X.

Ω(n log n) 参数(以及几个相关的参数，我将在后面讨论)的关键是算法必须能够根据输入是.现在让我们假设排序算法的输入是一个包含 n 个不同值的数组.因为除了它们的排序方式之外，该算法无法告诉任何关于这些元素的信息，所以排序的值是什么并不重要.重要的是这 n 个元素相对于彼此的相对顺序.

The crux of the Ω(n log n) argument (and several related arguments, as I'll discuss later) is that the algorithm has to have the ability to get into a large number of different states based on what the input is. Let's assume for now that the input to the sorting algorithm is an array of n distinct values. Because the algorithm can't tell anything about those elements other than the way that they're ordered, it doesn't really matter what the values being sorted are. All that matters is the relative ordering of those n elements relative to one another.

现在是关键步骤 - 让我们假设有 f(n) 种独特的方式对 n 个输入元素进行排序，并假设我们的排序算法无法进入至少 f(n) 种不同的状态.如果是这种情况，则数组中的元素必须有两种不同的排序，算法总是将它们组合在一起形成相同的状态.如果发生这种情况，则排序算法不可能正确地对两个输入数组进行正确排序.这背后的原因是，因为该算法对两个数组的处理相同，所以它用来重新排序第一个数组的元素的任何步骤都将与它用来重新排序第二个数组的元素的步骤相同.由于这两个数组不相同，因此在两种情况之一中必须至少有一个元素不合适.因此，我们知道排序算法必须能够进入 f(n) 个不同的状态.

Now for the key step - let's suppose that there are f(n) unique ways of ordering the n input elements and suppose that our sorting algorithm can't get into at least f(n) different states. If this is the case, then there has to be two different orderings of the elements in the array that the algorithm always groups together into the same state. If this happens, then the sorting algorithm can't possibly correctly sort both of the two input arrays correctly. The reasoning behind this is that because the algorithm treats the two arrays identically, whatever steps it uses to reorder the elements of the first array will be the same as the steps it uses to reorder the elements of the second array. Since the two arrays aren't the same, there has to be at least one element that will be out of place in one of the two cases. Consequently, we know that the sorting algorithm has to be able to get into f(n) different states.

但是算法如何进入这些不同的状态呢?好吧，让我们考虑一下.最初，该算法根本没有关于元素排序的信息.当它进行第一次比较时(例如，在元素 A[i] 和 A[j] 之间)，该算法可以进入两种状态之一 - A[i] <;A[j] 和 A[i] > A[j].更一般地说，算法进行的每一次比较，在最好的情况下，都可以根据比较的结果将算法置于两个新状态之一.因此，我们可以想象一个大的二叉树结构来描述算法可以处于的状态——每个状态有多达两个子节点，根据所进行的比较结果描述算法进入的状态.如果我们采用从树根到叶子的任何路径，我们就会得到算法对特定输入进行的一系列比较.为了尽快排序，我们希望尽可能减少比较次数，因此我们希望这个树结构具有尽可能小的高度.

But how can the algorithm get into these different states? Well, let's think about this. Initially, the algorithm has no information at all about the ordering of the elements. When it makes its first comparison (say, between elements A[i] and A[j]), the algorithm can get into one of two states - one where A[i] < A[j] and one where A[i] > A[j]. More generally, every comparison that the algorithm makes can, in the best case, put the algorithm into one of two new states based on the result of the comparison. We can therefore think of a large binary tree structure describing the states that the algorithm can be in - each state has up to two children describing what state the algorithm gets into based on the result of the comparison that's made. If we take any path from the root of the tree down to a leaf, we get the series of comparisons that end up getting made by the algorithm on a particular input. In order to sort as quickly as possible, we want to make the fewest number of comparisons possible, and so we want this tree structure to have the smallest height possible.

现在，我们知道两件事.首先，我们可以将算法可以进入的所有状态视为二叉树.其次，二叉树必须至少有 f(n) 个不同的节点.鉴于此，我们可以构建的最小二叉树的高度必须至少为 Ω(log f(n)).这意味着如果有 f(n) 种不同的排列数组元素的可能方式，我们必须至少进行 Ω(log f(n)) 次平均比较，否则我们就不能'进入足够多的不同状态.

Now, we know two things. First, we can think of all of the states the algorithm can get into as a binary tree. Second, that binary tree has to have at least f(n) different nodes in it. Given this, the smallest possible binary tree we can build has to have height at least Ω(log f(n)). This means that if there are f(n) different possible ways of ordering the array elements, we have to make at least Ω(log f(n)) comparisons on average, since otherwise we can't get into enough differing states.

总结你无法击败Ω(n log n)的证明，注意如果数组中有n个不同的元素，那么就有n！对元素进行排序的不同可能方式.使用 Stirling 近似，我们得到了 log n！= Ω(n log n)，因此我们必须在平均情况下至少进行 Ω(n log n) 次比较才能对输入序列进行排序.

To conclude the proof that you can't beat Ω(n log n), note that if the array has n distinct elements in it, then there are n! different possible ways of ordering the elements. using Stirling's approximation, we have that log n! = Ω(n log n), and so we have to make at least Ω(n log n) comparisons in the average case to sort the input sequence.

在我们刚才看到的内容中，我们看到如果您有 n 个完全不同的数组元素，则无法使用比 Ω(n log n) 更快的比较排序对它们进行排序.然而，这个起始假设不一定有效.我们想要排序的许多数组中可能有重复的元素.例如，假设我想对仅由 0 和 1 组成的数组进行排序，例如这里的数组:

In what we just saw above, we saw that if you have n array elements that are all distinct, you cannot sort them with a comparison sort any faster than Ω(n log n). However, this starting assumption isn't necessarily valid. Many arrays that we'd like to sort may have duplicated elements in them. For example, suppose that I want to sort arrays that are composed solely of zeros and ones, such as this array here:

 0 1 0 1 1 1 0 0 1 1 1

在这种情况下，不是有 n 个！不同的零数组和长度为 n 的数组.实际上，它们只有 2ⁿ 个.从我们上面的结果来看，这意味着我们应该能够使用纯粹基于比较的排序算法在 Ω(log 2ⁿ) = Ω(n) 时间内进行排序.事实上，我们完全可以做到这一点；这是我们如何做的草图:

In this case, it is not true that there are n! different arrays of zeros and ones of length n. In fact, there are only 2ⁿ of them. From our result above, this means that we should be able to sort in Ω(log 2ⁿ) = Ω(n) time using a purely comparison-based sorting algorithm. In fact, we absolutely can do this; here's a sketch of how we'd do it:

看第一个元素.
将小于第一个元素的所有元素复制到名为less"的数组中
将所有与第一个元素相等的元素复制到一个名为equal"的数组中
将所有大于第一个元素的元素复制到一个名为greater"的数组中
按照较小、相等、较大的顺序将所有三个数组连接在一起.

要看看这是否有效，如果 0 是我们的第一个元素，那么less"数组将为空，equal"数组将包含所有零，greater"数组将包含所有 0.连接它们然后将所有的零放在所有的 1 之前.否则，如果 1 是我们的第一个元素，那么 less 数组将保存零，equal 数组将保存 1，greater数组将为空.因此，根据需要，它们的连接是全零后跟全 1.

To see that this works, if 0 is our first element, then the 'less' array will be empty, the 'equal' array will have all the zeros, and the 'greater' array will have all the ones. Concatenating them then puts all the zeros before all the ones. Otherwise, if 1 is our first element, then the less array will hold the zeros, the equal array will hold the ones, and the greater array will be empty. Their concatenation is thus all zeros followed by all ones, as required.

在实践中，您不会使用此算法(您将使用计数排序，如下所述)，但它表明您确实可以使用基于比较的算法击败 Ω(n log n)，如果算法的可能输入数量很少.

In practice, you wouldn't use this algorithm (you'd use a counting sort, as described below), but it shows that you can indeed beat Ω(n log n) with a comparison-based algorithm if the number of possible inputs to the algorithm is small.

众所周知，一些基于比较的排序算法可以非常快速地处理具有多个重复值的输入.例如，众所周知，Quicksort with a special partitioning step利用输入数组中的重复元素.

Some comparison-based sorting algorithms are known to work very quickly on inputs that have multiple duplicated values. For example, it is known that Quicksort with a special partitioning step can take advantage of duplicated elements in the input array.

所有这些讨论都假设我们在讨论基于比较的排序，其中唯一允许对数组元素进行的操作是比较.然而，如果我们更多地了解我们将要排序的元素，并且可以对这些元素进行简单比较之外的操作，那么上述界限都不再成立.我们打破了导致我们构建算法所有状态的二叉树的初始假设，因此没有理由怀疑这些界限仍然成立.

All of this discussion has assumed that we're talking about comparison-based sorting, where the only permitted operation on array elements is a comparison. However, if we know more about what elements we're going to be sorting and can perform operations on those elements beyond simple comparisons, then none of the above bounds hold any more. We're breaking the starting assumptions that led us to construct a binary tree of all the states of the algorithm, and so there's no reason to suspect that those bounds will still hold.

例如，如果您知道输入值是从只有 |U| 的 Universe 中提取的中的元素，然后您可以使用巧妙的算法在 O(n + |U|) 时间内进行排序.从创建 |U| 开始不同的buckets，我们可以将原始数组中的元素放入其中.然后，遍历数组并将所有数组元素分配到相应的桶中.最后，访问每个存储桶，从包含最小元素副本的存储桶开始，以包含最大元素副本的存储桶结束，然后将您找到的所有值连接在一起.例如，让我们看看如何对由 1 - 5 组成的数组进行排序.如果我们有这个起始数组:

For example, if you know that the input values are drawn from a universe that only has |U| elements in it, then you can sort in O(n + |U|) time using a clever algorithm. Start off by creating |U| different buckets into which we can place the elements from the original array. Then, iterate across the array and distribute all of the array elements into the corresponding bucket. Finally, visit each of the buckets, starting with the bucket holding copies of the smallest element and end with the bucket containing copies of the largest element, then concatenate together all of the values you find. For example, let's see how to sort arrays consisting of the values 1 - 5. If we have this starting array:

1 3 4 5 2 3 2 1 4 3 5

然后我们可以像这样将这些元素放入桶中:

Then we can put those elements into buckets like this:

Bucket     1  2  3  4  5
           -------------
           1  2  3  4  5
           1  2  3  4  5
                 3

遍历存储桶并将它们的值连接在一起会产生以下结果:

Iterating across the buckets and concatenating their values together yields this:

1 1 2 2 3 3 3 4 4 5 5

果然是我们原始数组的排序版本！这里的运行时间是 O(n) 时间将原始数组元素分配到存储桶中，然后 O(n + |U|) 时间遍历所有存储桶将元素重新组合在一起.请注意，如果 |U|= O(n)，这在 O(n) 时间内运行，打破了 Ω(n log n) 排序障碍.

which, sure enough, is a sorted version of our original array! The runtime here is O(n) time to go and distribute the original array elements into the buckets, then O(n + |U|) time to iterate across all the buckets putting the elements back together. Notice that if |U| = O(n), this runs in O(n) time, breaking the Ω(n log n) sorting barrier.

如果您正在对整数进行排序，那么使用基数排序可以做得比这更好，它以 O(n lg |U|) 运行.如果您正在处理原始 ints，lg |U|通常是 32 或 64，所以这是非常快的.如果你愿意实现一个特别棘手的数据结构，你可以使用 van Emde Boas Tree在 O(n lg lg U) 时间内对从 0 到 U - 1 的整数进行排序，再次利用整数由可以在块中操作的位组组成的事实.

If you are sorting integers, you can do much better than this by using radix sort, which runs in O(n lg |U|). If you're dealing with primitive ints, lg |U| is usually 32 or 64, so this is extremely fast. If you're willing to implement a particularly tricky data structure, you can use a van Emde Boas Tree to sort integers from 0 to U - 1 in time O(n lg lg U), again by exploiting the fact that integers consist of groups of bits that can be manipulated in blocks.

同样，如果你知道你的元素是字符串，你可以通过构建一个 trie 来快速排序.a> 从字符串中取出，然后遍历 trie 以重建所有字符串.或者，您可以将字符串视为以大基数书写的数字(例如，ASCII 文本的基数为 128)，然后使用上述整数排序算法之一.

Similarly, if you know that your elements are strings, you can sort very quickly by building a trie out of the strings, then iterating across the trie to rebuild all the strings. Alternatively, you could consider the strings as numbers written in a large base (say, base 128 for ASCII text) and then use one of the integer sorting algorithms from above.

在上述每一种情况下，您能够突破信息论障碍的原因是您打破了障碍的初始假设，即您只能应用比较.如果您可以将输入元素视为数字、字符串或其他任何可以揭示更多结构的元素，那么一切都将结束，您可以非常高效地进行排序.

In each of these cases, the reason that you can beat the information-theoretic barrier is that you're breaking the barrier's starting assumption, namely that you can only apply comparisons. If you can treat the input elements as numbers, or as strings, or as anything else that reveals more structure, all bets are off and you can sort extremely efficiently.

希望这有帮助！

这篇关于“Ω(n log n)屏障"的规则是什么?排序算法?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，WP2

“Ω(n log n)屏障"的规则是什么?排序算法?

问题描述

推荐答案

admin_action_{$_REQUEST[‘action’]}

admin_footer-{$GLOBALS[‘hook_suffix’]}

customize_save_{$this->id_data[‘base’]}

customize_value_{$this->id_data[‘base’]}

get_comment_author_url

network_admin_edit_{$_GET[‘action’]}

network_sites_updated_message_{$_GET[‘updated’]}

pre_wp_is_site_initialized

WordPress 的SEO 教学：如何在网站中加入关键字（Meta Keywords）与Meta 描述（Meta Description）？

谷歌的SEO是什么