问题描述
我经常想循环遍历数据框的长数组或列,并且对于每个项目,查看它是否是另一个数组的成员.而不是做
I often want to loop over a long array or column of a dataframe, and for each item, see if it is a member of another array. Rather than doing
giant_list = ["a", "c", "j"]
good_letters = ["a", "b"]
isin = falses(size(giant_list,1))
for i=1:size(giant_list,1)
isin[i] = giant_list[i] in good_letters
end
在 Julia 中是否有任何矢量化(双矢量化?)方法可以做到这一点?与基本运算符类比,我想做类似的事情
Is there any vectorized (doubly-vectorized?) way to do this in julia? In analogy with the basic operators I want to do something like
isin = giant_list .in good_letters
我意识到这可能是不可能的,但我只是想确保我没有遗漏任何东西.我知道我可能可以使用 DataStructures 中的 DefaultDict 来做类似的事情,但不知道基础中的任何内容.
I realize this may not be possible, but I just wanted to make sure I wasn't missing something. I know I could probably use DefaultDict from DataStructures to do the similar but don't know of anything in base.
推荐答案
indexin
函数的作用与您想要的类似:
The indexin
function does something similar to what you want:
indexin(a, b)
为 a
中属于 b
的成员的每个值返回一个向量,该向量包含 b
中的最高索引.输出向量包含 0,只要 a
不是 b
的成员.
Returns a vector containing the highest index in b
for each value in a
that is a member of b
. The output vector contains 0 wherever a
is not a member of b
.
由于您希望 giant_list
中的每个元素都有一个布尔值(而不是 good_letters
中的索引),因此您可以这样做:
Since you want a boolean for each element in your giant_list
(instead of the index in good_letters
), you can simply do:
julia> indexin(giant_list, good_letters) .> 0
3-element BitArray{1}:
true
false
false
实现indexin
非常简单,如果您不关心 b
中的索引,它指出了如何优化它的方法:
The implementation of indexin
is very straightforward, and points the way to how you might optimize this if you don't care about the indices in b
:
function vectorin(a, b)
bset = Set(b)
[i in bset for i in a]
end
只有有限的一组名称可以用作中缀运算符,因此不能将其用作中缀运算符.
Only a limited set of names may be used as infix operators, so it's not possible to use it as an infix operator.
这篇关于向量化的“in";在朱莉娅的作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,WP2