问题描述
我有一个 mongodb 集合.当我这样做的时候.
I have a mongodb collection . When I do.
db.bill.find({})
我明白了,
{
"_id" : ObjectId("55695ea145e8a960bef8b87a"),
"name" : "ABC. Net",
"code" : "1-98tfv",
"abbreviation" : "ABC",
"bill_codes" : [ 190215, 44124, 190215, 147708 ],
"customer_name" : "abc"
}
我需要一个操作来从 bill_codes 中删除重复值.最后应该是
I need an operation to remove the duplicate values from the bill_codes. Finally it should be
{
"_id" : ObjectId("55695ea145e8a960bef8b87a"),
"name" : "ABC. Net",
"code" : "1-98tfv",
"abbreviation" : "ABC",
"bill_codes" : [ 190215, 44124, 147708 ],
"customer_name" : "abc"
}
如何在 mongodb 中实现.
How to achieve this in mongodb.
推荐答案
您可以使用聚合框架执行此操作,如下所示:
Well's you can do this using the aggregation framework as follows:
collection.aggregate([
{ "$project": {
"name": 1,
"code": 1,
"abbreviation": 1,
"bill_codes": { "$setUnion": [ "$bill_codes", [] ] }
}}
])
$setUnion
运算符是集合"运算符,因此要进行集合",则只保留唯一"项.
The $setUnion
operator is a "set" operator, therefore to make a "set" then only the "unique" items are kept behind.
如果您仍在使用早于 2.6 的 MongoDB 版本,则必须使用 $unwind
和 $addToSet
改为:
If you are still using a MongoDB version older than 2.6 then you would have to do this operation with $unwind
and $addToSet
instead:
collection.aggregate([
{ "$unwind": "$bill_codes" },
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"code": { "$first": "$code" },
"abbreviation": { "$first": "$abbreviation" },
"bill_codes": { "$addToSet": "$bill_codes" }
}}
])
效率不高,但从 2.2 版本开始支持运算符.
It's not as efficient but the operators are supported since version 2.2.
当然,如果您真的想永久修改您的收藏文档,那么您可以对此进行扩展并相应地处理每个文档的更新.您可以从 .aggregate()
中检索光标",但基本上遵循以下 shell 示例:
Of course if you actually want to modify your collection documents permanently then you can expand on this and process the updates for each document accordingly. You can retrieve a "cursor" from .aggregate()
, but basically following this shell example:
db.collection.aggregate([
{ "$project": {
"bill_codes": { "$setUnion": [ "$bill_codes", [] ] },
"same": { "$eq": [
{ "$size": "$bill_codes" },
{ "$size": { "$setUnion": [ "$bill_codes", [] ] } }
]}
}},
{ "$match": { "same": false } }
]).forEach(function(doc) {
db.collection.update(
{ "_id": doc._id },
{ "$set": { "bill_codes": doc.bill_codes } }
)
})
早期版本涉及更多:
db.collection.aggregate([
{ "$unwind": "$bill_codes" },
{ "$group": {
"_id": {
"_id": "$_id",
"bill_code": "$bill_codes"
},
"origSize": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id._id",
"bill_codes": { "$push": "$_id.bill_code" },
"origSize": { "$sum": "$origSize" },
"newSize": { "$sum": 1 }
}},
{ "$project": {
"bill_codes": 1,
"same": { "$eq": [ "$origSize", "$newSize" ] }
}},
{ "$match": { "same": false } }
]).forEach(function(doc) {
db.collection.update(
{ "_id": doc._id },
{ "$set": { "bill_codes": doc.bill_codes } }
)
})
其中添加了操作来比较去重"数组是否与原始数组长度相同,并且仅返回那些已删除重复"以进行更新处理的文档.
With the added operations in there to compare if the "de-duplicated" array is the same as the original array length, and only return those documents that had "duplicates" removed for processing on updates.
可能也应该在此处添加for python"注释.如果您不关心识别"包含重复数组条目的文档并准备通过更新爆炸"整个集合,那么只需使用 python .set()
在客户端代码中删除重复项:
Probably should add the "for python" note here as well. If you don't care about "identifying" the documents that contain duplicate array entries and are prepared to "blast" the whole collection with updates, then just use python .set()
in the client code to remove the duplicates:
for doc in collection.find():
collection.update(
{ "_id": doc["_id"] },
{ "$set": { "bill_codes": list(set(doc["bill_codes"])) } }
)
所以这很简单,它取决于哪个更大,查找具有重复的文档或更新每个文档是否需要它的成本.
So that's quite simple and it depends on which is the greater evil, the cost of finding the documents with duplicates or updating every document whether it needs it or not.
这至少涵盖了技术.
这篇关于如何删除mongodb列表中的重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,WP2