问题描述
我在 XML 文件上使用 XML 文本阅读器,该文件可能包含对阅读器无效的字符.我最初的想法是创建我自己的流阅读器版本并清除坏字符,但这严重拖慢了我的程序.
I am using a XML Text reader on a XML file that may contain characters that are invalid for the reader. My initial thought was to create my own version of the stream reader and clean out the bad characters but it is severely slowing down my program.
public class ClensingStream : StreamReader
{
private static char[] badChars = { 'x00', 'x09', 'x0A', 'x10' };
//snip
public override int Read(char[] buffer, int index, int count)
{
var tmp = base.Read(buffer, index, count);
for (int i = 0; i < buffer.Length; ++i)
{
//check the element in the buffer to see if it is one of the bad characters.
if(badChars.Contains(buffer[i]))
buffer[i] = ' ';
}
return tmp;
}
}
根据我的分析器,代码在 if(badChars.Contains(buffer[i]))
中花费了 88% 的时间,这样做的正确方法是什么,所以我不会造成可怕的慢?
according to my profiler the code is spending 88% of its time in if(badChars.Contains(buffer[i]))
what is the correct way to do this so I am not causing horrible slowness?
推荐答案
在该行中花费如此多时间的原因是 Contains
方法循环遍历数组以查找字符.
The reason that it spends so much time in that line is because the Contains
method loops through the array to look for the character.
将字符放在 HashSet
中:
private static HashSet<char> badChars =
new HashSet<char>(new char[] { 'x00', 'x09', 'x0A', 'x10' });
检查集合是否包含字符的代码看起来与在数组中查找时相同,但它使用字符的哈希码来查找它,而不是遍历数组中的所有项.
The code to check if the set contains the character looks the same as when looking in the array, but it uses the hash code of the character to look for it instead of looping through all the items in the array.
或者,您可以将字符放在一个开关中,这样编译器就会创建一个有效的比较:
Alternatively, you could put the characters in a switch, that way the compiler would create an efficient comparison:
switch (buffer[i]]) {
case 'x00':
case 'x09':
case 'x0A':
case 'x10': buffer[i] = ' '; break;
}
如果您有更多字符(五个或六个 IIRC),编译器实际上会创建一个哈希表来查找案例,因此这类似于使用 HashSet
.
If you have more characters (five or six IIRC), the compiler will actually create a hash table to look up the cases, so that would be similar to using a HashSet
.
这篇关于如何快速替换数组中的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,WP2