Mic*_*l B 5 c# arrays clr performance sqlclr
我有一项任务是从二进制文字0x0上的数据库表中拆分多行varbinary(8000)列.但是,这可能会改变,所以我想保留这个变量.我想使用SQLCLR作为流表值函数快速执行此操作.我知道我的字符串总是至少有几千字节.
编辑:我已经更新了我的算法.为了避免内环展开的肮脏.但很难说服CLR对寄存器分配做出正确的选择.如果有一个简单的方法来说服CLR j和我真的是同一件事,那将是非常棒的.但相反,它确实是愚蠢的事情.优化第一个路径循环会很不错.但你不能使用goto进入循环.
我决定改编C函数memchr的64位实现.基本上不是一次扫描一个字节并进行比较,而是使用一些比特来一次扫描8个字节.作为参考,Array.IndexOf<Byte>对于一个答案执行与4字节扫描类似的操作,我只想继续这样做.有几点需要注意:
内存压力是SQLCLR功能中的一个非常现实的问题.String.Split因为它预先分配了很多我真正想避免的内存.它也适用于UCS-2字符串,这需要我将我的ascii字符串转换为unicode字符串,因此在返回时将我的数据视为lob数据类型.(SqlChars/ SqlString在转换为lob类型之前只能返回4000个字节).
我想流.避免String.Split它的另一个原因是同时完成其工作,造成大量内存压力.在具有大量分隔符的代码上,纯T-SQL方法将开始击败它.
我想保持它"安全".所以都管理好了.在安全检查中似乎有很大的惩罚.
Buffer.BlockCopy真的很快,而且比不断支付BitConverter的成本似乎更好地支付前一次成本.这仍然比将我的输入转换为字符串并保持该引用更便宜.
代码非常快,但似乎我在初始循环和关键部分支付了相当多的绑定检查,当我找到匹配时.作为具有大量分隔符的代码的结果,我倾向于输入一个更简单的C#枚举器,它只进行字节比较.
这是我的代码,
class SplitBytesEnumeratorA : IEnumerator
{
// Fields
private readonly byte[] _bytes;
private readonly ulong[] _longs;
private readonly ulong _comparer;
private readonly Record _record = new Record();
private int _start;
private readonly int _length;
// Methods
internal SplitBytesEnumeratorA(byte[] bytes, byte delimiter)
{
this._bytes = bytes;
this._length = bytes.Length;
// we do this so that we can avoid a spillover scan near the end.
// in unsafe implementation this would be dangerous as we potentially
// will be reading more bytes than we should.
this._longs = new ulong[(_length + 7) / 8];
Buffer.BlockCopy(bytes, 0, _longs, 0, _length);
var c = (((ulong)delimiter << 8) + (ulong)delimiter);
c = (c << 16) + c;
// comparer is now 8 copies of the original delimiter.
c |= (c << 32);
this._comparer = c;
}
public bool MoveNext()
{
if (this._start >= this._length) return false;
int i = this._start;
var longs = this._longs;
var comparer = this._comparer;
var record = this._record;
record.id++;
// handle the case where start is not divisible by eight.
for (; (i & 7) != 0; i++)
{
if (i == _length || _bytes[i] == (comparer & 0xFF))
{
record.item = new byte[(i - _start)];
Buffer.BlockCopy(_bytes, _start, record.item, 0, i - _start);
_start = i + 1;
return true;
}
}
// main loop. We crawl the array 8 bytes at a time.
for (int j=i/8; j < longs.Length; j++)
{
ulong t1 = longs[j];
unchecked
{
t1 ^= comparer;
ulong t2 = (t1 - 0x0101010101010101) & ~t1;
if ((t2 & 0x8080808080808080) != 0)
{
i =j*8;
// make every case 3 comparison instead of n. Potentially better.
// This is an unrolled binary search.
if ((t2 & 0x80808080) == 0)
{
i += 4;
t2 >>= 32;
}
if ((t2 & 0x8080) == 0)
{
i += 2;
t2 >>= 16;
}
if ((t2 & 0x80) == 0)
{
i++;
}
record.item = new byte[(i - _start)];
// improve cache locality by not switching collections.
Buffer.BlockCopy(longs, _start, record.item, 0, i - _start); _start = i + 1;
return true;
}
}
// no matches found increment by 8
}
// no matches left. Let's return the remaining buffer.
record.item = new byte[(_length - _start)];
Buffer.BlockCopy(longs, _start, record.item, 0, (_length - _start));
_start = _bytes.Length;
return true;
}
void IEnumerator.Reset()
{
throw new NotImplementedException();
}
public object Current
{
get
{
return this._record;
}
}
}
// We use a class to avoid boxing .
class Record
{
internal int id;
internal byte[] item;
}
Run Code Online (Sandbox Code Playgroud)
跳出框框思考,您是否考虑过将字符串转换为 XML 并使用 XQuery 进行拆分?
例如,您可以传入分隔符和(空中代码):
DECLARE @xml as xml
DECLARE @str as varchar(max)
SET @str = (SELECT CAST(t.YourBinaryColumn AS varchar(max) FROM [tableName] t)
SET @xml = cast(('<X>'+replace(@str,@delimiter,'</X><X>')+'</X>') as xml)
Run Code Online (Sandbox Code Playgroud)
这会将二进制文件转换为字符串,并用 XML 标记替换分隔符。然后:
SELECT N.value('.', 'varchar(10)') as value FROM @xml.nodes('X') as T(N)
Run Code Online (Sandbox Code Playgroud)
将获取各个“元素”,即每个分隔符出现之间的数据。
也许这个想法按原样有用,或者作为您可以在此基础上继续发展的催化剂。