C#中的字符串不变性

n53*_*535 25 .net c# pointers immutability

我很好奇StringBuilder类是如何在内部实现的,因此我决定查看Mono的源代码并将其与Reflector的反汇编代码进行比较.从本质上讲,Microsoft的实现用于char[]在内部存储字符串表示,以及一些不安全的方法来操作它.这很简单,没有提出任何问题.但当我发现Mono在StringBuilder中使用一个字符串时,我很困惑:

private int _length;
private string _str;
Run Code Online (Sandbox Code Playgroud)

第一个想法是:"多么无谓的StringBuilder".但后来我发现可以使用指针改变字符串:

public StringBuilder Append (string value) 
{
     // ...
     String.CharCopy (_str, _length, value, 0, value.Length);
}

internal static unsafe void CharCopy (char *dest, char *src, int count) 
{
    // ...
    ((short*)dest) [0] = ((short*)src) [0]; dest++; src++;
}    
Run Code Online (Sandbox Code Playgroud)

我曾经在C/C++中编程一点,所以我不能说这段代码让我很困惑,但我认为字符串是完全不可变的(即绝对没有办法改变它).所以实际的问题是:

  • 我可以创建一个完全不可变的类型吗?
  • 除性能问题外,是否有任何理由使用此类代码?(更改不可变类型的不安全代码)
  • 字符串本质上是线程安全的吗?

Eri*_*ert 43

我可以创建一个完全不可变的类型吗?

You can create a type where the CLR enforces immutability on it. You can then use "unsafe" to turn off the CLR enforcement mechanisms. That's why "unsafe" is called "unsafe" - because it turns off the safety system. In unsafe code every single byte of memory in the process can be writable if you try hard enough, including both the immutable bytes and the code in the CLR which enforces immutability.

You can also use Reflection to break immutability. Both Reflection and unsafe code require an extremely high level of trust to be granted.

Is there any reason to use such code apart from performance concerns?

当然,使用不可变数据结构有很多理由.不可变数据结构摇滚.使用不可变数据结构的一些好理由:

  • 不可变数据结构比可变数据结构更容易推理.当你问"这个清单是空的吗?" 然后你会得到一个答案,然后你知道答案不仅仅是现在,而是永远.使用可变数据结构,您实际上无法问"这个列表是空的吗?" 所有你能问的是"这个清单现在是空的吗?" 然后答案在逻辑上回答了问题"这个列表在过去的某个时刻是空的吗?"

关于不可变类型的问题的答案永远保持为真的事实具有安全隐患.假设你有这样的代码:

void Frob(Bar bar)
{
    if (!IsSafe(bar)) throw something;
    DoSomethingDangerous(bar);
}
Run Code Online (Sandbox Code Playgroud)

如果Bar是一个可变类型,那么这里就存在竞争条件; 酒吧可以作出不安全的另一个线程的检查,但之前一些危险的发生.如果Bar是一个不可变类型,那么问题的答案始终保持不变,这样更安全.(试想一下,如果你可能会变异包含路径的字符串的安全检查,但之前的文件被打开,例如.)

  • 将不可变数据结构作为参数并将其作为结果返回并且不执行副作用的方法称为"纯方法".可以记忆纯方法,这可以增加内存使用以提高速度,通常可以极大地提高速度.

  • immutable data structures can often be used on multiple threads simultaneously without locking. Locking is there to prevent creation of inconsistent state of an object in the face of a mutation, but immutable objects don't have mutations. (Some so-called immutable data structures are logically immutable but actually do mutations inside themselves; imagine for example a lookup table which does not change its contents, but does reorganize its internal structure if it can deduce what the next query is likely to be. Such a data structure would not be automatically threadsafe.)

  • immutable data structures that efficiently re-use their internal parts when a new structure is built from an old one make it easy to "take a snapshot" of the state of a program without wasting lots of memory. That makes undo-redo operations trivial to implement. It makes it easier to write debugging tools that can show you how you got to a particular program state.

  • and so on.

Are strings then inherently thread-safe or not?

If everyone plays by the rules, they are. If someone uses unsafe code or private reflection then there is no rule enforcement anymore. You have to trust that if someone is using high-privilege code then they are doing so correctly and not mutating a string. Use your power to run unsafe code only for good; with great power comes great responsibility.

So do I need to use locks or not?

That is a strange question. Remember, locks are co-operative. Locks only work if everyone accessing a particular object agrees upon the locking strategy that must be used.

You have to use locks if the agreed-upon locking strategy for accessing particular object in a particular storage location is to use locks. If that isn't the agreed-upon locking strategy then using locks is pointless; you're carefully locking and unlocking the front door while someone else is walking in the open back door.

If you have a string which you know is being mutated by unsafe code, and you don't want to see inconsistent partial mutations, and the code which is doing the unsafe mutation documents that it takes out a particular lock during that mutation, then yes, you need to use locks when accessing that string. But this situation is very rare; ideally no one would use unsafe code to manipulate a string accessible by other code on another thread, because doing so is an incredibly bad idea. That's why we require that code that does so is fully trusted. And that's why we require that the C# source code for such a function wave a big red flag that says "this code is unsafe, review it carefully!"