And*_*imm 4 ruby string garbage-collection
为什么unused_variable_2和unused_variable_3会收集垃圾,但不是unused_variable_1?
# leaky_boat.rb
require "memprof"
class Boat
def initialize(string)
unused_variable1 = string[0...100]
puts unused_variable1.object_id
@string = string
puts @string.object_id
end
end
class Rocket
def initialize(string)
unused_variable_2 = string.dup
puts unused_variable_2.object_id
unused_variable_3 = String.new(string)
puts unused_variable_3.object_id
@string = string
puts @string.object_id
end
end
Memprof.start
text = "a" * 100
object_id_message = "Object ids of unused_variable_1, @string, unused_variable_2, unused_variable_3, and another @string"
before_gc_message = "Before GC"
after_gc_message = "After GC"
puts object_id_message
boat = Boat.new(text)
rocket = Rocket.new(text)
puts before_gc_message
Memprof.stats
ObjectSpace.garbage_collect
puts after_gc_message
Memprof.stats
Memprof.stop
Run Code Online (Sandbox Code Playgroud)
运行程序:
$ uname -a
Linux [redacted] 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
$ ruby --version # Have to use Ruby 1.8 - memprof doesn't work on 1.9
ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux]
$ ruby -rubygems leaky_boat.rb
Object ids of unused_variable_1, @string, unused_variable_2, unused_variable_3, and another @string
70178323299180
70178323299320
70178323299100
70178323299060
70178323299320
Before GC
2 leaky_boat.rb:6:String
2 leaky_boat.rb:26:String
1 leaky_boat.rb:9:String
1 leaky_boat.rb:7:String
1 leaky_boat.rb:32:Rocket
1 leaky_boat.rb:31:Boat
1 leaky_boat.rb:29:String
1 leaky_boat.rb:28:String
1 leaky_boat.rb:27:String
1 leaky_boat.rb:20:String
1 leaky_boat.rb:18:String
1 leaky_boat.rb:17:String
1 leaky_boat.rb:16:String
1 leaky_boat.rb:15:String
After GC
1 leaky_boat.rb:6:String
1 leaky_boat.rb:32:Rocket
1 leaky_boat.rb:31:Boat
1 leaky_boat.rb:29:String
1 leaky_boat.rb:28:String
1 leaky_boat.rb:27:String
1 leaky_boat.rb:26:String
Run Code Online (Sandbox Code Playgroud)
这种行为是因为你的ruby for substr版本的字符串实现有一个特殊情况,当你获取一个作为源字符串尾部的substr并且字符串长度足够大而不能将字符串值存储在其中时,可以节省内存分配.基础对象结构.
如果您跟踪代码,您会看到范围下标string[0...100]将在rb_str_substr中通过此子句.所以新的字符串将通过str_new3分配,它分配一个新的对象结构(因此不同的object_id),但是将字符串值ptr字段设置为指向源对象的扩展存储的指针,并设置ELTS_SHARED标志以指示新对象与之共享存储另一个对象.
在您的代码中,您将获取此新的substring对象并将其分配给实例var @string,当您运行垃圾回收时,该实例仍然是实时引用.由于存在对原始字符串的已分配存储的实时引用,因此无法收集它.
在ruby trunk中,这种在兼容尾部子串上共享存储的优化似乎仍然存在.
另外两个vars unused_variable_2并unused_variable_3没有这个扩展的存储共享问题,因为它们是通过确保不同存储的机制设置的,因此当它们的引用超出范围时,它们会按预期收集垃圾.
String#dup运行rb_str_replace(通过initialize_copy绑定),它使用源字符串内容的副本替换源字符串的内容,并确保不共享存储.
String#new(source_str)运行rb_str_init,类似地在提供的初始值上使用rb_str_replace确保不同的存储.