This blog is part of our Ruby 2.4 series.
Ruby has lstrip and rstrip methods which can be used to remove leading and trailing whitespaces respectively from a string.
Ruby also has strip method which is a combination of lstrip and rstrip and can be used to remove both, leading and trailing whitespaces, from a string.
1 2" Hello World ".lstrip #=> "Hello World " 3" Hello World ".rstrip #=> " Hello World" 4" Hello World ".strip #=> "Hello World" 5
Prior to Ruby 2.4, the rstrip method was optimized for performance, but the lstrip and strip were somehow missed. In Ruby 2.4, String#lstrip and String#strip methods too have been optimized to get the performance benefit of String#rstrip .
Let's run following snippet in Ruby 2.3 and Ruby 2.4 to benchmark and compare the performance improvement.
1 2require 'benchmark/ips' 3 4Benchmark.ips do |bench| 5 str1 = " " * 10_000_000 + "hello world" + " " * 10_000_000 6 str2 = str1.dup 7 str3 = str1.dup 8 9 bench.report('String#lstrip') do 10 str1.lstrip 11 end 12 13 bench.report('String#rstrip') do 14 str2.rstrip 15 end 16 17 bench.report('String#strip') do 18 str3.strip 19 end 20end 21
Result for Ruby 2.3
1 2Warming up -------------------------------------- 3 String#lstrip 1.000 i/100ms 4 String#rstrip 8.000 i/100ms 5 String#strip 1.000 i/100ms 6Calculating ------------------------------------- 7 String#lstrip 10.989 (± 0.0%) i/s - 55.000 in 5.010903s 8 String#rstrip 92.514 (± 5.4%) i/s - 464.000 in 5.032208s 9 String#strip 10.170 (± 0.0%) i/s - 51.000 in 5.022118s 10
Result for Ruby 2.4
1 2Warming up -------------------------------------- 3 String#lstrip 14.000 i/100ms 4 String#rstrip 8.000 i/100ms 5 String#strip 6.000 i/100ms 6Calculating ------------------------------------- 7 String#lstrip 143.424 (± 4.2%) i/s - 728.000 in 5.085311s 8 String#rstrip 89.150 (± 5.6%) i/s - 448.000 in 5.041301s 9 String#strip 67.834 (± 4.4%) i/s - 342.000 in 5.051584s 10
From the above results, we can see that in Ruby 2.4, String#lstrip is around 14x faster while String#strip is around 6x faster. String#rstrip as expected, has nearly the same performance as it was already optimized in previous versions.
Performance remains same for multi-byte strings
Strings can have single byte or multi-byte characters.
For example Lé Hello World is a multi-byte string because of the presence of é which is a multi-byte character.
1'e'.bytesize #=> 1 2'é'.bytesize #=> 2
Let's do performance benchmarking with string Lé hello world instead of hello world.
Result for Ruby 2.3
1 2Warming up -------------------------------------- 3 String#lstrip 1.000 i/100ms 4 String#rstrip 1.000 i/100ms 5 String#strip 1.000 i/100ms 6Calculating ------------------------------------- 7 String#lstrip 11.147 (± 9.0%) i/s - 56.000 in 5.034363s 8 String#rstrip 8.693 (± 0.0%) i/s - 44.000 in 5.075011s 9 String#strip 5.020 (± 0.0%) i/s - 26.000 in 5.183517s 10
Result for Ruby 2.4
1 2Warming up -------------------------------------- 3 String#lstrip 1.000 i/100ms 4 String#rstrip 1.000 i/100ms 5 String#strip 1.000 i/100ms 6Calculating ------------------------------------- 7 String#lstrip 10.691 (± 0.0%) i/s - 54.000 in 5.055101s 8 String#rstrip 9.524 (± 0.0%) i/s - 48.000 in 5.052678s 9 String#strip 4.860 (± 0.0%) i/s - 25.000 in 5.152804s 10
As we can see, the performance for multi-byte strings is almost the same across Ruby 2.3 and Ruby 2.4.
Explanation
The optimization introduced is related to how the strings are parsed to detect for whitespaces. Checking for whitespaces in multi-byte string requires an additional overhead. So the patch adds an initial condition to check if the string is a single byte string, and if so, processes it separately.
In most of the cases, the strings are single byte so the performance improvement would be visible and helpful.