March 14, 2017
This blog is part of our Ruby 2.4 series.
Ruby has lstrip
and rstrip
methods which can be used to remove leading and
trailing whitespaces respectively from a string.
Ruby also has strip
method which is a combination of lstrip and rstrip and can
be used to remove both, leading and trailing whitespaces, from a string.
" Hello World ".lstrip #=> "Hello World "
" Hello World ".rstrip #=> " Hello World"
" Hello World ".strip #=> "Hello World"
Prior to Ruby 2.4, the rstrip
method was optimized for performance, but the
lstrip
and strip
were somehow missed. In Ruby 2.4, String#lstrip
and
String#strip
methods too have been
optimized to get the performance
benefit of String#rstrip
.
Let's run following snippet in Ruby 2.3 and Ruby 2.4 to benchmark and compare the performance improvement.
require 'benchmark/ips'
Benchmark.ips do |bench|
str1 = " " * 10_000_000 + "hello world" + " " * 10_000_000
str2 = str1.dup
str3 = str1.dup
bench.report('String#lstrip') do
str1.lstrip
end
bench.report('String#rstrip') do
str2.rstrip
end
bench.report('String#strip') do
str3.strip
end
end
Warming up --------------------------------------
String#lstrip 1.000 i/100ms
String#rstrip 8.000 i/100ms
String#strip 1.000 i/100ms
Calculating -------------------------------------
String#lstrip 10.989 (± 0.0%) i/s - 55.000 in 5.010903s
String#rstrip 92.514 (± 5.4%) i/s - 464.000 in 5.032208s
String#strip 10.170 (± 0.0%) i/s - 51.000 in 5.022118s
Warming up --------------------------------------
String#lstrip 14.000 i/100ms
String#rstrip 8.000 i/100ms
String#strip 6.000 i/100ms
Calculating -------------------------------------
String#lstrip 143.424 (± 4.2%) i/s - 728.000 in 5.085311s
String#rstrip 89.150 (± 5.6%) i/s - 448.000 in 5.041301s
String#strip 67.834 (± 4.4%) i/s - 342.000 in 5.051584s
From the above results, we can see that in Ruby 2.4, String#lstrip
is around
14x faster while String#strip
is around 6x faster. String#rstrip
as
expected, has nearly the same performance as it was already optimized in
previous versions.
Strings can have single byte or multi-byte characters.
For example Lé Hello World
is a multi-byte string because of the presence of
é
which is a multi-byte character.
'e'.bytesize #=> 1
'é'.bytesize #=> 2
Let's do performance benchmarking with string Lé hello world
instead of
hello world
.
Warming up --------------------------------------
String#lstrip 1.000 i/100ms
String#rstrip 1.000 i/100ms
String#strip 1.000 i/100ms
Calculating -------------------------------------
String#lstrip 11.147 (± 9.0%) i/s - 56.000 in 5.034363s
String#rstrip 8.693 (± 0.0%) i/s - 44.000 in 5.075011s
String#strip 5.020 (± 0.0%) i/s - 26.000 in 5.183517s
Warming up --------------------------------------
String#lstrip 1.000 i/100ms
String#rstrip 1.000 i/100ms
String#strip 1.000 i/100ms
Calculating -------------------------------------
String#lstrip 10.691 (± 0.0%) i/s - 54.000 in 5.055101s
String#rstrip 9.524 (± 0.0%) i/s - 48.000 in 5.052678s
String#strip 4.860 (± 0.0%) i/s - 25.000 in 5.152804s
As we can see, the performance for multi-byte strings is almost the same across Ruby 2.3 and Ruby 2.4.
The optimization introduced is related to how the strings are parsed to detect for whitespaces. Checking for whitespaces in multi-byte string requires an additional overhead. So the patch adds an initial condition to check if the string is a single byte string, and if so, processes it separately.
In most of the cases, the strings are single byte so the performance improvement would be visible and helpful.
If this blog was helpful, check out our full blog archive.