Ruby 2.4 optimized lstrip & strip for ASCII strings

Chirag Shah

Chirag Shah

March 14, 2017

This blog is part of our  Ruby 2.4 series.

Ruby has lstrip and rstrip methods which can be used to remove leading and trailing whitespaces respectively from a string.

Ruby also has strip method which is a combination of lstrip and rstrip and can be used to remove both, leading and trailing whitespaces, from a string.


"    Hello World    ".lstrip    #=> "Hello World    "
"    Hello World    ".rstrip    #=> "    Hello World"
"    Hello World    ".strip     #=> "Hello World"

Prior to Ruby 2.4, the rstrip method was optimized for performance, but the lstrip and strip were somehow missed. In Ruby 2.4, String#lstrip and String#strip methods too have been optimized to get the performance benefit of String#rstrip .

Let's run following snippet in Ruby 2.3 and Ruby 2.4 to benchmark and compare the performance improvement.


require 'benchmark/ips'

Benchmark.ips do |bench|
  str1 = " " * 10_000_000 + "hello world" + " " * 10_000_000
  str2 = str1.dup
  str3 = str1.dup

  bench.report('String#lstrip') do
    str1.lstrip
  end

  bench.report('String#rstrip') do
    str2.rstrip
  end

  bench.report('String#strip') do
    str3.strip
  end
end

Result for Ruby 2.3


Warming up --------------------------------------
       String#lstrip     1.000  i/100ms
       String#rstrip     8.000  i/100ms
        String#strip     1.000  i/100ms
Calculating -------------------------------------
       String#lstrip     10.989  (± 0.0%) i/s -     55.000  in   5.010903s
       String#rstrip     92.514  (± 5.4%) i/s -    464.000  in   5.032208s
        String#strip     10.170  (± 0.0%) i/s -     51.000  in   5.022118s

Result for Ruby 2.4


Warming up --------------------------------------
       String#lstrip    14.000  i/100ms
       String#rstrip     8.000  i/100ms
        String#strip     6.000  i/100ms
Calculating -------------------------------------
       String#lstrip    143.424  (± 4.2%) i/s -    728.000  in   5.085311s
       String#rstrip     89.150  (± 5.6%) i/s -    448.000  in   5.041301s
        String#strip     67.834  (± 4.4%) i/s -    342.000  in   5.051584s

From the above results, we can see that in Ruby 2.4, String#lstrip is around 14x faster while String#strip is around 6x faster. String#rstrip as expected, has nearly the same performance as it was already optimized in previous versions.

Performance remains same for multi-byte strings

Strings can have single byte or multi-byte characters.

For example Lé Hello World is a multi-byte string because of the presence of é which is a multi-byte character.

'e'.bytesize        #=> 1
'é'.bytesize        #=> 2

Let's do performance benchmarking with string Lé hello world instead of hello world.

Result for Ruby 2.3


Warming up --------------------------------------
       String#lstrip     1.000  i/100ms
       String#rstrip     1.000  i/100ms
        String#strip     1.000  i/100ms
Calculating -------------------------------------
       String#lstrip     11.147  (± 9.0%) i/s -     56.000  in   5.034363s
       String#rstrip      8.693  (± 0.0%) i/s -     44.000  in   5.075011s
        String#strip      5.020  (± 0.0%) i/s -     26.000  in   5.183517s

Result for Ruby 2.4


Warming up --------------------------------------
       String#lstrip     1.000  i/100ms
       String#rstrip     1.000  i/100ms
        String#strip     1.000  i/100ms
Calculating -------------------------------------
       String#lstrip     10.691  (± 0.0%) i/s -     54.000  in   5.055101s
       String#rstrip      9.524  (± 0.0%) i/s -     48.000  in   5.052678s
        String#strip      4.860  (± 0.0%) i/s -     25.000  in   5.152804s

As we can see, the performance for multi-byte strings is almost the same across Ruby 2.3 and Ruby 2.4.

Explanation

The optimization introduced is related to how the strings are parsed to detect for whitespaces. Checking for whitespaces in multi-byte string requires an additional overhead. So the patch adds an initial condition to check if the string is a single byte string, and if so, processes it separately.

In most of the cases, the strings are single byte so the performance improvement would be visible and helpful.

If this blog was helpful, check out our full blog archive.

Stay up to date with our blogs.

Subscribe to receive email notifications for new blog posts.