Ruby 2.4 optimized lstrip & strip for ASCII strings

Chirag Shah

Chirag Shah

March 14, 2017

This blog is part of our  Ruby 2.4 series.

Ruby has lstrip and rstrip methods which can be used to remove leading and trailing whitespaces respectively from a string.

Ruby also has strip method which is a combination of lstrip and rstrip and can be used to remove both, leading and trailing whitespaces, from a string.

1
2"    Hello World    ".lstrip    #=> "Hello World    "
3"    Hello World    ".rstrip    #=> "    Hello World"
4"    Hello World    ".strip     #=> "Hello World"
5

Prior to Ruby 2.4, the rstrip method was optimized for performance, but the lstrip and strip were somehow missed. In Ruby 2.4, String#lstrip and String#strip methods too have been optimized to get the performance benefit of String#rstrip .

Let's run following snippet in Ruby 2.3 and Ruby 2.4 to benchmark and compare the performance improvement.

1
2require 'benchmark/ips'
3
4Benchmark.ips do |bench|
5  str1 = " " * 10_000_000 + "hello world" + " " * 10_000_000
6  str2 = str1.dup
7  str3 = str1.dup
8
9  bench.report('String#lstrip') do
10    str1.lstrip
11  end
12
13  bench.report('String#rstrip') do
14    str2.rstrip
15  end
16
17  bench.report('String#strip') do
18    str3.strip
19  end
20end
21

Result for Ruby 2.3

1
2Warming up --------------------------------------
3       String#lstrip     1.000  i/100ms
4       String#rstrip     8.000  i/100ms
5        String#strip     1.000  i/100ms
6Calculating -------------------------------------
7       String#lstrip     10.989  (± 0.0%) i/s -     55.000  in   5.010903s
8       String#rstrip     92.514  (± 5.4%) i/s -    464.000  in   5.032208s
9        String#strip     10.170  (± 0.0%) i/s -     51.000  in   5.022118s
10

Result for Ruby 2.4

1
2Warming up --------------------------------------
3       String#lstrip    14.000  i/100ms
4       String#rstrip     8.000  i/100ms
5        String#strip     6.000  i/100ms
6Calculating -------------------------------------
7       String#lstrip    143.424  (± 4.2%) i/s -    728.000  in   5.085311s
8       String#rstrip     89.150  (± 5.6%) i/s -    448.000  in   5.041301s
9        String#strip     67.834  (± 4.4%) i/s -    342.000  in   5.051584s
10

From the above results, we can see that in Ruby 2.4, String#lstrip is around 14x faster while String#strip is around 6x faster. String#rstrip as expected, has nearly the same performance as it was already optimized in previous versions.

Performance remains same for multi-byte strings

Strings can have single byte or multi-byte characters.

For example Lé Hello World is a multi-byte string because of the presence of é which is a multi-byte character.

1'e'.bytesize        #=> 1
2'é'.bytesize        #=> 2

Let's do performance benchmarking with string Lé hello world instead of hello world.

Result for Ruby 2.3

1
2Warming up --------------------------------------
3       String#lstrip     1.000  i/100ms
4       String#rstrip     1.000  i/100ms
5        String#strip     1.000  i/100ms
6Calculating -------------------------------------
7       String#lstrip     11.147  (± 9.0%) i/s -     56.000  in   5.034363s
8       String#rstrip      8.693  (± 0.0%) i/s -     44.000  in   5.075011s
9        String#strip      5.020  (± 0.0%) i/s -     26.000  in   5.183517s
10

Result for Ruby 2.4

1
2Warming up --------------------------------------
3       String#lstrip     1.000  i/100ms
4       String#rstrip     1.000  i/100ms
5        String#strip     1.000  i/100ms
6Calculating -------------------------------------
7       String#lstrip     10.691  (± 0.0%) i/s -     54.000  in   5.055101s
8       String#rstrip      9.524  (± 0.0%) i/s -     48.000  in   5.052678s
9        String#strip      4.860  (± 0.0%) i/s -     25.000  in   5.152804s
10

As we can see, the performance for multi-byte strings is almost the same across Ruby 2.3 and Ruby 2.4.

Explanation

The optimization introduced is related to how the strings are parsed to detect for whitespaces. Checking for whitespaces in multi-byte string requires an additional overhead. So the patch adds an initial condition to check if the string is a single byte string, and if so, processes it separately.

In most of the cases, the strings are single byte so the performance improvement would be visible and helpful.

If this blog was helpful, check out our full blog archive.

Stay up to date with our blogs.

Subscribe to receive email notifications for new blog posts.