Ruby 2.6 adds String#split with block

Taha Husain

Taha Husain

July 17, 2018

This blog is part of our  Ruby 2.6 series.

Before Ruby 2.6, String#split returned array of split strings.

In Ruby 2.6, a block can be passed to String#split (Link is not available) which yields each split string and operates on it. This avoids creating an array and thus is memory efficient.

We will add method is_fruit? to understand how to use split with a block.

1def is_fruit?(value)
2%w(apple mango banana watermelon grapes guava lychee).include?(value)
3end

Input is a comma separated string with vegetables and fruits names. Goal is to fetch names of fruits from input string and store it in an array.

String#split
1input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"
2
3splitted_values = input_str.split(", ")
4=> ["apple", "mango", "potato", "banana", "cabbage", "watermelon", "grapes"]
5
6fruits = splitted_values.select { |value| is_fruit?(value) }
7=> ["apple", "mango", "banana", "watermelon", "grapes"]

Using split an intermediate array is created which contains both fruits and vegetables names.

String#split with a block
1fruits = []
2
3input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"
4
5input_str.split(", ") { |value| fruits << value if is_fruit?(value) }
6=> "apple, mango, potato, banana, cabbage, watermelon, grapes"
7
8fruits
9=> ["apple", "mango", "banana", "watermelon", "grapes"]

When a block is passed to split, it returns the string on which split was called and does not create an array. String#split yields block on each split string, which in our case was to push fruit names in a separate array.

Update

Benchmark

We created a large random string to benchmark performance of split and split with block

1require 'securerandom'
2
3test_string = ''
4
5100_000.times.each do
6test_string += SecureRandom.alphanumeric(10)
7test_string += ' '
8end
1require 'benchmark'
2
3Benchmark.bmbm do |bench|
4
5bench.report('split') do
6arr = test_string.split(' ')
7str_starts_with_a = arr.select { |str| str.start_with?('a') }
8end
9
10bench.report('split with block') do
11str_starts_with_a = []
12test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
13end
14
15end

Results

1Rehearsal ----------------------------------------------------
2split              0.023764   0.000911   0.024675 (  0.024686)
3split with block   0.012892   0.000553   0.013445 (  0.013486)
4------------------------------------------- total: 0.038120sec
5
6                       user     system      total        real
7split              0.024107   0.000487   0.024594 (  0.024622)
8split with block   0.010613   0.000334   0.010947 (  0.010991)

We did another iteration of benchmarking using benchmark/ips.

1require 'benchmark/ips'
2Benchmark.ips do |bench|
3
4bench.report('split') do
5splitted_arr = test_string.split(' ')
6str_starts_with_a = splitted_arr.select { |str| str.start_with?('a') }
7end
8
9bench.report('split with block') do
10str_starts_with_a = []
11test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
12end
13
14bench.compare!
15end

Results

1Warming up --------------------------------------
2               split     4.000  i/100ms
3    split with block    10.000  i/100ms
4Calculating -------------------------------------
5               split     46.906  (± 2.1%) i/s -    236.000  in   5.033343s
6    split with block    107.301  (± 1.9%) i/s -    540.000  in   5.033614s
7
8Comparison:
9    split with block:      107.3 i/s
10               split:       46.9 i/s - 2.29x  slower

This benchmark shows that split with block is about 2 times faster than split.

Here is relevant commit and discussion for this change.

The Chinese version of this blog is available here.

If this blog was helpful, check out our full blog archive.

Stay up to date with our blogs.

Subscribe to receive email notifications for new blog posts.