This blog is part of our Ruby 2.6 series.
Before Ruby 2.6, String#split returned array of split strings.
In Ruby 2.6, a block can be passed to String#split (Link is not available) which yields each split string and operates on it. This avoids creating an array and thus is memory efficient.
We will add method is_fruit? to understand how to use split with a block.
1def is_fruit?(value) 2%w(apple mango banana watermelon grapes guava lychee).include?(value) 3end
Input is a comma separated string with vegetables and fruits names. Goal is to fetch names of fruits from input string and store it in an array.
String#split
1input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes" 2 3splitted_values = input_str.split(", ") 4=> ["apple", "mango", "potato", "banana", "cabbage", "watermelon", "grapes"] 5 6fruits = splitted_values.select { |value| is_fruit?(value) } 7=> ["apple", "mango", "banana", "watermelon", "grapes"]
Using split an intermediate array is created which contains both fruits and vegetables names.
String#split with a block
1fruits = [] 2 3input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes" 4 5input_str.split(", ") { |value| fruits << value if is_fruit?(value) } 6=> "apple, mango, potato, banana, cabbage, watermelon, grapes" 7 8fruits 9=> ["apple", "mango", "banana", "watermelon", "grapes"]
When a block is passed to split, it returns the string on which split was called and does not create an array. String#split yields block on each split string, which in our case was to push fruit names in a separate array.
Update
Benchmark
We created a large random string to benchmark performance of split and split with block
1require 'securerandom' 2 3test_string = '' 4 5100_000.times.each do 6test_string += SecureRandom.alphanumeric(10) 7test_string += ' ' 8end
1require 'benchmark' 2 3Benchmark.bmbm do |bench| 4 5bench.report('split') do 6arr = test_string.split(' ') 7str_starts_with_a = arr.select { |str| str.start_with?('a') } 8end 9 10bench.report('split with block') do 11str_starts_with_a = [] 12test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') } 13end 14 15end
Results
1Rehearsal ---------------------------------------------------- 2split 0.023764 0.000911 0.024675 ( 0.024686) 3split with block 0.012892 0.000553 0.013445 ( 0.013486) 4------------------------------------------- total: 0.038120sec 5 6 user system total real 7split 0.024107 0.000487 0.024594 ( 0.024622) 8split with block 0.010613 0.000334 0.010947 ( 0.010991)
We did another iteration of benchmarking using benchmark/ips.
1require 'benchmark/ips' 2Benchmark.ips do |bench| 3 4bench.report('split') do 5splitted_arr = test_string.split(' ') 6str_starts_with_a = splitted_arr.select { |str| str.start_with?('a') } 7end 8 9bench.report('split with block') do 10str_starts_with_a = [] 11test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') } 12end 13 14bench.compare! 15end
Results
1Warming up -------------------------------------- 2 split 4.000 i/100ms 3 split with block 10.000 i/100ms 4Calculating ------------------------------------- 5 split 46.906 (± 2.1%) i/s - 236.000 in 5.033343s 6 split with block 107.301 (± 1.9%) i/s - 540.000 in 5.033614s 7 8Comparison: 9 split with block: 107.3 i/s 10 split: 46.9 i/s - 2.29x slower
This benchmark shows that split with block is about 2 times faster than split.
Here is relevant commit and discussion for this change.
The Chinese version of this blog is available here.