February 1, 2013
Please read Journery into Rails routing to get a background on Rails routing discussion.
Let's say that the route definition looks like this.
/page/:id(/:action)(.:format)
The task at hand is to develop a new programming language which will understand
the rules of the route definitions. Since this language deals with routes
let's call this language Poutes
. Well Pout
sounds better so let's roll with
that.
rexical is a gem which generates
scanner generator. Notice that rexical
is not a scanner itself. It will
generate a scanner for the given rules. Let's give it a try.
Create a folder called pout_language
and in that folder create a file called
pout_scanner.rex
. Notice that the extension of the file is .rex
.
class PoutScanner
end
Before we proceed any further, let's compile to make sure it works.
$ gem install rexical
$ rex pout_scanner.rex -o pout_scanner.rb
$ ls
pout_scanner.rb pout_scanner.rex
While doing gem install do not do gem install rex
. We are installing gem
called rexical
not rex
.
Now it's time to add rules to our pout.rex
file.
Let's try to develop scanner which can detect difference between integers and strings .
class PoutScanner
rule
\d+ { puts "Detected number" }
[a-zA-Z]+ { puts "Detected string" }
end
Regenerate the scanner .
$ rex pout_scanner.rex -o pout_scanner.rb
Now let's put the scanner to test . Let's create pout.rb
.
require './pout_scanner.rb'
class Pout
@scanner = PoutScanner.new
@scanner.tokenize("123")
end
You will get the error undefined method
tokenize' for
#PoutScanner:0x007f9630837980 (NoMethodError)` .
To fix this error open pout_scanner.rex
and add inner section like this .
class PoutScanner
rule
\d+ { puts "Detected number" }
[a-zA-Z]+ { puts "Detected string" }
inner
def tokenize(code)
scan_setup(code)
tokens = []
while token = next_token
tokens << token
end
tokens
end
end
Regenerate the scanner by executing rex pout_scanner.rex -o pout_scanner.rb
.
Now let's try to run pout.rb
file.
$ ruby pout.rb
Detected number
So this time we got some result.
Now let's test for a string .
require './pout_scanner.rb'
class Pout
@scanner = PoutScanner.new
@scanner.tokenize("hello")
end
$ ruby pout.rb
Detected string
So the scanner is rightly identifying string vs integer. We are going to add a
lot more testing so let's create a test file so that we do not have to keep
changing the pout.rb
file.
This is our pout_test.rb
file.
require 'test/unit'
require './pout_scanner'
class PoutTest < Test::Unit::TestCase
def setup
@scanner = PoutScanner.new
end
def test_standalone_string
assert_equal [[:STRING, 'hello']], @scanner.tokenize("hello")
end
end
And this is our Rakefile
file .
require 'rake'
require 'rake/testtask'
task :generate_scanner do
`rex pout_scanner.rex -o pout_scanner.rb`
end
task :default => [:generate_scanner, :test_units]
desc "Run basic tests"
Rake::TestTask.new("test_units") { |t|
t.pattern = '*_test.rb'
t.verbose = true
t.warning = true
}
Also let's change the pout_scanner.rex
file to return an array instead of
puts
statements . The array contains information about what type of element it
is and the value .
class PoutScanner
rule
\d+ { [:INTEGER, text.to_i] }
[a-zA-Z]+ { [:STRING, text] }
inner
def tokenize(code)
scan_setup(code)
tokens = []
while token = next_token
tokens << token
end
tokens
end
end
With all this setup now all we need to do is write test and run rake
.
I added following test and it passed.
def test_standalone_integer
assert_equal [[:INTEGER, 123]], @scanner.tokenize("123")
end
However following test failed .
def test_string_and_integer
assert_equal [[:STRING, 'hello'], [:INTEGER, 123]], @scanner.tokenize("hello 123")
end
Test is failing with following message
1) Error:
test_string_and_integer(PoutTest):
PoutScanner::ScanError: can not match: ' 123'
Notice that in the error message before 123 there is a space. So the scanner does not know how to handle space. Let's fix that.
Here is the updated rule. We do not want any action to be taken when a space is detected. Now test is passing .
class PoutScanner
rule
\s+
\d+ { [:INTEGER, text.to_i] }
[a-zA-Z]+ { [:STRING, text] }
inner
def tokenize(code)
scan_setup(code)
tokens = []
while token = next_token
tokens << token
end
tokens
end
end
Now that we have some background on how scanning works let's get back to
business at hand. The task is to properly parse a routing statement like
/page/:id(/:action)(.:format)
.
The simplest route is one with /
. Let's write a test and then rule for it.
require 'test/unit'
require './pout_scanner'
class PoutTest < Test::Unit::TestCase
def setup
@scanner = PoutScanner.new
end
def test_just_slash
assert_equal [[:SLASH, '/']], @scanner.tokenize("/")
end
end
And here is the .rex
file .
class PoutScanner
rule
\/ { [:SLASH, text] }
inner
def tokenize(code)
scan_setup(code)
tokens = []
while token = next_token
tokens << token
end
tokens
end
end
Here is the test for /page
.
def test_slash_and_literal
assert_equal [[:SLASH, '/'], [:LITERAL, 'page']] , @scanner.tokenize("/page")
end
And here is the rule that was added .
[a-zA-Z]+ { [:LITERAL, text] }
Here is test for /:page
.
def test_slash_and_symbol
assert_equal [[:SLASH, '/'], [:SYMBOL, ':page']] , @scanner.tokenize("/:page")
end
And here are the rules .
rule
\/ { [:SLASH, text] }
\:[a-zA-Z]+ { [:SYMBOL, text] }
[a-zA-Z]+ { [:LITERAL, text] }
Here is test for /(:page)
.
def test_symbol_with_paran
assert_equal [[[:SLASH, '/'], [:LPAREN, '('], [:SYMBOL, ':page'], [:RPAREN, ')']]] , @scanner.tokenize("/(:page)")
end
And here is the new rule
\/\(\:[a-z]+\) { [ [:SLASH, '/'], [:LPAREN, '('], [:SYMBOL, text[2..-2]], [:RPAREN, ')']] }
We'll stop here and will look at the final set of files
This is Rakefile
.
require 'rake'
require 'rake/testtask'
task :generate_scanner do
`rex pout_scanner.rex -o pout_scanner.rb`
end
task :default => [:generate_scanner, :test_units]
desc "Run basic tests"
Rake::TestTask.new("test_units") { |t|
t.pattern = '*_test.rb'
t.verbose = true
t.warning = true
}
This is pout_scanner.rex
.
class PoutScanner
rule
\/\(\:[a-z]+\) { [ [:SLASH, '/'], [:LPAREN, '('], [:SYMBOL, text[2..-2]], [:RPAREN, ')']] }
\/ { [:SLASH, text] }
\:[a-zA-Z]+ { [:SYMBOL, text] }
[a-zA-Z]+ { [:LITERAL, text] }
inner
def tokenize(code)
scan_setup(code)
tokens = []
while token = next_token
tokens << token
end
tokens
end
end
This is pout_test.rb
.
require 'test/unit'
require './pout_scanner'
class PoutTest < Test::Unit::TestCase
def setup
@scanner = PoutScanner.new
end
def test_just_slash
assert_equal [[:SLASH, '/']] , @scanner.tokenize("/")
end
def test_slash_and_literal
assert_equal [[:SLASH, '/'], [:LITERAL, 'page']] , @scanner.tokenize("/page")
end
def test_slash_and_symbol
assert_equal [[:SLASH, '/'], [:SYMBOL, ':page']] , @scanner.tokenize("/:page")
end
def test_symbol_with_paran
assert_equal [[[:SLASH, '/'], [:LPAREN, '('], [:SYMBOL, ':page'], [:RPAREN, ')']]] , @scanner.tokenize("/(:page)")
end
end
Here we used rex
to generate the scanner. Now take a look that the
pout_scanner.rb
. Here is that file .
Please take a look at this file and study the code. It is only 91 lines of code.
If you look at the code it is clear that scanning is not that hard. You can hand
roll it without using a tool like rex
. And that's exactly what Aaron
Patternson did in Journey . He hand rolled
the
scanner .
In this blog we saw how to use rex
to build the scanner to read our routing
statements . In the next blog we'll see how to parse the routing statement and
how to find the matching routing statement for a given url .
If this blog was helpful, check out our full blog archive.