A no-go fantasy: writing Go in Ruby with Ruby Next
Translations
- JapaneseRubyファイルにGoコードを書いてRuby Nextで動かす
Ruby is awesome. We love its readability, flexibility, and developer-centric nature. Still, here at Mars, we also love Go because it has its own sense of simplicity and magic, too. Simply put, Go holds advantages over Ruby in some aspects: it’s faster, statically typed, and comes with cool concurrency primitives out of the box. This being said, some intrepid readers might find themselves grappling with a seemingly reasonable question: if Go is so darn good, why don’t we just write everything with it?
After all, sometimes Ruby just isn’t enough to handle to the task at hand, and there have been times where we’ve had to rewrite parts of some applications with something a bit faster. Case in point: the birth of imgproxy came about via Go while we were working on a Ruby project for eBay.
So, why not just use Go? The serious answer to that question is…
…well, actually, who needs the serious answer? What’s the point in yet another article covering when to use Go versus when to use Ruby? Let’s just toss that idea out! Instead, let’s approach this article through the lens of fantasy and pretend that—because of (possibly apocalyptic) circumstances beyond our control—we need to write literally everything in Go!
Assessing the situation
So, let’s say all of our projects have already been written in Ruby, but in this fantasy world, we’ll need to abandon that language. How can we “let it Go?” (“It” being Ruby, of course). There are a few options available here:
- A complete rewrite. We could use Go and simply take the time to rewrite the entire project. But, well, this would probably be ill-advised. Let’s look at some other options.
- Write new microservices. Of course, new microservices would allow you to start new small projects using Go, but what about all the legacy Ruby code you’d be missing out on?
- The perfect fantasy solution… let’s write Go in Ruby!
Our goal is to take a new class, written in Go, and have it work right in the middle of the old Ruby codebase. That way, when it’s actually time to migrate, we’ll already be 80% ready to go. To test this idea, let’s take the “Hello World” example from A Tour of Go and change the file extension so that main.go
will be main.go.rb
:
# main.go.rb
package main
import "fmt"
func main() {
fmt.Println("Hello, 世界")
}
To start, let’s just try running it:
$ ruby main.go.rb
main.go.rb:1:in `<main>': undefined local variable or method `main' for main:Object (NameError)
Well, unfortunately, it didn’t work. This means we’ll have to handwrite some code—a rare case with Ruby, but it does happen from time to time.
Implementing Go packages in Ruby
First, we are going to implement the package
method. Here is how Ruby sees package main
:
package main #=> package(main())
package foo #=> package(foo())
Ruby calls the main
method and passes the result to the package
method. But main
and foo
are undefined… so there is only one way to make it work.
Let’s add method_missing
, which will always return the name of a method:
class << self
def method_missing(name, *_args)
name
end
end
This method would be just perfect for actual, real-world production code, no? 😏
Now, let’s go over what the package
method does. In Go, packages limit the visibility of variables, functions, and so on. So, packages are essentially a namespace, and they’re reminiscent of a Ruby Module:
package foo
# =>
module GoRuby
module Foo
# defined functions
end
end
So, we have to take the name of the package and declare a module with that name. Let’s do this inside a GoRuby
module to avoid cluttering up the global namespace:
class << self
def package(pkg)
mod_name = pkg.to_s.capitalize
unless GoRuby.const_defined?(mod_name)
GoRuby.module_eval("#{mod_name} = Module.new { extend self }")
end
@__go_package__ = GoRuby.const_get(mod_name)
end
end
If the module mod_name
doesn’t exist yet, we use module_eval
to define a new module. Finally, we’ll save the resulting module into the @__go_package
variable (we’ll need it later). That’s it for package
. Let’s move on to the import
method!
Making Go imports work
In Go, we import a package and then call it by name to access its methods. For example, fmt.Println("Hello, 世界")
. Let’s demonstrate what we need to do to implement this:
import "fmt"
# =>
def fmt
GoRuby::Fmt
end
It seems simple enough:
class << self
def import(pkg)
mod_name = pkg.split('_').collect(&:capitalize).join # String#camelize from ActiveSupport
raise "unknown package #{pkg}" unless GoRuby.const_defined?(mod_name)
define_method(pkg) { GoRuby.const_get(mod_name) }
end
end
If mod_name
is not defined, we raise an exception. Otherwise, we create a method with define_method
.
We’ve dealt with imports. Let’s keep moving on and tackle function declarations next.
Dealing with function declarations
Here’s a little quiz: where will the block be passed?
func main() {
# some actions
}
# => Where will the block be passed?
# 1 block goes to foo
func(main()) {
# some actions
}
# 2 block goes to main
func(main() {
# some actions
})
If there were a RuboCop in the room, it would have already started sounding the alert. 🚨 Without parentheses, Ruby will pass the block to the main
method, which doesn’t exist. And that’s where method_missing
comes into play!
class << self
def method_missing(name, *_args, &block)
if block
[name, block.to_proc]
else
name
end
end
end
If a block is received, we’ll return it as the second element of the array. And here’s how we implement the func
method:
class << self
def func(attrs)
current_package = @__go_package__
name, block = attrs
if current_package.respond_to? name
raise "#{name} already defined for package #{current_package}"
end
current_package.module_eval { define_method(name, block) }
end
end
This time, there’s no magic: we define a method with a name and a block from method_missing
in the currently active module (@__go_package__
). Now the only thing left is implementing the Go standard library.
For now, we have one method of outputting a string to stdout—that’s enough for us. We’ll just leave it as is! Take a look:
module GoRuby
module Fmt
class << self
def Println(*attrs)
str = "#{attrs.join(' ')}\n"
$stdout << str
[str.bytesize, nil]
end
end
end
end
Go Ruby, Go!
Well, it seems like this should take care of everything! Let’s require our new library and run main.go.rb
:
$ ruby -r './go_ruby.rb' main.go.rb
Well, we didn’t get any errors, but we don’t see Hello, 世界
, either. The main()
function from the main
package is the entry point of the executable Go programs. And we have not implemented it in our library yet. Fortunately, Ruby has a callback method called at_exit
, so we’ll use it:
at_exit do
GoRuby::Main.main
end
And now let’s run the code again (here’s a gist for you to follow along at home):
$ ruby -r './go_ruby.rb' main.go.rb
Hello, 世界
You know how Go-developers have an obsession with build time? Well, there’s 0 build time here 😉 Awesome, it’s even better than Go.
Let’s dress up things a little further, shall we?
Go deeper
What about the :=
method? That should be easy to implement, right? We expect the No method ':=' is found
error to be raised. To solve it, we’ll simply define the method on the Object
, and that will be it:
$ ruby -e 'a := "Hello, 世界"'
-e:1: syntax error, unexpected '=', expecting literal content or terminator or tSTRING_DBEG or tSTRING_DVAR
a := "Hello, 世界"
We were half right. There is indeed an error, but a slightly different one. It’s happening as a result of the parser. Here is what happens:
First, our code goes to the lexer. It splits the text into tokens, then the array of tokens goes to the parser, and it raises an error because it’s an illegal operation to put an equal sign after a colon. How can we fix this? Let’s explore some realistic options:
- Persuade the Ruby team to add
:=
. You could spend a couple of years mastering Japanese, gain the trust of the core team, find a way to attend one of their meetings, and simply propose to add:=
. - Fork Ruby. You can make and maintain your fork of Ruby. Most people have Docker anyway, so businesses won’t even notice you made the swap.
- Wave our hands in the air and transpile like we just don’t care. We could write Go and transpile it in Ruby! If only we had something like Babel for JavaScript/TypeScript, but for Ruby instead… 🤔
When DSL is not enough
Well, actually, we already have a tool to transpile our code: Ruby Next which was written by the one and only Vladimir Dementyev, so why not use it?
To transpile our code, Ruby Next hijacks the code loading process, parsing it with its own updated lexer and parser. Next, the resulting AST nodes are modified with Ruby Next rewriters. The Ruby Next rewriter marks the AST nodes which require attention and modify them accordingly. Finally, Ruby Next rewrites the code with unparser
, taking the marked AST nodes into account.
But enough talk! Let’s actually formulate an actionable plan to bring :=
to Ruby:
- First, we’ll modify the Ruby Next lexer.
- Likewise, we’ll modify the Ruby Next parser.
- Next, we’ll write the Ruby Next rewriter.
- Finally, we’ll be able to actually use
:=
inside our code.
Rolling up our sleeves: modifying the lexer
So, let’s start with the lexer. We want the lexer to treat :=
as a single token and for the parser to return the same AST node that is returned when we use the simple =
method:
After that, we’ll be able to read our main.go.rb
file, get AST, and rewrite the original code before executing it with Ruby.
The lexer in the parser
gem was written using Ragel. Ragel State Machine Compiler is a finite-state machine compiler and a parser generator. It receives some code with regex expressions and various logical rules and outputs an optimized state machine in the target language, e.g., C++, Java, Ruby, Go, etc.
By the way, our lexer ends up with 2.5 thousand lines. And if we run Ragel, we’ll get an optimized Ruby class with almost 24 thousand lines:
$ wc -l lib/parser/lexer.rl
2556 lib/parser/lexer.rl
$ ragel -F1 -R lib/parser/lexer.rl -o lib/parser/lexer.rb
$ wc -l lib/parser/lexer.rb
23780 lib/parser/lexer.rb
Let’s just make a quick diversion and take a brisk look through the source code!
Just kidding! Instead, let’s write our own simple lexer for arithmetic operations.
Creating a simple lexer
We want to create a simple lexer for arithmetic operations. In this case, a string will go into the lexer and it should be transformed into an array of tokens:
We’ll start by defining our Ruby class, let’s call it Lexer
:
class Lexer
def initialize
@data = nil # array of input symbols
@ts = nil # token start index
@te = nil # token end index
@eof = nil # EOF index
@tokens = [] # resulting array of tokens
end
end
Ragel will work in the context of this class object, and it assumes that data
, ts
, and te
are defined. To go to the final state, we need the EOF index eof
. tokens
is an array of tokens that we will return to the user.
Let’s move on to the state machine. The Ragel code is located between %%{ ... }%%
:
class Lexer
%%{ # fix highlighting %
# name state machine
machine ultimate_math_machine;
# tell Ragel how to access variables defined earlier
access @;
variable eof @eof;
# regexp-like rules
number = ('-'?[0-9]+('.'[0-9]+)?);
# main rule for the state machine
main := |*
# when number is passed, print indices of parsed number
number => { puts "number [#{@ts},#{@te}]" };
# any is a predefined Ragel state machine for any symbol,
# just ignore everything for now
any;
*|;
}%% # fix highlighting %
end
In the code above, we defined a state machine called ultimate_math_machine
, and we’ve told Ragel how to access the variables we defined earlier. Next, we defined a regexp-like rule for numbers. Finally, we declared the state machine itself.
If we come across a number, we execute the Ruby code inside braces. In our case, it’s the output of the token type and its indices. Also, for now, we’ll use the predefined state machine any
to skip over all other symbols.
Now, the only thing left is to add the Lexer#run
method to prepare input data, initialize the Ragel state machine and execute it:
class Lexer
def run(input)
@data = input.unpack("c*")
@eof = @data.size
%%{ # fix highlighting %
write data;
write init;
write exec;
}%% # fix highlighting %
end
end
We unpack the input string and calculate the EOF index, write data
and write init
initializes the Ragel state machine, write exec
runs it. It’s time to compile and run this simple machine:
$ ragel -R lexer.rl -o lexer.rb
$ ruby -r './lexer.rb' -e 'Lexer.new.run("40 + 2")'
number [0,2]
number [5,6]
It works! Now we need to populate the @tokens
array with our tokens and add some operator rules. Here is the Ruby portion of the code:
class Lexer
# list of all symbols in our calculator with token names
PUNCTUATION = {
'+' => :tPLUS, '-' => :tMINUS,
'*' => :tSTAR, '/' => :tDIVIDE,
'(' => :tLPAREN, ')' => :tRPAREN
}
def run(input)
@data = input.unpack("c*") if input.is_a?(String)
@eof = input.length
%%{ # fix highlighting %
write data;
write init;
write exec;
}%% # fix highlighting %
# return tokens as a result
@tokens
end
# rebuild substring from input array and current indices
def current_token
@data[@ts...@te].pack("c*")
end
# push current token to the resulting array
def emit(type, tok = current_token)
@tokens.push([type, tok])
end
# use passed hash `table` to define type of the token and call `emit`
def emit_table(table)
token = current_token
emit(table[token], token)
end
end
Here we have a new Lexer#emit
method, which adds a token to the resulting @tokens
array, and a fancy Lexer#emit_table(table)
method, which will use the PUNCTUATION
hash to define the token type and then add it to the resulting array. Also, we will return @tokens
at the end of the Lexer#run
method. It’s time to trick out our state machine block:
class Lexer
%%{ # fix highlighting %
machine ultimate_math_machine;
access @;
variable eof @eof;
# regexp-like rules
number = ('-'?[0-9]+('.'[0-9]+)?);
operator = "+" | "-" | "/" | "*";
paren = "(" | ")";
main := |*
# when number is passed, call emmit with token type :tNUMBER
number => { emit(:tNUMBER) };
# when an operator or a parenthesis is passed,
# call emmit_table to use PUNCTUATION to choose token
operator | paren => { emit_table(PUNCTUATION) };
# space is a predefined Ragel state machine for whitespaces
space;
*|;
}%% # fix highlighting %
end
When the state machine encounters a number, we call the Lexer#emit
method. When it encounters an operator or a parenthesis, we call Lexer#emit_table
. Also, we swapped the any
and space
state machines to skip all the whitespaces. Here is a full gist of our lexer. Let’s compile it and run it once again!
$ ragel -R lexer.rl -o lexer.rb
$ ruby -r './lexer.rb' -e 'p Lexer.new.run("2 + (8 * 5)")'
[[:tNUMBER, "2"], [:tPLUS, "+"], [:tLPAREN, "("], [:tNUMBER, "8"], [:tSTAR, "*"], [:tNUMBER, "5"], [:tRPAREN, ")"]]
There are our tokens, awesome!
Ruby Next lexer
Now we are ready to tweak the lexer from the gem parser
. Let’s start by adding a new token to the PUNCTUATION
hash:
PUNCTUATION = {
'=' => :tEQL, '&' => :tAMPER2, '|' => :tPIPE,
':=' => :tGOEQL, # other tokens
}
Add :=
to the punctuation_end
rule:
# A list of all punctuation except punctuation_begin.
punctuation_end = ',' | '=' | ':=' | '->' | '(' | '[' |
']' | '::' | '?' | ':' | '.' | '..' | '...' ;
Finally, add :=
to one of the state machines, expr_fname
, just before the colon:
'::'
=> { fhold; fhold; fgoto expr_end; };
':='
=> { fhold; fhold; fgoto expr_end; };
':'
=> { fhold; fgoto expr_beg; };
fhold
and fgoto
are Ragel functions: one manages indices, and the other calls the next state machine.
Our journey through the lexer is almost complete. Let’s check it out by writing a test:
# test/ruby-next/test_lexer.rb
def test_go_eql
setup_lexer "next"
assert_scanned(
'foo := 42',
:tIDENTIFIER, 'foo', [0, 3],
:tGOEQL, ':=', [4, 6],
:tINTEGER, 42, [7, 9]
)
end
After running it, everything is working correctly, so let’s push onward to the parser!
Hey, more pushing? This is a lot of pushing! Well, this is a hardcore technical article on the Evil Martians blog—what did you expect? No worries, we’re almost there—so hang on! 💪
Keep rolling: modifying the parser
The parser inside the parser
gem is Rake Rack Racc—YACC in Ruby, for Ruby. Just like Ragel, Racc takes a file as an input and compiles it into a Ruby-class parser.
For Racc to work, we must create a grammar file with a rules block and a parser class with the #next_token
method defined. To understand how the parser
gem works, we will write our own parser from scratch. Let’s take the output from our lexer, pass it to the parser, and we’ll get AST nodes as an output:
By the way, if you’re wondering why we need both a lexer and a parser, imagine this math problem: 2 + (1 + 7) * 5. The lexer doesn’t know anything with regards to parentheses and operator priorities. The lexer just returns a stream of tokens; the parser is responsible for grouping AST nodes.
Creating a simple parser
Let’s start writing our parser by defining the MatchParser
class:
class MathParser
# Token types from our lexer
token tPLUS tMINUS tSTAR
tDIVIDE tLPAREN tRPAREN
tNUMBER
# operator precedence
prechigh
left tSTAR tDIVIDE
left tPLUS tMINUS
preclow
rule
# exp is one of the other rules
exp: operation
| paren
| number
# return :number node
number: tNUMBER { result = [:number, val[0]] }
# return result between parentheses
paren: tLPAREN exp tRPAREN { result = val[1] }
# return :send node for all operations
operation: exp tPLUS exp { result = [:send, val[0], val[1].to_sym, val[2]] }
| exp tMINUS exp { result = [:send, val[0], val[1].to_sym, val[2]] }
| exp tSTAR exp { result = [:send, val[0], val[1].to_sym, val[2]] }
| exp tDIVIDE exp { result = [:send, val[0], val[1].to_sym, val[2]] }
end
We listed the token types from our lexer in the token
block and defined operator precedence in the prechigh
block. Finally, in the rule
block, we defined the following rules:
number
—when the parser gets a number, we’ll add an AST node to the special variableresult
.paren
—when the parser gets an expression inside parentheses, return AST node of expression.operation
—when the parser gets a binary operation, return AST node:send
with an operator and resulting AST nodes of two expressions.
Below the MathParser
class, we can define two special blocks, ---- header
and ---- inner
:
# class MathParser ... end
---- header
require_relative "./lexer.rb"
---- inner
def parse(arg)
@tokens = Lexer.new.run(arg)
do_parse
end
def next_token
@tokens.shift
end
Within the ---- header
block, we can define imports. Here, we’ve imported our lexer.
And in the ---- inner
block, we can define the methods of the parser class. The main method MathParser#parse
gets tokens from the lexer and calls do_parse
to start parsing. The MathParser#next_token
method fetches tokens from the array one by one. Here is a full gist of our parser.
Let’s build and run it! Like so:
$ racc parser.y -o parser.rb
$ ruby -r './parser.rb' -e 'pp MathParser.new.parse("5 * (4 + 3) + 2");'
[:send,
[:send, [:number, "5"], :*, [:send, [:number, "4"], :+, [:number, "3"]]],
:+,
[:number, "2"]]
Ruby Next parser
Great, we wrote our parser, complied it to a Ruby parser, and now we can finally move on to the parser from the gem parser
!
Let’s take a look at how a common assignment from the gem works:
arg: lhs tEQL arg_rhs
{
result = @builder.assign(val[0], val[1], val[2])
}
#...
When the parser gets a tEQL
token, it calls the Parser::Builders::Default#assign
method:
def assign(lhs, eql_t, rhs)
(lhs << rhs).updated(
nil, nil,
location => lhs.loc
.with_operator(loc(eql_t))
.with_expression(join_exprs(lhs, rhs))
)
end
Upon closer inspection, it’s clear that the eql_t
token is used here only to calculate the operator location in the input text. What does this mean? We can simply reuse this method with our new token, and it will do the work for us!
Let’s add our new token:
token kCLASS kMODULE kDEF kUNDEF kBEGIN kRESCUE kENSURE kEND kIF kUNLESS
# ...
tRATIONAL tIMAGINARY tLABEL_END tANDDOT tMETHREF tBDOT2 tBDOT3
tGOEQL
Next, we’ll locate the rules for the assignment with tEQL
token and copy them, replacing the token with tGOEQL
:
command_asgn: lhs tEQL command_rhs
{
result = @builder.assign(val[0], val[1], val[2])
}
| lhs tGOEQL command_rhs
{
result = @builder.go_assign(val[0], val[1], val[2])
}
#...
arg: lhs tEQL arg_rhs
{
result = @builder.assign(val[0], val[1], val[2])
}
| lhs tGOEQL arg_rhs
{
result = @builder.go_assign(val[0], val[1], val[2])
}
#...
That’s that! Now let’s add a test:
# test/ruby-next/test_parser.rb
def test_go_eql
assert_parses(
s(:lvasgn, :foo, s(:int, 42)),
%q{foo := 42},
%q{ ^^ operator
|~~~~~~~~~ expression},
SINCE_NEXT
)
end
After running the test, we see everything is working properly, and the parser
gem is now ready to handle :=
.
What’s next?
Coming next, Ruby Next
Ruby Next has multiple modes. In transpiler mode, we feed the files into Ruby Next and get rewritten code as output (yep, just like with Racc and Ragel!). In runtime mode, Ruby Next simply patches the files on the go.
It doesn’t matter which mode we choose because, in any case, the source code will be run through the lexer, the parser, and finally through the available Ruby Next rewriters.
Let’s look at our case—the only way our new AST node differs from the common assignment AST node is the operator:
So let’s write a Ruby Next rewriter to replace it!
To catch our AST node, we need to define a method called "on_#{ast_type}"
. In our case, the AST node type is lvasgn
, so the method name will be #on_lvasgn
:
# lib/ruby-next/language/rewriters/go_assign.rb
module RubyNext
module Language
module Rewriters
class GoAssign < Base
NAME = "go-assign".freeze
SYNTAX_PROBE = "foo := 42".freeze
MIN_SUPPORTED_VERSION = Gem::Version.new(RubyNext::NEXT_VERSION)
def on_lvasgn(node)
# skip if operator is already '='
return if node.loc.operator == "="
# mark ast-node requiring rewriting
context.track! self
# change the operator to '='
replace(node.loc.operator, "=")
end
end
end
end
end
Inside the on_lvasgn
method, we check the node’s operator, and if it’s not =
, we mark the node as requiring a rewrite to change the operator from :=
to =
.
Next, we need to register our new rewriter by adding it to RubyNext::Language.rewriters
. Let’s register it as a proposed feature by modifying lib/ruby-next/language/proposed.rb
:
# lib/ruby-next/language/proposed.rb
# ...
require "ruby-next/language/rewriters/go_assign"
RubyNext::Language.rewriters << RubyNext::Language::Rewriters::GoAssign
And with that, here’s our Go code:
# main.go.rb
package main
import "fmt"
func main() {
s := "Hello, 世界"
fmt.Println(s)
}
Finally, let’s require uby-next.rb
and run it:
$ ruby -ruby-next -r './go_ruby.rb' main.go.rb
Hello, 世界
Hooray! Let’s take a second to celebrate our awesome accomplishment with an emoji! Unfortunately, there isn’t one which encapsulates exactly what we’ve done here, so this cup with a straw will have to be enough this time: 🥤
A serious conclusion
Ok, Go, perhaps we got a little sidetracked with you and writing this Ruby Go-berg Machine experiment of ours. But let’s go back, the original question was: if Go is so darned good, why do we still use Ruby at all?
Well, in this article, we used Ruby’s DSL superpowers to replicate the functionality of Go. Then we ran into a problem, and to solve it, we learned how ruby-next
uses the ruby-next-parser
gem. We wrote our own lexer and parser. And finally, we modified ruby-next-parser
and added a new rewriter to ruby-next
.
So, could you do all this beautiful madness with the same ease with Go? Yeah, no way.
Ruby is so powerful that we can do literally everything with it—for example, waste everyone’s time by using it to run Go 🤪—while in Go itself, we would need to write our own implementation of Ruby VM to run Ruby, or use C-bindings and mruby, and it wouldn’t be as easy as writing Ruby DSL, or tweaking a couple of lines in the parser.
That being said, there’s really no practical reason to write Go in Ruby—but guess what? That doesn ’t matter! That’s because you now have a great new trick in your toolkit: you can modify Ruby exactly how you want. So, do it! Play around, test your implementations in the real world, and try proposing your best ideas to the community by opening an issue in Ruby Tracker. With great power comes great Ruby ability! (Or something like that).