October 11, 2012
Solr is an open source search platform from Apache. It has a very powerful full-text search capability among other things.
Solr is written in Java. And it runs as a standalone search server within a servlet container like Tomcat. When you are working on a Ruby on Rails application you do not want to maintain Tomcat server. This is where websolr comes in picture. Websolr manages the index and the Rails application interacts with index using a gem called sunspot-rails .
# Gemfile
gem 'sunspot_rails', '= 1.3.3' # search feature
Here I am interested in searching products.
class Product < ActiveRecord::Base
searchable do
text :name, boost: 1.5
text :description
end
end
rails g sunspot_rails:install
Above command creates config/sunspot.yml
file. By default this file looks like
following.
production:
solr:
hostname: localhost
port: 8983
log_level: WARNING
development:
solr:
hostname: localhost
port: 8982
log_level: INFO
test:
solr:
hostname: localhost
port: 8981
log_level: WARNING
The way sunspot works is that after every single web request it updates solr
about the changes that took place in the request. This is not desirable. To turn
that off add auto_commit_after_request
option to false in the
config/sunsunspot.yml
file.
I would also change the log_level
for development to DEBUG
. The revised
config/sunspot.yml
file would look like
production:
solr:
hostname: localhost
port: 8983
log_level: WARNING
auto_commit_after_request: false
development:
solr:
hostname: localhost
port: 8980
log_level: DEBUG
auto_commit_after_request: false
test:
solr:
hostname: localhost
port: 8981
log_level: DEBUG
auto_commit_after_request: false
In the above case anytime I create, update or destroy a product then as part of
after_save
callback solr commit commands are issued. Since after_save
callbacks are part of ActiveRecord transaction, this slows up the create, update
and destroy operation. I like all these operations to happen in background.
Here is how I handled it
class Product < ActiveRecord::Base
searchable do
text :name, boost: 1.5
text :description
end
handle_asynchronously :solr_index, queue: 'indexing', priority: 50
handle_asynchronously :solr_index!, queue: 'indexing', priority: 50
handle_asynchronously :remove_from_index, queue: 'indexing', priority: 50
end
In the above case I used Delayed Job but you can use any background job processing tool.
In case of Delayed Job the higher the priority value the less is the priority. By bumping the priority value to 50, I'm making sure that emails and other background jobs are processed before solr work is taken up.
remove_from_index
In the above case the call to remove_from_index
has been deferred to Delayed
Job. However the record has already been destroyed. So when Delayed Job takes up
the work it first tries to retrieve the record. However the record is missing
and the background job fails.
Here is how we solved this problem.
class Product < ActiveRecord::Base
searchable do
text :name, boost: 1.5
text :description
end
handle_asynchronously :solr_index, queue: 'indexing', priority: 50
handle_asynchronously :solr_index!, queue: 'indexing', priority: 50
def remove_from_index_with_delayed
Delayed::Job.enqueue RemoveIndexJob.new(record_class: self.class.to_s, attributes: self.attributes), queue: 'indexing', priority: 50
end
alias_method_chain :remove_from_index, :delayed
end
Add another worker named remove_index.rb
.
class RemoveIndexJob < Struct.new(:options)
def perform
return if options.nil?
options.symbolize_keys!
record = options[:record_class].constantize.new options[:attributes].except("id")
record.id = options[:attributes]["id"]
record.remove_from_index_without_delayed
end
end
From the websolr documentation it was not clear that the sunspot gem first looks
for an environment variable called WEBSOLR_URL
and if that environment
variable has a value then sunspot assumes that the solr index is at that url. If
no value is found then it assumes that it is dealing with local solr instance.
So if you are using websolr then make sure that your application has environment
variable WEBSOLR_URL
properly configured in staging and in production
environment.
If this blog was helpful, check out our full blog archive.