February 5, 2019
We have a client that uses multi-tenant database where each database holds data for each of their customers. Whenever a new customer is added, a service dynamically creates a new database. In order to seed this new database we were tasked to implement a feature to copy data from existing "demo" database.
The "demo" database is actually a live client where sales team does demo. This ensures that the data that is copied is fresh and not stale.
We implemented a solution where we simply listed all the tables in namespace and used activerecord-import
to copy the table data.
We used activerecord-import
gem to keep code agnostic of underlying database as we used different databases in development from production.
Production is "SQL Server" and development database is "PostgreSQL".
Why this project ended up having different database in development and in production
is worthy of a separate blog.
When we started using the above mentioned strategy then we quickly ran into a problem. Inserts for some tables were failing.
insert or update on table "dependent_table" violates foreign key constraint "fk_rails"
Detail: Key (column)=(1) is not present in table "main_table".
The issue was we had foreign key constraints on some tables and "dependent" table was being processed before the "main" table.
So initially we thought of simply hard coding the sequence in which to process the tables. It means if any new table is added then we will have to update the service to include the newly added table. So we needed a way to identify the foreign key dependencies and determine the sequence to copy the tables at runtime. To resolve this issue, we thought of using Topological Sorting.
To get started we need the list of dependencies of "main" and "dependent" tables. In Postgresql, this sql query fetches the table dependencies.
SELECT
tc.table_name AS dependent_table,
ccu.table_name AS main_table
FROM
information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu
ON tc.constraint_name = kcu.constraint_name
AND tc.table_schema = kcu.table_schema
JOIN information_schema.constraint_column_usage AS ccu
ON ccu.constraint_name = tc.constraint_name
AND ccu.table_schema = tc.table_schema
WHERE constraint_type = 'FOREIGN KEY'
and (tc.table_name like 'namespace_%' or ccu.table_name like 'namespace_%');
=> dependent_table | main_table
-----------------------------------
dependent_table1 | main_table1
dependent_table2 | main_table2
The above query fetches all the dependencies for only the tables have namespace or the tables we are interested in.
The output of above query was [[dependent_table1, main_table1], [dependent_table2, main_table2]]
.
Ruby has a TSort
module that for implementing topological sorts.
So we needed to run the topological sort on the dependencies. So we inserted the dependencies into a hash and included the TSort
functionality into the hash. Following is the way to include the TSort
module into hash by subclassing the Hash
.
require "tsort"
class TsortableHash < Hash
include TSort
alias tsort_each_node each_key
def tsort_each_child(node, &block)
fetch(node).each(&block)
end
end
# Borrowed from https://www.viget.com/articles/dependency-sorting-in-ruby-with-tsort/
Then we simply added all the tables to dependency hash, as below
tables_to_sort = ["dependent_table1", "dependent_table2", "main_table1"]
dependency_graph = tables_to_sort.inject(TsortableHash.new) {|hash, table| hash[table] = []; hash }
table_dependency_map = fetch_table_dependencies_from_database
=> [["dependent_table1", "main_table1"], ["dependent_table2", "main_table2"]]
# Add missing tables to dependency graph
table_dependency_map.flatten.each {|table| dependency_graph[table] ||= [] }
table_dependency_map.each {|constraint| dependency_graph[constraint[0]] << constraint[1] }
dependency_graph.tsort
=> ["main_table1", "dependent_table1", "main_table2", "dependent_table2"]
The output above, is the dependency resolved sequence of tables.
Topological sorting is pretty useful in situations where we need to resolve dependencies and Ruby provides a really helpful tool TSort
to implement it without going into implementation details. Although I did spend time in understanding the underlying algorithm for fun.
If this blog was helpful, check out our full blog archive.