We have a client that uses multi-tenant database where each database holds data for each of their customers. Whenever a new customer is added, a service dynamically creates a new database. In order to seed this new database we were tasked to implement a feature to copy data from existing "demo" database.
The "demo" database is actually a live client where sales team does demo. This ensures that the data that is copied is fresh and not stale.
We implemented a solution where we simply listed all the tables in namespace and used activerecord-import to copy the table data. We used activerecord-import gem to keep code agnostic of underlying database as we used different databases in development from production. Production is "SQL Server" and development database is "PostgreSQL". Why this project ended up having different database in development and in production is worthy of a separate blog.
When we started using the above mentioned strategy then we quickly ran into a problem. Inserts for some tables were failing.
1insert or update on table "dependent_table" violates foreign key constraint "fk_rails" 2Detail: Key (column)=(1) is not present in table "main_table".
The issue was we had foreign key constraints on some tables and "dependent" table was being processed before the "main" table.
So initially we thought of simply hard coding the sequence in which to process the tables. It means if any new table is added then we will have to update the service to include the newly added table. So we needed a way to identify the foreign key dependencies and determine the sequence to copy the tables at runtime. To resolve this issue, we thought of using Topological Sorting.
Topological Sorting
To get started we need the list of dependencies of "main" and "dependent" tables. In Postgresql, this sql query fetches the table dependencies.
1SELECT 2 tc.table_name AS dependent_table, 3 ccu.table_name AS main_table 4FROM 5 information_schema.table_constraints AS tc 6 JOIN information_schema.key_column_usage AS kcu 7 ON tc.constraint_name = kcu.constraint_name 8 AND tc.table_schema = kcu.table_schema 9 JOIN information_schema.constraint_column_usage AS ccu 10 ON ccu.constraint_name = tc.constraint_name 11 AND ccu.table_schema = tc.table_schema 12WHERE constraint_type = 'FOREIGN KEY' 13and (tc.table_name like 'namespace_%' or ccu.table_name like 'namespace_%'); 14 15=> dependent_table | main_table 16----------------------------------- 17 dependent_table1 | main_table1 18 dependent_table2 | main_table2
The above query fetches all the dependencies for only the tables have namespace or the tables we are interested in. The output of above query was [[dependent_table1, main_table1], [dependent_table2, main_table2]].
Ruby has a TSort module that for implementing topological sorts. So we needed to run the topological sort on the dependencies. So we inserted the dependencies into a hash and included the TSort functionality into the hash. Following is the way to include the TSort module into hash by subclassing the Hash.
1require "tsort" 2 3class TsortableHash < Hash 4 include TSort 5 6 alias tsort_each_node each_key 7 8 def tsort_each_child(node, &block) 9 fetch(node).each(&block) 10 end 11end 12# Borrowed from https://www.viget.com/articles/dependency-sorting-in-ruby-with-tsort/
Then we simply added all the tables to dependency hash, as below
1tables_to_sort = ["dependent_table1", "dependent_table2", "main_table1"] 2dependency_graph = tables_to_sort.inject(TsortableHash.new) {|hash, table| hash[table] = []; hash } 3 4table_dependency_map = fetch_table_dependencies_from_database 5=> [["dependent_table1", "main_table1"], ["dependent_table2", "main_table2"]] 6 7# Add missing tables to dependency graph 8table_dependency_map.flatten.each {|table| dependency_graph[table] ||= [] } 9 10table_dependency_map.each {|constraint| dependency_graph[constraint[0]] << constraint[1] } 11 12dependency_graph.tsort 13=> ["main_table1", "dependent_table1", "main_table2", "dependent_table2"]
The output above, is the dependency resolved sequence of tables.
Topological sorting is pretty useful in situations where we need to resolve dependencies and Ruby provides a really helpful tool TSort to implement it without going into implementation details. Although I did spend time in understanding the underlying algorithm for fun.