A multi-tenant application uses a single codebase and database to serve multiple accounts as if each account has its own, isolated instance of the project. Gyroscope uses strategies such as gsguard and vendor portals to conveniently and securely implement multi-tenant experiences.
There are times when the shared data needs to be isolated. For example, when an account is closed, all the associated data and file may be removed. A more complex scenario involves multiple databases sharing the same user registry. Gyroscope supports a federated signon mode so that a user can login in from any regional domain and seamlessly land on the correct server.
Moving an account from one database to another can be tricky. This is because the ID space of a database is not preserved by another. For example, a User "101" on Server A has an Invoice "51". The invoice contains items that are ID'ed "4001" and "4002". Thanks to user sharding and federated signon, we know for certain that only Server A has User 101 related records. Now we must move all of User 101's records from Server A to Server B. But there might already be an Invoice 51 on Server B, that's owned by another user. If we can somehow change 51 to, say, 72, to avoid an ID collision, then then invoice item 4001 must also point to the new Invoice ID. The invoice items have their own ID space to worry about.
At the time of writing, we are building some experimental tools for migrating multi-tenanted data. The tools will be made available in the /utils folder; they are run from the command line.
A typical migration flow is to run gsexport first on the source server. This will generate portable files in a designated folder. The folder can then be archived and transported to a target server, where gsimport is run.
All the tools rely on a definition file gsexport_config.php. This file describes how each database table connects to each other.
Here's an example configuration:
$tables=array(
'gss'=>array(
'type'=>'root',
'files'=>array(
'logo'=>'../../protected/clogos/#.gif'
)
),
'users'=>array(
'type'=>'base',
'mnt'=>'users',
'mntid'=>'userid'
),
'templatetypes'=>array(
'type'=>'base',
'mnt'=>'templatetypes',
'mntid'=>'templatetypeid',
'defrefs'=>array(
'activetemplateid'=>array(
'reftable'=>'templates',
'refkey'=>'templateid'
)
)
),
'templates'=>array(
'type'=>'1-n',
'mnt'=>'templates',
'mntid'=>'templateid',
'chaintables'=>array('templatetypes','templates'),
'chainlinks'=>array('templatetypeid-templatetypeid'),
'refs'=>array(
'templatetypeid'=>array(
'reftable'=>'templatetypes',
'refkey'=>'templatetypeid'
)
)
),
'actionlog'=>array(
'type'=>'base',
'mnt'=>'actionlog',
'mntid'=>'alogid',
'refs'=>array(
'userid'=>array('reftable'=>'users','refkey'=>'userid'),
'recid'=>array('func'=>'actionlog')
)
)
);
Base vs. 1-N
Not counting the special "root" table for gss, there are mainly two types of tables. A base table has a "gsid" column. And a 1-n table relies on a parent table to resolve the gs ownership.
The gsexport tool scans all the tables and builds a master index file that contains all the IDs. The "mnt" field, which stands for "mount", is the name space for the object. In an earlier version, we've attempted a segmented mount structure, so that the "many" part in a one-to-many table can be nested inside another object. This turned out to be unnecessary; a flat structure works better. We can use the database table name for "mnt", and the primary key of the table for "mntid".
The entry for users table bears the simplest form:
'users'=>array(
'type'=>'base',
'mnt'=>'users',
'mntid'=>'userid'
)
1-N Chain Tables and Chain Links
In Gyroscope, the "templates" table does not have a gsid. Records in templates are linked to the templatetypes table:
'templatetypes'=>array(
'type'=>'base',
'mnt'=>'templatetypes',
'mntid'=>'templatetypeid'
),
'templates'=>array(
'type'=>'1-n',
'mnt'=>'templates',
'mntid'=>'templateid',
'chaintables'=>array('templatetypes','templates'),
'chainlinks'=>array('templatetypeid-templatetypeid')
)
The chaintables and chainlinks use a syntax that is similar to gsguard. The chain descriptors help gsexport locate gs-specific data entries.
References
We still have to tell gsexport that certain IDs in the templates table will change value upon import. For example, when a templatetype record is imported, a new ID will be issued. The "mntid", or the primary key field, is dropped from the import, leaving it to the database to assign an auto_increment value. How would a record in the templates table know about this value?
'templates'=>array(
'type'=>'1-n',
'mnt'=>'templates',
'mntid'=>'templateid',
'chaintables'=>array('templatetypes','templates'),
'chainlinks'=>array('templatetypeid-templatetypeid'),
'refs'=>array(
'templatetypeid'=>array(
'reftable'=>'templatetypes',
'refkey'=>'templatetypeid'
)
)
)
The "refs" directive contains a list of foreign references. For example, the "templatetypeid" field is linked to the "templatetypes" table, on a field that is also called "templatetypeid".
Here's another example. A table that records the pairing of users:
'pairs'=>array(
'type'=>'1-n',
'mnt'=>'pairs',
'mntid'=>'pairid',
'chaintables'=>array('users','pairs'),
'chainlinks'=>array('userid-mainid'),
'refs'=>array(
'mainid'=>array(
'reftable'=>'users',
'refkey'=>'userid'
),
'subid'=>array(
'reftable'=>'users',
'refkey'=>'userid',
)
)
During exporting, the referred tables will be checked. For example, if "pairs" is processed before "users", an error will be thrown.
Ghost References
References are resolved during import. However, if the target record is missing, the reference also cannot be resolved. Consider this case: User 52 is paired with User 76. But User 76 is deleted from the database. During import, User 52 is given a new ID 216. We simply cannot declare that 216 is linked to 76 because of 76 could be a different record in the new database. We also cannot mark the pair as 216-0 because the numeric value of the original 76 could be used for grouping, among other use cases.
gsimport.php has a convenient function get_newid($reftable, $srcid). The function attempts to obtain a new ID using the conventional method. If a missing reference, or a "ghost reference", is detected, a new record will be created to claim the name space. Subsequent lookups will reuse the same temporary identity. All the mock records are deleted from the database during the final stage of importing.
Deferred References
The templates table references the templatetypes table. The structure of templatetypes must be declared first. However, templatetypes table has a reference "activetemplateid" that points back to the templates table. This would require the templates table to be declared first. How do we handle circular references?
We know that the record templatetypes must be created before a record in templates can be created. The newly issued templateid is then retroactively assigned to templatetypes.activetemplateid. This means that we should declare templatetypes first, and resolve the back-reference at a later stage by using the "defrefs" directive:
'templatetypes'=>array(
'type'=>'base',
'mnt'=>'templatetypes',
'mntid'=>'templatetypeid',
'defrefs'=>array(
'activetemplateid'=>array(
'reftable'=>'templates',
'refkey'=>'templateid'
)
)
)
Deferred references are resolved in two stages during import. First, a value of 0 will be used on the deferred field. The record on the main table is created; its new ID is paired with its old ID. When a sub record is parsed, the old ID, identified by "refs" is replaced with the new ID in memory. Once a new sub record is created, the new sub ID is used to update the main record through a database query.
Custom Resolvers
The table space of an external reference could be dynamic. Look at the "recid" field in the actionlog table. If the "rectype" column has a value "users", then "recid" is interpreted as a User ID. In this case, we can't simply use a static "reftable".
'actionlog'=>array(
'type'=>'base',
'mnt'=>'actionlog',
'mntid'=>'alogid',
'refs'=>array(
'userid'=>array('reftable'=>'users','refkey'=>'userid'),
'recid'=>array('func'=>'actionlog')
)
)
Note that the "refkey" is not exactly necessary because of the "pid" mapping in gsexport. But it might be helpful to leave it in while we finalize the implementation.
The actionlog's resolver is defined in gsexport_config.php:
function resolve_actionlog($rec){
$reftable=null;
$rectype=$rec['rectype'];
switch ($rectype){
case '': case 'anything_else': break; //ignore
case 'reauth': case 'users': case 'user": $reftable='users'; break;
case 'templatetype': $reftable='templatetypes'; break;
default: echo "unknown rectype [$rectype]\r\n"; print_r($rec); die();
}
return $reftable;
}
Complex Custom Resolvers
In addition to the standard custom resolver function that simply uses custom logic to return a table space, a complex resolver can transform any object containing number number of references. For example, an invoice record that contains an JSON array:
raw_items: [{product_id:123, qty: 12}, product_id:24, qty:1}]
In gsexport_config, we need to declare the field:
'orders'=>array(
'type'=>'base',
'mnt'=>'orders',
'mntid'=>'orderid',
'xrefs'=>array('raw_items')
)
Then we implement the extended resolver:
function xresolve_raw_items($pid, $rec){
$raw=$rec['xrefs']['raw_items']['data'];
$objs=json_decode($raw,1);
foreach ($objs as $objid=>$obj){
$objs[$objid]['product_id']=get_newid('products',$obj['product_id']);
}
return json_encode($objs);
}
Files
Either the base or 1-n type support the "files" directive. A common practice for storing user files is to link the record ID with the file name. The logical filename is stored in the table. In a multi-tenant setup, each GS instance has its own customizable logo. For example, GS_1234 has a 1234.gif in the protected/clogos folder.
'gss'=>array(
'type'=>'root',
'files'=>array(
'logo'=>'../../protected/clogos/#.gif'
)
)
The # sign in the file name will be replaced with the primary key, which is identified by "mntid".
It is also possible to use any field values to construct the file name. For example:
'invoice'=>'invoice-#-%%title%%.pdf'
The "key" of a record avoids filename collision. For example, the same User ID may have many derived files such as profile_123.png, profile_123_small.png.
Exported Files
By default, a sub directory is created under the "export" folder. For example, GS_123 files will be in the export/123 directory.
Every table that's defined in gsexport_config will have its own file. The master index is "toc.txt". Both database and reference index files are in JSON format. The .txt extension makes it easier to view the files on Windows.
User-specific files are in the "files" sub folder. For example: export/123/files/
Vendor Portal Namespace and Renaming
In a vendor-pivoted Gyroscope instance, a "gsid" is called a vendor specific name. For example, in the "main" Gyroscope app for a car dealership where there is a table of automakers, the automakerid could be used as the conceptual "gsid", except that it is called a "automakerid".
We can set the following values to reflect this global change:
$defgskey='automakers';
$defgstable='automakerid';
Sometimes the "gsid" of a particular table is not called a "gsid". We can also make a localized change in table settings:
$tests{
'type'=>'1-n',
'mnt'=>'tests',
'mntid'=>'testid',
'gskey'=>'testgsid',
...
}
Next Steps
The migration tools will be tested on a few internal projects before becoming generally available. On the immediate road map, we have:
removal tool - powered by the same gsexport_config, this tool removes all the records and files for a specific GS instance. This tool can be used in conjunction with gsimport to repeatedly import from the same data source.
config generator - gsexport performs a preliminary check and ensures that every table in the database is accounted for. A type "ignore" may be used to skip an irrelevant table. The config generator may help accelerate this process by leveraging the help of gsextract - another experimental tool that was intended for data migration.