GusSchemaSubclassing

From GUS Wiki
Jump to: navigation, search

Understanding subclassing in the GUS Schema

The GUS schema uses single level subclassing. That is, it supports a superclass-subclass relationship. For example, DoTS.NASequence is a superclass of DoTS.GenomicSequence, DoTS.~VirtualSequence and others. They, in turn, are subclasses of DoTS::NASequence. It is single level because the subclasses may not have subclasses.

The subclassing is useful in the schema for the usual reasons: It makes the concepts clearer, it eliminates redundant columns across tables, and the elimination of that redundancy improves maintainability.

The subclassing is also useful in the GusObjectLayer. In GusPlugins, which use the GUS objects, the subclassing provides the usual benefits in a programming context. Code written to operate on the superclass works for all the subclasses.

How it is implemented

The superclass and subclasses are all views on an "implementation" table. Implementation table name is formed by adding the suffix "Imp" to the superclass name. In the case of NASequence, the implementation table is NASequenceImp. In the object layer the Imp tables are hidden. All that is available is the superclass and the subclasses. This is the right way to do it. In the relational schema, it is not so easy to do this hiding. So, the Imp tables are exposed. They are also exposed in the schema browser (http://www.gusdb.org/cgi-bin/schemaBrowser). But, you will always know the name of the superclass because it is the Imp table minus "Imp."

The Imp table stores all the data for the superclass and the subclasses. Each row belongs to the superclass and exactly one subclass. The subclass it belongs to is stored in a special Imp table column called "subclass_view."

The relationship between an Imp table, the superclass and the subclasses are included in the files which define the GUS schema, and are stored in the Core.TableInfo table.

Here is the flow of how the tables and views are created, conceptually

    • define the subclasses
    • derive the superclass from the subclasses
        • It contains exactly the columns that all subclasses share
    • derive the Imp table from the superclass and the subclasses.
        • It contains all the superclass columns.
        • It contains generic columns, named for the datatype stored (eg, int1, int2, float1, date1). These columns contain the data that is not shared by all the subclasses. If the union of all the subclass columns contains three int columns, then the Imp table will have three int columns: int1, int2, int3. The same logic applies to all the datatypes. This approach allows the Imp table to represent the union of all the subclass columns, but, to do so in a way that minimizes the width of the Imp table.