Table of Contents
#Introduction
#Background
#Overview of the WDK Design
GUS Plugins
GUS plugins are Perl programs that use the GUS Plugin API to load data into GUS. You may use plugins that are bundled with the GUS distribution or you may write your own.
Supported versus Community plugins
The distribution of GUS comes with two types of plugins
- Supported plugins are plugins that meet are confirmed to work and meet the Plugin Standard described below.
- Community plugins have not been confirmed to either work or meet the standard.
The Plugins API
GUS plugins are subclasses of GUS::PluginMgr::Plugin.pm. The public subroutines in Plugin.pm (private ones begin with an underscore) constitute the
Plugin API. GUS also provides Perl objects for each table and view in the GUS schema. These are also part of the API.
Part of the Plugin API is an easy to use command line argument declaration mechanism. This way the plugin author can easily specify, document and access command line arguments.
The Plugins Standard
Plugin naming
Plugins names begin with one of four verbs
- insert if the plugin inserts only
- delete if the plugin deletes only
- update if the plugin updates only
- load if the plugin does any two or more of insert, delete or update
Plugin names are concise
- for example, a plugin named InsertNewSequences is not concise because Insert and New are redundant
Plugin names are precise
- for example, a plugin named InsertData is way too general. The name should reflect the type of data inserted
Plugin names are accurate
- for example, a plugin named InsertExternalSequences is inaccurate if it can also insert internally generated sequences. A better name would be InsertSequences.
- If a Plugin expects exactly one file type, that file type should be in the name. For example, InsertFastaSequences
GUS primary keys
- Plugins never directly use (hard-code) GUS primary keys, either in the body of the code or for command line argument values. Instead they use semantically meaningful alternate keys.
Application specific tables
- some sites augment GUS with their own application specific tables. These are not permitted in supported plugins.
Command Line Args
- the name of the argument should be concise and precise
- the Plugin API provides a means for you to declare arguments of different types, such integers, strings and files. Use the most appropriate type. For example, don't use a string for a file argument.
Documentation
The Plugin API provides a means for you to document the plugin and its arguments. Be thorough in your documentation.
Programming Style
syntax
- use C and Java like syntax. Do not use weird Perl specific syntax.
- use $self to refer to the object itself
- declare method arguments using this syntax: my ($self, $sequence, $length) = @_;. Do not use shift.
- indenting must be spaces not tabs. Two or four spaces are acceptable
- use "camel caps" (eg, sequenceLength) for variable names and method names, not underscores (sequence_length).
logic
- no method should ever be longer than one screen. If it is, refactor part of into its own method.
- never repeat code. Repeated code must be in a method.
- the run() method, which is the main method of the plugin, should be very concise and provide at a glance the structure of the plugin. A good practice, when reasonable, is for the run method to call high level methods that return the objects to be submitted to the database, and then to submit them. This way, a reader of the run() method will know just what is being written to the database, which is the main purpose of a plugin.
- some methods in the API are marked as deprecated. Do not use them. They are for backward compatibility only.
- variable names should be named after the type of data they hold. For example a good name would be $sequence.
- do not abbreviate variable names or method names. They should be self-explanatory. A bad variable name would be $s. A bad method name would be "process" (what is being processed?). Don't "save keystrokes" with short names. If being self-explanatory requires being a long name, then use a long name.
Application specific controlled vocabularies
A controlled vocabulary (CV) is a restricted set of terms that are allowed values for a data type. They may be simple lists or they may be complex trees, graphs or ontologies. In GUS the CVs fall into two categories: standard CVs such as the Gene Ontology, and small application specific CVs such as ReviewStatus
.
The complete list of application specific CVs in the GUS 3.5 schema is:
- DoTS.BlatAlignmentQuality
- DoTS.GOAssociationInstanceLOE
- DoTS.GeneCategory
- DoTS.GeneInstanceCategory
- DoTS.InteractionType
- DoTS.MotifRejectionReason
- DoTS.ProteinCategory
- DoTS.ProteinInstanceCategory
- DoTS.ProteinProteinCategory
- DoTS.ProteinPropertyType
- DoTS.RNACategory
- DoTS.RNAInstanceCategory
- DoTS.RNARNACategory
- DoTS.RepeatType
- SRes.BibRefType
- SRes.ReviewStatus
Acquiring a standard CV typically involves downloading files from the CV provider and running a plugin to load it.
Application specific CVs are handled by the plugin that will use the CV. For example, a plugin that inserts bibliographic references will use the SRes.BibRefType CV. It is these plugins that are responsible for making sure that the CV they want to use is in the database.
Plugins that use CVs fall into two categories
- those that hard code the CV.
- those that do not hard code the CV, but, rather, get it from the input
In case 1, the plugin hard codes the CV in the Perl code.
In case 2, the plugin hard codes only a default. It also offers an optional command line argument that takes a file that contains the CV. If the user of the plugin determines that the input has an different CV than the default, the user will provide such a file.
In both cases, the plugin reads the table in GUS that contains the CV and compares it to the CV it expects to use. If the expected vocab is not found, the plugin updates the table.
Assigning an External Database Release Id
GUS is a data warehouse so it is very common for plugins to load into GUS data from another source. Whether the source is external or in-house, tracking its origin is often required. The tables in GUS that handle this are SRes.ExternalDatabase and SRes.ExternalDatabaseRelease. The former describes the database, eg, PFam, and the latter describes the particular release of the database that is being loaded, eg, 1.0.0. The data loaded will have a foreign key to the database release, which in turn has a foreign key to the database.
In order to create that relationship, the plugin must know the primary key of the external database release. To accomplish this, the plugin takes as command line arguments the name of the database and its release. It does not take the primary key of the external database release (that violates the plugin standard). The plugin passes that information to the API subroutine getExtDbRlsId($dbName, $dbVersion).
If the plugin is inserting the dataset as opposed to updating it, create new entries for the database and the release by using the plugins
- GUS::Common::Plugin::InsertExternalDatabase
- GUS::Common::Plugin::InsertExternalDatabaseRls




