Table of Contents

#Introduction

#Background

#Overview of the WDK Design

GUS Plugins

GUS plugins are Perl programs that use the GUS Plugin API to load data into GUS. You may use plugins that are bundled with the GUS distribution or you may write your own.

Supported versus Community plugins

The distribution of GUS comes with two types of plugins

The Plugins API

GUS plugins are subclasses of GUS::PluginMgr::Plugin.pm. The public subroutines in Plugin.pm (private ones begin with an underscore) constitute the Plugin API. GUS also provides Perl objects for each table and view in the GUS schema. These are also part of the API.

Part of the Plugin API is an easy to use command line argument declaration mechanism. This way the plugin author can easily specify, document and access command line arguments.

The Plugins Standard

Plugin naming

  • Plugins names begin with one of four verbs

    • insert if the plugin inserts only
    • delete if the plugin deletes only
    • update if the plugin updates only
    • load if the plugin does any two or more of insert, delete or update
  • Plugin names are concise

    • for example, a plugin named InsertNewSequences is not concise because Insert and New are redundant
  • Plugin names are precise

    • for example, a plugin named InsertData is way too general. The name should reflect the type of data inserted
  • Plugin names are accurate

    • for example, a plugin named InsertExternalSequences is inaccurate if it can also insert internally generated sequences. A better name would be InsertSequences.
  • If a Plugin expects exactly one file type, that file type should be in the name. For example, InsertFastaSequences

GUS primary keys

Application specific tables

Command Line Args

Documentation

The Plugin API provides a means for you to document the plugin and its arguments. Be thorough in your documentation.

Programming Style

Application specific controlled vocabularies

A controlled vocabulary (CV) is a restricted set of terms that are allowed values for a data type. They may be simple lists or they may be complex trees, graphs or ontologies. In GUS the CVs fall into two categories: standard CVs such as the Gene Ontology, and small application specific CVs such as ReviewStatus?.

The complete list of application specific CVs in the GUS 3.5 schema is:

Acquiring a standard CV typically involves downloading files from the CV provider and running a plugin to load it.

Application specific CVs are handled by the plugin that will use the CV. For example, a plugin that inserts bibliographic references will use the SRes.BibRefType CV. It is these plugins that are responsible for making sure that the CV they want to use is in the database.

Plugins that use CVs fall into two categories

  1. those that hard code the CV.
  2. those that do not hard code the CV, but, rather, get it from the input

In case 1, the plugin hard codes the CV in the Perl code.

In case 2, the plugin hard codes only a default. It also offers an optional command line argument that takes a file that contains the CV. If the user of the plugin determines that the input has an different CV than the default, the user will provide such a file.

In both cases, the plugin reads the table in GUS that contains the CV and compares it to the CV it expects to use. If the expected vocab is not found, the plugin updates the table.

Assigning an External Database Release Id

GUS is a data warehouse so it is very common for plugins to load into GUS data from another source. Whether the source is external or in-house, tracking its origin is often required. The tables in GUS that handle this are SRes.ExternalDatabase and SRes.ExternalDatabaseRelease. The former describes the database, eg, PFam, and the latter describes the particular release of the database that is being loaded, eg, 1.0.0. The data loaded will have a foreign key to the database release, which in turn has a foreign key to the database.

In order to create that relationship, the plugin must know the primary key of the external database release. To accomplish this, the plugin takes as command line arguments the name of the database and its release. It does not take the primary key of the external database release (that violates the plugin standard). The plugin passes that information to the API subroutine getExtDbRlsId($dbName, $dbVersion).

If the plugin is inserting the dataset as opposed to updating it, create new entries for the database and the release by using the plugins