Table of Contents
GUS plugins are Perl programs that load data into GUS. They are written using the Plugin API (the section called “The Plugin API”). You may use plugins that are bundled with the GUS distribution or you may write your own.
The standard GUS practice is to use only plugins, not straight SQL or bulk loading, to load the database. The reason is that plugins:
track the data that is loaded
copy any updated or deleted rows to "version" tables that store a history of the changes
are known programs that can be scrutinized and used again
have a standard documentation process so that they are easily understood
use the Plugin API and so are easier to write than regular scripts.
The distribution of GUS comes with two types of plugins:
Supported plugins:
are confirmed to work
are portable
are useful to sites other than the site that developed the plugin
meet the Plugin Standard described below
Community plugins:
are contributed by the staff at CBIL and any other plugin developers
have not been reviewed with respect to the criteria for being supported
When you begin writing your plugin, use as a guideline or as a
template an existing supported plugin. They are found in
$PROJECT_HOME/GUS/Supported/plugin/perl.
GUS plugins are subclasses of
GUS::PluginMgr::Plugin. The public
subroutines in Plugin.pm (private
ones begin with an underscore) constitute the Plugin API. GUS also
provides Perl objects for each table and view in the GUS schema.
These are also part of the API. (the section called “The Plugin API”)
All plugins must declare their package, using Perl's
package statement. The package name
of a plugin is derived as follows:
ProjectName::ComponentName::Plugin::PluginName
Plugins must also declare that they are subclasses of
Plugin.pm, using Perl's
@ISA array. The first lines of a
plugin will look like this:
package GUS::Supported::Plugin::SubmitRow @ISA = qw(GUS::PluginMgr::Plugin)
Plugins are objects and so must have a constructor. This
constructor is the new() method.
The new() method has exactly two
tasks to accomplish: constructing the object (and returning it), and
initializing it. Construction of the object follows standard Perl
practice. Initialization is handled by the
Plugin.pm superclass method
initialize(). the section called “The Plugin API” for details about that method.
Example 1.1. A Sample new()
method
sub new {
my ($class) = @_;
my $self = {};
bless($self,$class);
$self->initialize({
requiredDbVersion => 3.5,
cvsRevision => '$Revision: 3009 $',
name => ref($self),
argsDeclaration => $argsDeclaration,
documentation => $documentation
});
return $self;
}The $Revision: 3009 $ string
is CVS or Subversion keyword. When the plugin is checked into source
control, the repository substitutes the file's revision into that
keyword. The keywords must be in single quotes to prevent Perl from
interpreting $Revision: 3009 $ as a
variable.
If you follow the pattern used by supported plugins, you will
only ever need to change one line in the
new() method. As you can probably
tell, initialize() takes one
argument, a reference to a hash that contains a set of parameter
values. The one you will need to change is
requiredDbVersion. As the GUS
schema evolves, you will need to review your plugin to make sure it
is compatible with the latest version of GUS, upgrading it if not.
When it is compatible with the new version of GUS, update
requiredDbVersion to that version
of GUS.
In the example above (Example 1.1, “A Sample new()
method”), the
line
argsDeclaration => $argsDeclaration,
provides to the
initialization() method a reference
to an array, $argsDeclaration, that
declares what command line arguments the plugin will offer. When you
look at a supported plugin you will see the
$argsDeclaration variable being set
like this:
Example 1.2. Defining Command Line Arguments
my $argsDeclaration = [
tableNameArg({name => 'tablename',
descr => 'Table to submit to, eg, Core::UserInfo',
reqd => 1,
constraintFunc=> undef,
isList =>0,
}),
stringArg({name => 'attrlist',
descr => 'List of attributes to update (comma delimited)',
reqd => 1,
constraintFunc => undef,
isList = >1,
}),
enumArg({name => 'type',
descr => 'Dimension of attributes (comma delimited)',
reqd => 1,
constraintFunc => undef,
enum => "one, two, three",
isList => 1,
}),
fileArg({name => 'matrixFile',
descr => 'File containing weight matrix',
reqd => 1,
constraintFunc=> \&checkFileFormat,
mustExist=>0,
isList=>0,
}),
];If you look carefully at the list above you will notice that
each element of it is a call to a method such as
stringArg(). These are methods of
Plugin.pm and they all return
subclasses of
GUS::PluginMgr::Args::Arg. In the
case of stringArg(), it returns
GUS::PluginMgr::Args::StringArg.
All you really need to know is that there are a set of methods
available for you to use when declaring your command line arguments.
That is, the argsDeclaration
parameter of the initialize()
method expects a list of Arg
objects. You can learn about them in detail in the Plugin API (the section called “The Plugin API”)
The Arg objects are very
powerful. They parse the command line, validate the input, handle
list values, deal with optional arguments and default values and
provide for documentation of the arguments. There are two ways the
Arg objects validate the input.
First, it applies its standard validation. For example, a
FileArg confirms that the input is
a file, and throws an error otherwise. Second, if you provide a
constraintFunc, it will run that as
well, throwing an error if the plugin value violates the
constraints.
In a way that parallels the declaration of command line
arguments, the initialize method also expects a reference to a hash
that provides standardized fields that document the plugin: (Example 1.1, “A Sample new()
method”)
documentation => $documentation,
Here is a code snippet that demonstrates the standard way
$documentation is set:
Example 1.3. Defining Plugin Documentation
my $purposeBrief = <<PURPOSE_BRIEF;
Load blast results from a condensed file format into the DoTS.Similarity table.
PURPOSE_BRIEF
my $purpose = <<PLUGIN_PURPOSE;
Load a set of BLAST similarities from a file in the form generated by the blastSimilarity command.
PLUGIN_PURPOSE
my $tablesAffected =
[ ['DoTS::Similarity', 'One row per similarity to a subject'],
['DoTS::SimilaritySpan', 'One row per similarity span (HSP)'],
];
my $tablesDependedOn =
[
];
my $howToRestart = <<PLUGIN_RESTART;
Use the restartAlgInvs argument to provide a list of algorithm_invocation_ids that represent
previous runs of loading these similarities. The algorithm_invocation_id of a run of this
plugin is logged to stderr. If you don't have that information for a previous run or runs,
you will have to poke around in the Core.AlgorithmInvocation table and others to find your
runs and their algorithm_invocation_ids.
PLUGIN_RESTART
my $failureCases = <<PLUGIN_FAILURE_CASES;
PLUGIN_FAILURE_CASES
my $notes = <<PLUGIN_NOTES;
The definition lines of the sequences involved in the BLAST (both query and subject) must
begin with the na_sequence_ids of those sequences. The standard way to achieve that is to
first load the sequences into GUS, using the InsertFastaSequences plugin, and then to
extract them into a file with the dumpSequencesFromTable.pl command. That command places
the na_sequence_id of the sequence as the first thing in the definition line.
PLUGIN_NOTES
my $documentation = { purpose=>$purpose,
purposeBrief=>$purposeBrief,
tablesAffected=>$tablesAffected,
tablesDependedOn=>$tablesDependedOn,
howToRestart=>$howToRestart,
failureCases=>$failureCases,
notes=>$notes
};When you look at this example, you will see that a bunch of
variables, such as $purposeBrief
and $tablesAffected, are being set.
They are used as values of the hash called
$documentation.
$documentation is in turn passed as
a value to the initialize() method.
You will also notice that Perl's HEREDOC syntax is used. The setting
of the variables begins with, eg,
<<PLUGIN_PURPOSE and ends
with, eg, PLUGIN_PURPOSE. This is
Perl's way of allowing you to create paragraph-style strings without
worrying about quoting or metacharacters such as
\n.
The documentation is shown to the user when he or she uses the
help flag, or when he or she makes
a command line error.
The documentation is formatted using Perl's documentation
generation facility, pod. This means that you can include simple pod
directives in your documentation to, say, emphasize a word. Run the
command perldoc perlpod for more
information
Plugins are run by a command called
ga (which stands for "GUS
application"). ga constructs the
plugin (by calling its new()
method) and then runs the plugin by calling its
run() method.
The purpose of the run()
method is to provide at a glance the structure of the plugin. It
should be very concise and under no circumstances be longer than one
screen. A good practice, when reasonable, is for the
run() method to call high level
methods that return the objects to be submitted to the database, and
then to submit them in the run()
method. This way, a reader of the
run() method will know just what is
being written to the database, which is the main purpose of a
plugin.
The run() method is expected
to return a string describing the result of running the plugin. An
example would be "inserted 3432
sequences".
The pointer cache is a somewhat infamous component of the GUS object layer. It is a memory management facility that was designed to steer around poor garbage collection in Perl (in 2000). Whether or not is still needed is another matter because it is part of the object layer for now. The pointer cache is a way for the plugin to re-use objects that have been allocated but are no longer in active use. Because Perl was not properly garbage collecting objects when they were no longer referred to, the memory footprint of plugins was getting huge.
As a plugin developer what you need to know is that at points
in your code where you no longer need any of the GUS objects that
you have created (typically at the bottom of your outermost loop,
you should call the Plugin.pm
method undefPointerCache(). This
method clears out the cache.
If the default capacity (10000) is not enough to hold all the
objects you are creating in one cycle through your logic, you can
augment its size with the Plugin.pm
method
setPointerCacheSize().