View Source:
Gus Wdk 1.4
WDK Version: 1.4 (07/03/05) Go to the [GUS WDK Home | http://www.gusdb.org/wdk] for general project information. Go to the [download site |http://www.cbil.upenn.edu/downloads/GUS-WDK] for Release Notes and downloads. Please also see the [WDK Model Javadoc |http://www.gusdb.org/wdk/modelJavaDoc/index.html]. For the JSTL API to use in JSP pages, focus on the jspwrap package. !! Table of Contents [#Introduction] <br><br> [#Background] <br><br> [#Overview of the WDK Design] *[#The end user's perspective: Questions, Summaries, Records] *[#Questions] *[#Summaries] *[#Records] *[#MVC: Model-View-Controller Architecture] *[#The Model] *[#The View] *[#The Controller] [#Installing the WDK] *[#System requirements] *[#Understanding the install targets] *[#Using the GUS install system] *[#Downloading] *[#Installing the database driver] *[#Installing the Toy model] *[#Understanding the cache] *[#Configuring the model] *[#Creating and managing the cache] *[#Setting up the toy database] *[#Running the Model regression test] *[#Installing the Toy View] *[#Configuring Tomcat] *[#Building the WDK on your site] *[#Testing the toy website] [#Creating your project] *[#Using WDKToySite as a template] *[#Editing the template] *[#Building your project] *[#Storing your project in CVS] [#Creating a model] *[#The Model XML File] *[#Sets and names] *[#References] *[#Queries in the model] *[#Parameters in the model] *[#Records in the model] *[#Questions in the model] *[#The detailed specification for a WDK model file] *[#The model properties file] *[#Testing on the command line] * [#wdkXml] * [#wdkQuery] * [#wdkRecord] * [#wdkSummary] *[#The model sanity test] * [#Creating a sanity test] * [#Running a sanity test] [#Configuring and customizing the view] <br><br> [#Upcoming features] ---- !!#[Introduction] The GUS WDK is designed to accelerate the creation of "data mining" websites. It can work on any relational database system and on any schema. You use the WDK to * define a coarse grained data model (based on the tables already in your existing database) that specifies what questions the user can ask of the database and what kinds of answers he or she can get * define a view of that data model using ~JavaServer Pages For the more technical minded, the data model that you define in the WDK can be thought of as "a configurable Data Transfer Object (DTO) layer". A DTO is an object that brings together into one object data that may come from many detailed entities in the database. The database is typically structured in a normalized fashion, so that there are many tables with deep relationships that form the structure of the data. It is considered good practice to provide to a web site or other high-level data consumers with objects at a coarser granularity. For example, a Gene DTO may bring together data from many tables that contain information relating to a Gene. From this perspective, the WDK model offers an abstraction on how to request DTOs (questions) and the DTOs themselves (records). !!#[Background] The WDK evolved out of long standing development efforts at the [Computational Biology and Informatics Lab|http://www.cbil.upenn.edu] (CBIL) at the [Penn Center for Bioinformatics|http://www.pcbi.upenn.edu] (PCBI), University of Pennsylvania. The predecessor to the GUS WDK, the WDK-Classic, has been in use since 2001, and is the web framework used by three well recognized sites, http://plasmodb.org, http://www.allgenes.org and http://www.cbil.upenn.edu/epcondb. We are rewriting the WDK, in collaboration with the [Pathogen Sequencing Unit |http://www.sanger.ac.uk/Teams/Pathogen] (PSU) at the Wellcome Trust Sanger Institute (and YOU if you would like to join the effort). The new WDK will: * be GUS independent. In other words, _you can use the GUS WDK on top of any database system_. (See http://www.gusdb.org to learn about the GUS genomics infrastructure system.) * expand the functionality of the WDK-Classic * use current web technologies: * RELAX NG compliant XML to define the functionality of the site * ~JavaServer Pages (and custom tags) to define the look of the site * Struts to control the site The GUS WDK is a free and open source project. *Note*: In this document we use [PlasmoDB.org|http://plasmodb.org] to demonstrate our points. PlasmoDB.org is still using the WDK-Classic. There is plenty of functionality on the site which will only be available in future releases of the GUS WDK. (See [#Upcoming features].) *Soon*: we will provide a link to a real site using the new WDK (eg ApiDoTS), and to the WDK code that drives it. !!#[Overview of the WDK Design] The objectives of the WDK are to * make it easy for you to * offer lots of canned questions * display and manage question results * upgrade your site as your schema and data evolve * offer data mining tools such as: * set operations on results * sortable results * query history * data set upload * customizable data set download !#[The end user's perspective: Questions, Summaries, Records] The WDK uses a Question-Summary-Record paradigm to organize a web site. End users are provided with a set of _questions_ to chose from. They run a particular question by specifying values for the question's _parameters_. The result is a _summary_ of the entities found by the query. Each element in the summary offers a link to the full _record_ for the entity. A tour of questions, summaries and records will serve to orient you for the following discussions. _#[Questions]_ <br><br> In the WDK we call the inquiries that users pose _questions_ as opposed to _queries_. This is because in the WDK questions return a set of _records_ while queries are commonly understood in database systems to return tables (columns and rows.) As we describe below, the WDK does use queries, but, they are hidden from users. The [home page of Plasmodb.org|http://plasmodb.org] presents to the user about 30 canned questions. Some can be run directly from the home page. But, most are run from a dialogue page the user goes to by selecting a question from a pulldown list. Our discussion here starts from those pulldown lists. The WDK lets you define questions and put them into different lists. This way, you can offer your user one or more pulldowns, each containing a list of questions to chose from. When the user choses a question, he/she is brought to a question page. On PlasmoDB.org, if the user selects the "Transmembrane domain" question from the "gene sequence features" list, he or she will arive at the [Transmembrane domain question page|http://plasmodb.org/plasmodb/servlet/sv?page=geneQueries&showForm=1&query=genesByTMDomain]. On this page the user will fill in parameter values and submit the question. _#[Summaries]_ Once the question is run, the user is brought to a page showing a summary of the result. Here is the [summary for the query we ran on PlasmoDB|http://plasmodb.org/plasmodb/servlet/sv?page=geneQueries&query=genesByTMDomain&rowsPerPage=20&pm1=and%20paf.prediction_algorithm_id%20=%20306&pm2=5&pm3=5&pm4=and%20gf.prediction_algorithm_id%20=%207898&pm0=in%20('1','2','3','4','5','6','7','8','9','10','11','12','13','14','unmapped')] . The summary displays a row for each entity that matched the question (the GUS WDK will allow for more flexible displays of summaries in future releases). The first column is a link to a detailed record for the entity. _#[Records]_ When the user clicks on the link to a record, the record page appears. In our example, the user will go to the [gene record|http://plasmodb.org/plasmodb/servlet/sv?page=gene&source_id=PFE0685w]. The record presents detailed information about the entity, including: * its organism, chromosome, name, fuction, source of annotation * links to additional pages showing more details * tables of information (e.g., Notes and Gene Ontology Assignments) * graphics illustrating details of the record !#[MVC: Model-View-Controller Architecture] The WDK uses the Model-View-Controller architecural pattern. This design allows us to cleanly separate _what_ we are describing (the Model) from _how_ we display it (the View). _#[The Model]_ The WDK model describes the questions, summaries and records that will appear on the website. The model is defined in an XML file that you will create (you may borrow heavily from model files supplied with the release). Creating a model is described in the section [#Creating a model]. Because you define the model separately from the view, you can use command line tools, described in [#Testing on the command line], to test the model. This liberates your testing from the difficult environment of a web application server (Tomcat). _#[The View]_ The WDK view creates a website that displays the model. It does so by using ~JavaServer Pages (JSP) to present the model. The JSP pages have access to the questions and records in the model and display them. The view also defines custom JSP tags that help create JSP pages. In the current release of the WDK, only a _default_ view of the model is supported. After you set up Tomcat, install the WDK and define a model, you will immediately have a working web site. As described below, you configure it to give is a "branded" look. In upcoming releases the view will offer much more powerful means for you to customize your site. _#[The Controller]_ The WDK controller functions internally to the WDK. It conforms to standard Model 2 architecture. !!#[Installing the WDK] Installing the WDK is a multistep process. Expect it to take on the order of an hour to complete. !#[Understanding the install targets] There are two targets of the installation * $GUS_HOME - This is the location that you install the model to when testing it * The web application server - This is the location you install the whole WDK when it will run as a website To get started, you will install and test the WDK model facilities. For this, you can set your GUS_HOME to be anywhere, such as a gushome directory in your home directory. Once these facilities are in place, you will be able to design your own model against the data in your database. You will be able to do significant development and testing of your model without running in the trickier context of a web application. After you are satisfied with your model, it will be time to move to the web application server. To do so, set up a GUS_HOME that is in the same file system as your web application directories. (You will be creating symbolic links from the web application directory into GUS_HOME.) Then follow the instructions below for installing the webapp. !#[System requirements] * Operating system: the WDK has been tested on Linux * Application server: the WDK has been tested on Tomcat 5.0. _It does not run under Tomcat 4.x._ * Database: the WDK has been tested on Oracle (version 9i and 10g) and PostgreSQL !#[Using the GUS install system] The WDK uses the GUS install system. Follow the [GUS Installer instructions|GusInstaller] to set it up. !#[Downloading] Get the latest release of the WDK from [here|http://www.cbil.upenn.org/downloads/GUS-WDK] Unpack it into the $PROJECT_HOME you set up for the GUS install system. Then, install the WDK into $GUS_HOME <verbatim> build WDKToySite install -append </verbatim> !#[Installing the database driver] The current release supports Oracle and PostgreSQL. The PostgreSQL JDBC driver is included in the distribution But, for licensing reasons, _the Oracle JDBC driver is not included in the distribution_. To include it: * get a copy from your system administator or visit the [Oracle downloads site|http://www.oracle.com/technology/software/tech/java/sqlj_jdbc/index.html] * copy it into $GUS_HOME/lib/java * (the build system provides an "IMPORTANT REMINDER" to alert you to this) !#[Installing the Toy model] The WDK release includes a Toy Site. You will need to install and play with it before you are ready to build your own site. The Toy Site has a model and a view (just like yours will). We discuss the model first. After you have installed the WDK into $GUS_HOME, there are three additional steps to installing the Toy model. You must configure the model, create a query cache and create the toy database. _#[Understanding the cache]_ Before going on to configure the model and then create the query cache, it will help if you understand the purpose of the cache. The WDK model stores query results in a "cache." The cache is in your database. (In the section [#Configuring the model] you will configure the model to tell it where to create the cache tables.) The main table is called "~QueryInstance" by default (though you can name it whatever you want). Each row in that table represents the running of a query. The row stores the name of the query and the parameter values it was run with. It also stores the name of a result table that holds the actual result. If the same query is requested again with the exact same parameter values (regardless of which user requests it), the result is fetched from the cache, avoiding the expense of running the query all over again. When you define your model (see [#The Model XML File]), you will designate which queries to cache and not to cache, based on your expectations of your system's use. By default, queries that are used in Questions are cached, and you can turn that off by setting the isCacheable attribute to false. Queries that are used in Records are not cached. *NOTE:* Whenever you change the definition in your model of a cached query then _you must reset the cache (see [#Creating and managing the cache])_. _#[Configuring the Toy model]_ To configure the Toy model, you must edit its configuration file: <pre> $GUS_HOME/config/toyModel-config.xml </pre> *NOTE*: some of the properties you set control the database connection pooling in the WDK. The WDK is delivered with the [Jakarta Commons DBCP|http://jakarta.apache.org/commons/dbcp/] connection pooler. The properties you set in the file are: login | your login for the database password | your password for the database connectionUrl | the information describing how to connect to your database queryInstanceTable | the name of the table that the cache system should use to store information about queries that have been run maxQueryParams | the maximum number of parameters that a query is allowed to have. This controls the number of columns created in the cache's queryInstanceTable. The table has columns to hold parameter values with names param1, param2, etc. If you chose 20 (suggested) the table will be able to hold 20 parameter values per query (which is probably more than enough). This means that you may not define any queries in your model that have greater than 20 parameters. (see [#The Model XML File]). platformClass | a Java class which provides RDBMS-specific functionality. The WDK comes with these classes: Oracle | org.gusdb.gus.wdk.model.implementation.Oracle ~PostgreSQL | org.gusdb.gus.wdk.model.implementation.PostgreSQL ~MySQL | not yet initialSize | connection pool: the number of connections that are created on startup. maxActive | connection pool: the maximum number of active connections that can be allocated from this pool at the same time, or zero for no limit. maxIdle | connection pool: the maximum number of active connections that can remain idle in the pool, without extra ones being released, or zero for no limit. minIdle | connection pool: the minimum number of active connections that can remain idle in the pool, without extra ones being created, or zero to create none. maxWait | connection pool: the maximum number of milliseconds that the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception, or -1 to wait indefinitely. _#[Creating and managing the cache]_ After you have edited the model config file, create an empty cache by using the wdkCache command. Here is its usage: <verbatim> % wdkCache usage: wdkCache -model model_name -new|-reset|-drop Create, reset or drop a query cache. The name of the cache table is found in the Model config file (the table is placed in the schema owned by login). Resetting the cache drops all results tables and deletes all rows from the cache table. Dropping the cache first resets it then drops the cache table and sequence. Options: -model <model> the name of the model. This is used to find the Model config file ($GUS_HOME/config/model-name-config.xml) -drop drop the query cache -new create a new query cache -reset reset the query cache </verbatim> To set up the Toy model's cache use this command: <verbatim> % wdkCache -model toyModel -new </verbatim> _#[Setting up the toy database]_ <br><br> The WDK release includes a toy database. It is packaged in a set of files in $GUS_HOME/data/WDK/~ToyModel/testTables. The model regression test makes use of the toy database (as does the toy website). When you run the regression test it optionally creates the toy database in your RDBMS, moving the data from the files into tables. If you want to manage the toy database yourself, use the wdkTestDb command: <verbatim> % wdkTestDb Create a toy database to use in testing the WDK. (The database is created from files included in the WDK distribution.) usage: wdkTestDb -model model_name [-create | -drop] Options: -model <model> The name of the model. This is used to find the Model config file ($GUS_HOME/config/model_name-config.xml) </verbatim> Use this command to create the toy database: <verbatim> % wdkTestDb -model toyModel -create </verbatim> <br><br> _#[Running the Model regression test]_ <br><br> The regression test runs a set of wdk commands, and compares the results to a previously run, manually-validated expected result. If the output matches, the test passes, otherwise it fails. The test ensures that expected behavior of the codebase remains consistent even when behind-the-scenes changes are made. The WDK release provides a standard regression test to use with the toy model; if this test fails when run after installing the toy model, something is seriously wrong. The test can be found at <br><br> <verbatim> $GUS_HOME/data/WDKToyModel/regressionTest/sampleRegressionTestCommands </verbatim> <br> The test uses data in the toy database. Set the --commandListFile flag to point to this file when running the wdkRegressionTest command: <br><br> <pre> usage: Runs wdk executables from provided file and compares output to an expected result for testing purposes wdkRegressionTest --configFile (database configuration file to use with wdk model) --loadNewDatabase! (flag to create database from provided test flat files; set this flag when running the regression test for the first time and when data in the files have changed) --createNewExpectedResult! (overwrite existing regression test expected result) --outputDirectory (all results of test including error files will be placed in this directory) --commandListFile (file that contains lists of commands to run in test) --verbose (prints out names of shell commands being executed) </pre> <br><br> In typical usage, you should run the regression test provided with the WDK release. If you find a reason to create your own regression test, then create a separate command list file. The command list file includes one or more tests. Each test is composed of three lines, and the next test, if any, follows beginning on the fourth line. The expected input for a test is as follows: <br><br> name="~[name of test]"<br> command="~[command to run with parameters conforming to the normal usage of the command, EXCEPT any global parameters (configFile) passed to the wdkRegressionTest commmand]"<br> critical="~[true or false; if true, the regression test will immediately exit on failure, without running any more of the provided tests]" <br> !#[Installing the Toy View] _#[Configuring Tomcat]_ The WDK uses a slightly unconventional installation strategy. It installs a complete and working GUS_HOME in the same file system as the web application directory, and uses symbolic links to link into it from the web application directory. The reason is that this provides a working GUS_HOME so that you can test your model using the extensive command line tools available for the model. Because it is linked in, you know for sure that the model you are testing is the same model that is running on your site. The complication with this strategy is that it requires you to configure Tomcat to allow symlinks. This is not a default configuration because Tomcat feels that it may open a security issue. We believe that as long as nobody on your team makes symlinks to bad places, this is not a security problem. However, if you do not want to use links, then replace them with direct copies. To configure Tomcat to allow symlinks * do not use symlinks that are cross file system boundaries * you may not use Tomcat's optional security mode * in Tomcat's $TOMCAT_HOME/conf/server.xml file, the <Context path=/> directive for the web application must have allowLinking="true" like this (substituting values specific to your site): <pre> <host ...> <Context path="/wdktoysite" docBase="/www/wdktoysite/webapp" debug="1" privileged="false" allowLinking="true" > <Logger className="org.apache.catalina.logger.~FileLogger" prefix="wdktoysite-log." suffix=".txt" timestamp="true" docBase="/www/wdktoysite/logs/" /> </Context> </pre> _#[Building the WDK on your site]_ Here are the steps required * if you do not already have a GUS_HOME in the same file system as your web application directory, create an empty directory named, say, gushome. * set the GUS_HOME environment variable to that directory * set up a "webapp" directory for your web application where tomcat will find it. You must also create a WEB-INF directory within your webapp directory. * create a property file your_web_prop_file with the following information <pre> webappTargetDir=your_webapp_directory </pre> * run the build command <pre> build WDKToySite webinstall -append -webPropFile your_web_prop_file cd your_webapp_directory $ ls error.jsp images/ misc/ questionSets.jsp summary.jsp error.user.jsp index.jsp question.jsp record.jsp WEB-INF/ </pre> * if you haven't already done so, copy your oracle JDBC .jar file into $GUS_HOME/lib/java _#[Testing the toy website]_ The WDK is now installed on your site. Bring up the toy WDK site to test it. Here are the steps * if you haven't already done so, edit $GUS_HOME/config/toyModelConfig.xml to reflect your site. * set up the header and footer files and the web.xml file <pre> cd your_webapp_directory cd WEB-INF cp web.xml.toy web.xml cd tags/site cp footer.tag.toy footer.tag cp header.tag.toy header.tag </pre> * edit your_webapp_directory/WEB-INF/web.xml to provide a correct logging directory * work with your system administrator to start up the application in Tomcat !!#[Creating your project] Now that you have installed and tested the Toy Site you are ready to create your own WDK-based project. The first step is to set up a directory structure that is compatible with CVS and the GUS installer. This way you will be able to keep your project safe and also use the GUS installer to install your project along with the WDK. !#[Using WDKToySite as a template] To get started use the WDKToySite as a template. You will need to chose the name of your site (referred to below as TheNameOfMySite). You will also need to chose the name of your model, (referred to below as myModelName). Start out by doing this: <verbatim> % cd $PROJECT_HOME # init your project with WDKToySite % cp -r WDKToySite TheNameOfMySite % cd TheNameOfMySite/Model # if your copy of the WDK came from CVS, remove the CVS dirs % rm -r `find . -name CVS` # remove unneeded data/ dir % rm -r data % cd config # rename config files so they use your model name % mv toyModel.xml myModelName.xml % mv toyModel.prop myModelName.prop % mv toyModel-config.xml myModelName-config.xml % mv toyModel-sanity.xml myModelName-sanity.xml # remove unneeded regression test file % rm regressionTestModelConfig.xml % cd $PROJECT_HOME/TheNameOfMySite/Site/webapp/WEB-INF # move the toy's web.xml file to a template file for your project % mv web.xml.toy web.xml.template % cd tags/site # move the toy's header and footer into files for your project % mv header.tag.toy header.tag % mv footer.tag.toy footer.tag </verbatim> !#[Editing the template] Now your project's directory structure is pretty close to what you need. To finish it off, you need to: * modify myModelName-config.xml as described above in [#Configuring the model] * re-write myModelName.xml and myModelName.prop as described below in [#Creating a model] * re-write myModelName-sanity.xml as described below in [#Creating a sanity test] * configure your view as described below in [#Configuring the view] * edit the file $PROJECT_HOME/TheNameOfMySite/build.xml and replace "~WDKToySite" with ~TheNameOfMySite. This will enable you to install your project using the GUS installer !#[Building your project] To build your project, follow the directions in either [#Downloading] (to install the Model alone without the view) or in [#Building the WDK on your site] to install the project into a web site. In both cases, substitute ~TheNameOfMySite for ~WDKToySite in the build command to build _your_ project instead of the Toy Site. !#[Storing your project in CVS] It is recommended that you store your project in CVS. To do so, do this: <verbatim> % cd $PROJECT_HOME % mv TheNameOfMySite TheNameOfMySite.bak % cd TheNameOfMySite.bak % cvs -d TheNameOfMyRepository import -m "Start of project" TheNameOfMySite dontcare start % cd .. % cvs -d TheNameOfMyRepository co TheNameOfMySite </verbatim> !!#[Creating a model] The first step in using the WDK (after installing it) is to create a model. Because the WDK is schema independent, we cannot provide you with a model that is ready to go. We do supply a model for a sample site against a sample database to give you ideas. But, the basic task of deciding what questions, summaries and records you want on your site must be done by you. *NOTE:* Whenever you change the definition in your model of a cached query then _you must reset the cache (see [#Creating and managing the cache])_. !#[The Model XML File] The model XML file defines the model. In it you specify all the details of the questions, summaries and records the end users will see. The WDK reads the file and creates your site from it. The model XML file must be named a certain way: <verbatim> $PROJECT_HOME/TheNameOfMySite/Model/config/myModelName.xml </verbatim> Your installation contains a sample model XML file which will prove very useful in orienting you. <pre> $PROJECT_HOME/WDKToySite/Model/config/toyModel.xml </pre> In the model you define these kinds of objects: <query> | A query is a request for tabular data from a data source. The result is a list of rows, each with a set of columns. You define queries in the model for many purposes. As described below, some are used in questions, some in records and some in parameters. <parameter> | A value that must be specified for a query to run. Queries may have zero, one or many parameters. <recordClass> | A template for creating records of a particular type of entity, such as a Gene. The type of entity a record describes is determined by the type of ID the record is parameterized by. A Gene record expects a Gene ID, an RNA record expects an RNA ID. (More details on this below). <question> | A pairing of a query (one that returns a set of entity IDs) and a record. The record is used to describe each of the entities returned. _#[Sets and names]_ The top level of the model contains a number of sets. You create these first, and then add objects, as described in the previous section, to the sets. Here are the types of sets: <querySet> | A set of queries. <parameterSet> | A set of parameters. <recordClassSet> | A set of record classes. <questionSet> | A set of questions. A question set can be marked "internal" for the use in nested records and nested record lists (see below). You may have more than one set of each type. An object may be in only one set. Each set has a name. All sets must have unique names, regardless of the type of the set. The elements of a set, eg, queries or records, must have names that are unique within the set (but not across sets). In this way, any element of any set can be uniquely identified by its *two part name*: _setname.elementname_. It is recommended that you to divide your model into convenient sets. For example, queries that perform a similar function go well together in a set. Or, all Gene records might go together. _#[References]_ Some of the objects you define in the model refer to other objects you have defined in the model. For example, a question refers to a query and to a record. The reference uses the unique two part name of the referent. _#[Queries in the model]_ Queries are defined inside a query set. There are different types of queries to accomodate different types of data sources: <sqlQuery> | issued to a relational database <flatFileQuery> | issued against flat files (see [#Upcoming features]) <processQuery> | runs a process and returns a result (see [#Upcoming features]) A query has these constituents: <paramRef> | References to parameters used by the query. Not only must each query have a unique two-part name, but, each query must also have a unique element name (the second part of the two-part name). <column> | Each <column> describes a column that is part of the result. Additionally, a column has the attribute _truncateToRef_, which, when set to an integer value, restricts the displayed value of the column to that many characters. This is useful for long sequences and descriptions that interfere with neat table layout. The default truncate value is 100. <sql> | If the query is an <sqlQuery>, the sql statement to run. Embed parameter values in the sql by using the parameter's name (the second part of its two-part name) surrounded by '$$', eg, '$$taxon$$'. *NOTES:* (1) the columns in the select clause must match the columns declared in the <column>s. (2) all columns in the where clause must have a prefix to indicate what table they come from, even if the column name is unique across the tables used. The model maintains the order of the query result for later use. As discussed in the following sections, queries are used for different purposes. Some are used by records, some are used by questions and some are used by parameters. _#[Parameters in the model]_ Parameters are defined inside a parameter set. There are different types of parameters (more on the way): <stringParameter> | A parameter whose value is a string. <flatVocabParameter> | A parameter that provides a list of choices. It does so by issuing a query to find the list of choices. The query must return two columns "term" and "internal." The terms are shown to the user. The internal value is the value that is embedded into the query that uses this parameter. Set the multiChoice attribute to "true" to allow the user to chose more than one element from the list. _#[Records in the model]_ A record provides information about an entity in the database. The entity must have a primary key. The primary key is used in queries to retrieve information about the entity. In the model, you define a <recordClass>. It is a template for creating records of a certain type. A record class has the following attributes: idPrefix | The value that will be prepended to the primary key of each record type | The type of this recordClass for display purposes attributeOrdering | comma-separated list of attributes that will be used to determine the order in which the columns of the contained records will be displayed. Attributes not specified in this ordering will be appended to ordered columns in no guaranteed order. Note that the primary key of the record will always be the first column displayed. And a recordClass contains one or more of each of these kinds of elements: <attributesQuery> | a reference to a query that returns a single row of information about the record's entity. _This query must return exactly one row or an error will be thrown._ Use an attributes query to return information such as the entity's name, location, organism or any other attribute that the entity has only one of. Each value returned by this query is considered an "attribute" of the record. The "name" of the attribute is taken form the name of its column. An attributes query must take exactly one parameter, and its name must be "primaryKey." It will be passed the primary key of the record which contains the attributes query. <tableQuery> | a reference to a query that returns zero, one or more rows of information about the record's entity. Use a table query to return tabular information about the entity, such as the entity's Gene Ontology functions. A table query must take exactly one parameter, and its name must be "primaryKey." It will be passed the primary key of the record which contains the attributes query. <textAttribute> | a block of text that all records of this type will have. This may be a description of the kind of entity the record represents. The text can have the values of other attributes (including other text attributes) embedded in it. To do that, embed a attribute's name surrounded by '$$', for example: "This Gene is from organism $$organism$$." <linkAttribute> | an attribute that is a hyperlink. The <linkAttribute> has a visible="" attribute that describes how the link should be displayed. It also has a <url> element that describes what URL should be used. If your URL contains funny characters, then enclose it in <![CDATA[ your url here... ]]>. Embed into the visible and url text values from other attributes by using $$attribute_name$$ as described in <textAttribute>. It is recommended that you place all URLs in the model properties file described in [#The model properties file]. Doing so places all URLs in one file so that they are easy to maintain (URLs are notorious for changing!). If you use the $$ mechanism to embed values in an URL that is stored in the properties file, you must escape each $ with \\. Also, you may want to hide the values that you embed in the URL from the user because they are internal identifiers. In that case, set the isInternal="true" attribute of the attribute or column that is supplying the internal value. <nestedRecord> | a record inside of this record. It is returned by an internal question that use values from the containing record as parameters and return one record to be embedded here. <nestedRecordList> | a list of records. Same as above except that the internal question may return more than one record. _#[Questions in the model]_ A question is "a query that returns records." You define a question by pairing an "id query" and a record class. The id query defines the question asked, and returns a list of IDs of entities that match the query. The record class defines the type of record returned for each entity. A question contains these attributes: queryRef | a reference to a query that will return a single column containing the primary keys of a set of entities. recordClassRef | a reference to a record that will describe each element of the query's result. The primary key that the record expects must match the type of primary keys returned by the query. displayName | the name of the question for display purposes summaryAttributesRef | comma-separated list of attributes that will be displayed in the question's summary. If this property is not set, the summary will display all attributes belonging to the question's query. And, a question contains these elements: description | a helpful one sentence or so description of the question. help | a lengthy description of the question that provides help about it. _#[The detailed specification for a WDK model file]_ The WDK model file is validated against a ["schema file" |http://www.gusdb.org/wdk/wdkModel.rng]" You can find the file in your installation at <pre> $GUS_HOME/lib/rng/wdkModel.rng </pre> The schema file provides the exact syntactic specification for what is allowed in a WDK model file. It is written in the [RELAX NG (RNG)|http://relaxng.org] format. !#[The model properties file] The WDK model uses a properties file so that you can configure your model with values that may change over time. You should always use the properties file for changeable values, rather than "hard-coding" them in the model XML file. The properties file is in standard "properties file" format, for example: <pre> ourDataVersion=1.20 ourProjectId=22 TAXON_URL=http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?name=\\$\\$taxon_name\\$\\$ </pre> You may embed any properties you define in the properties file in your model XML file by surounding the name of the property with '@', for example "@ourDataVersion@." !#[Testing on the command line] Test your WDK model on the command line before you deploy it to a website. _#[wdkXml]_ Use the wdkXml command to parse your xml file. It will detect any syntactic problems, such as malformed xml, xml that does not conform to the schema file or invalid references. <pre> % wdkXml Parse a wdk model xml file <pre> % wdkXml usage: wdkXml -model model_name Parse and print out a WDK Model xml file. Options: -model <model> the name of the model. This is used to find the Model XML file ($GUS_HOME/config/model_name.xml) the Model property file ($GUS_HOME/config/model_name.prop) and the Model config file ($GUS_HOME/config/model_name-config.xml) </pre> Here is an example using the toy model: <pre> % wdkXml -model toyModel </pre> _#[wdkQuery]_ Use the wdkQuery command to run any query in your model XML file. <pre> % wdkQuery usage: wdkQuery -model model_name -query full_query_name [-dontCache] [-returnTable | -returnSize | -rows start end] [-params param_1_name param_1_value,...] Run a query found in a WDK Model xml file. If run without -params, displays the parameters for the specified query Options: -model <model> the name of the model. This is used to find the Model XML file ($GUS_HOME/config/model_name.xml) the Model property file ($GUS_HOME/config/model_name.prop) and the Model config file ($GUS_HOME/config/model_name-config.xml) -dontCache Do not use the cache for this query (even if it is cache enabled). -params <params> space delimited list of param_name param_value .... -query <query> The full name (set.element) of the query to run. -returnSize For pageable queries only: return the total size of the result. -returnTable Place the result in a table and return the name of the table. -rows For pageable queries only: provide the start and end rows to return. </pre> Here is an example using the toy model: <verbatim> % wdkQuery -model toyModel -query RnaIds.ByDbESTLib -params NumEstLibs 6 AssemblyConsistency 80 </verbatim> _#[wdkRecord]_ Use the wdkRecord command to test a record. <pre> % wdkRecord usage: wdkRecord -model model_name -record full_record_name -primaryKey primary_key Print a record found in a WDK Model xml file. Options: -model <model> the name of the model. This is used to find the Model XML file ($GUS_HOME/config/model_name.xml) the Model property file ($GUS_HOME/config/model_name.prop) and the Model config file ($GUS_HOME/config/model_name-config.xml) -primaryKey <primaryKey> The primary key of the record to find. -record <record> The full name (set.element) of the record to print. </pre> Here is an example using the toy model: <verbatim> % wdkRecord -model toyModel -record RnaRecordClasses.RnaRecordClass -primaryKey 92484673 </verbatim> _#[wdkSummary]_ Use the wdkSummary command to test a summary <pre> % wdkSummary usage: wdkSummary -model model_name -question full_question_name Print a summary found in a WDK Model xml file. Options: -model <model> the name of the model. This is used to find the Model XML file ($GUS_HOME/config/model_name.xml) the Model property file ($GUS_HOME/config/model_name.prop) and the Model config file ($GUS_HOME/config/model_name-config.xml) -params <params> space delimited list of param_name param_value .... -question <question> The full name (set.element) of the question to run. -rows the start and end pairs of the summary rows to return </pre> Here is an example using the toy model: <verbatim> % wdkSummary -model toyModel -question RnaQuestions.ByNumSeqs -rows 1 20 -params NumSeqs 10 ApiTaxon "Neospora caninum" </verbatim> !#[The model sanity test] The model sanity test exercises all the queries and records in a model to make sure they will run correctly and produce an output within an expected range. You should always run a sanity test before you deploy your model (typically calling it very often as you develop a model). <br><br> _#[Creating a sanity test]_ <br><br> A sanity test is an xml file consisting of a number of individual tests, one or more for each record and query in the model. You must include in the file at least one test for each query and record in the model or the sanity test will fail. You can provide more if you think it will be useful. A sanity test contains two kinds of elements: <sanityQuery> | A test of a query in the model <sanityRecord> | A test of a record in the model A <sanityQuery> contains the following attributes: queryRef | The name of the query. Formatted as "querySetName.queryName." minOutputLength | The minimum number of rows the query is expected to return when run with the given parameters. This value must be at least one (see [#Sanity query restriction] below) maxOutputLength | The maximum number of rows the query is expected to return when run with the given parameters A <sanityQuery> contains one or more <sanityParam> elements. A <sanityParam> contains these attributes: name | The name of the parameter required by the query value | Value to be used for the parameter in the sanity test <sanityRecord> contains the following attributes: twoPartName | The name of the record. Formatted as "recordSetName.recordName" primaryKey | The primary key of the entity that the record represents _#[Running a sanity test]_ <br><br> The sanity test will print the results of each provided test. When a test fails, you will be notified and be given a command to run to debug the failed query or record individually. <pre> usage: wdkSanityTest -model model_name -verbose Run a test on all queries and records in a wdk model, using a provided sanity model, to ensure that the course of development hasn't dramatically affected wdk functionality. Options: -model <model> the name of the model. This is used to find the Model XML file ($GUS_HOME/config/model_name.xml), the Model property file ($GUS_HOME/config/model_name.prop), the Sanity Test file ($GUS_HOME/config/model_name-sanity.xml) and the Model config file ($GUS_HOME/config/model_name-config.xml) -verbose Print out more information while running test. </pre> Here is an example using the toy model: <pre> % wdkSanityTest -model toyModel </pre> _#[Sanity query restriction]_ <br><br> A sanity test that runs a query must return at least one row. You must provide parameters that fulfill this requirement. This ensures that the query is capable of producing output. When a query is run outside of the sanity test, it may return zero rows. !!#[Configuring and customizing the view] The WDK provides a "default" view that will display any model you create. It also allows you to brand the default view, as described here. Logo | Provide a logo in your_webapp_directory/WEB-INF/images Stylesheet| Provide the file your_webapp_directory/WEB-INF/misc/style.css Header and footer | Provide a header.jsp and footer.jsp in your_webapp_directory/WEB-INF/tags/site. Make sure they point to your logo. Model | Edit your_webapp_directory/WEB-INF/web.xml to use the model file you created View | Edit your_webapp_directory/index.jsp to forward to "showQuestionSetsFlat.do" action to show questionSets as flattened list of question or to "showQuestionSets.do" action to show structured questionSet view. If you are not satisfied with the default view for one or more of the <recordClass>s you have defined in your model, you may provide custom pages. To do so, place a .jsp file named after the "full name" of the record class in the directory <verbatim> $PROJECT_HOME/TheNameOfMySite/Site/webapp/custom_view </verbatim> The full name of the file is formed by combining the recordClassSet name and the recordClass name: "recordClassSet.recordClass." So, for example, you will find the following in file in the WDKToySite: <verbatim> $PROJECT_HOME/WDKToySite/Site/webapp/custom_view/EstRecordClasses.EstRecordClass.jsp </verbatim> !!#[Upcoming features] !Model _Query context_ The query context stores parameter values to apply universally to all queries. The canonical example of this is "taxon." The context is applied to parameters that belong to user's queries and also to lower level queries, such as those used by controlled vocabulary parameters. This way controlled vocabularies reflect the context. The model will make a query context available for reading and writing. _Hints in records_ A design goal of the wdk is that it supply a generic view that can display a model with minimal configuration. More advanced WDK users will write particular views (in .jsp) for particular record types. In order to support the generic view, the model needs to provide some display hints in Records * attribute hints * attribute is "summary" (display in a Summary) * this hint can be overridden by a hint in a Summary (add/delete particular attributes from the list of summary attributes) * attribute is "internal" (don't display) * attribute ordering * subrecord membership (only display in a certain subrecord) * table hints * table ordering * subrecord membership (only display in a certain subrecord) _Sample values for parameters_ We would like query dialogues to show sample values for parameters. This means that Parameter will gain a getSampleValue() property. The trickiness is in how to set that value. We have been thinking that a model XML file could be reusable across projects and/or sites. This suggests that sample values should probably be set in the model properties file. (This solution ignores the complication that a given parameter may be reused by many queries, and so might want to have context specific sample values.) !View _Customization Hooks_ We will provide hooks intended for plugging in custom pages for summary (per site and per record class), for questions and questionSets. !Controller _Customization Hooks_ See above. More sophisticated customization such as configurable/declarative page navigation controll flow.