This is a work-in-progress. It only covers part of the schema so far.
3.5 Categories
Mapping
- FingerprintClone
- FingerprintCloneContig
- FingerprintCloneMarker
- FingerprintContig
- FingerprintMap
- RHMap
- RHMarker
- RHMapMarker
- OpticalMapAlignment
- OpticalMapAlignmentSpan
- OpticalMapFragment
- EPCR
- EndSequencePairMap
- PlasmoMap
Clones
- EST
- Library
- Clone
- CloneSet
- CloneInSet
- AnatomyLibrary
Controlled Vocab
- AnatomyLOE
AA Features
- TranslatedAAFeatSeg
NA Features
- ScaffoldGapFeature
NA Sequence
- NASequence
- VirtualSequence
- SequencePiece
- ExternalNASequence
- GenomicSequence
- Assembly
- SplicedNASequence
- NAEntry
- DbRefNASequence
AA Sequence
- AASequence
- ExternalAASequence
- TranslatedAASequence
- TrivialTranslation
- MotifAASequence
- NRDBEntry
- PfamEntry
- DBRefPfamEntry
Similarity
- Similarity
- SimilaritySpan
- BlatAlignmentQuality
- BlatAlignment
- ConsistentAlignment (remove this empty table?)
Misc Applications
- GeneFeatureSeqOverlap
- AAMotifGoTermRule
- AAMotifGOTermRuleSet
Assembly
- AssemblyAnatomyPercent
- AssemblySequence
- AssemblySequenceSNP
- AssemblySNP
General
4.0 Categories (approximate)
Mapping
- FingerprintClone
- FingerprintCloneContig
- FingerprintCloneMarker
- FingerprintContig
- FingerprintMap
- RHMap
- RHMarker
- RHMapMarker
- OpticalMapAlignment
- OpticalMapAlignmentSpan
- OpticalMapFragment
- EPCR
- EndSequencePairMap
Clones
- Library
- Clone
- CloneSet
- CloneSetMember (renamed from CloneInSet)
- LibraryAnatomy (renamed from AnatomyLibrary)
- TranscriptConsensusAnatomyPercent
Controlled Vocab
- AnatomyLOE
- SimilarityQuality
AA Features
- TranslatedAAFeatSeg - this should possibly be made an AA Feature (a sub-feature of TranslatedAAFeature)
NA Features
- ScaffoldGapFeature
Sequence
- Sequence
- NASequence
- TranscriptConsensusNASequence (renaming of Assembly)
- ESTNASequence (new subclass of NASequence, replaceing EST)
- AASequence
- MotifAASequence
- PfamEntry?
- NRDBEntry?
Post-its
- DbXRef
- Evidence
- Comment
- PostIt
- AnalysisAlgorithm
- NewIdentity (renaming of MergeSplit)
Misc Applications
- GeneFeatureSeqOverlap
- AAMotifGoTermRule
- AAMotifGOTermRuleSet
Similarity
- Similarity
- SimilaritySpan
Assembly
- AssemblyPiece (merging of AssemblySequence and SequencePiece)
- AssemblySNP
- AssemblyPieceSNP
Discussion points
- What is the fate of PfamEntry and NRDBEntry?
Proposed 4.0 modifications
- Review PlasmoMap table
Soft links
- should we use controlled vocabs to indicate what tables are allowed?
one idea: the PostIt table
Post-its are tables with soft links, so far
- dbxref
- evidence
- comment
- analysis algorithm (a link to algorithm invocation that shows an analysis algorithm run)
- any table whose rows might reasonably have a post-it gets a new overhead column called post_it_id
- post_it_id is null if the row has no post-its
- otherwise it points to a row in the PostIt table
- PostIt has a column for each type of post-it in the schema.
- the columns are of type int, and track how many post-its of each type the pointing row has
- these are effectively reference counts. they are maintained by some kind of process.
Dbxrefs
- why do we have specific linking tables, eg DbRefNASequence instead of one big linking table
- why do we have in-line links, eg, source_id,external_database_release_id
- in other words, should all links be handled uniformly
- review how chado solves this problem
if we're gonna have dedicated linking tables for DbRef
- they should all be named consistently
- like this: NASequenceDbRef (ie, they belong in NASequence's category)
Similarity
- GeneTrapAssembly table should be replaced with
- GeneTrapSequence, a subclass of NASequence
- a Similarity between GeneTrapSequence and an AssemblySequence
- Review why we don't have a unified treatment of similarity. in particular, why do we have BlatAlignment as a separate type of similarity. The basic fact is that things are similar. The technology specific details can be relegated to subclasses. This will make Similarity even bigger than it already is. We will need an optimization to deal with this.
- Similarity ideally should have links to a Sequence superclass, but, we can't do that because we have only one level subclassing
- the Similarity table will be big, and could be partitioned on types of similarity and/or types of participating sequences.
- integrate BLATAlignment. Its attributes block size, q_start, t_start hold span info that would go into similarity span
Sequence
- Lose NAEntry table; merge its values into NASequence
- Lose GenomicSequence (lose sequencing_center_contact_id. belongs, if anywhere, in ExternalDatabase)
- Does VirtualSequence need the sequence clob? If it does, then it doesn't need its own subclass. Add a bit to NASequence to indicate that it has pieces. Call that bit "has_assembly_pieces."
- lose crc32_value from ExternalAASequence
- lose all subclasses of AASequence except MotifAASequence
- use TranslatedAAFeatSeg to describe translation (shouldn't this be a Feature?)
- lose EST table. Create instead the ESTNASequence subclass of NASequence
Sequence types
we need to discuss what are the dimensions of sequence type
- genomic, genetrap, transcript
- single/double stranded
- DNA RNA AA
Controlled Vocabs
- should simple controlled vocabs, eg SimilarityQuality be in their own table, with hard links to them?
- if so, which category should they go in, CV or the relevant category, if there is one?
Assembly
Other topics
Schemas
- return to question of the what Schemas we should have after we finalize the Categories.
- introduce application specific schemas for, eg, GO prediction
Documentation
- we need a simple technology with which to edit the documentation of the tables.
- Trish will work with Congzhou to explore if ontology tools would be able to give us multi-dimensional ways to document the schema (eg, dependency graph, category graph, use case graph)
we need to upgrade the schema browser to
- display category
- sort by category
- limit by category
- limit by keyword (searched against any combo of: table name, category, table doc, attribute doc)
- linking tables are named like so: table1table2. they should go into the category that table1 belongs to





