Skip to main content
Figure 2 | Source Code for Biology and Medicine

Figure 2

From: HitKeeper, a generic software package for hit list management

Figure 2

Schematic representation of the sequence and motif pipelines. Several successive versions of a given source database usually coexist at different stages in a pipeline. The databases are processed by three scripts running simultaneously, in a manner similar to a system daemon: HKLoader watches the source data files for changes (using the date/time stamp). This script is responsible for parsing and converting the raw data, detecting redundancy, and transferring the "clean" data into the SQL database. HKUpdater updates the hit list. Once a motif database enters the prepare state, the new motifs are computed against the sequences that are in current state. Similarly, when a sequence database comes in the states prepare, the new sequences are computed against the motifs that are in the current state. The two computational tasks, sequences-vs-motifs and motifs-vs-sequences, are never executed simultaneously – this keeps the two pipelines synchronized. Once the calculations are done, HKPublisher becomes responsible for the deployment of the databases to external computing elements (e.g. a blast server) and the database flagged as ready is promoted to current ("in production"): all subsequent queries are now applied to this database. Previous versions can be kept as archives or deleted to reclaim space.

Back to article page