Installation guide

There are several minor versions of Roddy. They can be downloaded and installed in the same directory. Minor versions mark changes in the Roddy API. This may or may not lead to incompatibilies of Roddy and Roddy plugins. Installations for the different versions differ a bit, so we list all versions here.

Roddy uses Groovy, however, Groovy is a bit slow to start. So Roddy 2.4+ supports GroovyServ, which can be used by you to speed things up. Roddy will try to install GroovyServ on its own. However, if that fails, you can still try to set it up on your own. If it still does not work, you can also disable it.

Premises

To install and run Roddy the following programs need to be installed on your computer:

  • zip / unzip
  • git
  • bash
  • lockfile (part of procmail mail-processing-package (v3.22))

As Roddy is Linux based, you will be able to find them in your package manager.

Roddy 2.2

Will not be supported in the future. Releases are only available for legacy plugins.

Roddy 2.3

  1. Clone the repo and select the desired tag.
  2. Step two depends on your role.
  • If you intend to use Roddy and do not want to develop plugins or Roddy itself:
    • Download any JRE v1.8.* (OpenJDK and SunJDK were tested). Also download Groovy 2.4.* [1]
    • Open up the dist folder in the Roddy directory.
    • Create a folder named runtimeDevel
    • unzip / untar both archives in runtimeDevel
  • If you want to develop Roddy or Roddy plugins:
    • Download any JDK v1.8.* (OpenJDK and SunJDK were tested). Also download Groovy 2.4.* [1]
    • Open up the dist folder in the Roddy directory.
    • Create a folder named runtimeDevel
    • unzip / untar both archives in runtimeDevel
  1. Optionally unpack one or more of the release zips in dist/bin/ directory.

Please see Roddy version mix for information about how to mix different versions of Roddy in the same directory.

Roddy 2.4

Roddy version 2.4 is installed in the same way as 2.3. In addition, there will be an automatic downloader for JRE / JDK and Groovy. If you want to use this, you can skip the download steps.

[1](1, 2) If you cannot find the necessary Groovy version, you can also download it from the Maven Groovy repository

Roddy version mix

Different Roddy versions can be co-installed the same installation folder. Currently we do not offer prepackaged zip files, but you can easily assemble the version mix you need.

  1. You need to install Roddy like in the description above.
  2. Switch to the desired release tag.
  3. Run Roddy with the pack option like
./roddy.sh pack
  1. Switch back to master / develop and OR repeat steps 1 - 3 for additional Roddy versions.

If you take a look into your dist folder now, you’ll see a new zip file and a folder with the proper version numbers.

Setup GroovyServ

As explained above, GroovyServ tremendously decreases the startup time of Groovy applications and Roddy will try to download and set it up automatically. If that fails or if you want to set it up by yourself, do the following in your Roddy directory:

mkdir -p dist/runtime
cd dist/runtime

# Download the GroovyServ binary zip archive from the `GroovyServ`_ download site,
# unzip it and delete the archive afterwards.

unzip groovyserv*.zip
rm groovyserv*.zip

# Last step, put Groovy and the Java binary folders to your PATH environment variable. This
# is e.g. set in your ~/.bashrc file.

Now that’s it. If you want to disable GroovyServ, you also do this.

mkdir -p dist/runtime
cd dist/runtime
touch gservforbidden

If you create the file, Roddy will not use GroovyServ.

Note

This setup was tested using GroovyServ 1.1.0!

Test your installation

Head over to the Roddy directory and do

./roddy.sh

If everything is properly done, Roddy will print its help screen.

Example workflow

If you want to try out Roddy, you can download our example workflow. The workflow is wrapped inside a Docker container and you can use it to test some Roddys functionality in a controlled environment. The workflow itself is used for somatic small indel calling. It is based on Platypus and accepts paired control and tumor BAM files. Output files are in VCF format.

Installation

Make sure, you have a running Docker environment! Open the de.NBI / HD-HuB ownCloud repository

Download the Docker images:

  • The base image for our example: roddybaseimage.tar.gz
  • The workflow image itself: roddyplatypus.tar.gz

and import them into your Docker environment.

Also download:

  • The workflow dependencies: PlatypusIndelCallingWorkflowDependencies.tar.gz
  • The scripts to run the workflow: PlatypusIndelCallingBundle.tar.gz

Create unpack the scripts file. The bundle directory will be created. Unpack the dependencies file and move the folder dependenciesPlatypusIndel/ to the bundle directory. Create a working directory and give it access rights like chmod 777

Now you are nearly prepared and only need files which you can analyse. For this example, you will need a control and a tumor bam file plus their index files. The bam files need to be aligned with BWA (we used versions >= 0.7.8) against hs37d5 and duplication marking should be turned on.

Example usage

The docker container uses a slighty simplified Roddy syntax. Head into the extracted bundle directory. There you will finde the roddy.sh script.

You can call the script in the following way:

bash roddy.sh (mode) (dataset id) (control bam) (tumor bam) (work directory)

So to just run the example:

bash roddy.sh run TEST [PATH_TO_YOUR_CONTROL] [PATH_TO_YOUR_TUMOR] [PATH_TO_YOUR_WORKING_DIR]

If everything is setup properly, the Roddy docker will now start and create run the workflow. The workflow will take several hours to finish, so make sure to run it in e.g. a screen session-

Users guide

Walkthrough

This guide will show you how to setup Roddy, so that it is starting and ready to run an analysis for a project. There is a sample NGS workflow available, which will be used in the examples.

For a short overview about Roddy usage navigate to Cheat sheet.

If you do not already have a running installation, please see Installation guide for instructions to install Roddy.

After installing Roddy, please head to the Roddy folder and run the Roddy start script:

bash roddy.sh

If everything is good, Roddy will start and print the help.

Roddy is supposed to be a rapid development and management platform for cluster based workflows.
The current supported ways of execution are:
 - job submission using qsub with PBS or SGE
 - monolithic, direct execution of jobs via Roddy
 - submission or execution on the local machine or via SSH

  To support you with your workflows, Roddy offers you several options:

  help
        Shows a list of available configuration files in all configured paths.

[...]

    --usePluginVersion=(...,...)    - Supply a list of used plugins and versions.


Roddy version 2.2.78 build at Fri Sep 18 13:55:26 CEST 2015

Now you can go on and prepare the configuration for your project.

Setup Roddy Configuration

You need two types of configurations:

  1. Application configuration file (by default applicationProperties.ini): A ini formatted file that configures properties of the Roddy application, the batch processing system (PBS, SGE, etc.), default paths for configutations and plugins, etc.
  2. An XML configuration file for your project with all parameters of the workflow that you want to use.

Application ini file

Roddy uses an ini file to control the application behaviour. The ini file define several things:

  • Which job system you use
  • How you connect to the processing system
  • Where Roddy shall search for plugins and configuration files

By default, Roddy will use the ini file located at $HOME/.roddy/applicationProperties.ini, but you can select any other file with the _–useconfig__ command-line option.

The ini files are explained in detail in Application properties files. Here you’ll see a brief overview:

[COMMON]
useRoddyVersion=current                     # Use the most current version for tests

[DIRECTORIES]
configurationDirectories=[FOLDER_WITH_CONFIGURATION_FILES]
pluginDirectories=[FOLDER_WITH_PLUGINS]

[COMMANDS]
jobManagerClass=de.dkfz.roddy.execution.jobs.direct.synchronousexecution.DirectSynchronousExecutionJobManager
#jobManagerClass=de.dkfz.roddy.execution.jobs.cluster.pbs.PBSJobManager
#jobManagerClass=de.dkfz.roddy.execution.jobs.cluster.sge.SGEJobManager
#jobManagerClass=de.dkfz.roddy.execution.jobs.cluster.slurm.SlurmJobManager
#jobManagerClass=de.dkfz.roddy.execution.jobs.cluster.lsf.rest.LSFRestJobManager
commandFactoryUpdateInterval=300
commandLogTruncate=80                       # Truncate logged commands to this length. If <= 0, then no truncation.

[COMMANDLINE]
executionServiceUser=USERNAME
executionServiceClass=de.dkfz.roddy.execution.io.LocalExecutionService
#executionServiceClass=de.dkfz.roddy.execution.io.SSHExecutionService
executionServiceHost=[YOURHOST]
executionServiceAuth=keyfile
#executionServiceAuth=password
executionServicePasswd=
executionServiceStorePassword=false
executionServiceUseCompression=false
fileSystemInfoProviderClass=de.dkfz.roddy.execution.io.fs.FileSystemInfoProvider

The file is divided into several sections, but this is mainly to keep a better order:

  • COMMON is for setting up general things
  • DIRECTORIES
  • COMMANDS
  • COMMANDLINE is to set up the command line interface

We try to keep every possible option in the ini file, so you should basically be able to just select what you need and to fill in the missing parts.

Usually, you just need to change the following settings:

  • jobManagerClass - Selects the cluster system backend
  • CLI.executionServiceClass - Selects, if you want to access your system via SSH or directly
  • CLI.executionServiceAuth - keyfile or password?
  • CLI.executionServiceHost - The host, if you select SSH
  • CLI.executionServicePasswd - The password for your system, if using SSH and no keyfiles
  • CLI.executionServiceStorePassword - If you want to store the password, put in true, however, the password is stored in plain-text!

You might remember or store away the above options for future usage as its likely, that they won’t change too often. For you the more important settings might be:

  • configurationDirectories - Put in a comma separated list of directories, where you keep your project XML files
  • pluginDirectories - Put in a comma separated list of the directories, where your plugins are stored. Note, that the folder dist/plugins in the Roddy base directory, which contains the PluginBase and DefaultPlugin, will always be imported. You do not need to set this one.

You can either copy the content from above or you can also use Roddy to help you with the setup. This will be explained later on.

Project configuration files

All workflow-specific settings are stored in XML files.

The configuration files are multi-level, which means, you can - Import configuration files into other configuration files - Define several level of configurations and subconfigurations in one file

<configuration configurationType='project'
         name='TestProject'
         description='A very small project configuration for some workflow tests.'
         imports="baseProject"
         usedresourcessize="m">
    <availableAnalyses>
        <analysis id='testWorkflow' configuration='TestAnalysis' useplugin="DefaultPlugin:current"/>
        <analysis id='qualityControl' configuration='QualityControlAnalysis' useplugin="QualityControlPlugin:1.0.10"/>
    </availableAnalyses>
    <configurationvalues>
        <cvalue name='inputBaseDirectory' value='$USERHOME/roddyTests/${projectName}/data' type='path'/>
        <cvalue name='outputBaseDirectory' value='$USERHOME/roddyTests/${projectName}/results' type='path'/>
    </configurationvalues>
    <subconfigurations>
        <configuration name="verysmall" usedresourcessize="xs" inheritAnalyses="true" />
    </subconfigurations>
</configuration>

You as a user normally should only need to create a project specific file like the one above. Roddy also offers a command for you to help you to set this one up.

Configuration files contain several sections where Roddy lets you define things like configuration values, tools and even filenames. But, you probably won’t need that now and we’ll concentrate on a very basic project configuration like the one above. You can find an in-detail guide here XML configuration files. You might concentrate on the configuration values part as this will be the part which you probably need most.

//Uhhh, ok, so what is in the above example?//

Good that you ask! First you’ll find a standard XML format containing the configuration header. If it is a project configuration file (you could e.g. create a file which contains basic settings for your working environment like e.g. commonly used binaries and reference files) then your file must be named with the prefix “projects”. Otherwise it will not be recognized as a project configuration by Roddy.

<configuration configurationType='project'
                     name='TestProject'
                     description='A very small project configuration for some workflow tests.'
                     imports="baseProject"
                     usedresourcessize="m">

The header of the configuration must contain the following: - The configurationType (in this case “project”) - A name which must not contain “.” and ” “

It may contain:

  • A description
  • Imports for other configuration files. import can hold a comma separated list of other configuration id’s / names
  • A switch for the size of the data you are dealing with. In the analysis configuration every tool can have different level of resources im memory, CPU, and walltime. This option in the project XML allows you to select a project-wide resource requirement level for the size of the input data expected in the project. The values t, xs, s, m, l, xl are allowed the and default is “l”.

Directly after the header, you will find a list of the imported workflows for your project.

<availableAnalyses>
    <analysis id='testWorkflow' configuration='TestAnalysis' useplugin="DefaultPlugin:current"/>
    <analysis id='qualityControl' configuration='QualityControlAnalysis' useplugin="QualityControlPlugin:1.0.10"/>
</availableAnalyses>

Each line can enable a workflow / analysis for your project. To make such a line work, you need to set:

  • id an arbitrary name that identifies the workflow in your project. This name will be used to call the workflow from the command line.
  • configuration to identify the original analysis configuration id that is defined in the analysis XML in the plugin. You can also import an analysis several times with a different id value.
  • finally, useplugin is used to select the plugin and the plugins version, in which the analysis is searched. This parameter is optional.

The corresponding configuration files are automatically searched in your plugins. The active plugins are retrieved from the plugin directories set in you application ini file.

Next comes the part where you set the projects input and output folder.

<configurationvalues>
    <cvalue name='inputBaseDirectory' value='$USERHOME/roddyTests/${projectName}/data' type='path'/>
    <cvalue name='outputBaseDirectory' value='$USERHOME/roddyTests/${projectName}/results' type='path'/>
</configurationvalues>

In most cases, you should be done right now.

Analysis-specific configuration

Occasionally, you may want to set specific parameters only for selected analyses. In this case you can add subconfigurations:

<subconfigurations>
    <configuration name="verysmall" usedresourcessize="xs" inheritAnalyses="true" />
</subconfigurations>

Subconfigurations are exactly defined like the main configuration. They can contain the same sections. Each value, which is defined by you, overrides a value of the parent configuration. Subconfigurations can be nested and affect all ** tags that are nested within.

Built-in configuration creation / updates

Use Roddy to create an initial project configuration

Roddy can help you to create an initial project configuration with one command.

bash roddy.sh prepareprojectconfig create [targetprojectfolder] --useRoddyVersion=current

The command will:

  1. Create a target folder structure like [targetprojectfolder]/roddyProject/versions/version_[current date]_[current time]
  2. Copy a default ini file to the target folder [targetprojectfolder]/applicationProperties.ini
  3. Copy a default project XML to the target folder [targetprojectfolder]/project.xml

You can now update both the ini file and the XML file to your needs. Do not forget to place the freshly create folder as a configuration folder to the ini file! Please see the explanation above to decide which settings are appropriate for your system.

To use the ini file, you can call Roddy in the following way:

bash roddy.sh --useconfig=[targetprojectfolder]/applicationProperties.ini

Use Roddy to update an existing project configuration to a new version

Sometimes it is helpful to keep several version for project configuration files. This ensures, that you can always try to go back to an old version of your config. To support this, you can call Roddy in the following way:

bash roddy.sh prepareprojectconfig update [targetprojectfolder]

Roddy will then search the latest existing project configuration version and create a new folder with a copy in it.

So after you call Roddy, you’ll find e.g.:

  • [targetprojectfolder]/roddyProject/versions/version_20150719_111328 and
  • [targetprojectfolder]/roddyProject/versions/version_20150925_134527

The new folder will contain a copy of the contents of the old folder. You can call Roddy afterwards with the new ini file.

IMPORTANT: Roddy does not update the configurationDirectories option in the new applicationProperties.ini. As of now, you need to manually adapt the configuration directories in the ini file!

Check if things are set up properly

With configurations of complex workflows, it may become very tedious and error prone to ensure that everything is configured correctly. If you work with multiple projects, the first thing to check is the use of the correct configuration files. To find out, if you did everything right, Roddy offers you several options:

bash roddy.sh showconfigpaths  --useconfig=[pathOfIniFile]

This will show you all available configuration files in your configured paths. Note, that this won’t list analysis XML files, as these are loaded in a later stage, where Roddy has knowledge about loaded plugins.

With the following command you can check, whether you set the right paths and if all your files are available:

bash roddy.sh listdatasets [project]@[analysis] --useconfig=[pathOfIniFile]

Note

Roddy supports parsing metadata such as dataset identifiers from paths but additionally has a MetadataTable facility that simplifies metadata input via a table. Some workflows may also be implemented to get the metadata from dedicated configuration values. Therefore, whether this command works may depend on the specific workflow and may require additional command-line parameters or configuration values. Still it can be extremely useful to get a list of all findable datasets.

If everything is properly set and you use the right configuration and analysis, Roddy will be able to search the input and output folders in your project configuration file. It will then display a list of all found datasets. Roddy will search both folders and the result will be combined, so you will not get doublettes. If you see the list of your datasets, you can now run your analysis, but before you do this, you can also try some more things before.

bash roddy.sh printruntimeconfig [project]@[analysis] [pid] --useconfig=[pathOfIniFile]

If you run a workflow for the first time, it might make sense to check the generated runtime configuration file before you start a process. The above command will do that for the pid set by you. Is everything right? Good, then you can go on and start a process. If not, you need to check your configuration files.

Run a project

There is one more thing you can do before starting a process: You can call Roddy with testrun:

bash roddy.sh testrun [project]@[analysis] [pattern]/[ALL] --useconfig=[pathOfIniFile]

testrun will nearly do the same thing as run, except, that it does not start cluster jobs. It will list all the jobs which will be executed. Please take a close look at the output for all the jobs. testrun and all the other run commands are all triggered with a dataset id pattern. We’ll explain that soon.

Some explanation for the dataset patterns. Roddy selects and lists datasets like e.g. ls. This means, you can use all sorts of wildcards and patterns. Valid patterns are e.g. H063, *-A*, ???3- and so on. But! Keep in mind, that wildcards will may already be resolved by the shell (e.g. Bash is always good for surprises). testrun will help you find out, if the patterns you use are working. Also note, that a plain * won’t work at least for Bash. If you want to run all datasets, use the dataset selector [ALL].

Now let’s look at an example for a job output:

0x789C44FF73F: fastqc [ -l walltime=1000:00:00]
  pid                       : H006-1
  PID                       : H006-1
  CONFIG_FILE               : [ exDir]/runtimeConfig.sh
  ANALYSIS_DIR              : /home/heinold/temp/roddyLocalTest/testproject
  TOOLSDIR                  : [ exDir]/analysisTools/qcPipeline
  TOOL_ID                   : fastqc
  RAW_SEQ                   : [ inDir]/control/paired/run120918_SN7001149_0101_AC16PKACXX/sequence/1_B_GCCAAT_L002_R1_complete_filtered.fastq.gz
  FILENAME_FASTQC           : [outDir]/fastx_qc/control_run120918_SN7001149_0101_AC16PKACXX_1_B_GCCAAT_L002_R1_sequence_fastqc.zip
  RODDY_PARENT_JOBS         : parameterArray=()

This is the output for a job calling fastqc on a fastq file, to go easy, we just named it fastqc. First, there is a fake job id, which is used in test cases. If you call run instead of testrun, this will be replaced by a job identifier produced by your processing backend (PBS, SGE, etc.). The job id is followed by the resource settings specific to your configured processing backend. Here it is the walltime setting for a PBS system. The next lines are the parameters which will be passed to the job. Some of the parameters are set for every job including pid/PID (“patient id”, this is the “dataset”), CONFIG_FILE or ANALYSIS_DIR. The abbreviations like [exDir] or [inDir] are explained in the header of the testrun output. They are there to make things more readable. Other parameters like e.g. FILENAME_FASTQC are job specific. In this case, there is a fastq file for the job input and a zip file containing the job output. Filenames are based on rules which are normally included in analysis configuration files.

Let’s see, showconfigpaths worked, listdatasets worked, printanalysisxml worked and also testrun. What’s left? Right: run!

Let’s start and run something.

bash roddy.sh run [project]@[analysis] [pattern]/[ALL] --useconfig=[pathOfIniFile]

Instead of the output of testrun, Roddy will now try and run the jobs on your processing backend. If all jobs fail, you might have the wrong settings. If some fail, there might be problems with the backend. Roddy will also try to tell you what sort of problems there are. But this won’t work in every case. We won’t bother you with the full output now, but something like the following will show up in case of success:


Finally, you started something. Now all you have to to is to wait until your process finishes. Roddy will again offer you several commands to help you keep track of your progress.

Process tracking, Debugging and Rerunning a process

Sometimes, it can be nice to know if a process is still running or if there were faulty jobs and sometimes you just want to restart a process. Roddy has what you need: checkworkflowstatus, testrerun and rerun.

bash roddy.sh checkworkflowstatus [project]@[analysis] [pattern]/[ALL] --useconfig=[pathOfIniFile]

checkworkflowstatus will create a table listing your selection of datasets and their states:

[outDir]: /home/heinold/temp/roddyLocalTest/testproject/rpp
Dataset       State     #    OK   ERR  User      Folder / Message
A100          UNSTARTED 0    0    0    Not executed (or the Roddy log files were deleted).
A200          UNSTARTED 0    0    0    Not executed (or the Roddy log files were deleted).
stds          OK        3    3    0    testuser   /home/testuser/temp/roddyLocalTest/testproject...

The table has several columns:

  • Dataset is self explaining and shows you for which dataset the line is
  • State is the state for the last execution of a dataset
  • Is the number of started jobs for a process ===========================================
  • OK is the number of good jobs
  • ERR is the number of faulty jobs
  • User is the user which started the last process
  • Folder / Message is the execution store folder for the process

You can e.g. use the output to grep for states, folders and other things. If there are errornous jobs, you now have the info to look for those jobs. The next section will show you, how to do this. For know, we’ll consider the jobs as failed for technical reasons and show you how to restart them.

Roddys restart / rerun option tries to start only jobs which need to be run. For this, it creates a list of all the output files which it knows and compares these files with the existing files on disk. There are no consistency checks done, so files with the size of zero are also taken into account. If a job has failed, all of its descendants are automatically marked as failed. This is also true, when a new job will get startet. What the workflow then does is within the responsibility of the workflows author. Similar to testrun / run, testrerun and rerun will start to process data. However, only necessary jobs will be started.

Import list for different workflows:

Please consider using only one analysis import per project XML file, if you set configuration variables. Configuration values for different workflows might have the same name, which could lead to misconfigured workflows. If you do not want to create a new file, you can still use subconfigurations for the different workflows.

<!-- Roddy 2.2.x -->
<analysis id='snvCalling' configuration='snvCallingAnalysis' useplugin="COWorkflows:1.0.132-4" />
<analysis id='indelCalling' configuration='indelCallingAnalysis'  useplugin="COWorkflows:1.0.132-4" />
<analysis id='copyNumberEstimation' configuration='copyNumberEstimationAnalysis' useplugin="CopyNumberEstimationWorkflow:1.0.189" />
<analysis id='delly' configuration='dellyAnalysis' useplugin="DellyWorkflow:0.1.12"/>

<!-- Roddy 2.3.x -->
<analysis id='WES' configuration='exomeAnalysis' useplugin="AlignmentAndQCWorkflows:1.1.39" />
<analysis id='WGS' configuration='qcAnalysis' useplugin="AlignmentAndQCWorkflows:1.1.39" />
<analysis id='postMergeQC' configuration='postMergeQCAnalysis' useplugin="AlignmentAndQCWorkflows:1.1.39"/>
<analysis id='postMergeExomeQC' configuration='postMergeExomeQCAnalysis' useplugin="AlignmentAndQCWorkflows:1.1.39"/>

<!-- Unreleased or Beta -->
<analysis id='rdw' configuration='snvRecurrenceDetectionAnalysis' useplugin="SNVRecurrenceDetectionWorkflow"/>
<analysis id='WGBS' configuration='bisulfiteCoreAnalysis' useplugin="AlignmentAndQCWorkflows:1.1.39"/>

Cheat sheet

This page is for those amongst you, that need to rush in or just need a fresreshment, when it comes to Roddy usage. We will mostly list useful commands and that’s it. No big explanations or other things. If you need this, open up the Walkthrough.

Where?

/icgc/ngs_share/ngsPipelines/RoddyStable/roddy.sh

Create a new project

lang=bash
bash roddy.sh prepareprojectconfig create [targetprojectfolder]

# Open up the applicationProperties.ini. Change:

- The cluster settings
- Add the COProjectConfigurations path which you need.

# Open the XML file. Change:

- The project id in the header
- Add analyses you need (see user guide, last part)
- Add / change values you need (e.g. I/O dir)

Test

lang=bash
bash roddy.sh listdatasets [project]@[analysis] --useconfig=[yourinifile]

Testrun / Run

lang=bash
bash roddy.sh testrerun [project]@[analysis]  [id] --useconfig=[yourinifile]

Command line options

Roddy has a wide range of run modes and options which will be explained here. The run modes are basically divided into user options and extended developer options. You can view all options by running Roddy without any parameters.

User options

If you do not intend to develop Roddy or Roddy plugins, you can stop reading after this part.

“Title”
Option Additional Description
help Shows a list of available configuration files in all configured paths.  
printappconfig [–useconfig={file}] Prints the currently loaded application properties ini file.
showconfigpaths [–useconfig={file}] Shows a list of available configuration files in all configured paths.
showfeaturetoggles   Shows a list of available feature toggles.
prepareprojectconfig   Create or update a project xml file and an application properties ini file.
plugininfo [–useconfig={file}] Shows details about the available plugins.
printpluginreadme (configuration@analysis) n[–useconfig={file}] Prints the readme file of the currently selected workflow.
printanalysisxml (configuration@analysis) n[–useconfig={file}] Prints the analysis xml file of the currently selected workflow.
validateconfig (configuration@analysis) n[–useconfig={file}] Tries to find errors in the specified configuration and shows them.
listworkflows [filter word] [–shortlist] n[–useconfig={file}] Shows a list of available configurations and analyses. If a filter word is specified, then the whole configuration tree is only printed, if at least one configuration id in the tree contains the word.
listdatasets (configuration@analysis) n[–useconfig={file}] Lists the available datasets for a configuration.
printruntimeconfig (configuration@analysis) n[–useconfig={file}] [–extendedlist] [–showentrysources] Basically calls testrun but prints out the converted / prepared runtime configuration script content. –extendedlist shows all stored values (also e.g. tool entries. Works only in combination with –showentrysources –showentrysources shows the source file of the entry in addition to the value.
testrun (configuration@analysis) n[pid_0,..,pid_n] [–useconfig={file}] Displays the current workflow status for the given datasets.
testrerun (configuration@analysis) n[pid_0,..,pid_n] [–useconfig={file}] Displays the current workflow status for the given datasets.
run (configuration@analysis) n[pid_0,..,pid_n] [–waitforjobs] [–useconfig={file}] Runs a workflow with the configured Jobfactory. Does not check if the workflow is already running on the cluster.
rerun (configuration@analysis) n[pid_0,..,pid_n] [–waitforjobs] [–useconfig={file}] Reruns a workflow starting only the parts which did not produce valid files. Does not check if the workflow is already running on the cluster.
cleanup (configuration@analysis) n[pid_0,..,pid_n] [–useconfig={file}] Calls a workflows cleanup method or a setup cleanup script to clean (i.e. remove or set to file size zero) output files. Aborts the running jobs of a workflow for a pid.
checkworkflowstatus (configuration@analysis) n[pid_0,..,pid_n] [–detailed] [–useconfig={file}] Shows a generic overview about all datasets for a configuration. If some datasets are selected, a more detailed output is generated. If detailed is set, information about all started jobs and their status is shown.

== Advanced developer options ==

compile , , Compiles the roddy library / application.

pack , ,Creates a copy of the current version and puts the version number to the file name. compileplugin , (plugin ID) [–useconfig={file}] , Compiles a plugin .

packplugin , (plugin ID) [–useconfig={file}] , Packages the compiled plugin in dist/plugins and creates a version number for it. Please note that you can indeed override contents of a zip file if you do not update / compile the plugin jar!


Common additional options

–useconfig={file} - Use {file} as the application configuration. –c={file} The order is: full path, .roddy folder, Roddy directory. –verbositylevel={1,3,5} - Set how much Roddy will print to the console, 1 is default, 3 is more, 5 is a lot. –v Set verbosity to 3. –vv Set verbosity to 5. –useiodir=[fileIn],{fileOut} - Use fileIn/fileOut as the base input and output directories for your project.

If fileOut is not specified, fileIn is used for that as well. format can be: tsv, csv or excel
–usemetadatatable={file},[format]
  • Tell Roddy to use an input table to load metadata and input data and available datasets.
--waitforjobs
  • Let Roddy wait for all submitted jobs to finish.
--disabletrackonlyuserjobs
 
  • By default, Roddy will only track jobs of the current user. The switch tells Roddy to track all jobs.
--disablestrictfilechecks
 
  • Tell Roddy to ignore missing files. By default, Roddy checks if all necessary files exist.
--ignoreconfigurationerrors
 
  • Tell Roddy to ignore configuration errors. By default, Roddy will exit if configuration errors are detected.
--ignorecvalueduplicates
 
  • Tell Roddy to ignore duplicate configuration values within the same configuration value block.

errors. By default, Roddy will exit if duplicates are found.

--forcenativepluginconversion
 
  • Tell Roddy to override any existing converted Native plugin. By default Roddy will prevent this.
--forcekeepexecutiondirectory
 
  • Tell Roddy to keep execution directories. By default Roddy will delete them, if no jobs were executed in a run.

–useRoddyVersion=(version no) - Use a specific roddy version. –rv=(version no) –usePluginVersion=(…,…) - Supply a list of used plugins and versions. –configurationDirectories={path},…

  • Supply a list of configurationdirectories.

–pluginDirectories={path},… - Supply a list of plugin directories.

Developer options

Configuration topics

Application properties files

To successfully manage a workflow, Roddy needs to know about several things:

  • The Batch system you’re running on.
  • The user credentials for e.g. SSH and connection settings.
  • The directories for configuration files and plugins.
  • And, if you want, some debug settings.

Let’s have a brief look at it:

[COMMON]
useRoddyVersion=current                     # Use the most current version for tests

[DIRECTORIES]
configurationDirectories=[FOLDER_WITH_CONFIGURATION_FILES]
pluginDirectories=[FOLDER_WITH_PLUGINS]

[JOB_PROCESSING]
jobManagerClass=de.dkfz.roddy.execution.jobs.direct.synchronousexecution.DirectSynchronousExecutionJobManager
#jobManagerClass=de.dkfz.roddy.execution.jobs.cluster.pbs.PBSJobManager
#jobManagerClass=de.dkfz.roddy.execution.jobs.cluster.sge.SGEJobManager
#jobManagerClass=de.dkfz.roddy.execution.jobs.cluster.slurm.SlurmJobManager
#jobManagerClass=de.dkfz.roddy.execution.jobs.cluster.lsf.rest.LSFRestJobManager
commandFactoryUpdateInterval=300
commandLogTruncate=80                       # Truncate logged commands to this length. If <= 0, then no truncation.

[COMMANDLINE]
executionServiceUser=USERNAME
executionServiceClass=de.dkfz.roddy.execution.io.LocalExecutionService
#executionServiceClass=de.dkfz.roddy.execution.io.SSHExecutionService
executionServiceHost=[YOURHOST]
executionServiceAuth=keyfile
#executionServiceAuth=password
executionServicePasswd=
executionServiceStorePassword=false
executionServiceUseCompression=false
fileSystemInfoProviderClass=de.dkfz.roddy.execution.io.fs.FileSystemInfoProvider

The file is divided into several sections, but this is mainly to keep a better order, you can have the file setup like you want it. Briefly explained, the

  • COMMON is for setting up general things
  • DIRECTORIES
  • COMMANDS
  • COMMANDLINE is to set up the command line interface

We try to keep every possible option in the ini file, so you should basically be able to just select what you need and to fill in the missing parts.

Usually, you just need to change the following settings:

  • jobManagerClass - Selects the cluster system backend
  • CLI.executionServiceClass - Selects, if you want to access your system via SSH or directly
  • CLI.executionServiceAuth - keyfile or password?
  • CLI.executionServiceHost - The host, if you select SSH
  • CLI.executionServicePasswd - The password for your system, if using SSH and no keyfiles
  • CLI.executionServiceStorePassword - If you want to store the password, put in true, however, the password is stored in plain-text!

You might remember or store away the above options for future usage as its likely, that they won’t change too often. For you the more important settings might be:

  • configurationDirectories - Put in a comma separated list of directories, where you keep your project XML files
  • pluginDirectories - Put in a comma separated list of the directories, where your plugins are stored. Note, that the folder dist/plugins in the Roddy base directory, which contains the PluginBase and DefaultPlugin, will always be imported. You do not need to set this one.

You can either copy the content from above or you can also use Roddy to help you with the setup. This will be explained later on.

Configuration files

Roddy currently supports two different types of configuration files: - XML based which allows to use all configuration features - Bash based which only allows a reduced set of configuration features

Normally, Roddy workflows and projects are configured with XML files. This document will give you all the details you need to know about those special files. Don’t be afraid of messing up things in configuration files. Roddy checks at least a part (not everything) of the files, when they get loaded and will inform you about structural errors as good as possible.

Types of files

Roddy configuration files exist in three flavours:

  • Project configuration files
  • Workflow or analysis configuration files
  • Generic configuration files.

All file types may contain the same content type though analysis configuration files will normally look different than e.g. project configuration files. The main difference between the different types is their position in the configuration inheritance tree, their filename and their header.

Filenames

Roddy imposes some filename conventions to identify XML files when they are loaded from disk:

  • Project configuration files look like projects*[yourfilename]*.xml
  • Workflow configuration files use the pattern analysis*[yourfilename]*.xml

Common configuration files do not use any pattern. You can name them like you want, except for the above patterns.

Inheritance structure

Configurations and configuration files can be linked in several ways:

  1. Subconfigurations extend their parent configuration(s)
  2. Configuration files can import other configuration, this is only possible on the top-level of a configuration file, a subconfiguration cannot do this
  3. Analysis configuration files can be imported as an analysis import by a project configuration or subconfiguration
  4. An analysis can be imported by a project but not vice-versa

Bash configuration files

Bash configuration files are, compared to XML files, very lightweight. They offer only a subset of configuration options (namely configuration values and analysis imports) and are ideally used for small project or generic configurations.

#name aConfig
#imports anotherConfig
#description aConfig
#usedresourcessize m
#analysis A,aAnalysis,TestPlugin:current
#analysis B,bAnalysis,TestPlugin:current
#analysis C,aAnalysis,TestPlugin:current

outputBaseDirectory=/data/michael/temp/roddyLocalTest/testproject/rpp
preventJobExecution=false
UNZIPTOOL=gunzip
ZIPTOOL_OPTIONS="-c"
sampleDirectory=/data/michael/temp/roddyLocalTest/testproject/vbp/A100/${sample}/${SEQUENCER_PROTOCOL}*

As you can see in the example, a Bash configuration needs a header and a body.

#name aConfig
#imports anotherConfig
#description aConfig
#usedresourcessize m
#analysis A,aAnalysis,TestPlugin:current
#analysis B,bAnalysis,TestPlugin:current
#analysis C,aAnalysis,TestPlugin:current

The header must contain the name of the configuration and may contain imports, a description, the usedresourcessize attribute and several analysis tags. The analysis tags need to be set like [id],[analysis config id],[plugin name]:[plugin version]. Please see XML configuration files for a detailed description of the tags and attributes.

After the header comes the configuration values section.

outputBaseDirectory=/data/michael/temp/roddyLocalTest/testproject/rpp
preventJobExecution=false
UNZIPTOOL=gunzip
ZIPTOOL_OPTIONS="-c"
sampleDirectory=/data/michael/temp/roddyLocalTest/testproject/vbp/A100/${sample}/${SEQUENCER_PROTOCOL}*

The syntax for configuration values is the regular Bash syntax for variables. Of course, you can also use comments.

XML configuration files

Structure / Sections

Each configuration file is built up after the following pattern

<configuration name='test' description='Example.' >
      <availableAnalyses />
      <configurationvalues />
      <processingTools />
      <filenames />
      <enumerations>
      <subconfigurations />
</configuration>

However, keep in mind, that not every section makes sense for every type of XML file. E.g. availableAnalyses only makes sense in project XML files, whereas filenames and processing tools will moste likely only be used within analysis XML files.

Configuration values

Configuration values are the thing you might change most of the times. When Roddy executes a workflow, a shell script will be created where all the configuration values are stored. This script can then be imported by workflow scripts.

Types of values

Special values

For future releases of Roddy and also for better readability of XML files, Roddy offers “special” variables like:

Run flags which look like runPostProcessing, runFlagstats, runScript

and

Binaries which look like BWA_BINARY, MBUFFER_BINARY, PYTHON_BINARY and so on.

Run flags are always considered to be boolean and are e.g. used smartly in Brawl based workflows. Binary variables are or are supposed to be checked on workflow validation and startup in future versions. If you want to exchange a binary in a fast way or set a fixed binary for your scripts, it is also wise to store everything in configuration values.

Tool entries and filename patterns

Note

Because of the importance and complexity of both entry types, they are covered in their own section Tools and filenames.

These sections are started like this:

<processingTools>
    <tool name='compressionDetection' value='determineFileCompressor.sh' basepath='roddyTools'/>
    <tool name='createLockFiles' value='createLockFiles.sh' basepath='roddyTools'/>
    <tool name='streamBuffer' value='streamBuffer.sh' basepath='roddyTools'/>
    <tool name='wrapinScript' value='wrapInScript.sh' basepath='roddyTools'/>
    <tool name='nativeWorkflowScriptWrapper' value='nativeWorkflowScriptWrapper.sh' basepath='roddyTools'/>
</processingTools>
<filenames package='de.dkfz.roddy.knowledge.examples' filestagesbase='de.dkfz.roddy.knowledge.examples.SimpleFileStage'>
    <filename class='SimpleTestTextFile' onMethod='test1' pattern='${testOutputDirectory}/test_method_1.txt'/>
    <filename class='SimpleTestTextFile' onMethod='test2' pattern='${outputAnalysisBaseDirectory}/${testAOutputDirectory}/test_method_2.txt'/>
    <filename class='SimpleTestTextFile' onMethod='test3' pattern='${testInnerOutputDirectory}/test_method_3.txt'/>

    <filename class='FileWithChildren' onMethod='SimpleTestTextFile.testFWChildren' pattern='${testOutputDirectory}/filewithchildren.txt'/>
    <filename class='SimpleTestTextFile' onMethod='SimpleTestTextFile.testFWChildren' pattern='${testOutputDirectory}/test_method_child0.txt'/>
    <filename class='SimpleTestTextFile' onMethod='SimpleTestTextFile.testFWChildren' selectiontag="file1" pattern='${testOutputDirectory}/test_method_child1.txt'/>
</filenames>

They contain a list and resource definitions for included workflow tools and patterns to create filenames based on different rules.

Tool entry names are automatically converted to configuration variables. For this to work, you need to set the tool id in camel case notation: camelCase. If this is done, Roddy will convert the id e.g. to TOOL_CAMEL_CASE. For the above example, you’d get TOOL_COMPRESSION_DETECTION out of compressionDetection and e.g. TOOL_WRAPIN_SCRIPT, TOOL_CREATE_LOCK_FILES, TOOL_STREAM_BUFFER and finally TOOL_NATIVE_WORKFLOW_SCRIPT_WRAPPER.

Here comes a list of stuff taken from an old config file. It’s just taken over and not reworked. However, a lot of the possibilities for filename patterns is listed here:

<!-- Filenames are always stored in the pid's output folder -->
      <!-- Different variables can be used:
          - ${sourcefile}, use the name and the path of the file from which the new name is derived
          - ${sourcefileAtomic}, use the atomic name of which the file is derived
          - ${sourcefileAtomicPrefix,delimiter=".."}, use the atomic name's prefix (without file-ending like .txt/.paired.bam...
                                                      of which the file is derived, set the delimiter option to define the delimiter default is "_"
                                                      the delimiter has to be placed inside "" as this is used to find the delimiter!
          - ${sourcepath}, use the path in which the source file is stored
          - ${outputbasepath}, use the output path of the pid
          - ${[nameofdir]OutputDirectory}

          NOTICE: If you use options for a variable your are NOT allowed to use ","! It is used to recognize options.

          - ${pid}
          - ${sample}
          - ${run}
          - ${lane}
          - ${laneindex}
          - You can put in configuration values to do this use:
            ${cvalue,name=[name of the value],default=".."} where default is optional.
          - ${fileStageID} use the id String of the file's stage to build up the name.
          -->
      <!-- A filename can be derived from another file, use derivedFrom='shortClassName/longClassName'
           A filename can also be specified for a level, use fileStage='PID/SAMPLE/RUN/LANE/INDEXEDLANE', refer to BaseFile.FileStage
           A filename can be specified for all levels, the name is then build up with the ${fileStageID} value
           A filename can be created using the file's called method's name
           A filename can be created using the used tool's name
           -->

Special: Autofilenames and Autofiletypes

Just to mention it (it is also covered in detail in the full guide), Roddy supports some sort of autofilenames and types. This means, if you just want to get things running, you can specify a tool with input and output files. If no filename patterns and file classes exist, Roddy will take care of this for you. However, the autofilenames are not the nicest things to have, so you should go on and create rules, if needed.

Enumerations

Enumerations are there to specify data types and validators for configuration values.

<enumeration name='cvalueType' description='various types of configuration values' extends="">
  <value id='path' valueTag="de.dkfz.roddy.config.validation.FileSystemValidator" description="Value type is a file system path (fully or with wildcards like ~, *"/>
  <value id='bashArray' valueTag="de.dkfz.roddy.config.validation.BashValidator" description="A bash array."/>
  <value id='boolean' valueTag="de.dkfz.roddy.config.validation.DefaultValidator" description="A boolean value containing true or false."/>
  <value id='integer' valueTag="de.dkfz.roddy.config.validation.DefaultValidator" description="A positive or negative integer value."/>
  <value id='float' valueTag="de.dkfz.roddy.config.validation.DefaultValidator" description="A single precision floating point value."/>
  <value id='double' valueTag="de.dkfz.roddy.config.validation.DefaultValidator" description="A double precision floating point value."/>
  <value id='string' valueTag="de.dkfz.roddy.config.validation.DefaultValidator" description="The default type of no type is set. The value will be stored unchecked."/>
</enumeration>

Looking at the default configuration value type configuration, you can see e.g. that path objects are validated with the FileSystemValidator class.

Tools and filenames

The whole workflow structure in Roddy is built around files and filenames. Files are used to create dependencies between steps in the workflow and files also enable Roddy to rerun a workflow based on created files.

As Roddy strictly separates code and configuration, filenames are configured. Of course you are allowed to make exceptions for e.g. initial files but the standard is to create rules for filenames.

So how do you tie things up?

Filename patterns are used to define a single or a range of names for a file class.

File classes are used as input and output parameters for tool entries. Filename patterns are automatically applied to output files!

Tool entries tell Roddy how a script or a binary is called. Which files and parameters go in and which files come out and which resources will be used by jobs running this tool.

A complex tool entry will be shown at the end of this document.

Note

In our experience, it is a good way to create a workflow and its tools on a step by step base so that:

  1. You create a tool entry, define an initial resource set and i/o parameters.
  2. Integrate the call into your workflow.
  3. Setup filename patterns for the tools output files.
  4. Test the new tool with testrun and testrerun.
  5. Repeat the steps for the next tool.

Occasionally it might still be wise to remove the output data and test the whole workflow again.

Important

Remember, that Roddy does not feature job monitoring. The job structure, file names and patterns must be well known before the workflow starts!

Tool entries

<tool name='testScript' value='testScriptSleep.sh' basepath='roddyTests'>
  <resourcesets>
      <rset size="l" memory="1" cores="1" nodes="1" walltime="5"/>
  </resourcesets>
  <input type="file" typeof="SimpleTestTextFile" scriptparameter="FILENAME_IN"/>
  <output type="file" typeof="SimpleTestTextFile" scriptparameter="FILENAME_OUT"/>
</tool>

Each tool entry has a header:

<tool name='testScript' value='testScriptSleep.sh' basepath='roddyTests'>
  • The value of the name attribute is used to call or manage the tool in a workflow. Before a workflow starts, the names of all tools are converted to configuration values so that you will have easy access to them from your scripts. As explained in the configuration section, a job name will be converted from camel case notation to All caps notation using underscore as the word separator. In addition TOOL_ will be used as a prefix. So the tool name testScript would be named TOOL_TEST_SCRIPT in your job.
  • The value of the value attribute holds the script or binary name of the executed file.
  • The value of the basepath attribute points to the tools folder in the plugins analyisTools folder.

Important

You can, but you don’t have to add resource sets and input and ouput parameters to a job. If you omit resource sets, the job will run with default resource settings. They are explained below. If you omit input and output parameters, you need to take care of the job call by yourself. Normally, Roddy will take care of this for you. If you create a native workflow, you will lose the rerun feature, if you omit the output parameters! Omitting all these parameters might sometimes make sense, when you just want to get easy access to a tool in your analysisTools folder.

Resource sets

Each tool can have several resource sets.

<rset size="l" memory="1" cores="1" nodes="1" walltime="5"/>
  • The attribute size can be one of t, xs, s, m, l, xl and allows you to define resource sets for different cases. From extra small to extra large. t is a special case and can be used for test resources.

  • Currently Roddy (or BatchEuphoria) can be used to request the resources memory, cores, nodes and walltime You can set values in different formats:

    • The default for memory is 1GB. Valid strings for it are for example:
      • 1 (which is 1 GB)
      • 1m/g/t
      • 0.5(m/g/t) which would be 500MB
    • The default cores value is 1. Other values are natural numbers in [1; n]
    • The default nodes value is 1. Other values are natural numbers in [1; n]
    • The default walltime is 1 hour. Other values are for example:
      • 00:10:00 which would be 10 minutes.
      • 24:00:00 would be aligned to 01:00:00:00 which is one day. All other values will be aligned as well.
      • 1h, 1d, 1h50m … or other values in human readable format.

    Note

    The default size for resource sets used by Roddy is l

Input types

A tool can have different input objects:

  • Values, like strings or numbers:

    <input type="string" setby="callingCode" scriptparameter="SAMPLE"/>
    
    • The type attribute tells Roddy, that a string is expected.
    • The setby attribute tells Roddy, that the parameter will be set by the developer in the call of the job. Currently only callingCode is valid.
    • The scriptparameter value tells Roddy that a parameter with this name is passed to the job.
  • Single file objects like:

    <input type="file" typeof="de.dkfz.b080.co.files.LaneFile" scriptparameter="RAW_SEQUENCE_FILE" />
    <input type="file" typeof="BasicBamFile" scriptparameter="RAW_SEQUENCE_FILE" />
    
    • The type attribute tells Roddy that a file object is expected as input.
    • The typeof value tells Roddy the expected type of an input value. This check is done within the job call. If the type of the input object does not match, Roddy will fail. You’re allowed to omit the package structure. Roddy will try to find the class in its core code and in the plugin classes. If more than two classes match, Roddy will fail and tell you, that this happened.

    Important

    You are allowed to put in a non-existent class! If Roddy cannot find the class, it will create a synthetic class during runtime. This way, you can skip code creation and keep your code lean. You are allowed to use this class like any other class. However, you are not able to use the class directly in your Java code.

    • Like above, the scriptparameter value tells Roddy that a parameter with this name is passed to the job.
  • File groups:

    File groups are collections of file objects. By default, file groups are designed to store files of the same type.

    <input type="filegroup" typeof="de.dkfz.b080.co.files.BamFileGroup" scriptparameter="INPUT_FILES" passas="array"/>
    <input type="filegroup" typeof="GenericFileGroup" scriptparameter="INPUT_FILES2" passas="array"/>
    
    • Set the type to filegroup if you want to use it.
    • typeof behaves nearly the same as for file input definitions. However, here you need to put in a file group class. If you do not need a specialized or named file group, you can use the GenericFileGroup class.
    • TODO: classOfContainedFiles
    • The passas attribute defines, how the files in the file group are passed to your job. Allowed values are:
      • parameters which will tell Roddy to create a parameter for each file in the group.
      • array which will tell Roddy to pass the files as an array in a single string.
    • The scriptparameter behaves nearly like the one for files. If you set array, the parameter name will be used like it is. If you set parameters it will be used as a prefix and the .

Important

The order of the input parameters matters, when you pass parameters to a job. Roddy will check this and fail, if:

  • the number of input parameters does not match
  • the type of input parameters does not match

Output types

The output of a Roddy job is always a file or a group of files. Moreover, you are only allowed to have one top-level output object in the XML description, but this object might be one which holds other objects like the mentionend file groups.

If your tool does not create output files you can omit those entries. However, it might still be wise to create some sort of checkpoint for the tool so that Roddys rerun feature will work properly. The syntax for output objects is quite similar to the syntax for input objects, so we’ll skip explanations for known attributes. Valid output objects are:

  • Single file objects:

    The single output file syntax is the same like for input files. Just change the tag name to output.

    <output type="file" typeof="de.dkfz.b080.co.files.BamFile" scriptparameter="FILENAME" />
    
  • Files with children:

    Files with children are a bit special. They are necessary, if you want to create a file which has some children. The main difference to single files is, that you need to create a class file! Then, for each file you want as a child, you need to create the field and the set / get accessors. We use this feature only in a handful of cases.

    <output type="file" typeof="BasicBamFile" scriptparameter="FILENAME">
      <output type="file" variable="indexFile" typeof="BamIndexFile" scriptparameter="FILENAME_INDEX"/>
    </output>
    

    The example shows an output entry with one child. You can add more children, if you need.

    The variable attribute tells Roddy which field in the parent class is used to store the created child.

  • Tuples of files:

    Tuples of files are the easiest way to create collections of file objects. It does not matter which types the files have.

    <output type="tuple">
      <output type="file" typeof="BasicBamFile" scriptparameter="FILENAME_BAM"/>
      <output type="file" typeof="BamIndexFile" scriptparameter="FILENAME_INDEX"/>
    </output>
    

    Call in Java code

    // Call with output tuple
    Tuple2 fileTuple = (Tuple2) call("testScriptWithMultiOut", someFile)
    
    // Access output tuple children
    (BasicBamFile)fileTuple.value0
    (BamIndexFile)fileTuple.value1
    
  • File groups:

    Output file groups offer a lot more options than input file groups. This

    <output type="filegroup" typeof="GenericFileGroup">
      <output type="file" typeof="" scriptparameter="BAM1"/>
      <output type="file" typeof="" scriptparameter="BAM2"/>
      <output type="file" typeof="" scriptparameter="BAM3"/>
    </output>
    

Filename patterns

Filenames in Roddy are rule based. They are defined in the filenames section in your XML file.

<filenames package='de.dkfz.roddy.knowledge.examples' filestagesbase='de.dkfz.roddy.knowledge.examples.SimpleFileStage'>
  <filename class='SimpleTestTextFile' onTool='testScript' pattern='${testOutputDirectory}/test_onScript_1.txt'/>
  <filename class='SimpleMultiOutFile' onTool="testScriptWithMultiOut" selectiontag="mout1" pattern="${testOutputDirectory}/test_mout_a.txt" />
  <filename class='SimpleMultiOutFile' onTool="testScriptWithMultiOut" selectiontag="mout2" pattern="${testOutputDirectory}/test_mout_b.txt" />
  <filename class='SimpleMultiOutFile' onTool="testScriptWithMultiOut" selectiontag="mout3" pattern="${testOutputDirectory}/test_mout_c.txt" />
  <filename class='SimpleMultiOutFile' onTool="testScriptWithMultiOut" selectiontag="mout4" pattern="${testOutputDirectory}/test_mout_d.txt" />
</filenames>

There are several types of triggers for patterns available. Patterns are always linked to a particular class. By applying the selectiontag attribute to some of the trigger types, you gain a more fine grained control over pattern selection, if you define output objects of the same class multiple times in a tool.

onScriptParameter trigger

This trigger type links the pattern to the scriptparameter attribute of an output object. Valid trigger values are:

  • [parameter name] - where parameter name is linked to the scriptparameter attribute. The trigger is valid for all tools.
  • :[parameter name] - behaves like above.
  • [ANY]:[parameter name] - behaves like above. This is the long form and [ANY] is meant to make the syntax more readable.
  • [tool id]:[parameter name] - behaves like above, except that tool id restricts the trigger to exactly one tool.

This trigger type will NOT accept the selectiontag attribute.

onMethod trigger

This trigger links the pattern to a method name or a class and a method name. Roddy will search all called methods using the current Threads stack trace. The search will stop, as soon as the execute method is reached. Valid values are:

  • [methodName] - by specifying only a method name, the pattern will be used for any called method with this name.
  • [simple class name].[methodName] - this will accept all methods in classes with the given class name. The class package will be ignored.
  • [full class name].[methodName] - by setting the class and the package, this pattern will only be applied with a full match.

This trigger type will accept the selectiontag attribute.

onToolID trigger

This trigger will link the pattern to a tool call. If this tool is called and outputs a file of the given class then this pattern might be used.

This trigger type will accept the selectiontag attribute.

derivedfrom trigger

In some cases the name of a new file depends on the name of a parent file, e.g. a Bam Index file depends on a Bam file like DATASET_TIMESTAMP.merged.bam -> DATASET_TIMESTAMP.merged.bam.bai.

This trigger type will accept the selectiontag attribute.

generic

To be done… we hardly use it.

Important

Filename patterns are evaluated in a specific order!

  1. First by the type
  • onScriptParameter -> onMethod -> onToolID -> derivedFrom -> generic
  1. By the order in the configuration. First come first serve!
"<filename class='TestFileWithParent' derivedFrom='TestParentFile' pattern='/tmp/onderivedFile'/>"
"<filename class='TestFileWithParent' derivedFrom='TestParentFile' pattern='/tmp/onderivedFile'/>"
"<filename class='TestFileWithParentArr' derivedFrom='TestParentFile[2]' pattern='/tmp/onderivedFile'/>"
"<filename class='TestFileOnMethod' onMethod='de.dkfz.roddy.knowledge.files.BaseFile.getFilename' pattern='/tmp/onMethod'/>"
"<filename class='TestFileOnMethod' onMethod='BaseFile.getFilename' pattern='/tmp/onMethodwithClassName'/>"
"<filename class='TestFileOnMethod' onMethod='getFilename' pattern='/tmp/onMethod'/>"
"<filename class='TestFileOnTool' onTool='testScript' pattern='/tmp/onTool'/>"
"<filename class='FileWithFileStage' fileStage=\"GENERIC\" pattern='/tmp/filestage'/>"
"<filename class='TestOnScriptParameter' onScriptParameter='testScript:BAM_INDEX_FILE' pattern='/tmp/onScript' />"
"<filename class='TestOnScriptParameter' onScriptParameter='BAM_INDEX_FILE2' pattern='/tmp/onScript' />"
"<filename class='TestOnScriptParameter' onScriptParameter=':BAM_INDEX_FILE3' pattern='/tmp/onScript' />"
"<filename class='TestOnScriptParameter' onScriptParameter='[ANY]:BAM_INDEX_FILE4' pattern='/tmp/onScript' />"
"<filename class='TestOnScriptParameter' onScriptParameter='[AffY]:BAM_INDEX_FILE5' pattern='/tmp/onScript' />" // Error!!
"<filename onScriptParameter='testScript:BAM_INDEX_FILE6' pattern='/tmp/onScript' />"

Automatic filenames

Synthetic classes

Synthetic classes are a mechanism which allows you to use Roddys built-in type checking system without the need to create class files. Synthetic classes are automatically created during runtime in the following cases:

  • A filename pattern requires a specific non-existent class.

  • A tool i/o parameter needs a specific non-existent class.

  • Programmatically, if you request Roddy to load a non-existent class with the LibrariesFactory:

    LibrariesFactory.getInstance().loadRealOrSyntheticClass(String classOfFileObject, String baseClassOfFileObject)
    LibrariesFactory.getInstance().loadRealOrSyntheticClass(String classOfFileObject, Class<FileObject> constructorClass)
    LibrariesFactory.getInstance().forceLoadSyntheticClassOrFail(String classOfFileObject, Class<FileObject> constructorClass = BaseFile.class)
    LibrariesFactory.getInstance().generateSyntheticFileClassWithParentClass(String syntheticClassName, String constructorClassName, GroovyClassLoader classLoader = null)
    

    or via the ClassLoaderHelper

    LibrariesFactory.getInstance().getClassLoaderHelper().loadRealOrSyntheticClass(String classOfFileObject, String baseClassOfFileObject)
    LibrariesFactory.getInstance().getClassLoaderHelper().loadRealOrSyntheticClass(String classOfFileObject, Class<FileObject> constructorClass)
    LibrariesFactory.getInstance().getClassLoaderHelper().generateSyntheticFileClassWithParentClass(String syntheticClassName, String constructorClassName, GroovyClassLoader classLoader = null)
    

Example tool entry and filename patterns

<a/>

Overriding tool entries

Sometimes, the initial specification might not be right for you. In this case, you are always allowed to override the existing tool entry. There are basically two ways: Override the resource sets only or redefine the whole tool.

If you want to override the whole tool, just do it. The only thing to remember is, that you probably have to match the in and output parameter count or even the types and you have to make sure, that you put the new tool definition to the proper level in your configuration file hierarchy.

<tool name='testScript' value='testScriptSleep.sh' basepath='roddyTests'>
  <resourcesets>
      <rset size="l" memory="1" cores="1" nodes="1" walltime="5"/>
  </resourcesets>
  <input type="file" typeof="SimpleTestTextFile" scriptparameter="FILENAME_IN"/>
  <output type="file" typeof="SimpleTestTextFile" scriptparameter="FILENAME_OUT"/>
</tool>

Now, if you just need to adapt the resources, you can use the overrideresourcesets*=*”true”** attribute.

<tool name='testScript' value='testScriptSleep.sh' basepath='roddyTests' overrideresourcesets="true">
  <resourcesets>
      <rset size="l" memory="1" cores="1" nodes="1" walltime="5"/>
  </resourcesets>
</tool>

The in- and output entries will be inherited and you’ll have your tools setup with the new resources. Be aware that all of the old resource entries will void!

Runtime configuration files

Freshly created for each workflow start.

Developing Roddy

Application structure

Developers guide

Code guidelines

Roddy has no specific development or code style. Here, we try to collect topics and settings, where we think that they might be important.

Code Format ^^^^^^^^^^~ We are mainly using IntelliJ IDEA and use the default settings for code formatting.

Collections as return types

By default, we do not return a copy (neither shallow, nor deep) of the Collection object. Be careful, not to modify the collection, if you do not change the contents of the object.

Keep it clean and simple

We do not really enforce rules, but we try to keep things simple and readable.

  • If a code block is not readable, try to make a method out of it.
  • Reduce size and complexity of methods.
  • Your code should be self explanatory. If it is not, try to make it that way.

We know, that we have a lot of issues in our codebase, but we listen to every improvement suggestion and constantly try to improve things.

Development model

For development we follow the standard git flow with feature branches getting merged into the develop branch and merge into master branch upon release. Currently we are discussing if we remove the development branch. Roddys versioning system makes it easy to go back to previous versions.

Settings for Groovy classes

We will not accept Groovy classes without the @CompileStatic annotation.

Roddy versioning scheme

Roddy version numbers consist of three entries: [major].[minor].[build]. These are added to the repository for releases.

The [major] entry is used to mark huge changes in the Roddy core functions. Backward compatibility is most likely not granted and Roddy will not execute plugins built with different [major] versions.

The [minor] entry marks smaller changes which might affect your plugin. Backward compatibility might be affected and Roddy will warn you when a plugin was built with another [minor] version. Only decrease this value, when you increase the [major] version.

The [build] number is automatically increased when Roddy is packed or compiled. You should only lower the build number, if you increase either the [major] or [minor] version.

The combination of [major].[minor] can somehow be seen as the API level of Roddy. For a “full API level” the plugin versions of “PluginBase” and “DefaultPlugin” need to be considered as well.

If we have to maintain old plugin version with bugfixes or feature backports for specific projects in production, then we extend the tag to a full branch called “ReleaseBranch_$major.$minor.$build and tag the subversions with a “-$revision” suffix.

Below, you’ll find, how things are (or are supposed to be) handled in git.

How to get started

Have you already checked out the Installation guide? If not, please do so and do not forget to use the developer settings instead of the user settings.

Repository Structure

/
roddy.sh                                          Top-level script
./RoddyCore                                       The core project
    buildversion.txt                              Current buildversion
    Java/Groovy sources
dist/
    bin/
        current/
        $major.$minor.$build
    plugins/
    plugins_R$major.$minor
    runtimeDevel
        groovy-$major.$minor.$build
        jdk, jre, jdk_$major.$minor._$revision

Compiling Roddy

Currently, the compilation & packaging is implemented in the top-level roddy.sh script that itself calls a number of scripts in the dist/bin/current/helperScripts directory. On the long run we will probably implement a Gradle-based re-implementation of the workflow.

Compiling Roddy is easy:

bash roddy.sh compile

Will compile a new “current” version.

Packing Roddy

Similar to compile, Roddy has a pack option:

bash roddy.sh pack

Will pack current to a directory called $major.$minor.$build.

Plugin development

Plugin developers guide

This page should give you an idea and proper knowledge to start your own Roddy based workflows.

Initially you should at least read the “Where to start” section. Afterwards you can decide if you either want:

Please read the installationGuide if you do not have a readily installed version.

If you just need a quickstart or a short repetition, you can read Workflow development primer

Select the workflow type

Before you create a new workflow, you have to decide, which type of workflow you want to create:

  1. Java / Groovy or other JVM based plugins. We will call them JVM plugins.
  2. Brawl
  3. Bash or other native workflows like e.g. Python or Perl based

and if you want to create a new plugin or extend an existing plugin. Of course, you can have a mix of workflows in a plugin at a later stage.

We are discussing, if we will support CWL based workflows.

Common plugin setup

_images/img_pluginOverview.png

Roddy plugins are normally strictly organized. An exception to this structure are full native plugins. But as these special plugins get converted to the default structure, finally all plugins are organized this way.

The plugins folder name is built up in the following way:

PluginName_1.0.111-1

Note

The standard Roddy versioning scheme also applies to the plugin versioning scheme which is [major].[minor].[build] and extends it by the revision to [major].[minor].[build]-[revision].

where:

  • PluginName is the name of the plugin
  • _1.0.111 is the version of the plugin, this is not necessarily the same as the entry in the buildversion.txt file. If you omit this entry, the plugin version is current by default!
  • -1 is the revision of the plugin. if you only have smaller changes, you can increase the revision number of the new plugin and Roddy is able to select the revised plugin instead of the former revision. You can omit this entry and Roddy will set the revision to -0 internally. Please be aware: * The revision is only valid, if you set the version! It is not valid for plugins marked as current.
    • You are also not allowed to set current as the plugin version!

There are some main components for any plugin and files for the contained workflows.

  1. The buildversion.txt file contains the build number of the plugin. This number will get increased, if you pack or compile the plugin. The file contains exactly two lines:

    Major.Minor
    Build
    

    e.g.

    1.0
    182
    
  2. The buildinfo.txt file contains information about:

  • The Roddy API level, which is e.g. 2.3 or 2.4
  • The Java version API level
  • The groovy API level

furthermore, it contains information about dependencies to other plugins and compatibility entries.

One example:

dependson=PluginBase:1.0.29
dependson=COWorkflows:1.1.20-1
dependson=DefaultPlugin:1.0.34
JDKVersion=1.8
GroovyVersion=2.4
RoddyAPIVersion=2.4

This plugin depends on three other plugins with specific version. For development, it is possible to set current for the version number. The plugin also depends on JDK version 1.8.*/8.*, Groovy version 2.4.* and the Roddy version 2.4.*. If you do not develop a Java based plugin, you can omit JDKVersion and GroovyVersion.

  1. The resources directory which contains:
  • The analysisTools directory, which is populated with several tool folders, e.g.

    13:45 $ ll analysisTools/
    insgesamt 8
    ... 4096 26. Jun 13:47 roddyNativeTools
    ... 4096 13. Jul 16:20 roddyTools
    

    The names of the tool folders will be used as the basepath entry for tool entries in your workflow configuration file.

  • The configurationFiles directory which contains one or more configuration files. Workflow configuration files need the prefix analysis, e.g. analysisTest.xml.

  • If you use Brawl workflows, you will store your Brawl files inside the folder brawlWorkflows.

  1. The src folder for e.g. Java classes. Of course, you are free to change this and have the code organized in your own way. We tend to keep it like this.
  2. The jar file, which is named after the plugin name. The jar file is only needed, if you create Java based workflows.

Important

The build* files and the analysisTools and configurationFiles folders are mandatory! If you do not create them, the plugin will not be loaded by Roddy.

Populating your plugin

Now it is time to populate your plugin with files, configuration files and resources. The common settings are explained in this document, plugin specific settings are explained separetely.

As noted before, you need to create at least a plugin folder with a valid name, the buildinfo and the buildversion text files and both subfolders in resources.

Important

JVM workflows offer the highest amount of access to the Roddy API. Roddys API concepts will be explained in the description of JVM workflows. However you are allowed to mix workflow types in a plugin.

Let Roddy help you

Call Roddy like this:

bash roddy.sh createnewworkflow PluginID[:dependencyPlugin] [native|brawl:]WorkflowID
  • Set PluginID to either an existing or a new Plugin.
  • Set dependencyPlugin to a parent plugin
  • Select if you want a Java, a native (Bash) or a Brawl workflow
  • Finally, set the workflows name with at WorkflowID

So e.g. create a Java workflow called FirstWorkflow in a plugin called NewPlugin:

bash roddy.sh createnewworkflow NewPlugin FirstWorkflow

or e.g. create a Brawl workflow called SecondWorkflow in another plugin and set it to depend on NewPlugin:

bash roddy.sh createnewworkflow AnotherPlugin:NewPlugin SecondWorkflow

*Oh I have something new now… but where is it?*

Good question, that totally depends on your application ini file and the setup plugin directories. So look up the file and take a look into all configured directories.

Workflow development primer

Following the instructions on this page, you should be able to setup and run a basic workflow within ten minutes. At the end of this page you’ll find all commands in one code block. This guide estimates, that you will be developing for Roddy version 2.4.x and that you will create a JVM based workflow.

1. Setup a plugins folder

The plugins folder is the folder, where you will store your (self-created) plugins.

mkdir ~/RoddyPlugins

2. Prepare the plugin folder

Now create a folder in which you will store your new plugin.

cd ~/RoddyPlugins
mkdir NewPlugin
cd NewPlugin

3. Create the first files and folders

This will create the basic structure which is necessary for your plugin. See the Plugin developers guide for more information about plugin structures. We are

mkdir -p resources/analysisTools/workflowTools
mkdir -p resources/configurationFiles

echo 0.0 > buildversion.txt
echo 0 >> buildversion.txt

echo "dependson=PluginBase:1.0.29" > buildversion.txt
echo "dependson=DefaultPlugin:1.0.34" > buildversion.txt
echo "RoddyAPIVersion=2.4" > buildversion.txt
echo "JDKVersion=1.8" >> buildversion.txt
echo "GroovyVersion=2.4" >> buildversion.txt

4. Create the src folder and the inital java package

We’ll use our package structure for this example, change it as you need it. You’ll need the src structure, if you want to compile the plugin using Roddy.

mkdir -p src/de/dkfz/roddy/newplugin
cd src/de/dkfz/roddy/newplugin

In this directory, create the file NewPlugin.java and put in the following code.

package de.dkfz.roddy.newplugin;

import de.dkfz.roddy.plugins.BasePlugin;

public class TestPlugin extends BasePlugin {
    public static final String CURRENT_VERSION_STRING = "0.0.0";
    public static final String CURRENT_VERSION_BUILD_DATE = "NotBuildYet";

    @Override
    public String getVersionInfo() {
        return "Roddy plugin: " + this.getClass().getName() + ", V " + CURRENT_VERSION_STRING + " built at " + CURRENT_VERSION_BUILD_DATE;
    }
}

There you are, next step is…

5. Create a workflow class

In this directory, create the file NewWorkflow.java and put in the following code.

package de.dkfz.roddy.newplugin;

import de.dkfz.roddy.core.ExecutionContext;
import de.dkfz.roddy.core.Workflow;

public class NewWorkflow extends Workflow {
    @Override
    public boolean execute(ExecutionContext context) {
        return true;
    }
}

6. Create your analysis XML file

The next step is the creation of your analysis XML file, which will make the workflow available to Roddy. If the XML file is setup properly, you can import the analysis in your project configuration or call it in configuration free mode.

cd ~/RoddyPlugins/NewPlugin/resources/configurationFiles
<configuration name='newAnalysis' description=''
           configurationType='analysis'
           class='de.dkfz.roddy.core.Analysis'
           workflowClass='de.dkfz.roddy.newplugin.NewWorkflow'
           runtimeServiceClass="de.dkfz.roddy.core.RuntimeService"
           listOfUsedTools="testScript" usedToolFolders="workflowTools">
  <configurationvalues>
    <cvalue name="firstValue" value="FillIt" type="string" />
    <cvalue name="testOutputDirectory" value="${outputAnalysisBaseDirectory}/testfiles" type="path"/>
  </configurationvalues>
  <processingTools>
    <tool name='testScript' value='testScriptSleep.sh' basepath='workflowTools'>
      <resourcesets>
        <rset size="l" memory="1" cores="1" nodes="1" walltime="5"/>
      </resourcesets>
      <input type="file" typeof="SimpleTestTextFile" scriptparameter="FILENAME_IN"/>
      <output type="file" typeof="SimpleTestTextFile" scriptparameter="FILENAME_OUT"/>
    </tool>
  </processingTools>
  <filenames package='de.dkfz.roddy.knowledge.examples' filestagesbase='de.dkfz.roddy.knowledge.examples.SimpleFileStage'>
    <filename class='SimpleTestTextFile' onTool='testScript' pattern='${testOutputDirectory}/test_onScript_1.txt'/>
  </filenames>
</configuration>

There you are. You now have a tool which you can call from your workflow.

7. Extend the workflow

Open up the workflow class again and change the execute method so that it calls the tool “testScript”. For that to work, you need to load one SimpleTestTextFile.

public boolean execute(ExecutionContext context) {

    SimpleTestTextFile textFile = (SimpleTestTextFile)loadSourceFile("/tmp/someTextFile.txt");
    SimpleTestTextFile result = call("testScript", textFile);
    return true;
}

Successful Roddy workflows will return true. If you detect an error, you can return false or throw an exception. Only one thing is missing, before you try out your new workflow.

8. Create the first script

cd ~/RoddyPlugins/NewPlugin/resources/analysisTools/workflowTools

echo 'source ${CONFIG_FILE}' > testScriptSleep.sh
echo '' > testScriptSleep.sh
echo 'sleep 10' > testScriptSleep.sh
echo 'cat $FILENAME_IN > $FILENAME_OUT' > testScriptSleep.sh

chmod 770 testScriptSleep.sh

9. Create a new properties file for Roddy

There is a skeleton application properties file in your Roddy folder. Copy the file [RODDY]/dist/bin/current/helperScripts/skeletonAppProperties.ini to a location of your choice. Open it and add the folder ~/RoddyPlugins to the pluginDirectories entry. Also change the jobManager class to DirectSynchronousExecutedJobManager. Just comment the currently active line and uncomment the new jobManager.

10. Last steps

The last step you need to do is to compile and run the script. We’ll stick to the configuration free mode here. Project configuration files are explained in Configuration files. If you use a project configuration file, put in a directory of your choice (e.g. where you put your ini file from the step before).

[RODDY_DIRECTORY]/roddy.sh compileplugin NewPlugin --c=[YOUR_INI_FILE]

If you sticked to the example code, everything should be fine now and you can call it.

[RODDY_DIRECTORY]/roddy.sh testrun NewPlugin_current:test --c=[YOUR_INI_FILE]

Command code block

JVM plugins

Java or Groovy based plugins are the default plugin type for Roddy, as both provide a lot of checks when the plugin is build. E.g. variable type errors and misspelled variables. Brawl based workflows will be converted to Groovy workflows during runtime. Here we will focus on the development of a new empty plugin. All you need is the basic setup described in pluginDevelopersGuide. The code shown here can be found in the TestPluginWithJarFile plugin.

Note

There are some basic and test workflows available in the Roddy distribution folder. You can always take a look at them, if you need some examples.

Initial workflow

To start the development, you need to setup a package structure and put in a class which extends the Workflow class and an initial analysis configuration file.

Here comes the Java workflow class:

package de.dkfz.roddy.knowledge.examples;

import de.dkfz.roddy.core.Workflow;

class SimpleWorkflow extends Workflow {
  @Override
  public boolean execute(ExecutionContext context) {
  }
}

What you can see is a workflow class which overrides the execution method from Workflow. There are other methods which you can override or use:

  • checkExecutability - which returns a boolean value and

And here is the initial XML file:

<configuration name='testAnalysis' description=''
 configurationType='analysis'
 class='de.dkfz.roddy.core.Analysis'
 workflowClass='de.dkfz.roddy.knowledge.examples.SimpleWorkflow'
 runtimeServiceClass="de.dkfz.roddy.knowledge.examples.SimpleRuntimeService"
 listOfUsedTools=""
 usedToolFolders="devel"
 cleanupScript="cleanupScript">

</configuration>

What you have to do here is to set:

  • The name attribute -> This is used as the analysis identifier.
  • The workflowClass attribute -> This is the workflow class which we created above.
  • And finally the runtimeServiceClass -> This class

That’s it! This workflow could already be run though it would not produce any files.

Now, let’s extend the workflow to call a tool. At first we need to get some files from storage with which we can work. Roddy works with explicitely defined dependencies. Job dependencies are automatically created, when an output file is used as an input to another job. Initially we do not have any files, so we need to get at least one from storage.

package de.dkfz.roddy.knowledge.examples;

import de.dkfz.roddy.core.Workflow;

class SimpleWorkflow extends Workflow {

  BaseFile createInitialTextFile(ExecutionContext ec) {
      BaseFile tf = BaseFile.constructSourceFile(
          new File(ec.runtimeService.getOutputFolderForDataSetAndAnalysis(ec.getDataSet(),ec.getAnalysis()).getAbsolutePath(),
            "textBase.txt"),
          ec,
          new SimpleFileStageSettings(ec.getDataSet(), "100", "R001"),
        null)
      )
      if (!FileSystemAccessProvider.getInstance().checkFile(tf.getPath()))
          FileSystemAccessProvider.getInstance().createFileWithDefaultAccessRights(true, tf.getPath(), ec, true)
      return tf
  }

  @Override
  public boolean execute(ExecutionContext context) {

  }
}
package de.dkfz.roddy.knowledge.examples;

import de.dkfz.roddy.core.ExecutionContext;
import de.dkfz.roddy.core.Workflow;
import de.dkfz.roddy.knowledge.files.Tuple4;

/**
 */
public class TestWorkflow extends Workflow {
    @Override
    public boolean execute(ExecutionContext context) {
        SimpleRuntimeService srs = (SimpleRuntimeService) context.getRuntimeService();
        SimpleTestTextFile initialTextFile = srs.createInitialTextFile(context);
        SimpleTestTextFile textFile1 = initialTextFile.test1();
        FileWithChildren fileWithChildren = initialTextFile.testFWChildren();
        SimpleTestTextFile textFile2 = textFile1.test2();
        SimpleTestTextFile textFile3 = textFile2.test3();
        Tuple4 mout = (Tuple4) call("testScriptWithMultiOut", textFile3);
        return true;
    }
}

Brawl plugins

Native plugins

A How-To to Roddy Job scripts

Structure

Tool IDs

Job environment

Support for Modules / environments

The Roddy WMS

What is Roddy

Roddy is a framework for development and management of script based workflows on a batch processing cluster.

You can find the Roddy source code and its releases on our GitHub project site

Key Features

Roddy has several key features which make it a good choice to be used as a base for workflows:

  • Multi-Level configuration system
  • Modular application design
  • A variety of supported workflows
  • Access to several cluster backends
  • Callable stand-alone or integratable in other applications
  • Different versions of plugins/workflows and the Roddy core application are handled in a single installation
  • Only a few dependencies and no database for the Roddy core application necessary
  • Various execution modes to support users to get their work done faster

The multi-layer configuration system and the handling of plugin versions make Roddy particularly well suited for multi-user, multi-project environments.

Where to start?

Take a look at the example workflow package: Example workflow

Do you want to use it to run existing workflows? Then head over to the Users guide

Do you want to develop it? See the Developers guide

Do you want to develop workflows with it? Open up the Plugin developers guide

Do you have questions? Please visit the F.A.Q. section in our GitHub Wiki

License and associated projects

Roddy is offered under and MIT based license.

We extracted two possibly helpful open source libraries, again under MIT license:

  • RoddyToolLib is a Java / Groovy library which provides several tools used in BatchEuphoria and Roddy. See the project description for more information.
  • BatchEuphoria is a Java / Groovy library designed to offer easy access to cluster systems. Currently supported are PBS, SGE and LSF Rest