edit · print · PDF

Please note that all the SIEpedia's articles address specific issues or questions raised by IAC users, so they do not attempt to be rigorous or exhaustive, and may or may not be useful or applicable in different or more general contexts.

How to run IDL jobs with Condor unencumbered by IDL licences

A recurring question to us has been whether IDL jobs can be run with Condor. So far, our use of IDL with Condor was limited by the number of available licenses at any given time (which meant that perhaps you could run 50-60 jobs simultaneously), until we discovered the IDL Virtual Machine which lets you run an IDL "executable" file without the need for licenses (most of you probably know the necessary steps to create a SAVE file, but if in doubt see here for an example on how to create such a file).

The problem is that the IDL Virtual Machine is meant to be run interactively in a server with X running and Condor is not particularly well suited for this. But you can manage it with a little ingenuity. We have written a little program to take care of all the details and overall it works without any problems, and now we can submit hundreds of IDL jobs simultaneously to our Condor pool! Read on for all the details...

Submitting a job to Condor at the IAC (for the impatient)

If you are at the IAC and you just want to submit an IDL job to Condor, but don't care about how it all works, this section is for you. All you will need to do is:

  • Modify your IDL program so that it will take an argument (from 0 to the number of jobs you want to submit with Condor) and act according to that argument. A sample IDL program to illustrate this could be:
PRO SUBS

args = command_line_args ()

print, 'Original argument   ', args(0)
print, 'Modified   ', args(0)*2

print, 'Wasting ', args(0), ' seconds'
wait, args(0)

print, 'I (IDL) have finished...'
END
  • Create a SAVE file from it. For this follow the steps in here.
  • Verify that this works with the IDL Virtual Machine without Condor (the IDL Virtual Machine will show you a Splash screen, where you will have to press the button "Click to Continue", and which then will proceed with the execution of the program).
[angel@vil ~]$ idl -vm=subs.sav -args 10
IDL Version 7.1 (linux x86_64 m64). (c) 2009, ITT Visual Information Solutions

Original argument   10
Modified         20
Wasting 10 seconds
I (IDL) have finished...
[angel@vil ~]$
  • Write the Condor submit file. If you are new to Condor, you might need to look at our Condor Primer (in Spanish) or the Condor manual. In the following example you will need to modify:
    • The arguments line, which has 4 items: the first one is the path to the SAVE file; the second one is the argument to pass to it; the third one is 1 if you use a left-handed mouse, and 0 otherwise; and the fourth one is 1 if you want verbose messages for debugging, or 0 otherwise)
    • NOTE: leave the line "next_job_start_delay = 1"
Executable   = /home/condor/SIE/idlvm_with_condor.sh
Universe     = vanilla                   
arguments    = /home/angel/subs.sav $(Process) 1 1
output       = $(Process).out
error        = $(Process).error             
Log          = idl-vm.log   
Notification = error                                               
next_job_start_delay = 1       

queue 100
  • Submit it to Condor and go for a cup of coffee while the programs are executed...

How is it all done?

All the real work to avoid having to press the "Click to continue" button in all the virtual machines is done by the alpha-version idlvm_with_condor.sh script. This script makes use of: Xvfb to create a virtual X11 server where the IDL splash screen will be created (but without showing anything in the screen); and xautomation to automatically press the button for you. The script has to take care of two important things: how to create several virtual X servers on multicore machines without conflicting with each other; and how to cleanly kill all processes when Condor wants to reclaim the machine for its "owner" before the IDL code has finished. The script is still work in progress (since some things could be performed probably more efficiently), but in its present form seems to work pretty well (let me know if you have any trouble with it). The script is:

#!/bin/bash                                                                                                                                                 

###### Script to run an IDL executable file (a SAVE file) in the IDL Virtual Machine
###### with Condor.     

###### Written by Angel de Vicente - 2009/10/26     

###### Usage:   
###### /home/condor/SIE/idlvm_with_condor.sh idl_prog argument zurdo verbose       
###### Example:                                                                                 
###### /home/condor/SIE/idlvm_with_condor.sh /home/angelv/test.sav 10 1 1     
######                                                will press button as a left-handed person, and will print messages of its progress,           
######                                                and will print debugging messages. 

XVFB_BIN="/home/condor/SIE/Xvfb"
XTE_BIN="/home/condor/SIE/xte"

## This allows for job control inside the script                                                                                                             
set -o monitor

##                                                                                                                                                           
if [ $3 -eq 1 ]; then
mousebutton=3
else
mousebutton=1
fi

if [ $4 -eq 1 ]; then
echo "Running on machine `uname -a`"
fi

## When we do a condor_rm or when the job is evicted, a SIGTERM to the executable file           
## (i.e., this script is issued, so we make sure we catch that signal, and then kill the     
## virtual X and the IDL Virtual Machine   
trap cleanup SIGINT SIGTERM SIGTSTP

function cleanup ()
{
kill %2
if [ $4 -eq 1 ]; then
echo "IDL Terminated"
fi

sleep 1

kill %1
if [ $4 -eq 1 ]; then
echo "Xvfb killed"
fi

exit
}


## Find free server number           

## A cheap way of avoiding two Condor processes in the same (multicore) machine to have a race condition   
## and ending up with the same server number is to sleep a random number of seconds before trying to find   
## which server number is free                                         
## NOT ROBUST ENOUGH AND A BIT WASTEFUL. SHOULD FIND A BETTER WAY OF DOING THIS 
##                                                                                                                                                           
## We comment this out, assuming the submit Condor file has next_job_start_delay = 1     
#RANGE=10                                                                           
#number=$RANDOM                                                                           
#let "number %= $RANGE"                                                                       
#if [ $4 -eq 1 ]; then                 
#echo "Sleeping $number seconds"                                         
#fi                                                                             
#sleep $number                                                                               

## Find the free number   
i=1
while [ -f /tmp/.X$i-lock ]; do
        i=$(($i + 1))

if [ $i -eq 10 ]; then
    i=1
 if [ $4 -eq 1 ]; then
 echo "No servers available under 10. Waiting 5 minutes..."
 sleep 300
 fi
fi
done


$XVFB_BIN :$i -screen 0 487x299x16 &
sleep 5
export DISPLAY=":$i.0"
if [ $4 -eq 1 ]; then
echo "Virtual X Server $i created"
fi

idl -vm=$1 -args $2 &
sleep 10
if [ $4 -eq 1 ]; then
echo "IDL Virtual Machine started"
fi

$XTE_BIN 'mousemove 394 235'
$XTE_BIN "mouseclick $mousebutton"
if [ $4 -eq 1 ]; then
echo "Click to continue pressed"
fi


if [ $4 -eq 1 ]; then
echo "Waiting for IDL"
fi
wait %2

if [ $4 -eq 1 ]; then
echo "IDL Finished"
fi

sleep 2

kill %1
if [ $4 -eq 1 ]; then
echo "Xvfb killed"
fi

Comments or suggestions to improve this guide can be sent to

Section: HOWTOs

edit · print · PDF
Page last modified on November 17, 2009, at 11:08 AM