edit · print · PDF

Please note that all the SIEpedia's articles address specific issues or questions raised by IAC users, so they do not attempt to be rigorous or exhaustive, and may or may not be useful or applicable in different or more general contexts.

HTCondor submit files (description and examples)

Introduction

To execute your application with HTCondor, you have to specify some parameters like the name of your executable, its arguments, inputs and outputs, requirements, etc. This information is written in a plain text using submit commands in a file called HTCondor Submit Description File or simply submit file. Once that file is filled with all needed info, you have to submit it to HTCondor using condor_submit in your terminal, and then it will be processed and your jobs will be added to the queue in order to be executed.

Submit files have considerably changed after the release of versions 8.4.X (first version 8.4.0 released in Sept 2015, since Feb 2017 we are using versions 8.6.X). Some operations were not possible or highly painful in previous versions (like dealing with an undetermined number of files with arbitrary names, declaring variables and macros and performing operations with them, including submission commands from other files, adding conditional statements, etc.). To solve that, many researchers developed external scripts (perl, python, bash, etc.) to dynamically create description files and submit them, what in most cases resulted in complex submissions and less efficient executions, not to mention that usually it was needed a hard work to adapt those scripts when the application, arguments and/or IO files changed.

With the addition of new, powerful and flexible commands, most of those problems have been solved, so there should be no need of using external scripts and we highly recommend you always use a HTCondor submit description file instead of developing scripts in other languages. If you did that in the past, please, consider migrating your old scripts, we will give you support if you find any problems.

In this section you will find templates and examples of HTCondor Submit Description Files. Use them as reference to create your own submit files and contact us if you have any doubt or issue. Topics:

Caution!: Before submitting your real jobs, perform always some simple tests in order to make sure that both your submit file and program will work in a proper way: if you are going to submit hundreds of jobs and each job takes several hours to finish, before doing that try with just a few jobs and change the input data in order to let them finish in minutes. Then check the results to see if everything went fine before submitting the real jobs. Also we recommend you use condor_submit -dry-run to debug your jobs and make sure they will work as expected, see useful commands page). Bear in mind that submitting untested files and/or jobs may cause a waste of time and resources if they fail, and also your priority will be lower in following submissions.

Creating a Submit File

As many other languages, HTCondor submit files allow the use of comments, variable, macros, commands, etc. Here we will describe the most common ones, you can check the official documentation for a complete and detailed information about submit files and submitting process.

Comments

HTCondor uses symbol # for comments. Everything found after that symbol will be ignored. Please, do not mix commands and comments in the same line, since it may produce errors. We recommend you always write commands and comments in different lines.

 # This is a valid comment
 A = 4    # This may produce errors when expanding A, do not use comments and anything else in the same line!

Variables and macros

There are many predefined variables and macros in HTCondor that you can use, and you can define your own ones.

  • To define a variable, just chose a valid name (names are case-insensitive) and assign a value to it, like N = 4, Name = "example"
  • To get the value of a variable, use next syntax: $(varName), both $ symbol and parentheses () are mandatory.
  • You can do basic operations with variables, like B = $(A) + 1, etc. (since version 8.4.0 is not needed to use the old and complex syntax $$[(...)] for the operations). To get the expression evaluated, you may need to use function macros like $INT(B), $REAL(B), etc.
  • There are several special automatic variables defined by HTCondor that will help you when creating your submit file. The most useful one is $(Process) or $(ProcId), that will contain the Process ID of each job (if you submit N jobs, the value of $(Process) will be 0 for the first job and N-1 in the last job). This variable is like an iteration counter and you can use it to specify different inputs, outputs, arguments, ... for each job. There are some other automatic variables, like $(Cluster) or $(ClusterId) that stores the ID of each submission, $(Item), $(ItemIndex), $(Step), $(Row), etc. (see Example1 for further information).
  • There are several pre-defined Function Macros. Their syntax is $FunctName(varName) and they can perform some operations on variable varName like evaluating expressions and type conversions, selecting a value from a list according an index, getting random numbers, string operations, filenames processing, setting environment variables, etc. Before creating your own macros, check if HTCondor has already a pre-defined Function Macro with the same purpose.

Submit commands

You will need to add several HTCondor submit commands in your script file in order to specify which executable you want to run and where it is located, its arguments if any, input files, which result files will be generated, etc. There is a wide set of HTCondor with almost 200 different submit description file commands to cover many different scenarios. But in most situations you will only need to specify a few of them (usually about 10-15). Here we will present the most common ones (commands are case-insensitive):

  1. Mandatory commands:
    • executable: specify where your executable is located (you can use an absolute path, a relative one to the directory where you do the submission or to another directory specified with initialdir). You should specify only the executable and not other things like arguments, etc., there are specific commands for that. HTCondor will automatically copy the executable file from your machine to any machine where your job will be executed, so you do not need to worry about that.
    • queue: this command will send your job(s) to the queue, so it should be the last command in your submit file. In previous versions of HTCondor it was quite limited, only allowing the number of jobs as argument. But since version 8.4.0, this command is very powerful and flexible, and you can use it to specify variables, iterations over other commands, files to be processed, list of arguments, etc. (see complete syntax and examples).
  2. Highly recommended commands:
    • output: it will copy the standard output printed on the screen (stdout) of the remote machines when executing your program to the local file you specify here. Since all the jobs will use the same name, the filename should include some variable parts that change depending on the job to avoid overwritten the same file, like $(Process) (and also $(Cluster) if you do not want that different submissions ruin your output files). Even if your program does not print any useful results on screen, it is very recommended you save the screen output to check if there were errors, debug them if any, etc.
    • error: the same as previous command, but for standard error output (stderr).
    • log: it will save a log of your submission that later can be analysed with HTCondor tools. This is very useful when there is any problem with your job(s) to find the problem and fix it. The log should be the same for all jobs submitted in the same cluster, so you should not use $(Process) in the filename (but including $(Cluster) is recommended).
    • universe: there are several runtime environments in HTCondor called universes, we will mostly use the one named vanilla since it is the easiest one. This is the universe by default, so if you miss this command, your jobs will also go to vanilla universe.
  3. Useful commands when working with inputs and outputs (arguments, files, keyboard, etc.):
    • arguments: it is used to specify options and flags for your executable file, like when using it in command line.
    • should_transfer_files: assign YES to it in order to activate HTCondor file transfer system (needed when working with files).
    • when_to_transfer_output: it will usually have a value of ON_EXIT to only copy output files when your job is finished, avoiding the copy of temporary or incomplete files if your job fails or it is moved to another machine.
    • transfer_input_files: it is used to specify where the needed input files are located. We can use a comma-separated list of files (with absolute or relative paths, as mentioned in executable command). Local path will be ignored, and HTCondor will copy all files to the root directory of a virtual location on the remote machine (your executable will be also copy to the same place, so input files will be in the same directory). If you specify a directory in this command, you can choose if you want to copy only the content of the directory (add a slash "/" at the end, for instance myInputDir/) or the directory itself and its content (do not add a slash).
    • transfer_output_files: a comma-separated list of result files to be copied back to our machine. If this command is omitted, HTCondor will automatically copy all files that have been created or modified on the remote machine. Sometimes omitting this command is useful, but other times our program creates many temporary or useless files and we only want to get the ones we specify with this command.
More commands for input/output files:
  • transfer_output_remaps: it changes the name of the output files when copying them to your machine. That is useful when your executable generates result file(s) with the same name, so changing the filename to include a variable part (like $(Process) and maybe also $(Cluster)) will avoid overwritten them.
  • initialdir: this command is used to specify the base directory for input and output files, instead of the directory where the submission was performed from. If this command include a variable part (like $(Process)), you can use this command to specify a different base directory for each job.
  • input: if your program needs some data from keyboard, you can specify a file or a comma-separated list of files containing it (each end of line in the file will have the same behaviour as pressing Intro key in the keyboard, like when using stdin redirection in command line with <). As other similar commands, you can use absolute or relative paths.
  • transfer_executable: by default its value is True, but if it is set to False, HTCondor will not copy the executable file to the remote machine(s). This is useful when the executable is a system command or a program that is installed in all machines, so it is not needed to copy it.
  1. Other useful commands:
    • request_memory, request_disk: if your program needs a certain amount of total RAM memory or free disk space, you can use these commands to force that your jobs will be only executed on machines with at least the requested memory/free disk space [HowTo]
    • requirements: this is a very useful command if your program has any special needs. With it you can specify that your job can be only executed on some machines (or some machines cannot run your program) according to a wide set of parameters (machine name, operative system and version and a large etc.) [HowTo]
    • rank: you can specify some values or combination of them (total memory, free disk space, MIPS, etc.) and HTCondor will choose the best machines for your jobs according to your specifications, where the higher the value, the better (this command is used to specify preferences, not requirements) [HowTo]
    • getenv: if it is set to True, all your environment variables will be copied at submission time and they will be available when your program is executed on remote machines (if you do not use this command or it is set to False, then your jobs will have no environment variables). This is useful when running some programs that need a special environment, like python, etc. [HowTo]
    • nice_user: if it is set to True, your jobs will be executed with a fake user with very low priority, what could be very useful when the queue is (almost) empty, so you can run your jobs without wasting your real user priority (you can activate and deactivate this feature when your jobs are being executed, so you can begin running your jobs as nice user if the queue is empty and change to normal user when the queue has many other jobs, or vice versa) [HowTo]
    • concurrency_limits: you can limit the maximum number of your jobs that could be executed at the same time. You should use this command if your program needs licences and there are a few of them (like IDL, see also this alternative) or if for any reason you cannot use the HTCondor file transfer system and all your jobs access to the same shared resource (/scratch, /net/nas, etc.), in order to avoid that too many concurrent access can stress the network [HowTo]
    • include: since HTCondor v8.4.0, it is possible to include externally defined submit commands using syntax: include : <myfile>. You can even include the output of external scripts that will be executed at submission time, adding a pipe symbol after the file: include : <myscript.sh> |
More useful commands:
  • environment: this command will allow you to set/unset/change any environment variable(s) [HowTo]
  • priority: if some of your jobs/clusters are more important than others and you want to execute them first, you can use priority command to assign them a priority (the higher the value, the higher priority). This command only have an effect on your own jobs, and it is not related to users priority [HowTo]
  • job_machine_attrs, job_machine_attrs_history_length: use these commands to reduce the effects of black holes in HTCondor, what causes that many of your jobs could fail in a short time [HowTo]
  • noop_job: you specify a condition and those jobs that evaluate it to true will not be executed. This is useful when some of your jobs failed and you want to repeat only the failing jobs, not all of them [HowTo]
  • +PreCmd, +PreArguments, +PostCmd, +PostArguments: These commands allow you to run some scripts before and/or after your executable. That is useful to prepare, convert, decompress, etc. your inputs and outputs if needed, or debug your executions [HowTo]
  • notify_user, notification: use these commands if you want to receive a notification (an email) when your jobs begin, fail and/or finish [HowTo]
  • if ... elif ... else ... endif: since HTCondor version 8.4.0, a limited conditional semantic is available. You can use it to specify different commands or options depending on the defined/undefined variables, HTCondor version, etc.
  • on_exit_hold, on_exit_remove, periodic_hold, periodic_remove, periodic_release, etc.: you can modify the default behaviour of your jobs and the associated status. These commands can be used in a wide set of circumstances. For instance, you can force that jobs that are running for more than X minutes or hours will be deleted or get a on hold status (with this you can prevent that failing jobs will be running forever, since they will be stopped or deleted if they run for a much longer while than expected) or the opposite, hold those jobs that finish in an abnormal short time to check later what happened. Or you can also periodically release your held jobs, to run them on other machines if for any reason your jobs work fine on some machines, but fail on others [HowTo]
  • deferrall_time, deferral_window, deferral_prep_time: you can force your jobs begin at a given date and time. That is useful when the input data is not ready when submitting and your jobs have to wait till a certain time [HowTo]

Templates and examples

Here you can find basic templates of submit files, you can use them as starting point and then do the customizations needed for your executions. Check the examples in following sections for details and explanations.

Common Template

 ######################################################
 # HTCondor Submit Description File. COMMON TEMPLATE   
 # Next commands should be added to all your submit files   
 ######################################################
 if !defined FNAME
   FNAME = condor_exec
 endif
 ID      = $(Cluster).$(Process)

 output  = $(FNAME).$(ID).out
 error   = $(FNAME).$(ID).err
 log     = $(FNAME).$(Cluster).log

 universe                = vanilla
 should_transfer_files   = YES
 when_to_transfer_output = ON_EXIT

Explanation:

Let's analyse the common template:

  1. First block:
    • Here we will define some variables that will be used later. The first of them is FNAME and first we ask with the if defined condition whether that variable is not already defined (if so, we will use the previous value). This variable will contain the base name for the files where HTCondor will save the information displayed on the screen (stdout and stderr) and the log file. It is interesting to give a common name to those files generated by HTCondor so later we can identify and manage them together. Since all jobs will use the name specified there, we have to include a variable part that has to be different in each job, in order to avoid overwriting the files. We recommend you use a combination of $(Process) (it contains the process ID that is different for each job) and $(Cluster) (it contains the cluster ID that is different for each submission), as we have done when defining $(ID). In this way, different jobs and different submission will use different filenames and none of them will be overwritten.
  2. Second block:
    • With output command we force HTCondor to write in the specified file all the screen output (stdout) generated by each job. We have used the variables $(FNAME) and $(ID) defined above.
    • With error command we manage stderr in the same way we did with output.
    • Then we have also specified a HTCondor log file with log command. You should not use $(Process) in the filename of the log since all jobs should share the same log.
  3. Third block:
    • universe: there are runtime environments in HTCondor called universes, we will mostly use the one named vanilla since it is the easiest one. This is the universe by default, so if you miss this command, your jobs will go also to vanilla universe.
    • should_transfer_files=YES and when_to_transfer_output=ON_EXIT commands are used to specify that input files have to be copied to the remote machines and output files must be copied back to your machine only when our program is finished. Although these commands are only needed when working with files, we recommend you always use them unless you are totally sure you can omit them.

Examples when working with input/output files and arguments

Most times you will want to run applications that deal with input and/or output files. Commonly, the input files will be located on your local machine, but since your application will be executed on other machine(s), it will be needed to copy your input files there, and then copy the result files back to your computer once your program is done. HTCondor have some commands to automatically do both operations in an easy way, so you do not need to worry about the file transfers: you just need to specify where your files are and HTCondor will copy them.

Note: All these examples will begin defining a specific variable FNAME that contains the base name of the files that HTCondor will generate to save the stdout, stderr and log. Next, the common template explained above with be included using command include (we assume that the common template filename is condor_common.tmpl).

Example A (arbitrary filenames)

  • Process all input files with extension .in in a given directory with next program:
    ./myprogram -i inputFile -o outputFile
 # Including Common Template
 FNAME = exampleA
 include : /path/to/condor_common.tmpl

 transfer_input_files    = $(mydata)
 transfer_output_files   = $Fn(mydata).out


 executable    = myprogram
 arguments     = "-i $Fnx(mydata) -o $Fn(mydata).out"

 queue mydata matching files /path/to/inputs/*.in

Explanation:

We use transfer_input_files to specify where the needed input files are located. We can use a comma-separated list of files, but since we do not know the name of the files, we will use the variable mydata to specify them. That variable is defined in the last line, with the queue command: there, we choose to process all files in /path/to/inputs with extension .in. When submitting, HTCondor will check that directory and it will automatically create a job for each .in file found there, assigning the complete filename to mydata (in this way, each job will work with a different file). We have used the matching files to specify that we only want files matching the condition, but we can also select only directories (matching dirs) or both of them (just matching).

With transfer_output_files we set the name of the output files, that is the same as the input file with .out extension. To remove the old extension we use the $Fn macro, that is one of the new Fpdnxq Function Macros available since version 8.4.0, used to operate the filename and extract the path, name without extension, extension, etc.

Then we use executable to specify the name of the executable (it can be a system command, your own application, a script, etc). We can use a absolute path or a relative one to the directory where we will perform the submission. This executable will be copied to all remote machines automatically. Finally, arguments is used to specify the options for the program. We have to employ again Fpdnxq macros, first Fnx to remove the original path (file we be copied to the root of a virtual location where HTCondor will run the executable on the remote machine) and then Fn to remove path and change extension of the output file.

Example B (based on ProcessID, old system before HTCondor v8.4.0)

  • Process 50 input files with consecutive names (from data0.in to data49.out) using the same program as previous example
 # Including Common Template
 FNAME = example2
 include : /path/to/condor_common.tmpl

 transfer_input_files    = /path/to/inputs/data$(Process).in
 transfer_output_files   = data$(Process).out

 N             = 50
 executable    = myprogram
 arguments     = "-i data$(Process).in -o data$(Process).out"

 queue $(N)

Explanation:

transfer_input_files command allows a comma-separated list of files or directories that will be copied to the remote machine. Local path will be ignored, and HTCondor will copy all files to the root directory of a virtual location on the remote machine (your executable will be also copy to the same place, so input files will be in the same directory). If you specify a directory in this command, you can choose if you want to copy only the content of the directory (add a slash "/" at the end, for instance myInputDir/) or the directory itself and its content (do not add a slash). In this case, each job will process a different input file, and since they have a consecutive name beginning from 0, we will use HTCondor macro $(Process) to build the proper name, since the process ID will be 0 from the first job to N-1 for the last job.

With transfer_output_files we specify a comma-separated list of result files to be copied back to our machine. In this case, we specify just one file, with the same name as the input file, but with .out extension.

Then we define the variable N to specify the number of jobs to be executed. Our program is set using executable command and with arguments command we specify all the needed options (here the name of the input and output file with the corresponding flags).

At the end, we send all jobs to the queue with queue command, specifying how many jobs we want (we have used the variable N).

Example C (lists of files and arguments written in submit file)

  • Process all arbitrary files and arguments of a given list. Executable is myprogram and it needs an input file with extension .dat and some arguments. Results will be printed on screen (stdout).
 # Including Common Template
 FNAME = exampleC
 include : /path/to/condor_common.tmpl

 executable    = myprogram

 queue transfer_input_files,arguments from (
   xray434.dat, -d 345 -p f034
   sunf37.dat,  -d 2   -p f302
   light67.dat, -d 62  -p f473
 ) 

Explanation:

We will use the flexibility of queue command to assign values of a list to several commands. We must specify which files must be transferred and which arguments are needed by each file. We specify then transfer_input_files and arguments commands using the from option, and then we add a list of pairs file,argument.

At submission time, HTCondor will iterate over the list and expand the assignations. For instance, our jobs will have next values:

  1. transfer_input_files = xray434.dat, arguments = -d 345 -p f034
  2. transfer_input_files = sunf37.dat, arguments = -d 2 -p f302
  3. transfer_input_files = light67.dat, arguments = -d 62 -p f473

When using this format you can specify as many commands separated by commas as needed between queue and from, but check that each line in the list has the right number of elements also separated by commas.

Writing the list of items in the submit file can be a little bit tedious, but it may be easily done in an external file using scripts. Then you can directly specify the file. For instance, suppose you have all items in a file named data.lst, then you can use next queue command:

 queue transfer_input_files,arguments from /path/to/data.lst

Example D (lists of files and arguments in external file)

  • Process arbitrary files and arguments stored in file data.lst (process only lines from 28 to 43, both inclusive, with step 5). Executable is myprogram as in previous example, but this time it saves the result in a file named output.out.
 # Including Common Template
 FNAME = exampleD
 include : /path/to/condor_common.tmpl

 transfer_output_files  = output.out
 line                   = $(Row)+1
 transfer_output_remaps = "output.out=output$INT(line).out"

 executable    = myprogram

 queue transfer_input_files,arguments from [27:43:5] data.lst

Explanation:

This example is similar to the previous one, but this time the list of input files and arguments is written in a file with the following format:

 input_file1,args1
 input_file2,args2
 input_file3,args3
 ...

To illustrate the slice feature, we have been asked to process only items (lines) from 28 to 43 with step 5 (28, 33, 38 and 43), this could be useful when we want to run only certain experiments. The syntax for the slices is very easy, the same as Python: [init:end:step]. Since the first index is 0, but we do not use line 0 but line 1, the init should be 27. Then the end is 43 (it should be 42, but we need to add 1 because the upper limit is included according to our example). So we specify the slice using [27:43:5] in the queue command, between the from clause and the file.

We have to be careful with the results. Our program writes them in a file named output.out. We cannot get all files with the same name because they will be overwritten, so we need to use transfer_output_remaps to change names when copying from remote machines to our. We can add the $(Process) variable to the new name, so all of them will be different, but then it could be a little bit complicated to identify each result. Instead, we will use another of the automatic variables, called $(Row). It stores the number of the row in the list that is being processed, that is, almost the number of the line: since $(Row) begins in 0, we need to add 1 to get the line number. We do that in variable $(line). Then, HTCondor will process rows 27, 32, 37 and 42, and our output files will be output28.out, output33.out, output38.out and output43.out.

Example E (stdin, initialdir external scripts and lists)

  • Our program myprogram works with stdin (keyboard is used to specify input data). We have written that input data in 4 files (dataFeH.in, dataOFe.in, dataOH.in and dataHe.in) and there is a set of 3 different experiments in directories (dir000, dir001 and dir002). Output files will be generated with the same name as inputs and extension .out (use -o argument) and they must be located in the same directory where the respective input file is. Program also needs all *.tbl files located in /path/to/tables.
 # Including Common Template
 FNAME = exampleE
 include : /path/to/condor_common.tmpl

 N            = 3
 input        = data$(met).in
 initialdir   = /path/to/dir$INT(Step,%03d)
 include      : input_tables.sh |
 transfer_output_files = data$(met).out

 executable   = myprogram
 arguments    = "-o data$(met).out"

 queue $(N) met in FeH, OFe, OH, He

Explanation:

The key of this example is the queue command in last line. We are using the clause in to specify a list of values. HTCondor will create a job for each element in the list and the current value will be assigned to the variable met that we have declared (this variable is optional, you can omit it and use the automatic variable Item). We have 3 set of experiments, so we need to go over the list 3 times, that is why we have defined N = 3 and we are using $(N) in the queue command. So, at the end, HTCondor will execute 12 jobs (3 runs * 4 elements in the list): we will use automatic variable $(Step) to get the number of the present run (0, 1 or 2) and $(met) (or $(Item) if we omit the variable) to get the value of the current element in the list.

input command is used to specify a file that will be used as stdin, using variable $(met) to get the proper filename. That variable will be also used when building the name of the output files (transfer_output_files command) and the arguments (arguments command).

We use initialdir to specify a base directory that changes according to the current job, using the automatic variable $(Step). HTCondor will use this directory as base for the relative paths, so it will affect the input and output files, including the stdout, stderr and log files created by HTCondor (see common template). We use $INT(Step,%03d) to get a 3-digit number (000, 001 and 002) to build the proper path for each experiment, then HTCondor will go to the right directory to get the input files and to place later the respective output files there.

Last thing we have to solve is the problem with the required input files (all *.tbl files located in /path/to/tables). HTCondor does not allow globbing in transfer_input_files, but instead we can use the new feature of including external files with include command. This command not only include other files, but also invoke them if the command finish with a bar |. Then we can easily make a external script to get the list of needed files with linux command ls and options -m (commas are used to separate elements) and -w (used to specify the wide of the screen before adding a new line. Since we need all elements in the same line, we should specify a number big enough). In this case, our external script input_tables.sh is the following one:

 #!/bin/bash
 echo "transfer_input_files = `ls -w 400 -m /path/to/tables/*.tbl`"

Example F (loops)

  • Execute each iteration of a 3-level nested loop using: myprogram -dim1 i -dim2 j -dim3 k with the following ranges: i:[0,20), j:[0,15) and k:[0,35). Output will be written on screen, no input files are needed.
 # Including Common Template
 FNAME = exampleF
 include : /path/to/condor_common.tmpl

 MAX_I = 20
 MAX_J = 15
 MAX_K = 35

 N = $(MAX_I) * $(MAX_J) * $(MAX_K)

 I = ( $(Process) / ($(MAX_K)  * $(MAX_J)))
 J = (($(Process) /  $(MAX_K)) % $(MAX_J))
 K = ( $(Process) %  $(MAX_K))

 executable = myprogram
 arguments  = "-dim1 $INT(I) -dim2 $INT(J) -dim3 $INT(K)"

 queue $(N) 

Explanation:

In this example we only need to simulate a 3 nested loops from a 1-level loop (we will use $(Process) as main loop counter). The 3-level loop will be the next ones, and HTCondor will create a job for each iteration:

 for (i = 0; i < MAX_I; i++)
   for (j = 0; j < MAX_J; j++)
     for (k = 0; k < MAX_K; k++)
       ./myprogram  -dim1 i -dim2 j -dim3 k

Then we only need to set the limits (MAX_I, MAX_J, MAX_K), the number of total iterations (N = $(MAX_I) * $(MAX_J) * $(MAX_K)) and use some maths to get the values of I, J and K according the value of $(Process), as we have done above (just a few multiplications, integer divisions and remeinders are needed).

For a 2-level loop, you can use next code:

 I = ($(Process) / $(MAX_J))
 J = ($(Process) % $(MAX_J))

Example G: This example shows the use of several useful commands for specific conditions. It is also a summary of the HOWTOs, you can find further details and explanation about the submit commands there

  1. Execute myprogram with argument "-run " from 0 to 99 by default.
  2. BLOCK A: Execute only on machines with at least 4GB RAM and 2GB of free disk space. The higher memory and the faster calculations, the better (we can use KFLOPS to choose the faster machines doing floating point operations, but since memory and kflops have different units, we need to weight them, for instance, multiplying memory by 200).
  3. BLOCK B: Execute only on machines with Linux Fedora21 or upper and avoid executing on cata, miel and those with hostname beginning with letter m or d.
  4. BLOCK C: It is needed to run script processData.sh before (argument: -decompress) and after (argument: -compress) to prepare our data.
  5. BLOCK D: Our executable needs the environment variables and variable OUT has to be set with the argument.
  6. BLOCK E: Avoid black holes (when your jobs do not execute correctly on a machine, and since they finish quickly, that machine is getting most of the jobs).
  7. BLOCK F: Get a notification via email when errors in the job. If the job finishes before 5 minutes or takes more than 2 hours to be done, there was a problem: hold it to check later what happened.
  8. BLOCK G: Our program needs licenses, so we cannot run more than 20 jobs at the same time. Execute jobs as nice user to save priority since there are no other jobs running at this moment.
 # Including Common Template
 FNAME = exampleG
 include : /path/to/condor_common.tmpl

 if !defined N
   N = 100
 endif

 #BLOCK A
 requested_memory = 4 GB
 requested_disk   = 2 GB
 rank             = (200 * Memory) + KFLOPS

 #BLOCK B
 letter           = substr(toLower(Target.Machine),0,1)
 requirements     = (UtsnameSysname == "Linux") 
         && (OpSysName == "Fedora") && (OpSysMajorVer >= 21) 
         && !stringListMember(UtsnameNodename, "cata,miel")
         && !stringListMember($(letter), "m,d")


 #BLOCK C
 transfer_input_data = processData.sh
 +PreCmd             = "processData.sh"
 +PreArguments       = "-decompress"
 +PostCmd            = "processData.sh"
 +PostArguments      = "-compress"

 # ...
 # ...

 #BLOCK D
 getenv              = True
 environment         = "OUT=$(Process)"

 #BLOCK E
 job_machine_attrs = Machine  
 job_machine_attrs_history_length = 5           
 requirements = $(requirements) 
       && (target.machine =!= MachineAttrMachine1)  
       && (target.machine =!= MachineAttrMachine2)

 #BLOCK F
 notify_user       = myuser@iac.es
 notification      = Error

 on_exit_hold = ((CurrentTime - JobStartDate) < (5 * 60)
 periodic_hold = ((JobStatus == 2) 
          && (time() - EnteredCurrentStatus) >  (2  $(HOUR)))

 #BLOCK G
 concurrency_limits = myuser$(Cluster):50
 nice_user = True

 executable = myprogram
 arguments  = "-run $(Process)"

 queue $(N) 

IMPORTANT: Although your program could use shared locations (/net/XXXX/scratch, /net/nasX, etc.) to read/write files from any machine so there is no need to copy files, we highly recommend you always use the HTCondor file transfer system to avoid network congestion since files will be accessed locally on the remote machines. Bear in mind that HTCondor can execute hundreds of your jobs at the same time, and if all of them concurrently access to the same shared location, network could experience a huge stress and fail. If for any reason you cannot copy files and you have to use shared locations -you are using huge files of several GB, etc.-, then contact us before submitting to adapt your jobs in order to avoid network congestion.

Submit file HowTo

NOTE: Submit File HOWTOs have been moved to their own page: HTCondor(4): Submit File (HowTo)

OLD Examples

This section presents several examples of submit files, from very basic examples to more complex ones, step by step. These examples were created for previous versions of HTCondor and since version 8.4.0 there are easier and more flexible ways to get the same results in most cases. However, we have left these old examples here since they may help you, but bear in mind that they may be obsolete.

  • Example 1. Our first submit file: executable and arguments
  • Example 2. Adding simple inputs and outputs: stdin, stdout and stderr
  • Example 3. Simple examples including input and output files
  • Example 4. A more complex example, step by step
  • Example 5. Working with more complex loops and macros

These examples will cover the most common cases based on our experience with IAC's users. If you want a complete documentation, you can run man condor_submit in your shell, visit the condor_submit page in the reference manual and/or the Submitting a Job section). Some more examples of submit description files are also available at HTCondor site.

Example 1. Our first submit file: executable and arguments ^ Top

The first thing you have to specify is the executable of the application to run and its arguments, and then launch the jobs. For that purpose we will use executable, arguments and queue commands, respectively (note that commands are case insensitive). If your application is located in a private directory that is not accessible for other users and/or from other machines, then you need to add should_transfer_files command and HTCondor will copy your application to the machines where it will be run.

In our first example we have developed an application called "myprogram" located in the same directory where we are going to do the submission. We want to run it with 2 different sets of arguments -c -v 453 and -g 212. Then our submit file will be the following one:

 universe = vanilla
 should_transfer_files  = YES

 executable = myprogram

 arguments  = "-c -v 453"
 queue

 arguments  = "-g 212"
 queue

We will explain here why we use each of these commands:

  • universe: there are several runtime environments in HTCondor, we will mostly use the one named vanilla since it is the easiest one. This is the universe by default, so if you miss this command, your jobs will go also to vanilla universe.
  • should_transfer_files: use it with value YES to specify that your files are not accessible and should be copied to the remote machines
  • executable: Specify the name and path of your executable. The path can be absolute or relative (to the directory in which the condor_submit command is run). HTCondor will copy the executable to each machine where your job(s) will be run.
  • arguments: Specify the parameters of your application. There is an old syntax, but it is recommendable to use the new one enclosed by double quote marks. If you need to specify complex arguments including simple or double quote marks, check the new syntax in the argument list in HTCondor documentation.
  • queue: Place one job into the HTCondor queue, or N if you use queue <N>.


Save this file (for example, call it myprogram.submit) and do the submission in the same directory where your program is located:

 [...]$ condor_submit myprogram.submit

That is all, your jobs will be added into the HTCondor queue, you can check it running condor_q.

Example 2. Adding simple inputs and outputs: stdin, stdout and stderr ^ Top

Now we will deal with inputs and outputs. Let's configure three HTCondor jobs to print "Hello World!" and the ID of each job. We will use OS command echo so outputs will be printed in stdout (the screen), but since we cannot access to the screen of other machines when running the jobs, we should find the way to save these outputs to files. Of course, each job should write a different file and it may be interesting to store them in a separated directory, for instance an existing one called output_files. Also we may want to see any errors (from stderr) and save a log file. The resulting HTCondor submit file could be the next one:

 # First block
 N = 3

 universe               = vanilla
 should_transfer_files  = YES
 initialdir             = /path/to/files 

 input   =
 output  = echo_example.$(Cluster).$(Process).out
 error   = echo_example.$(Cluster).$(Process).err                                                                                     
 log     = echo_example.$(Cluster).log                                                                       

 # Second block
 executable          = /bin/echo
 transfer_executable = False
 arguments           = "Hello World, I am job: $(Process)!"

 queue $(N)

Let's analyze this example:

  1. First block:
    • The first line contains a macro declaration, N = 3, so from that point we can use that macro writing $(N) (you must use parenthesis, $N is NOT valid).
    • should_transfer_files = YES command is used to specify that files should be copied to/from the remote machines.
    • Then with initialdir we specify the path to input and output files (not the executable), it can be an absolute path or relative (to the directory in which the condor_submit command is run). If your files are in the same directory where you are doing the submission, then you do not need to use this command.
    • input command is empty since we do not need it in this example. But if you run your program in this way: myprogram < data.in, then you should add next command input = data.in.
    • With output command we force HTCondor to write in the specified file all the screen output (stdout). Note that to avoid all jobs writing in the same file, we have used the $(Cluster) macro (it is an ID of each submission) and the $(Process) macro (it is an ID given to each job, from 0 to N-1).
    • With error command we manage stderr in the same way we did with output.
    • Then we have also specified a log file with log command.
  2. Second block:
    • We specify the name of your application using executable command (we set it to /bin/echo).
    • Since the executable is an OS command available in each machine, it is not needed that HTCondor makes a copy to each machine, so we have used transfer_executable = False to avoid that.
    • arguments command specify the arguments of your program. We have use the predefined $(Process) macro so each job will print its own ID. This can be used also like a counter or loop in your arguments.
    • At the end we send N jobs to the queue using queue <N> command.

If we save the submit file with name echo.submit and send it to the queue using condor_submit echo.submit (let's suppose it gets Cluster ID 325), the result should be something like the following one, assuming we are located in the directory where we did the submission:

 ./echo.submit
 /path/to/output_files/echo_example.325.0.out   # (content: Hello World, I am job: 0!)
 /path/to/output_files/echo_example.325.1.out   # (content: Hello World, I am job: 1!)
 /path/to/output_files/echo_example.325.2.out   # (content: Hello World, I am job: 2!)
 /path/to/output_files/echo_example.325.0.err   # (content: Empty if no errors)
 /path/to/output_files/echo_example.325.1.err   # (content: Empty if no errors)
 /path/to/output_files/echo_example.325.2.err   # (content: Empty if no errors)
 /path/to/output_files/echo_example.325.log     # (content: Info about jobs execution)    


HTCondor is mainly designed to run batch programs and they usually have no interaction with users, but if your program needs any input from the stdin (i.e. keyboard), you can specify it writing all the inputs in a file and then using input command to indicate that file, with the same syntax as the output command.

Example 3. Simple examples including input and output files ^ Top

Now we know how to specify standard inputs and outputs, let's see how we can deal with input and output files. We will study two different situations to see how we can solve each one, depending on whether our executable accepts arguments for input/output files or not.

Example 3A. We can specify our input/output files as arguments ^ Top

Suppose that we have developed an application called myprogram that needs two arguments, the first one is the name of the input file and the second one is the name of the output file that will be generated. We usually run this application in the following way:

 ./myprogram /path/to/input/data.in data.out

We have 300 different input data files named data0.in, data1.in, data2.in, ..., data299.in and we want to use HTCondor to execute them (each job will process a different input file). Then we just need to write the next submit file to execute jobs in HTCondor:

 N     = 300
 ID    = $(Cluster).$(Process)
 FNAME = example3A

 output  = $(FNAME).$(ID).out
 error   = $(FNAME).$(ID).err                                                                                     
 log     = $(FNAME).$(Cluster).log                                                                       

 universe                = vanilla
 should_transfer_files   = YES
 when_to_transfer_output = ON_EXIT

 transfer_input_files    = /path/to/input/data$(Process).in
 transfer_output_files   = data$(Process).out

 executable  = myprogram
 arguments   = "data$(Process).in data$(Process).out"

 queue $(N)

This submit file is similar to previous examples. We have defined some useful macros (ID and FNAME) to avoid writing the same text several times, and we have also used some new commands like transfer_input_files to specify input files and transfer_output_files for the output files (if you need to specify several input and/or output files, use a comma separated list). Remember we have to activate the HTCondor copying files mechanism using should_transfer_files command, and we have also used when_to_transfer_output to tell HTCondor that it should only copy the output files when our program is finished. If you do not use transfer_output_files command, then HTCondor will copy all generated or modified files located in the same directory where your application was executed (see this FAQ for more info).

You do not need to deal with copying files, HTCondor will copy the input files from the specified location on your machine to the same directory where your program will be executed on the remote machine (that is why we have used no path for the input file in the arguments command, since that file will be in the same place as the executable). Once your program is finished, HTCondor will copy the output file from the remote machine to yours and it will be located in the same directory where you did the submission (remember you can change this behaviour with initialdir command).

In this example we have supposed that input files have a convenient name, containing a known pattern that includes a consecutive number from 0 to N-1. This is the easiest situation, and although it is not strictly needed to rename your input files, we recommend you change filenames to make much easier to specify them using HTCondor commands. There are several simple ways to rename your files, like using the rename linux command, a bash script, etc. For instance, if your input files have different names, but all of them have .in extension, then next simple bash script will do the work renaming all of them so the result will be data0.in, data1.in, data2.in, ..., data299.in following alphabetic order (you can modify it to use your own criteria, save the equivalence between old and new names, etc):

 #!/bin/bash

 n=0
 cd /path/to/input/
 for file in *.in  
 do 
   mv $file data$n.in 
   n=$((n+1))  
 done

Example 3B. We cannot specify arguments ^ Top

Sometimes our executable does not accept arguments and it needs to find some specific files. For instance, suppose that our application myprogram needs to find an input file called data.in in the same directory where it will be executed and then it will produce an output file called data.out, also in the same directory. Again, we will also assume that we have all our input files in /path/to/input/, so we have to prepare them. Since all the files must have the same name, we cannot use the same directory, so we are going to create directories with names input0, input1, input2, ..., input299 and each of these directory will contain the pertinent data.in file. To do that, we can use a bash script like the next one:

 #!/bin/bash

 n=0
 cd /path/to/input/
 for file in *.in  
 do 
   mkdir input$n
   mv  $file input$n/data.in 
   echo "$file -> input$n/data.in" >> file_map.txt
   n=$((n+1))  
 done

Last script simply creates a new directory and move into it the input file, renaming it as data.in. We have also added a extra line to create a file called file_map.txt that will include a list with the original and the new name and location for each file, that could be useful to identify later the outputs. Now we need to write the submit file:

 N     = 300
 ID    = $(Cluster).$(Process)
 fname = example3B

 output  = $(fname).$(ID).out
 error   = $(fname).$(ID).err                                                                                     
 log     = $(fname).$(Cluster).log                                                                       

 universe                = vanilla
 should_transfer_files   = YES
 when_to_transfer_output = ON_EXIT

 transfer_input_files    = /path/to/input/input$(Process)/data.in
 transfer_output_files   = data.out
 transfer_output_remaps  = "data.out=data$(ID).out"

 executable  = myprogram
 arguments   = ""

 queue $(N)

We have introduced a few changes in the submit file. Now we will use transfer_input_files to choose the proper data.in file according to the directory of each job. Output files will be copied to the same directory where the submission is done and since all of them will have the same name, we need to avoid that they will be overwritten using transfer_output_remaps command. With that command we will rename all output files to include the ID.

Sometimes we want that the output files will be located in the same directory where the related input file is placed. Then, since output files will be in different directories, there is no need to change their names. In these situations, we can remove the transfer_output_remaps command and use instead the initialdir command to specify that HTCondor should use a different directory for both input and output files in each execution (this will not affect the executable file):

 initialdir              =/path/to/input/input$(Process)
 transfer_input_files    = data.in
 transfer_output_files   = data.out



Note: Using known patterns and consecutive numbers as names of files makes very easy that you can specify input and output files in HTCondor, and you only need to use simple linux commands and/or bash scripts to rename these files (always keep a backup of your original files!). However, there are other ways to work with HTCondor if for any reason you do not want or you cannot change the names of your files.

Also remember that if you specify directories with transfer_input_files and transfer_output_files and they finish with a slash ("/"), HTCondor will copy the content of the directories, but not the directory itself. That can be used to copy input or output files without knowing their names, we only need to place them in a pertinent directory structure, using a bash script like that presented in example 3B (but without changing the name of the files). Also if your application is able to use the stdin to get the name of the files, you can write those names in another file with a known pattern and then specify that file using a HTCondor input command.

Also you can add in your submit file some more commands that could be very useful when dealing with inputs and output files. For instance, preCmd and postCmd commands allow you to run scripts or shell commands before and after executing your program, respectively, so you can use them to rename or change the location of your input and output files, or any other operation that you may need. You have more information about these commands in Submit File (HowTo) section.

Example 4. A more complex example, step by step ^ Top

This example should be enough to run HTCondor jobs in most common situations. In this example, assume that we have an application called myprogram that accepts two arguments: the first one is the input file to be processed, where each line is a set of values that can be independently computed. The second argument is the name of the output file that will be created with the results.

In our example, we have a huge input file with several thousands of lines, called data.in and it takes quite a long time to be computed (several days), so we will use HTCondor to reduce this amount of time. What we are going to do is to split the huge input file in N smaller files with names data0.in, data1.in, ..., data(N-1).in and create a HTCondor job to process each one.

The first step is to decide how many files we will create. Since each file will be a HTCondor job, this is a critical step, we have to make our decision according to next criteria:

  1. We should create a relatively large number of jobs, at least a few hundreds of them. If we split our input in just 2 files, that means that there will be only 2 jobs to be executed by HTCondor, so the maximum speedup we could get is 2 (our results will be ready in half time compared to a normal serial execution). But if we generate 100 hundreds jobs, then we could get a time factor reduction of 100x, or 500x if we generate 500 jobs... Of course, this is always a theoretical limit, it is almost impossible to reach it (all jobs have several overheads, probably there will be more users running jobs with HTCondor, the number of idle machines is always changing, your jobs could be evicted and restarted later, etc.), but generating a large number of jobs will increase your chances to get your results in less time. If you are wondering how much speedup you can get, on average HTCondor has around 350 idle slots at working hours, but at nights or weekends there could be peaks of about 600 idle slots. Anyway, you can generate as many jobs as you want, even several thousands of them, HTCondor will manage it and run your jobs when slots get idle. A large number of short jobs could be more efficient than a low number of long ones, but also bear in mind that transferring input and output files consumes resources and time: if your jobs need that HTCondor transfers many/long files to/from remote machines, then you may need to significantly reduce the number of jobs to avoid overloading the network and also to decrease the total time consumed by those file transfers.
  2. Most times the number of jobs has to be chosen according to the estimation of the time a job needs to be processed. We should not choose jobs that only last few seconds/minutes, because executing a job has an overhead (communications, creating the execution environment, transferring files, etc.), so if your job is too short, it could happen that this overhead takes more time than executing your program. On the other hand, if your jobs need several hours to be finished, it is likely they will be suspended/killed and restarted from the beginning many times, so the final consumed time could be really high. There is not a fixed rule about the duration of your jobs and sometimes you cannot choose it... But if you can choose, a job that needs from 10 to 30 minutes to be done should be fine (the bigger the files you need to transfer, the larger the jobs should be to reduce the total number of jobs and, therefore, the amount of file transfers). When possible, avoid those large jobs that need more than one hour to be processed, unless heavy file transfers are involved (if files are so big, consider using a share location like scratch instead of copying them to all remote machines, and then add a limit to the number of concurrent jobs).

For instance, our original data.in file has 97564 lines and we will try to follow these recommendations when splitting it. Before choosing the number of jobs, we need to run some tests to have an estimation about how much time our program needs to process different inputs. For example, suppose we have already done those tests and, on average, our program needs about 4 second per line, so it can process 250 lines in around 17 minutes. If we split our huge file in smaller ones of 250 lines each, then we will have 391 files. That means 391 jobs will be generated, what is a good amount. Since we just need to transfer one input file and one output file and their sizes will be about just a few KB, this time it is not needed to think about the overhead of file transfers. If we are really lucky and HTCondor is able to immediately execute all our jobs at the same time, then we could get our results in about 17 minutes. It is almost sure that will not happen, we may need to wait some more minutes or hours, but we will get our results much faster than a serial execution that needs 97564 * 4 seconds to be processed, almost 5 days.

So, we have finally chosen N = 391. Next step should be to split our file, that could be easily done with Linux commands like split or awk. For example, see next command:

  awk '{filename = "A" int((NR-1)/B) "C"; print >> filename}' D

where A: prefix of the output file, B: number of lines to split, C: postfix of the output file and D: input file. When used, this command will split the input file (D) in files containing a number of C lines each and named A0C, A1C, A2C, ...

Then we will use that command in the next way: A = data, B = 250, C = .in and D = data.in

  [...]$ awk '{filename = "data" int((NR-1)/250) ".in"; print >> filename}' data.in

After executing the previous command, we will have 391 files of 250 lines each (except the last one), from data0.in to data390.in, what means we are going to execute 391 jobs. Then, we will also name our output files in the same way: data0.out, data0.out, ..., data390.out. At this point we are ready to create our submission file, we only need to specify what the executable is, the arguments, the inputs and outputs and where to find them.

If for any reason you want to include a header, you can use next command:

  [...]$ sed -i '1iWrite your header here...' data*.in

To process all files we need to change the arguments in each execution. We could explicitly do that writing N times the proper argument and queue commands in the submit file, but this is a very awful way to solve the problem, besides other factors. A much simpler (and elegant) way is to use a loop, from 0 to 390 (N - 1), to generate all the arguments. To simulate this loop, we could try to write an script (for instance, a bash script) in order to generate N submit files where each one has the correct arguments, but again this is not the best solution: managing 391 HTCondor submit files is bothersome and, even worse, efficiency will be reduced: every time you do a submission, HTCondor will create a Cluster for that execution, what involves an overhead, so we should try to create only one cluster with N jobs rather than N clusters with only one job each. To solve this problem, HTCondor offers us a simple way to process this loop: we can use the $(Process) macro, so each job will have a different value from 0 to N-1. Then, the HTCondor submit file should be similar to the following one:

 # Set number of jobs to execute
 N    = 391

 ID = $(Cluster).$(Process) 
 output  = myprogram.$(ID).out
 error   = myprogram.$(ID).err
 log     = myprogram.$(Cluster).log

 universe                = vanilla
 should_transfer_files   = YES
 when_to_transfer_output = ON_EXIT
 transfer_input_files    = data$(Process).in
 transfer_output_files   = data$(Process).out

 executable    = myprogram
 arguments     = "data$(Process).in  data$(Process).out"

 queue $(N)

The final submit file shown above is very simple and easy to understand. The first blocks were explained in the previous example, we just defined a new macro called ID to make some commands shorter. Then, should_transfer_files command is again used to force the file transfers and we have added a when_to_transfer_output command to tell HTCondor that the files should be transferred after completion.

The key of this example is the transfer_input_files and transfer_output_files commands. With these two commands we tell HTCondor which files have to be copied to the remote machine before executing the program and which files have to be copied back to the machine where the submission was done as results. Before queueing the jobs, we use the arguments command to specify the name of the input file (first argument) and the output file (second argument).

And that is all: HTCondor will expand $(Process) macro in every job, so it will copy the file data0.in to the remote machine where job number 0 will be executed with arguments "data0.in data0.out" and, afterwards, will copy data0.out back to the submit machine, and so on with all remaining jobs till N - 1.

Some remarks to this example:

  • NOTE 1: We are supposing that our inputs and outputs are not in a shared directory so it will not be accessible from other machines where your jobs will be run. It might be possible to solve these problems changing your application and using shared locations, like those in /net/<your_machine>/scratch/..., but this solution is highly not recommendable, moreover if you are using big files or many of them and your application is constantly accessing them to perform read/write operation. If you do so, a big amount of concurrently access may produce locks and a considerable slowdown in your and others' computer's performance. To avoid that, it is a much better idea to copy your input files to the target machine where your job will be run and then bring the results back to your machine. You do not need to take care of this copying process, HTCondor will do all the work for you, the only thing you need to do is use HTCondor commands transfer_input_files and transfer_output_files to specify where files and directories to be copied are located. If you cannot avoid intensive accesses to your files located in shared resources like scracth, then consider the possibility of limiting your concurrent running jobs.
  • NOTE 2: We are assuming here that all inputs and outputs are located in the same directory where the submission will be done. If that is not true, we can specify absolute or relative path (to the submission directory) in the transfer_input_files command, or use initialdir command as explained in the previous example, affecting to both input and output files. Remember that when using transfer_input_files or transfer_output_files you can also specify a directory to be copied to the remote machine. If you specify a long path, HTCondor will not create it all, just the last level (if you want to copy only the content and not the directory itself, add an slash at the end of the directory). For instance, suppose that data_inputs directory only contains a file called data1.dat:
Command Exec Dir @ remote machine
transfer_input_files = /path/to/inputs/data_inputs/data1.dat data1.dat
transfer_input_files = /path/to/inputs/data_inputs data_inputs (and its content)
transfer_input_files = /path/to/inputs/data_inputs/ data1.dat
Please, check next example for more details or Condor documentation about transferring files. If you have doubts about where your input files will be located in the remote machine, it could be useful to submit a job with executable tree to see where files and directories will be placed when executing.
  • NOTE 3: Another assumption is that we can specify arguments to our executable. That is now always true, it could happen that the executable is expecting to find files with predefined names, for example, data.in as input and it will generate data.out as output. If we cannot change this behaviour (for instance, we do not have access to the source code), we need to do some small modifications. The first step is to change our awk script for splitting files in order to place every resulting file in a different directory (dataXX/), but with the same name (data.in), so our inputs will be located in data0/data.in, data1/data.in, ..., data390/data.in. Then, we will add next commands in the submit file (following lines should be placed before the queue command):
   Initialdir  = data$(Process)
   arguments   = ""
With Initialdir command we are specifying that HTCondor has to search for the inputs in that directory (it will be different for each job), and output files will be also placed in that directory. For instance, job with ID 34 will transfer the input file located in data34/data.in and after the execution it will place the output file in data34/data.out.
But we may want to have all our output files in the same directory to process all of them together. That could be achieved removing the Initialdir command and changing our submit file with next commands:
    transfer_input_files    = data$(Process)/data.in
    transfer_output_files   = data.out
    transfer_output_remaps  = "data.out=data$(Process).out"
    arguments               = ""
With the new transfer_input_files command we specify that every data.in have to be copied from the proper directory. Then we use transfer_output_files to copy back the output file, but since all the output files will have the same name, we need to use transfer_output_remaps to change the name and avoiding all jobs overwriting the same file, so they will be renamed to data0.out, data1.out, ... data390.out (this command ONLY works with files, NOT with directories). Finally, we do not specify any arguments since the names of the files are those expected by the executable.
Additionally, you can use +PreCmd and/or +PostCmd commands to run shell commands/scripts/programs before and/or after your main executable, so you can use this commands to rename or move your input and output files. See Submit File (HowTo) section for more information.
  • NOTE 4: If we want to change the number of lines per file, we do not need to change the submit file. For instance, now we want files with 350 lines so after running the awk command, we will have 279 input files and N = 279. Then we can use the same submit file and change the value of N when doing the submission using the -append options, that allows us to change the value of existing macros or define new ones:
   [...]$ condor_submit myprogram.submit -append 'N = 279'

Example 5. Working with more complex loops and macros ^ Top

After studying simple loops where we directly use the $(Process) macro from 0 to N -1, we will see some more complex situations where we need to do some operations with macros. Now assume that we have developed an application called myprogram that needs the following inputs:

  1. We have to specify next arguments, -init XX -end YY:
    • First job (ID: 0): -init 0 -end 99
    • Second job (ID: 1): -init 100 -end 199
    • ...
    • Last job (ID: N-1): -init [N*100] -end [((N+1)*100)-1]
  2. The application expects to find the following files and directories located in the same directory where it will run, although right now they are in different locations:
    1. a common file (it does not depend on the arguments) called data.in located in /path/to/inputs/data.in
    2. all files located inside /path/to/inputs/data_inputs directory
    3. a specific directory called specific-XXX/ (where XXX is the value of the -init argument) located in /path/to/inputs/specific-XXX/

With these inputs, our program will produce next outputs in the same directory where it was executed:

  1. A file called data.out
  2. A directory called data_outputs-XXX (where XXX is the value of the -init argument) with many files inside

We will present the HTCondor submit file for this situation and it will be discussed right after:

 # Set number of jobs to execute
 N  = 50     
 ID = $(Cluster).$(Process) 

 output  = myprogram.$(ID).out
 error   = myprogram.$(ID).err
 log     = myprogram.$(Cluster).log
 should_transfer_files   = YES
 when_to_transfer_output = ON_EXIT
 universe                = vanilla

 # Step in arguments 
 STEP = 100   
 init = $$([$(Process) * $(STEP)])
 end  = $$([(($(Process) + 1) * $(STEP)) -1])

 BDIR                    = /path/to/inputs
 +TransferInput          = "$(BDIR)/data.in, $(BDIR)/data_inputs/, $(BDIR)/specific-$(init)"
 +TransferOutput         = "data.out, data_outputs-$(init)"
 transfer_output_remaps  = "data.out=data-$(init).out"

 executable    = myprogram
 arguments     = "-init $(init) -end $(end)"

 queue $(N)


Let's skip the first and second blocks since we have explained those commands in previous examples (we have just set N = 50 in this example, we can change this value when submitting if we use -append option). In the third block we have used a special syntax \$\$(\[...\]) to define macros init and end. With this syntax we specify that we want to evaluate the macro, allowing arithmetic operators like *, /, +, -, %, ... If you need complex macros, there is a number of operators, predefined functions, etc. (for instance, eval() could be very helpful, or other functions to manipulate strings, lists, ...) and also other [[predefined macros -> http://research.cs.wisc.edu/htcondor/manual/v8.6/3_5Configuration_Macros.html#SECTION00451800000000000000] that you can use to generate random numbers, randomly choose one value among several of them, etc.

Most HTCondor commands will use the resulting value when expanding these macros, but unfortunately that does not work for all commands. For instance, transfer_input_files and transfer_output_files commands do a simple expansion, but do not evaluate the operations, so instead of getting the directory specific-100, you will get specific-$$([$(1) * $(100)]). To avoid that, we have to use other commands that correctly expand complex macros and have similar functionality. In this case +TransferInput and +TransferOutput respectively do the same with similar syntax (they expect strings, so you have to use quotes). We have also defined a simple macro BDIR to avoid writing the path several times.

According to the written commands in the third and fourth blocks, the behaviour of this submit file will be the next one:

  • Inputs: when our program runs in other machine(s), it will find next structure in its local directory: data.in (file), all the content of data_inputs (but NOT the directory itself) and specific-XXX (directory and its content).
  • Outputs: On the outputs side, once all jobs have finished, we should find in our machine the next structure in the same directory where we did the submission: data.out (file) and one directory called data-XXX for each job (where XXX is the value of each -init argument). Note that we have a problem because our application always name the result file with data.out, so all jobs will override it in destination. To avoid that, we use the transfer_output_remaps command to specify that data.out file has to be renamed to data-XXX.out and then all results will be copied in different files (this command ONLY works with files, NOT with directories).

Some more useful commands and info

If you have some issues when creating submit files or running your jobs, please, check the HOWTOs and FAQs pages, since there you could find some more examples or visit the useful commands page. Much more information is available at the official documentation about HTCondor and the Howto recipes. If you need further support, just contact us.



Check also:





Section: HOWTOs

edit · print · PDF
Page last modified on September 19, 2017, at 01:23 PM