edit · print · PDF

Please note that all the SIEpedia's articles address specific issues or questions raised by IAC users, so they do not attempt to be rigorous or exhaustive, and may or may not be useful or applicable in different or more general contexts.

If you have no experience with HTCondor, we recommend that you contact us before running any job so we can give you a quick introduction (bear in mind that you will be using other users' computers and there are some basic guidelines that you must follow to avoid disturbing them).

The HTCondor infrastructure at the IAC has been recently expanded and improved, with about 100 new Linux desktop PCs financed by the Ministry of Economy and Competitiveness through FEDER funds, code IACA13-3E-2493.

Introduction

What is HTCondor?

Here at IAC we have several Supercomputing resources that allow you to obtain your computational results in much less time and/or work with much more complex problems. One of them is Condor, or HTCondor since it is a High Throughput Computing (HTC) system. The underlying idea is quite simple (and powerful): let's use idle machines to perform computations while their owners are away. So, in a nutshell, HTCondor is an application that is installed in our PCs to make it possible to run a large number of yours and others' computations at a time in different machines when they are not being used, achieving a better utilization of our resources. A more detailed overview of HTCondor is available at the official documentation and in the old pages about HTCondor at IAC.

How can HTCondor help us?

HTCondor is very useful when you have an application that has to run a large number of times over different input data. For instance, suppose you have a program that carry out some calculations taking an image file as input. Let's say that the processing time is about one hour per image and you want to process 250 images. Then you can use your own machine and process all images one by one, and wait more than 10 days to get all results, or you can use HTCondor to process each image in different computers and hopefully get all results in one hour, or maybe two or four, but for sure less than 10 days. And HTCondor will do all the work for you: it will copy the input files to the remote machines, execute your program there with different inputs and bring back the results to your machine when they are complete.

HTCondor has also other powerful features. For instance, you can also use it to create specific or periodic checkpoints of your C, C++ or fortran programs. Then you can normally run your application (without using HTCondor) and if something happens (your program crashes, PC is unexpectedly halted, etc.), you will be able to restart the execution from any of the checkpoints, saving a lot of time. Please, read this introduction and the other sections to discover how HTCondor can help you.

How powerful is HTCondor?

HTCondor calls "slot" the unit that executes a job, typically a CPU or core if the CPU has several of them. Right now we have around 900 slots (Feb 2017) that might execute applications submitted via HTCondor. It means that everyday more than 21000 hours could be available to run HTCondor jobs, more than 2 years of computation in just 24 hours! OK, this is the theoretical limit if no one were using their computers and all slots were idle... ;) The number of idle slots is always changing, but based on our experience a more realistic number of idle slots could be around 400 slots in office hours and 650 at nights and weekends, so still we have more than one year of HTCondor computation time per day.

You can see the real-time HTCondor statistics here: http://nectarino (Pool Resource Stats show the number of slots being used by their owners, by HTCondor and the idle ones; while Pool User Stats show the number of HTCondor jobs and consumed hours per user). If you want more detailed info about which and when jobs have been executing on specific machines, check stats at http://carlota:81/condor_stats/. Also you can visit the Hall of Fame of HTCondor.

Which machines are running HTCondor?

HTCondor is already installed in most of the Desktop PCs running Linux that we have at IAC Headquarters in La Laguna, with a total number of more than 230 machines.

If you are concerned about hardware specifications, you may know that those machines are rather heterogeneous and its availability and specifications change from time to time. At the present status (Sept, 2014), most CPUs are Intel (and also some AMD) from 2.40 to 3.20 GHz. Each CPU has typically 2, 4 or 8 cores, although there are also more powerful machines with up to 32 cores per CPU. As for memory, the most common is 2GB per slot, while some of them have from 3 to 8GB per slot and a few just 1GB per slot.

On the other hand, software specifications are quite homogeneous and all machines are running the same OS: Fedora Linux. Almost all machines run Fedora21 (as of June 2017, there are still a few machines running older versions, and a handful with Fedora 25). Installed software should be also more or less the same in every machine (see the software supported by the SIE), which makes it easy to run almost every application in any machine (although the available software could be different in some machines that belong to the Instrumentation area).

If your application has special requirements about memory per slot, OS version, etc., you can rank and/or limit these parameters and also a quite large set of other ones. Please, visit FAQs page for more information and examples.

Who can use HTCondor? How does it work? Do I need to change my application?

If you have a computer account at the IAC and you can log in on a Linux PC Desktop connected to the internal network, then you should be able to use HTCondor with no problems (try condor_version command to check whether HTCondor is installed. Please contact us if it is not or you experience any issue).

HTCondor is a batch-processing system, so you only need to submit your jobs to the HTCondor queue and it will do all the work. The submission is done using a HTCondor script where you specify your executable, its arguments, inputs and outputs, etc. (visit HTCondor submit files page to see some examples and recommendations). You do not need to prepare or compile your programs in any special way to run them, and almost all programming languages that are commonly used at IAC should be suitable to be run with HTCondor (shell scripts, Python, Perl, C, Fortran, IDL, etc.). Sometimes a few minor modifications may be needed in order to specify arguments and the locations of inputs or outputs, so that HTCondor can find them, but that should be all.

Once the submitted jobs are in HTCondor queue, it uses its allocation algorithm to send and execute your jobs on those idle slots that satisfy your requirements. Idle slots are those located in machines where there has been no keyboard/mouse activity for a long while and the computer load is low enough to ensure that there is no interference with the owner's processes. While HTCondor is running its jobs, it also keeps checking that the owner is not using the machine. If HTCondor detects any activity in the computer (for instance, a key is pressed), then it will suspend all its jobs and wait a little while to see whether the machine gets idle again so as to resume the jobs. If the owner keeps working, HTCondor will interrupt all jobs and send them to other available slots in any other idle machine. HTCondor will repeat this process till all jobs are done, sending notifications via email when they are finished or if any errors show up.

I am using HTCondor, should I add an acknowledgement text in my publications?

Yes, you should mention it in the acknowledgments of your papers or any other publications where you have used HTCondor. Although there is no standard format, we suggest the following:

"This paper made use of the IAC Supercomputing facility HTCondor (http://research.cs.wisc.edu/htcondor/), partly financed by the Ministry of Economy and Competitiveness with FEDER funds, code IACA13-3E-2493."

If you have used any other IAC Supercomputing facilities (LaPalma, TeideHPC, etc.), please, add them in the acknowledgments too:

LaPalma: "The author thankfully acknowledges the technical expertise and assistance provided by the Spanish Supercomputing Network (Red Española de Supercomputación), as well as the computer resources used: the LaPalma Supercomputer, located at the Instituto de Astrofísica de Canarias."

TeideHPC: "The author(s) wish to acknowledge the contribution of Teide High-Performance Computing facilities to the results of this research. TeideHPC facilities are provided by the Instituto Tecnológico y de Energías Renovables (ITER, SA). URL: http://teidehpc.iter.es/"

I need more information or have some problems, who can help me...?

If you need further information, please check the other pages about HTCondor at the SIEpedia: Useful Commands, Submit Files (description and examples), Submit Files (HowTo), FAQs, , etc. HTCondor at SIEpedia is continuously updated, but we also have more documentation about older versions of HTCondor at the the HTCondor section at IAC (most of that information is still valid, but some may be obsolete, including broken links). For detailed and complete information, check the official documentation about HTCondor.

If you need help or you are having any kind of issues related to HTCondor, the SIE gives direct support to IAC's users who want to use HTCondor: we will not code your whole application, but we help and advise you about how to get the most out of HTCondor: use its commands, create submit files, modify your application to run it with HTCondor (in case it is needed), fix common mistakes, etc. We also organize workshops about HTCondor for IAC's users (the last one was on February, 25th 2014 - slides available), and we can organize a new workshop on demand if you and your colleges need it: if the group is large enough -10 or 12 people-, just contact us!). Even if HTCondor is not suitable for your needs, we may help you to get your results in less time using other Supercomputing resources at IAC, like High Performance Computing (HPC) in LaPalma Supercomputer, etc.

Contact data: Antonio Dorta: adorta@iac.es | Phone ext.: 5278 | Office 1124 (end of corridor #1, 1st floor)

Check also:





Section: HOWTOs

edit · print · PDF
Page last modified on August 16, 2017, at 05:17 PM