Presentation Supercomputing Computer Database SIEpedia SIE News

Follow @SIEie_IAC
Logo SIE

Supercomputing News

Here we publish the most relevant news and announcements about our supercomputing resources and the HTCondor queue management system.

07 August 2018: HTCondor updated to v8.6.12

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.6.12

15 June 2018: HTCondor updated to v8.6.11

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.6.11

21 March 2018: HTCondor updated to v8.6.10

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.6.10

11 January 2018: HTCondor updated to v8.6.9

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.6.9

17 November 2017: New supercomputing machines

Several new supercomputing PCs (aka "burros") have been added to the IAC pool. All relevant information and usage note can be found in our supercomputing page.

10 July 2017: HTCondor updated to v8.6.4

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.6.4

10 May 2017: HTCondor updated to v8.6.3

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.6.3.

25 April 2017: HTCondor updated to v8.6.2

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.6.2.

13 March 2017: HTCondor updated to v8.6.1

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.6.1.

13 December 2016: HTCondor updated to v8.4.10

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.4.10.

18 July 2016: HTCondor updated to v8.4.8

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.4.8.

25 April 2016: HTCondor updated to v8.4.6

HTCondor (https://research.cs.wisc.edu/htcondor/) has been updated to version 8.4.6.

05 February 2016: HTCondor updated to v8.4.4

HTCondor (https://research.cs.wisc.edu/htcondor/) updated to version 8.4.4.

21 September 2015:HTCondor updated to v8.4.0

HTCondor (https://research.cs.wisc.edu/htcondor/) updated to version 8.4.0, which includes a many improvements and new interesting features. For a quick description of them, please read our entry in the September/October-2015 SIENews issue.

17 August 2015: HTCondor updated to v8.2.9

HTCondor (https://research.cs.wisc.edu/htcondor/) updated to version 8.2.9, which includes a fix to correctly interpret the "Load Average" with Linux kernels 4.x. For a general introduction to HTCondor, and some usage notes and tips, please see http://www.iac.es/sieinvens/siepedia/pmwiki.php?n=HOWTOs.Condor

19 June 2014: Access to Teide-HPC

IAC users have access to the Teide-HPC supercomputer (see the Teide-HPC website for details). To get an account in Teide-HPC, please contact us at . An Introduction Manual is available in our intranet.

20 February 2014: Condor Workshop Announcement

A workshop on Condor ("Let your programs fly!") will take place on 25 February 2014, at 9.30. For details about the aim and content of the workshop, please see the Condor Workshop entry in our SIE Courses webpage (text in spanish).

(Addendum 27-Feb-2014) Transparencies and exercise worksheet are now available at the link above.

15 January 2013: Diodo retired from service

After the failure of several of its aging computing nodes, we decided to retire Diodo from service.

All users are encouraged to make use of the LaPalma supercomputer, which has been recently expanded and now boasts 1024 CPU cores, for a total of 1 TB of RAM. The percentage of time reserved for IAC users is now 50%.

30 March 2011: Diodo ready again (after some changes)

Diodo is "open for business" again. I intend to do some more changes to Diodo, but those will have to wait (provisionally scheduled for the week starting April 25th), in order not to have Diodo unserviceable for too long.

The main changes to Diodo during this downtime were:

21 January 2011: Diodo to be upgraded

Diodo (http://diodo/ formerly Chimera) is showing signs of its age, so I'm going to install it from scratch with a more modern version of its operating system (and libraries, compilers, etc.). A tentative schedule (if no complications arise) is:

So, it is IMPORTANT that you backup any data that you have in Diodo and that you want to keep BEFORE Sunday February 13th. Otherwise, all data in ALL partitions (INCLUDING HOME DIRECTORIES IN DIODO) will be lost. Also, if you want to start using the new diodo as soon as possible, please let me know what software/libraries/compilers you would like installed, so that I can prepare their installation, if possible, by March 1st.

20 January 2011: Condor Hall of Fame, second semester 2010

As we do every six months, it's the time to publish the usage statistics of the Supercomputing resources at the IAC for the second semester of 2010. In total, 814768.9 CPU hours were delivered during this period. By resource, Condor delivered 577898.8 CPU hours, LaPalma 218551.43 and Diodo 18318.67. Full details of the breakdown by users can be found at the SIE Forum for Condor, LaPalma and Diodo. If you want a piece of this pie and don't know how to start, just let us know.

05 July 2010: CondorHall of Fame, first semester 2010

Tradition dictates that it is now time to publish the usage statistics of the Supercomputing resources at the IAC for the first semester of 2010. In total, 914353.9 CPU hours were delivered during this period. By resource, Condor delivered 440990 CPU hours, LaPalma 282620 and Diodo 190743.90. Full details of the breakdown by users can be found at the SIE Forum for Condor, LaPalma and Diodo. If you want a piece of this pie and don\'t know how to start, just let us know.

08 July 2009: Condor Hall of Fame, first semester 2009

We have published the Condor usage statistics for the first semester of 2009 at http://venus/SIE/forum/viewtopic.php?f=8&t=38&p=686#p686

At the same time, just a reminder that due to the Condor license "Any academic report, publication, or other academic disclosure of results obtained with this Software will acknowledge this Software's use by an appropriate citation." (http://research.cs.wisc.edu/htcondor/license.html). A description of what "an appropriate citation" means can be found at https://lists.cs.wisc.edu/archive/htcondor-users/pre-2004-June/msg00542.shtml

07 July 2009: Chimera Hall of Fame, first semester 2009

We have published the Chimera usage statistics for the first semester of 2009 at http://venus/SIE/forum/viewtopic.php?f=8&t=154&p=687#p687

29 April 2009: Chimera 32 bits partition down

Due to lack of sufficient air-conditioning in the servers' room, some machines had do be turned off. Since the 32 bits partition in Chimera is quite old now, and not many people were using it, I'm afraid they had to go... They will remain switched off for the foreseeable future. But remember that if you have software compiled for 32 bits, in many cases it should run without changes in the 64 bits machines. If you have problems or any doubts, please do get in touch.

19 March 2009: Condor record: 415 CPUs available (and submitting to machines with different architecture)

With dual- and quad-core workstations becoming the norm these days, we have revisited the list of machines available to Condor, and we have managed to set a new record, breaking the 400 CPUs barrier. Roughly 75% are 64 bits CPUs, the remaining ones being 32 bits ones. The details are:
INTEL/LINUX    102
X86_64/LINUX   313

The future is 64 bits, but while we still have 32 bits machines, you should know that the 64 bits CPUs can also run 32 bits codes. This is important because with Condor is very easy to do an "Heterogeneous Submit", for example: submit from a 32bits machine, but ask Condor to execute the code in either a 32bits or 64bits machine. This is not the default behaviour (the default will be to execute in those machines with  exactly the same architecture and operating system as the one from which  you submit), so if you want to make use of this feature have a look at  section 2.5.6 of the manual http://research.cs.wisc.edu/htcondor/manual/v7.0/2_5Submitting_Job.html  or ask me, if in doubt.

25 October 2008: Data in /scratch partition in Chimera to be deleted automatically

From the various file systems/partitions available at Chimera, the /scratch partition (which is shared by all nodes in the cluster) is getting full very quickly, so it is time to set a program to automatically delete files that are not accessed in a given period of time. Those of you who were using Beoiac will remember that this was how its /scratch partition worked as well.

To give you enough time to recover any data that might be useful, this automatically deletion mechanism will not be set up until the 9th of November (in about 15 days). On that date, all files in the /scratch partition that have not been accessed (read, modified, etc.) in the last 60 days will be automatically deleted (it works file by file, so accessing a directory does not save its individual files). This time limit will be reconsidered at a later point if the measure is considered ineffective or excessive.

You should consider the /scratch partition only as temporary storage, and back up all important data somewhere else outside the cluster.

Sorry for the inconvenience, but it is the only way to keep the cluster operational.

02 July 2008: Condor Hall of Fame, first semester 2008

We have published the Condor usage statistics for the first semester of 2008 at http://venus/SIE/forum/viewtopic.php?p=456#456

At the same time, just a reminder that due to the Condor license "Any academic report, publication, or other academic disclosure of results obtained with this Software will acknowledge this Software's use by an appropriate citation." (http://reearch.cs.wisc.edu/htcondor/license.html). A description of what "an appropriate citation" means can be found at https://lists.cs.wisc.edu/archive/htcondor-users/pre-2004-June/msg00542.shtml

01-July-2008: Chimera Hall of Fame, first semester 2008

We have published the Chimera usage statistics for the first semester of 2008 at http://venus/SIE/forum/viewtopic.php?p=457#457

As you can see in the post, the cluster has been used roughly to 73% of its capacity. This is indeed pretty good, given that: some jobs need to occupy a full node, but only use two of the four available CPUs (due to memory constraints); that one of the nodes is often reserved for testing and thus not fully occupied; etc.

17-April-2008: Parallel code development in Chimera

Until recently Chimera had a static reservation of one of the nodes that avoided long running jobs during working hours in that node. This was meant for code development or small tests, but since this was seldom used and meant that 4 CPUs were most of the time unused, I have deleted this static reservation. So Chimera is close to its full 128 CPUs again (one of the nodes needs to be repaired, which should happen during the coming days).

If you need a cluster for tests, you can now use our new mini-cluster Xerenade. This mini-cluster has 16 AMD CPUs, and should be perfect for code development. This is not yet in full production, but if you would be interested in trying it out, please do get in touch.

Perhaps the news about the new installed compiler has gone unnoticed, but we have recently installed in the IAC network the C/C++ and Fortran Portland Group's compilers. For those of you developing parallel codes, it is perhaps interesting to know that this compiler supports "High Performance Fortran (HPF)" (http://www.pgroup.com/doc/pghpf_ug/hpfug.htm).

28-March-2008: Condor version upgraded to 7.0.1

Our version of Condor at the IAC had become a bit old, and it was having some problems with multi-core PCs, so we have upgraded to the latest stable Condor version, 7.0.1, still warm from the oven (released February 27, 2008).

The upgrade went smoothly and without any problems, but if you find anything odd during the coming days, please let us know.

27-December-2007: Chimera Hall of Fame (2007)

As it is now usual towards the end of each year, I have calculated the yearly usage of Chimera, who during 2007 (up to the 26th of December) has delivered 447262.71 CPU hours. You can see all the details at the SIE Forum (http://venus/SIE/forum/viewtopic.php?t=154).

And by the way, if you missed our last SIEminar (Supercomputing resources at the IAC), you can see the slides at http://www.iac.es/sieinvens/SINFIN/Sie_Courses_PDFs/resources_supercomputing.pdf Happy holidays, and let's try to make Chimera work even harder during 2008!

21-November-2007: ABINIT installed in Chimera

Just to let you know (in case it can be of interest), that due to a user request I have installed in Chimera (chi64, the 64 bits machines) the ABINIT software. According to its main page (http://www.abinit.org/):
"ABINIT is a package whose main program allows one to find the total energy, charge density and electronic structure of systems made of electrons and nuclei (molecules and periodic solids) within Density Functional Theory (DFT), using pseudopotentials and a planewave basis. ABINIT also includes options to optimize the geometry according to the DFT forces and stresses, or to perform molecular dynamics simulations using these forces, or to generate dynamical matrices, Born effective charges, and dielectric tensors. Excited states can be computed within the Time-Dependent Density Functional Theory (for molecules), or within Many-Body Perturbation Theory (the GW approximation). In addition to the main ABINIT code, different utility programs are provided."

If you are interested in trying it out, do get in touch.

28-March-2007: Beoiac is gone, long live Chimera ...

As planned, Beoiac is gone. In its place we have now Chimera. It will have 32 nodes, but as of today only 30 have been configured. If you have used Beoiac before, much will be familiar,but there are new things to be learnt, like compiling for 64 or 32 bits, using PVFS, etc. Instructions (very preliminary yet) on how to use the cluster are in the Cluster Documentation Page (website no longer operative).

Next week I will be away, so if you would like to use the cluster during that time and you don't have an account yet try to get in touch with me tomorrow.

Overall the cluster is working OK, but there are still a number of things to be configured, so if you find any problems please get in touch. Also, remember that although the cluster is primarily for parallel codes, serial codes are also permitted, though they will have a very low priority in the queueing system.

31-August-2006: Re. the new 64 bits Beowulf

At last the air-conditioning in the machine room was installed and is working fine, so we got the green light to start the installation of the new 64 bits Beowulf. This installation is relatively complex, as there are many things to test, and many things to install before putting it into production. This is specially true since the transition from 32 to 64 bits adds a number of new challenges, so I cannot tell you when it will be ready for regular use, but I will keep you informed.

In order to make this new cluster a better experience for everyone, I would ask you two things:

  1. If you have a production code that is easy to compile (without too many dependencies), easy to run, and easy to test for performance, please get in touch. Otherwise I will tune the system with standard benchmarks, but there is no benchmark like your code, and tuning the system for it will make your code go faster.
  2. If you have any suggestions on how to make the cluster more usable (for example, by installing new libraries, changing scheduling policies, changing the quotas, etc.), please let everybody know, by posting a message in the SIE Forum at http://venus/SIE/forum/viewtopic.php?t=122
26-April-2006: Condor Code of Conduct

Condor is now beyond its testing phase, as it has proven very stable and useful, but in order to avoid affecting other users, we have written a small Code of Conduct to which you should stick when using it. I include a copy below.

Condor is a terrific tool for performing parametric studies and other type of jobs that can run simultaneously and independently in a number of machines. Nevertheless, under certain circumstances, if you are not careful you can bring the network to a crawl. To avoid these situations, please stick to this simple code of conduct:

  1. Submit jobs only from your machine or from a machine whose owner you have contacted and is aware of the extra load that you will put on it. No submission from public machines, sorry! (For each Condor running job, there is a process running in the submitting machine, plus lots of network connections, so the submitting machine pays a big toll, which is not fair to pass it to someone else unawares).
  2. If you plan to run I/O intensive code (i.e. code that reads or writes to disk very large files, or small ones but very often), get in touch with me first. Depending on how I/O intensive your code is, it might not be worth it to use Condor, or I might be able to offer you counsel on how to best do it. Hopefully your Condor submission will perform faster if we take this into account.
  3. Test your submission. Don't go nuts and submit a 10000 jobs submission without first making sure the whole thing will work with a smaller subset. Start small, verify that things are going OK, check the logs to see that the jobs can access all the necessary files, etc. and only when you are satisfied that things are working go for the big submission.

Please stick to these basic rules so that we can avoid Condor affecting other users' work.

26-October-2005: Change in scheduling policies of cluster

Until now the priority policies in the cluster took into consideration a number of parameters, but not how much each user had used the cluster during the preceding days, as the cluster was not heavily used and this didn't seem necessary.

As you might have noticed, this has changed, and the queueing time can be now large. Thus, to ensure fairness amongst all users, a new "fairshare" parameter is now taken into consideration when calculating job priorities. Basically, the less you have used the cluster during the preceding days (right now 7 days), the greater this fairshare component will be, thus giving an advantage to your jobs over the jobs of users who have used the cluster recently. This should make the use of the cluster fairer for everyone.

As always, don't hesitate to let me know any suggestions/comments/etc.

26-August-2005: New feature to improve efficiency of long jobs with Condor

As you know, Condor is great for short jobs, but when running long jobs the efficiency can decrease due to evictions. One type of eviction happens when your Condor job is running in a machine and the "owner" comes back to his/her workstation. If your job had been running for only 20 minutes it is not a big deal, but if your job was about to complete after running for 10 hours, then the efficiency suffers a bit.

You could avoid sending long jobs to Condor by splitting the execution in small parts. If this is not possible, then you could consider submitting your jobs to the Standard Universe. From the Condor manual: "Jobs submitted to the standard universe may produce checkpoints. A checkpoint can then be used to start up and continue execution of a partially completed job." More info about the Standard Universe at:
http://research.cs.wisc.edu/htcondor/manual/v6.8/2_4Road_map_Running.html#SECTION00341100000000000000

If the standard universe is not possible either, then you could make use of the brand new feature I have just implemented. The idea is to submit your jobs to those workstations in which the owner has had very little activity in the past, thus (assuming that past behaviour is a predictor of future behaviour) reducing the risk of possible evictions. So, how does it work? Very simple, you should just add to your submit file

"Rank = owner_inactivity"

owner_inactivity is a value which is increased by one every 15 minutes if the machine is not being used by its "owner". If you are curious about these values for all the machines in the Condor pool, you could run the command: condor_status -format "%d " owner_inactivity -format "%s \n" Machine -sort owner_inactivity which will print all the machines with their corresponding owner_inactivity values in ascending order (right now the values are still very small because I just started the feature, but you will see them growing).

If you want to mix this rank expression with another rank expression, check the examples at:
http://research.cs.wisc.edu/htcondor/manual/v6.8/2_5Submitting_Job.html#SECTION00352300000000000000

With this feature, your jobs will try to run first on machines that have been unused by their owners for a long time, which should improve your chances of avoiding eviction. As always, let me know if you find any issues with this.