Supercomputing at the IAC

Why Supercomputing?

Supercomputing is a general term that encompasses any high-speed computational process, and whose definition changes as new computing methods are developed. There is a large bibliography about this topic, but in this page we focus on the advantages that can be achieved by IAC researchers by using Supercomputing. If you are interested in theoretical aspects, this Wikipedia article about Supercomputing gives you a good overview and many references and links to further information.

The main reason to use Supercomputing is to get your computational results in less time. That time could be reduced by factor of 1.5, 2, 5, 10, 100, 1000, ... the limit will depend on the restrictions of your problem and program, the Supercomputing techniques you apply and the available resources to compute it. Even if time is not a limiting factor in your computations, using Supercomputing you may be able to work with much bigger problems than those you were previously able to, in the same time.

Teide-HPC and LaPalma Supercomputers (Parallel Computing)

Parallel Computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain faster results (see also: Parallel Computing by Wikipedia). If you have a problem that requires a huge amount of calculations to be processed, but some of those operations are independent and could be performed at the same time, then you should consider using Parallel Computing in order to get your results in less time. Algorithms with huge loops and iterations with no (or a few) dependencies among them (like simulations of galaxies) are good candidates to be parallelized. Once you have your parallel code, a Supercomputer is needed in order to run it.

Researches at IAC have access to two Supercomputers, Teide-HPC and La Palma:

1. Teide-HPC (Teide High Performance Computing) is a supercomputer located in the Instituto Tecnológico de Energías Renovables S.A. (ITER). It is the second most powerful supercomputer in Spain and it appears in the 169th position (June 2014) within the Top500 list of the most powerful computers in the world. It is composed of 1,100 Fujitsu computer servers, with a total of 17,800 computing cores (featuring the latest in Intel Sandy Bridge processors, allowing to obtain not only the best performance but also great energy efficiency), 36 TB of memory, a high-performance network and a parallel system of NetApp storage. According to Top500 it has a theoretical peak performance of 340.8 TFLOPs with a maximal LINPACK performance achieved of 274.0 TFLOPs.

IAC's users need an account to be able to connect and run their programs in Teide-HPC. Please, send an email to support (res_support@iac.es) to get this account and also to solve any issue related to Teide-HPC. We have also prepared documentation about this machine for IAC's users, it is available at SIEwiki in venus.

2. LaPalma, in its third version LaPalma3, belongs to IAC and it is one of thirteen nodes located on Spanish territory linked together to form the Spanish Supercomputing Network (RES). LaPalma node was previously part of MareNostrum, one of the most powerful computers in Europe. In its present status (March 2018), LaPalma has a pick of 83.85 TFLOPS with 4032 cores Intel Xeon E5-2670 and 2GB of RAM per core. The total disk space is 346 TB and Lustre Parallel Filesystem is available. LaPalma has a fast Infiniband network (40 Gb/s) for internal communication (both computation and storage system). A large set of scientific programs and libraries are already installed, and it is possible to install new software packages on demand, if they are compatible and widely used. 50% of the computation time of LaPalma is assigned to the RES, and the other 50% (4,554,547 hours per four-month period) is available for IAC researchers, who can apply for it at any time, although it is recommendable to do it in the official periods to get higher priority.

Please, visit the following links for more information (some of them are only available from the internal IAC network):

HTCondor (Distributed Computing)

Distributed computing is the process of running a single computational task on more than one computer (see also: Distributed Computing by Wikipedia). For instance, suppose we need to reduce some data using an application we have developed. We have a very large number of sets of data to reduce and the processing time is some hours per each set. If we compute all sets one by one in our machine, it may take several weeks to have all results... Now imagine we have hundreds of machines where we could run our program reducing different sets of data at the same time, then we could get all results just in a few hours!! HTCondor software makes this possible, and it will do all the work for you: HTCondor will copy the input files to the remote machines, execute your program there with different data and bring back the results to your machine when they are complete.

At the IAC the HTCondor system -High-Throughput Computing (HTC) system- is installed in most of the Linux Desktop PCs, allowing us to run our applications (shell scripts, astronomical software, our own programs written in C, Fortran, Python, IDL, Matlab, etc.) in other computers when they are not being used by their owners. At this time (Apr. 2014), more than 230 PCs with about 720 cores are ready to execute other users' programs when they are idle, i.e. you could get the equivalent to one month of serial execution in just one hour! (this is the theorical maximum, as not all HTCondor slots are always idle; a more realistic estimation could be an average value between 350 and 500 idle slots).

Please, visit the following links for more information (some of them are only available when connected to the internal IAC network)

"Burros" (High Performance Linux PCs)

Users who need to run CPU- or memory-intensive jobs, which are unsuitable for their own PCs or other IAC's Supercomputing resources (like LaPalma, TeideHPC, HTCondor system, etc.), can access any of several High performance Linux PCs. They are also suitable for developing, debugging and testing parallel applications before submitting them to other Supercomputers. These machines are open to any user and do not require advanced reservation, but please follow the next simple rules:

1. Before running any job on them, please check their load (with uptime or htop): if it is higher than the number of cores, wait a bit till it goes down before launching your application. Also check that the load does not exceed the number of cores after your program starts.
2. If you are testing your parallel codes, check how many cores are being used and don't take up all the cores.
3. These machines should be used only when developing or testing your parallel programs: if you need to run a parallel aplication for hours or days on a large number of cores, there are better alternatives, such as TeideHPC or LaPalma (please, contact us).
4. Some of these machines have a huge disk space (about 20TB). Don't abuse it! There are no backups of your data on any of these machines, so don't use it like a storage system. Do not forget to delete or move your data to other locations once your executions are done to make room for other users.

Some of these machines are listed here:

Sorry, this list is only available when connecting from IAC's intranet. Contact SIE for further details...

Other HPC Resources

There are more resources available in other institutions that can be accessed by researchers at IAC:

• Universidad de La Laguna (ULL): Thanks to the close relationship between IAC and the ULL, our researchers can have access to some of the resources of the ULL:
• SAII: The "Servicio de Apoyo Informático a la Investigación (SAII)" is a Service that supports researches in computing issues. They have several Supercomputers that may be used by researches at IAC. Please contact this Service to get your account (http://www.saii.ull.es).
• GCAP: The "Grupo de Computación de Altas Prestaciones (GCAP)" of La Universidad de La Laguna (http://cap.pcg.ull.es) is open to collaborations in HPC topics. They have several GPUs that could be used and other resources. If you are interested, please, contact Antonio Dorta (adorta@iac.es) who belongs to that group.
• Red Española de Supercomputación (RES): The RES is an alliance of 8 organizations and its Supercomputers distributed throughout Spain which work together and offer since 2006 a High Performance Computing service to the scientific community. IAC belongs to the RES and LaPalma is one of the available Supercomputers, but we must mention others that are one of the most powerful of Spain and also Europe, like MareNostrum III. You can apply to get access and use these Supercomputers, with usually 3 deadlines every year.
• Partnership for Advanced Computing in Europe (PRACE): At a higher level than RES, you can find PRACE. It consists of 25 member countries whose representative organizations create a pan-European Supercomputing infrastructure, providing access to world class computing and data management resources and services for large-scale scientific and engineering applications at the highest performance level.
• There are some individual institutions that own Supercomputers and it might be possible to get access to them under certain conditions. Since those conditions change from time to time, it is not easy to list all these institutions, but some examples are: ITER (Tenerife), CIEMAT (Madrid), CESGA (Galicia), CESCA (Catalonia), EPCC (Edinburgh), etc.
• Most of the Supercomputing Centres and Networks also offer formation on HPC topics with many courses, education and training programs, seminars, schools, PhD opportunities, etc. Also there are projects and programmes to allow the mobility of researches, so they can visit HPC Centres for a period of time to receive formation in HPC topics and gain access to the facilities. Some examples of that are the RES Education and Training Programs, PRACE Training Events, HPC-Europa Mobility Programme, etc.