HOWTOs

LaPalma3 (1): Introduction

Please note that all the SIEpedia's articles address specific issues or questions raised by IAC users, so they do not attempt to be rigorous or exhaustive, and may or may not be useful or applicable in different or more general contexts.

Introduction

This user's guide for the LaPalma Supercomputer (v3) is intended to provide the minimum amount of information needed by a new user on this system. As such, it assumes that the user is familiar with many of the standard aspects of supercomputing as the Unix operating system.

We hope you can find most of the information you need to use our computing resources: from applications and libraries to technical documentation about LaPalma; how to include references in publications and so on. Please read carefully this document and if any doubt arises do not hesitate to contact our support group at res_support@iac.es

System Overview

LaPalma comprises 252 IBM dx360 M4 compute nodes. Every node has 16 cores (Intel E5-2670) at 2.6 GHz, running Linux operating system with 32 GB of memory RAM (2 GB per core) and 500GB of local disk storage. Two Bull R423 servers are connected to a pair of Netapp E5600 storage systems providing a total amount of 346 TB of disk storage accessible from every node through Lustre Parallel File System. The networks that interconnect the LaPalma are:

  • Infiniband Network: High bandwidth network used by parallel applications communications.
  • Gigabit Network: Ethernet network used by the nodes to mount remotely their root file system from the servers and the network over which Lustre works.

File Systems

IMPORTANT: It is your responsibility as a user of the LaPalma system to backup all your critical data. NO backup of user data will be done in any of the filesystems of LaPalma.

Each user has several areas of disk space for storing files. These areas may have size or time limits, please read carefully all this section to know about the policy of usage of each of these filesystems.

There are 3 different types of storage available inside a node:

  1. Root filesystem: It is the filesystem where the operating system resides
  2. Lustre filesystems: Lustre is a distributed networked filesystem which can be accessed from all the nodes
  3. Local hard drive: Every node has an internal hard drive

Root Filesystem

The root file system, where the operating system is stored does not reside in the node, this is a NFS filesystem mounted from a Network Attached Storage (NAS).

As this is a remote filesystem only data from the operating system has to reside in this filesystem. It is NOT permitted the use of /tmp for temporary user data. The local hard drive can be used for this purpose as you could read later.

Furthermore, the environment variable $TMPDIR is already configured to force the normal applications to use the local hard drive to store their temporary files.

Lustre Filesystem

Lustre is an open-source, parallel file system that can provide fast, reliable data access from all nodes of the cluster to a global filesystem, with a remarkable scale capacity and performance. Lustre allows parallel applications simultaneous access to a set of files (even a single file) from any node that has the Lustre file system mounted while providing a high level of control over all file system operations. These filesystems are the recommended to use with most jobs, because Lustre provides high performance I/O by "striping" blocks of data from individual files across multiple disks on multiple storage devices and reading/writing these blocks in parallel. In addition, Lustre can read or write large blocks of data in a single I/O operation, thereby minimizing overhead.

Even though there is only one Lustre filesystem mounted on LaPalma, there are different locations for different purposes:

  • /storage/home: This location has the home directories of all the users. When you log into LaPalma you start in your home directory by default. Every user will have their own home directory to store the executables, own developed sources and their personal data.
  • /storage/projects: In addition to the home directory, there is a directory in /storage/projects for each group of users of LaPalma. For instance, the group iac01 will have a /storage/projects/iac01 directory ready to use. This space is intended to store data that needs to be shared between the users of the same group or project. All the users of the same project will share their common /storage/projects space and it is responsibility of each project manager to determine and coordinate the better use of this space, and how it is distributed or shared between their users.
  • /storage/scratch: Each LaPalma user will have a directory over /storage/scratch, you must use this space to store temporary files of your jobs during its execution.

The previous three locations share the same quota in order to limit the amount of data that can be saved by each group. Since the locations /storage/home, /storage/projects and /storage/scratch are in the same filesystem, the quota assigned is the sum of “Disk Projects” and “Disk Scratch” established by the access committee.

The quota and the usage of space can be consulted via the quota command:

  usertest@login1:~> lfs quota -hg <GROUP> /storage

For example, if your group has been granted the following resources: Disk Projects: 1000GB and Disk Scratch: 500GB, the command quota will report the sum of the two values:

  usertest@login1:~> lfs quota -hg usergroup /storage
  Disk quotas for grp usergroup (gid 123):
  Filesystem   used     quota        limit        grace  files   quota   limit    grace
  /storage/    500G     1.5T         1.5T         -      700     100000  100000   -

The amount of files is limited as well. By default the quota for files is set to 100000 files.

If you need more disk space or number of files, the responsible of your project has to make a request for this extra space needed, specifying the requested space and the reasons why it is needed. The request can be sent by email or any other way of contact to the user support team.

/storage/apps: Over this location will reside the applications and libraries that have already been installed on LaPalma. Take a look at the directories or to XXXXXX to know the applications available for general use. Before installing any application that is needed by your project, first check if this application is already installed on the system. If some application that you need is not on the system, you will have to ask our user support team to install it. If it is a general application with no restrictions in its use, this will be in stalled over a public directory, that is over /storage/apps so all users on LaPalma could make use of it. If the application needs some type of license and its use must be restricted, a private directory over /storage/apps will be created, so only the required users of LaPalma could make use of this application.

All applications installed on /storage/apps will be installed, controlled and supervised by the user support team. This doesn't mean that the users could not help in this task, both can work together to get the best result. The user support can provide his wide experience in compiling and optimizing applications in the LaPalma platform and the users can provide his knowledge of the application to be installed. All that general applications that have been modified in some way from its normal behavior by the project users' for their own study, and may not be suitable for general use, must be installed over /storage/projects or /storage/home depending on the usage scope of the application, but not over /storage/apps.

Local Hard Drive

Every node has a local hard drive that can be used as a local scratch space to store temporary files during executions of one of your jobs. This space is mounted over /scratch directory. The amount of space within the /scratch filesystem varies from node to node (depending on the total amount of disk space available). All data stored in these local hard drives at the compute nodes will not be available from the login nodes. Local hard drive data is not automatically removed, so each job should have to remove its data when finishes.

Acknowledgments

Please, add next text to your publications if they are based on the results obtained in LaPalma:

  The author thankfully acknowledges the technical expertise and assistance provided by the 
  Spanish Supercomputing Network (Red Española de Supercomputación), as well as the computer 
  resources used: the LaPalma Supercomputer, located at the Instituto de Astrofísica de Canarias.

Where can I get support if I have any issue?

    usertest@login1:~> man command-name
which displays information about that command to the standard output. If you don't know the exact name of the command you want but you know the subject matter, you can use the -k flag. For example:
    usertest@login1:~> man -k compiler
This will print out a list of all commands whose man-page definition includes the word 'compiler'. Then youcould execute the exact man command line to know about the exact command you were looking for.Just to know more about the man command itself, you could also type:
    usertest@login1:~> man man
  • If you need help or further information, please, contact us sending an email to res_support@iac.es