Skip to main content.

Performing Calculations in Linux environment

Login

  Currently, two machines are available in the computer center of MLU theoretical physics department for interactive work and job submission: cluster1 and cluster3. You can connect to these machines using SSH protocol. For this purpose an SSH client, such as PuTTY or SSH secure shell, must be installed on your PC. In order to be able to work with graphical applications such as emacs, xmgrace, etc., you will also need X11 server, such as Xming, to be running on your computer. Do not forget to enable the X11 forwarding.

Detailed information about the CPU and memory configuration of these machines can be found in the files /proc/cpuinfo /proc/meminfo.

Editing files

  There are two basic editors in Linux environment, vi and emacs. Vi editor opens in the terminal window, it does not require X11 for its work. At the same time it has very limited capabilities. Emacs is one of the main tools for writing programs. It understands most of the programming languages including make, latex and shell scripts and supports the syntax highlighting. Its default shortcuts are also compatible with shortcuts of bash shell.

How to avoid programming

  Very often we need to prepare our data for plotting. Some basic operations such as rescaling, algebraic operations on the set of numbers, extraction of columns from multicolumn files, etc., can be performed without writing programs in fortran or c to do these specific tasks. Awk programming language allows to do this very efficiently. A version of the awk language is a standard feature of nearly every modern Unix-like operating system. Below we consider two examples.

Example 1: Absolute value
Assume that file xy.dat contains three columns with energies and corresponding real and imaginary part of some function. We would like to create a file (z.dat) with energies and absolute values of this function. This can be done as follows:
awk '{print $1, sqrt($2^2+$3^2)}' xy.dat >z.dat

Example 1: Data extraction

Assume we have a large output file (out.dat) that among others contains starting with a keyword Energy followed by a numerical value. We would like to extract these values:

awk '/Energy/{print $2}' out.dat

Organizing files

  It is a good practice to have the file structure in your home directory similar to that of Linux filesystem. Keep all your programs and important data in your home directory. It is backed up. Keep intermediate data in /lustre/username. It is not backed up, but it is fast and has plenty of space.

It is preferable to have all files with source code in a single directory. In this case it is easier to compile and maintain your programs. A proper tool for that is make. Below we explain the syntax of Makefile adapted to our environment.


Declare possible file extensions that make must understand
.SUFFIXES: .f90 .f .F .c .o

Describe rules for the
compilation. Lines below instruct make how a fortran file (with extension f90) should be compiled, i.e., transformed into the object file.
-c flag is requires that only compilation is performed (no linking).
$< is automatic make variable that denotes the actual name of f90 file to be compiled according to the rule (the first dependency of a rule)
.f90.o:  
    $(FC) $(FFLAGS) -c $<

Variables that denote names of compilers, paths to the libraries, etc., are set up as follows:
FC=ifort
FFLAGS =-O3
MKLPATH9.0=/opt/cluster/intel/mkl9.0/lib/em64t
LIBS9.0=-L$(MKLPATH9.0) -lmkl_lapack -lmkl_em64t -lmkl
ifort is a name of Intel fortran compiler,
-O3 is a parameter that sets the compiler optimization level,
MKLPATH9.0 denotes the path to the MKL mathematical library,
LIBS9.0 are the flags passed to the linker in order to inform which libraries should be used.

It is good programming practice to split a project into several files. This technique in its extreme form means that each function or subroutine resides in a separate file. Although it is a matter of personal taste one should be aware that this approach has numerous pitfalls. One should not forget to include additional options for the linker in order to perform the inter-procedural optimization. Additionally, for the fortran programming language, it means the necessity of creation of the interface block for each program unit. That is why it is advisable to combine data and functions that perform one specific task into a module. Make program should know which files belong to which project. Therefore, a list of object files has to be created in the following form:
OBJ1=mymodule.o test1.o
OBJ2=test2.o
OBJ3=test3.o 
In the last section of the Makefile we must specify rules for the linking:
test1.x:$ $(OBJ1) #
    $(FC) -o $@ $(OBJ1) $(LFLAGS)
test2.x:$ $(OBJ2) #
    $(FC) -o $@ $(OBJ2) $(LFLAGS)
testlib.x:$ $(OBJ3) #
    $(FC) -o $@ $(OBJ3) $(LFLAGS) $(LIBS9.0)

Each declaration takes two lines. In the first line the dependencies are shown. For example, test1.x can only be obtained if all $(OBJ1) files are already available. The second line describes how the linking should actually be performed. $@ is another automatic make variable denoting the target of a rule. In order to link testlib.x program one also needs to provide linker with instructions where and which libraries have to be found.

Fortran and C compilers

 Our computer cluster comprises machines of different architectures. Full configuration of each node in the cluster can be obtained by command

qhost
or for more data
qstat -F
If you are overwhelmed with information use grep or egrep in order to filter up only essential one.  For example this command will show you the name of the node, architecture, number of processors and total memory.
qstat -F | egrep "@node|arch|num_proc|mem_total"
Despite a variety of species inhabiting this zoo there are just few essential distinctions. 32 vs. 64 bit architecture and AMD vs. Intel. Intel C and Fortran compilers can be used on all machines. However, programs compiled for 64 bit architecture won't run on the 32 bits. There are also different versions of software installed on the cluster. They, however, have almost identical syntax. Therefore, use native Intel reference manuals for the C and Fortran compilers as well as for the MKL library

Performance

 There are numerous books and web pages on the Internet where you can find advices how to write an efficient code. Your work, however, will be most efficient if you do not program at all! This can be achieved by using highly optimized standard libraries. A spectacular illustration you can find on the Intel web page. However, keep in mind that not every function of this library will be available on machines of different architecture. In order to write programs that are easy to migrate use only those functions that are contained in free program packages, such as ATLAS for basic linear algebra subprograms (BLAS) or LAPACK.

Use 64 bit architecture if your code needs for then 2GB of memory

Setting the environment

 In order to be able to use large variety of tools The Environment Modules package is installed on the cluster. Use
module list
in order to see your current configuration. To see all available configurations use:
module avail
You can load, unload, purge, etc. in order to set the environment that satisfies your needs.
 

Submit a job

 The purpose of the cluster1 or cluster3 machines is interactive work and submission of calculations to the queuing system. Do not perform calculations directly on these machines! In order to manipulate and observe jobs in the queue use following q-commands. Typically a job can be submitted as follows:

qsub sge_openmpi64.x
where sge_openmi64.x is the name of the script to be executed. All system settings can be made in the script.



Graphic programs under Linux (free software)

    blender      
     gimp  
     gnuplot (2D - , 3D - plotting, FAQs see here )
 
     xfig    

     xmgrace

     xv