Batch deployment
Last updated
Last updated
Dynpart is at the moment available on INDIGO-1 repository (only for CentOS) at . Very soon we will have the INDIGO-2 release
Install the LSF side package:
You must have the epel repository enabled:
$ yum install epel-release
Then you have to enable the INDIGO - DataCloud packages repositories. See full instructions . Briefly you have to download the repo file from in your /etc/yum.repos.d folder.
Finally install the dynpart package.
Install the LSF side package:
For updating from the INDIGO-1 release
Update the Dynpart package
On the LSF master, installing this package basically create and deploy following directories and files:
IMPORTANT NOTE :
Please create this link according to the variable $LSF_SERVERDIR
which depends on LSF installation.
and check if this link exists if not then create the one:
The list of running jobs on each batch host can be achieved in two alternative ways:
Compiling the C program :
The mcjobs_r.c C program queries LSF through its APIs to retrieve the list of running jobs on each host. Pre compiled binary cannot be distributed due to licensing constraints, thus it must be compiled locally. Following is an example compile command, on LSF9.1; please adapt to your specific setup.
Python script :
Alternative to compiling the mcjobs_r.c is the bjobs_r.py script which produces the same result. It uses the batch command 'bjobs' to retrieve the number of running jobs on a given host.
In /usr/share/lsf/conf/lsf.cluster.<clustername>
file check the host section.
In the Host section specify usage of the dynp elim on each WN participating in the dynamic partitioning. Following is an example Host section:
Define the dynp External Load Index In the Resource Section of lsf.shared
:
Declare use of the custom ESUB method. Add the following in lsf.conf
:
Note: The provided esub.dynp assumes that no other esub method is in place. If so, you must adapt it to your specific case.
Verify LSF configuration is ok using the command:
If everything is ok (no errors found) reconfigure and restart lim on all nodes in the cluster:
[root@lsf9test ~]# lsadmin reconfig
Checking configuration files ... No errors found.
Restart only the master candidate hosts? [y/n] n
Do you really want to restart LIMs on all hosts? [y/n] y
Restart LIM on ...... done
Restart LIM on ...... done
Restart LIM on ...... done
Restart LIM on ...... done
Note: you can only manually restart lim on a subset of nodes, if needed. For example, if you configure dynp for more nodes in lsf.cluster. cluster and want to make them partition aware you can restrict limrestart to those nodes only.
Next, restart the Master Batch Daemon
[root@lsf9test ~]# badmin mbdrestart
Little after the header output line of the lsload -l command will display the new External Load Information value dynp.
After some time (limrestart takes several minutes to take effect, even on a small cluster) the value 1 should be reported by each node configured to play dynp. other cluster members would display a dash.
elim.dynp
This is a custom External Load Information Manager, specific to LSF,created to enable implementation of the Functionalities and conformant to the LSF guidelines. It assumes to be properly configured at batch system side.
submitter_demo.py
keep submitting jobs to a specified queue
Make sure you have added the INDIGO package repository to you package sources. The package repository can be found at the