| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| taskmanager-0.9.2.tar.gz | 2012-07-11 | 370.0 kB | |
| README | 2012-07-11 | 7.1 kB | |
| Totals: 2 Items | 377.1 kB | 1 | |
TASKMANAGER
===========
TaskManager manages calculation jobs in a computer cluster environment
TaskManager is an open source infrastructure software for distributing and managing calculation jobs
in a Unix computer cluster environment. The TaskManager was designed to control the utilization of a
set of hosts even if you are not the administrator of the system. The hosts are embedded in a Unix
environment and the user's home directories are mounted on each host. The hosts may have different
numbers of CPUs/cores and different kernels. Keep in mind that a user is able to log into each host
and calculate on it. However, he should use the TaskManager to submit calculation jobs to the
cluster to avoid an overload of the hosts. Jobs which are under the control of the TaskManager are
executed on a host of the computer cluster with the rights of the respective user to ensure that the
executing jobs have the permission to access the user's files.
The TaskManager package consists of several servers, TaskDispatcher, TaskManagerServer and
InfoServer, and several clients, which communicate with these servers. The main server,
TaskDispatcher, is responsible for receiving jobs from users, storing detailed information about
each job, sending jobs to vacant computer in a cluster, and controlling their execution. A
TaskManagerServer is invoked by each user, respectively, in the background with his Unix
permissions. Therefore, it has the rights to access user specific data in the file system. Finally,
the InfoServer is invoked on every host in the cluster to gather information about the computer. The
servers and clients communicate over secure socket layers (SSL) authenticated with certificates,
which are generated by you or the taskmanager admin.
See https://sourceforge.net/apps/mediawiki/tmpackage/index.php?title=Main_Page for more info
System requirements
===================
- Unix system with a python installation
- One or more computers with one or more cores
- Enable login into each computer via ssh without typing a password
- Access to the home file system over ssh
Additionally to the Servers and Clients, the TaskManager package contains a Zope product for a
webinterface. For that, a Zope webserver is required.
Installation & Configuration
============================
In order to install the TaskManager follow the given configuation steps:
1. Copy entire TaskManager directory to a directory of your choice, e.g., /usr/local/opt
2. (optional) Create a frozen version of several python programs with freeze. Modify and execute script
scripts/mkDist.sh
For each program all necessary python libraries are copied to a single directory and a binary is
created. Therefore a load of python libraries over an intranet is not anymore necessary and it makes
the execution faster. Modify wrapper scripts in the directory bin/ in order to invoke the compiled
version respectively.
3. Set permissions of wrapper scripts in bin/ that every user can execute it.
4. Create certificate for user taskdispatcher who represents the TaskDispatcher. Modify and execute
script
cd scripts; ./createCertificate.sh taskdispatcher <YOUREMAIL>; cd -
5. Create for each user a certificate
cd scripts; ./createCertificate.sh <USER> <USEREMAIL>; cd -
For each user the following files are created in etc/certs
<USER>.key private key of user
<USER>.csr certificate request file (not necessary)
<USER>.crt self signed certificate of user
ca_certs.<USER>.crt certificates of taskdispatcher and user himself
6. Each user has to create the directory .taskManager in his home directory and copy the following
files to that directory:
<USER>.key
<USER>.crt
ca_certs.<USER>.crt
7. Include all users who are allowed to use the TaskManager and their certificate files into
etc/users
8. Create file which include all certificates of users
cd scripts; ./createAuthorizedCertsFile.sh; cd -
9. Assign users to groups
etc/groups
10. Configure computer cluster by given information about each computer in the cluster in
etc/ComputerCluster.config
11. Start TaskDispatcher
cd Server
python TaskDispatcher -e ../var/TaskDispatcherError.log -p 101010
Several Commands
================
Get help
bin/hRunJob -h
Get status of TaskManagerServer and TaskDispatcher
bin/hRunJob -s
Connect directly to TaskDispatcher and get help
bin/hSend localhost 101010 "help"
Activate computer (if you have permissions given in etc/groups)
bin/hSend localhost 101010 "activatehost:localhost"
Send job to cluster
bin/hRunJob "sleep 10"
Files and directories
=====================
|-- Client
| |-- hListen.py
| |-- hRun.py
| |-- hRunJob.py
| `-- hSend.py
|-- README
|-- Server
| |-- InfoServer.py
| |-- TaskDispatcher.py
| `-- daemon.py
|-- UserServer
| |-- TMS.py
| `-- daemon.py
|-- ZopeProduct
| `-- ZPTaskManager
| |-- ZPTaskManagerSite.py
| |-- __init__.py
| |-- css ... directory with css files
| |-- js ... directory with js files
| |-- lib
| | `-- hSocket.py
| |-- pics ... directory with pic files
| |-- refresh.txt
| `-- zpt
| |-- AddComputer.zpt
| |-- ChangeComputer.zpt
| |-- ChangeTaskDispatcher.zpt
| |-- ChangeTaskDispatcherProperties.zpt
| |-- HostInfoSite.zpt
| |-- JobInformation.zpt
| |-- ProcSite.zpt
| |-- ShowJobs.zpt
| |-- ShowPendingJobs.zpt
| |-- TaskDispatcherLog.zpt
| |-- TaskDispatcherOutput.zpt
| |-- TopSite.zpt
| |-- ZPTaskManager.zpt
| |-- ZPTaskManagerAdmin.zpt
| |-- ZPTaskManagerFinishedJobs.zpt
| `-- addZPTaskManagerSiteForm.zpt
|-- bin
| |-- TMS
| |-- hRun
| |-- hRunJob
| `-- hSend
|-- etc
| |-- ComputerCluster.config
| |-- TaskDispatcher.info
| |-- certs
| |-- groups
| `-- users
|-- lib
| |-- hSocket.py
| `-- hTMSConnection.py
|-- scripts
| |-- createAuthorizedCertsFile.sh
| |-- createCertificate.sh
| |-- createCertificates.txt
| |-- mkDist.sh
| `-- startTaskDispatcher.csh
`-- var
Changelog
=========
Since 0.9.2
-----------
bugfix: added missing files
bugfix: check for vacant hosts in TaskDispatcher
usability: modified Zope admin page (dynamic visualization of loads of each host)
system: added possibility in TaskDispatcher/hRunJob to exclude hosts from cluster temporarily by each user independently
system: extracted TMSConnection class as library
system: improved handling of jobs where ssh connection has failed
system: improved output of TaskDispatcher in terminal
system: switched to ssl socket communication in Zope product
system: general code cleanup
Since 0.9.1
-----------
bugfix: error handling
bugfix: correct termination of TMS
system: added wrapper scripts for executables
system: added several new commands to TaskDispatcher
system: general code cleanup
Since 0.9
---------
initial import