Download Latest Version taskmanager-0.9.2.tar.gz (370.0 kB)
Email in envelope

Get an email when there's a new version of TaskManager

Home / taskmanager 0.9.2
Name Modified Size InfoDownloads / Week
Parent folder
taskmanager-0.9.2.tar.gz 2012-07-11 370.0 kB
README 2012-07-11 7.1 kB
Totals: 2 Items   377.1 kB 1
TASKMANAGER
===========

TaskManager manages calculation jobs in a computer cluster environment

TaskManager is an open source infrastructure software for distributing and managing calculation jobs
in a Unix computer cluster environment. The TaskManager was designed to control the utilization of a
set of hosts even if you are not the administrator of the system. The hosts are embedded in a Unix
environment and the user's home directories are mounted on each host. The hosts may have different
numbers of CPUs/cores and different kernels. Keep in mind that a user is able to log into each host
and calculate on it. However, he should use the TaskManager to submit calculation jobs to the
cluster to avoid an overload of the hosts. Jobs which are under the control of the TaskManager are
executed on a host of the computer cluster with the rights of the respective user to ensure that the
executing jobs have the permission to access the user's files.

The TaskManager package consists of several servers, TaskDispatcher, TaskManagerServer and
InfoServer, and several clients, which communicate with these servers. The main server,
TaskDispatcher, is responsible for receiving jobs from users, storing detailed information about
each job, sending jobs to vacant computer in a cluster, and controlling their execution. A
TaskManagerServer is invoked by each user, respectively, in the background with his Unix
permissions. Therefore, it has the rights to access user specific data in the file system. Finally,
the InfoServer is invoked on every host in the cluster to gather information about the computer. The
servers and clients communicate over secure socket layers (SSL) authenticated with certificates,
which are generated by you or the taskmanager admin.

See https://sourceforge.net/apps/mediawiki/tmpackage/index.php?title=Main_Page for more info


System requirements
===================

- Unix system with a python installation
- One or more computers with one or more cores
- Enable login into each computer via ssh without typing a password 
- Access to the home file system over ssh

Additionally to the Servers and Clients, the TaskManager package contains a Zope product for a
webinterface. For that, a Zope webserver is required.


Installation & Configuration
============================

In order to install the TaskManager follow the given configuation steps:

1. Copy entire TaskManager directory to a directory of your choice, e.g., /usr/local/opt

2. (optional) Create a frozen version of several python programs with freeze. Modify and execute script

      scripts/mkDist.sh

For each program all necessary python libraries are copied to a single directory and a binary is
created. Therefore a load of python libraries over an intranet is not anymore necessary and it makes
the execution faster. Modify wrapper scripts in the directory bin/ in order to invoke the compiled
version respectively.

3. Set permissions of wrapper scripts in bin/ that every user can execute it.

4. Create certificate for user taskdispatcher who represents the TaskDispatcher. Modify and execute
script

      cd scripts; ./createCertificate.sh taskdispatcher <YOUREMAIL>; cd -

5. Create for each user a certificate

      cd scripts; ./createCertificate.sh <USER> <USEREMAIL>; cd -

For each user the following files are created in etc/certs

      <USER>.key   private key of user
      <USER>.csr   certificate request file (not necessary)
      <USER>.crt   self signed certificate of user
      ca_certs.<USER>.crt   certificates of taskdispatcher and user himself

6. Each user has to create the directory .taskManager in his home directory and copy the following
files to that directory:

      <USER>.key
      <USER>.crt
      ca_certs.<USER>.crt

7. Include all users who are allowed to use the TaskManager and their certificate files into

      etc/users

8. Create file which include all certificates of users

      cd scripts; ./createAuthorizedCertsFile.sh; cd -

9. Assign users to groups

      etc/groups

10. Configure computer cluster by given information about each computer in the cluster in 

      etc/ComputerCluster.config

11. Start TaskDispatcher

      cd Server
      python TaskDispatcher -e ../var/TaskDispatcherError.log -p 101010


Several Commands
================

Get help

    bin/hRunJob -h

Get status of TaskManagerServer and TaskDispatcher

      bin/hRunJob -s

Connect directly to TaskDispatcher and get help

      bin/hSend localhost 101010 "help"

Activate computer (if you have permissions given in etc/groups)

      bin/hSend localhost 101010 "activatehost:localhost"

Send job to cluster

      bin/hRunJob "sleep 10"


Files and directories
=====================

|-- Client
|   |-- hListen.py
|   |-- hRun.py
|   |-- hRunJob.py
|   `-- hSend.py
|-- README
|-- Server
|   |-- InfoServer.py
|   |-- TaskDispatcher.py
|   `-- daemon.py
|-- UserServer
|   |-- TMS.py
|   `-- daemon.py
|-- ZopeProduct
|   `-- ZPTaskManager
|       |-- ZPTaskManagerSite.py
|       |-- __init__.py
|       |-- css				... directory with css files
|       |-- js				... directory with js files
|       |-- lib
|       |   `-- hSocket.py
|       |-- pics			... directory with pic files
|       |-- refresh.txt
|       `-- zpt
|           |-- AddComputer.zpt
|           |-- ChangeComputer.zpt
|           |-- ChangeTaskDispatcher.zpt
|           |-- ChangeTaskDispatcherProperties.zpt
|           |-- HostInfoSite.zpt
|           |-- JobInformation.zpt
|           |-- ProcSite.zpt
|           |-- ShowJobs.zpt
|           |-- ShowPendingJobs.zpt
|           |-- TaskDispatcherLog.zpt
|           |-- TaskDispatcherOutput.zpt
|           |-- TopSite.zpt
|           |-- ZPTaskManager.zpt
|           |-- ZPTaskManagerAdmin.zpt
|           |-- ZPTaskManagerFinishedJobs.zpt
|           `-- addZPTaskManagerSiteForm.zpt
|-- bin
|   |-- TMS
|   |-- hRun
|   |-- hRunJob
|   `-- hSend
|-- etc
|   |-- ComputerCluster.config
|   |-- TaskDispatcher.info
|   |-- certs
|   |-- groups
|   `-- users
|-- lib
|   |-- hSocket.py
|   `-- hTMSConnection.py
|-- scripts
|   |-- createAuthorizedCertsFile.sh
|   |-- createCertificate.sh
|   |-- createCertificates.txt
|   |-- mkDist.sh
|   `-- startTaskDispatcher.csh
`-- var



Changelog
=========

Since 0.9.2
-----------
bugfix: added missing files
bugfix: check for vacant hosts in TaskDispatcher
usability: modified Zope admin page (dynamic visualization of loads of each host)
system: added possibility in TaskDispatcher/hRunJob to exclude hosts from cluster temporarily by each user independently
system: extracted TMSConnection class as library
system: improved handling of jobs where ssh connection has failed
system: improved output of TaskDispatcher in terminal
system: switched to ssl socket communication in Zope product
system: general code cleanup

Since 0.9.1
-----------
bugfix: error handling
bugfix: correct termination of TMS
system: added wrapper scripts for executables
system: added several new commands to TaskDispatcher
system: general code cleanup

Since 0.9
---------
initial import
Source: README, updated 2012-07-11