Menu

Home

Matt Massie

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. Supports clusters up to 2000 nodes in size.

Screenshot thumbnail
Ganglia can scale to handle clusters with thousands of nodes


Project Admins:


Discussion

  • char tao

    char tao - 2013-03-08

    cygwin 1.7 + ganglia 3.5

    gmond.c:160: error: parse error before '*' token
    gmond.c:160: warning: type defaults to int' in declaration ofhosts_mutex'
    gmond.c:160: warning: data definition has no type or storage class
    gmond.c: In function Ganglia_host_get': gmond.c:1029: warning: implicit declaration of functionapr_thread_mutex_create'
    gmond.c:1029: error: APR_THREAD_MUTEX_DEFAULT' undeclared (first use in this function) gmond.c:1029: error: (Each undeclared identifier is reported only once gmond.c:1029: error: for each function it appears in.) gmond.c:1055: warning: implicit declaration of functionapr_thread_mutex_lock'
    gmond.c:1057: warning: implicit declaration of function apr_thread_mutex_unlock' gmond.c: In functiontcp_listener':
    gmond.c:3056: warning: implicit declaration of function apr_thread_exit' gmond.c: In functionmain':
    gmond.c:3174: error: `APR_THREAD_MUTEX_DEFAULT' undeclared (first use in this function)

     

    Last edit: char tao 2013-03-08
  • Ayman

    Ayman - 2013-11-21

    am using Ganglia to monitor huge infrastructure with more than 300 nodes, But the central machine which collecting data from those nodes by gmetad has very high cpu load due to heavy I/O operations, i tried to put the rrds files in ramdisk it gets better but still has load about 9!! Any one has resolution for this please help me as this cause me panic.

    Thanks

     
  • char tao

    char tao - 2013-11-26

    rrdcached is a daemon that receives updates to existing RRD files,
    accumulates them and, if enough have been received or a defined time has
    passed, writes the updates to the RRD file. A flush command may be used to
    force writing of values to disk, so that graphing facilities and similar
    can work with up-to-date data.

    2013/11/21 Ayman ayman-shorman@users.sf.net

    am using Ganglia to monitor huge infrastructure with more than 300 nodes,
    But the central machine which collecting data from those nodes by gmetad
    has very high cpu load due to heavy I/O operations, i tried to put the rrds
    files in ramdisk it gets better but still has load about 9!! Any one has
    resolution for this please help me as this cause me panic.

    Thanks

    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/ganglia/wiki/Home/

    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/

     
  • Charlotte9

    Charlotte9 - 2015-08-05

    Great software, really couldnt ask more from it

     
  • jigli

    jigli - 2016-04-01

    Hello:
    I using ganglia-3.7.1 on aix7.1 of IBM POWER6 . When I configure it , receive error messages:
    Checking for confuse
    checking for cfg_parse in -lconfuse... no
    Trying harder including gettext
    checking for cfg_parse in -lconfuse... no
    Trying harder including iconv
    checking for cfg_parse in -lconfuse... no
    libconfuse not found

    But I have installed libconfuse-2.7-1 and libconfuse-devel-2.7-1
    Please help me!

     
    • char tao

      char tao - 2016-04-14

      --with-libconfuse=/usr/lib or LDFLAGS="-L /usr/lib"

      2016-04-01 10:39 GMT+08:00 jigli jigli@users.sf.net:

      Hello:
      I using ganglia-3.7.1 on aix7.1 of IBM POWER6 . When I configure it ,
      receive error messages:
      Checking for confuse
      checking for cfg_parse in -lconfuse... no
      Trying harder including gettext
      checking for cfg_parse in -lconfuse... no
      Trying harder including iconv
      checking for cfg_parse in -lconfuse... no
      libconfuse not found

      But I have installed libconfuse-2.7-1 and libconfuse-devel-2.7-1
      Please help me!


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/ganglia/wiki/Home/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Jagannath Nagare

    I have to monitor fds for perticular processes. I did below changes in ganglia configuration but I am getting blank graph.When same script I run through console I get output.

    ***In this path I added /usr/lib64/ganglia/pythonmodules/
    procfds.py: *

    import os

    OBSOLETE_POPEN = False
    try:
    import subprocess
    except ImportError:
    import popen2
    OBSOLETE_POPEN = True

    import threading
    import time

    _refresh_rate = 30 # Refresh rate of the netstat data

    _conns = {'process_fds': 0}

    def TCP_Connections(name):
    global tempconns
    tempconns= []
    pid = file('/var/run/computenode.pid', 'rt').readline().strip()
    process = subprocess.Popen("ls /proc/"+pid+"/fd | wc -l", stdout=subprocess.PIPE,shell=True)
    lines = process.communicate()[0].strip()
    _conns['process_fds']=lines
    ret = int(_conns[name])
    return ret

    Metric descriptions

    _descriptors = [{
    'name': 'process_fds',
    'call_back': TCP_Connections,
    'time_max': 20,
    'value_type': 'uint',
    'units': '',
    'slope': 'both',
    'format': '%u',
    'description': 'Total number of file descriptor ',
    'groups': 'procstat'
    }]

    def metric_init(params):
    '''Initialize the tcp connection status module and create the
    metric definition dictionary object for each metric.'''
    global _refresh_rate

    if 'RefreshRate' in params:
        _refresh_rate = int(params['RefreshRate'])
    _descriptors[:]=[]
    #Return the metric descriptions to Gmond
    _descriptors.append({
                        'name':'process_fds',
                        'call_back': TCP_Connections,
                        'time_max': 20,
                        'value_type': 'uint',
                        'units': '',
                        'slope': 'both',
                        'format': '%u',
                        'description': 'Total number of file descriptor ',
                        'groups': 'procstat'
                        })
    
    
    return _descriptors
    

    def metric_cleanup():
    '''Clean up the metric module.'''
    pass

    if name == 'main':
    params = {'Refresh': '20'}
    metric_init(params)
    while True:
    try:
    for d in _descriptors:
    v = d'call_back'
    print 'value for %s is %u' % (d['name'], v)
    time.sleep(5)
    except KeyboardInterrupt:
    os._exit(1)

    ***configuration file: ***

    [root@mtl-nes-qa3-cn1 conf.d]# cat proc_fds.pyconf
    modules {
    module {
    name = 'proc_fds'
    language = 'python'

    }
    }

    collection_group {
    collect_every = 30
    time_threshold = 30

    metric {
    name = "process_fds"
    value_threshold = "256.0"
    title = "process_fds"

    }
    }

    Please let me know why blank graph is showing and also value_threshold is not changing its 0.0/0.5 range

     
  • Juan

    Juan - 2017-07-17

    Hello,
    I'm using ganglia-3.7.2. And I have one secruity issue when doing secruity scanning:
    Description:
    Unix-based systems support variable settings to control access to files. World writable files are the least secure. See the chmod(2) man page for more information.

    *Rationale: *
    Data in world-writable files can be modified and compromised by any user on the system. World writable files may also indicate an incorrectly written script or program that could potentially be the cause of a larger compromise to the system's integrity.

    *Remediation: *
    Removing write access for the "other" category ( chmod o-w <filename> ) is advisable, but always consult relevant vendor documentation to avoid breaking any application dependencies on a given file.</filename>

    *Assessment: *
    Ensure no world writable files exist -- Less

    Script: sce/world_writable_files.sh
    Standard Output:
    World-Writable file /var/log/sdc_alarm.log
    World-Writable file /var/lib/ganglia/dwoo/compiled/templates/default/host_extra.tpl.d17.php
    World-Writable file /var/lib/ganglia/dwoo/compiled/templates/default/cluster_host_metric_graphs.tpl.d17.php
    World-Writable file /var/lib/ganglia/dwoo/compiled/templates/default/host_view.tpl.d17.php
    World-Writable file /var/lib/ganglia/dwoo/compiled/templates/default/cluster_overview.tpl.d17.php
    World-Writable file /var/lib/ganglia/dwoo/compiled/templates/default/footer.tpl.d17.php
    World-Writable file /var/lib/ganglia/dwoo/compiled/templates/default/metric_group_view.tpl.d17.php
    World-Writable file /var/lib/ganglia/dwoo/compiled/templates/default/host_overview.tpl.d17.php
    World-Writable file /var/lib/ganglia/dwoo/compiled/templates/default/cluster_extra.tpl.d17.php
    World-Writable file /var/lib/ganglia/dwoo/compiled/templates/default/cluster_view.tpl.d17.php
    World-Writable file /var/lib/ganglia/dwoo/compiled/templates/default/header.tpl.d17.php
    Standard Error:
    No Standard Error was produced

    I found these files were be generated automatically when user accessed gweb GUI. Do we have resoluton or other opinions about secruity scanning? Please help. Thanks.

     
  • Silviu Chiric

    Silviu Chiric - 2018-02-06

    Dear all
    do you have metrics avaialable in Ganglia for Spark memory variables such as :

    Input, Storage Memory, Shuffle Read and Shuffle Writte

    for the Active tasks for driver and executors for each application_xxxxxx

    and for the cluster as a whole please? I attache the Memory settigs we got now in static manner only from Spark JOb HIstory as what we want to have in Ganglia :

    Please help. Thanks.

     

Log in to post a comment.

MongoDB Logo MongoDB