Personal tools
Skip to content. | Skip to navigation
This project is a Zenoss extension (ZenPack) that allows for monitoring of OpenStack. This means that you can monitor the flavors, images and servers a user or consumer perspective. OpenStack Compute v1.1 (Cactus) is known to be supported. Specifically this means that Rackspace's CloudServers can be monitored. In the future it is likely that support for monitoring OpenStack Storage (Swift) will be added. OpenStack is a global collaboration of developers and cloud computing technologists producing the ubiquitous open source cloud computing platform for public and private clouds. The project aims to deliver solutions for all types of clouds by being simple to implement, massively scalable, and feature rich. The technology consists of a series of interrelated projects delivering various components for a cloud infrastructure solution. Once the OpenStack ZenPack is installed you can begin monitoring by going to the infrastructure screen and clicking the normal button for adding devices. You'll find a new option labeled, "Add OpenStack." Choose that option and you'll be presented with a dialog asking for the following inputs. 1. Username - Same username used to login to OpenStack web interface 2. API Key - Can be found by going to "Your Account/API Access" 3. Project ID - This can be left blank if you don't know what it is 4. Auth URL - For Rackspace this would be https://auth.api.rackspacecloud.com/v1.0 5. Region Name - This can be left blank if you don't know what it is Once you click Add, Zenoss will contact the OpenStack API and discover servers, images and flavors. Once it is complete you'll find a new device in the OpenStack device class with the same name as the hostname or IP you entered into the dialog. Click into this new device to see everything that was discovered. The following types of elements are discovered. * Servers * Images * Flavors The following metrics are collected. * Total Servers and Servers by State o States: Active, Build, Rebuild, Suspended, Queue Resize, Prep Resize, Resize, Verify Resize, Password, Rescue, Reboot, Hard Reboot, Delete IP, Unknown, Other * Total Images and Images by State o States: Active, Saving, Preparing, Queued, Failed, Unknown, Other * Total Flavors Status monitoring is performed on servers and images with the following mapping of state to Zenoss event severity. Servers State to Severity Mapping: * Reboot, Hard Reboot, Build, Rebuild, Rescue, Unknown == Critical * Resize == Error * Prep Resize, Delete IP == Warning * Suspended, Queue Resize, Verify Resize, Password == Info * Active == Clear Images State to Severity Mapping: * Failed, Unknown == Critical * Queued, Saving, Preparing == Info * Active == Clear If you are also using Zenoss to monitor the guest operating system running within the server Zenoss will present the graphs for that operating system when the graphs option is chosen for the OpenStack server.
This ZenPack allows for monitoring of OpenStack from a service provider perspective. This means that in addition to the user-oriented components supported in the regular OpenStack ZenPack (instances, flavors, images), the underlying OpenStack servers and software are monitored. Once the OpenStack ZenPack is installed and you can begin monitoring by going to the infrastructure screen and clicking the normal button for adding devices. You'll find a new option labeled, "Add OpenStack Endpoint (Infrastructure)." Choose that option and you'll be presented with a dialog asking for the following inputs. Device To Create - name to use for this device in zenoss. Should not be an actual hostname, since that name will be used when the host is registered as a linux device. Auth URL - A keystone URL, such as http://<hostname>:5000/v2.0/ Username, Password / API Key, Project/Tenant ID - *Administrative* credentials to your zenoss instance. Region Name - choose the correct region from the dropdown. You may only choose one, so each region you wish to manage must be registered as a separate endpoint in zenoss. Ceilometer URL - Will auto-populate based on the other selections. Once you click Add, Zenoss will contact the OpenStack API and discover servers, images and flavors. Once it is complete you'll find a new device in the OpenStack device class with the same name as the hostname or IP you entered into the dialog. Click into this new device to see everything that was discovered. The following types of elements are discovered. Tenants Instances (Servers) vNICs Images Flavors Nova API Endpoints Regions Availability Zones Hosts Nova Services (processes supporting nova servers) Hypervisors The following component level metrics are collected. Instances CPU Utilization (percent) Disk Requests (requests/sec) Disk IO Rate (bytes/sec) Vnics Network Packet Rate (packets/sec) Network Throughput (bytes/sec) Hosts (Zenoss Linux OS monitoring) Load Average (processes) CPU Utilization (percent) Free Memory (bytes) Free Swap (bytes) IO (sectors/sec) Nova Services (Zenoss Process monitoring) CPU Utilization (percent) Memory Utilization (bytes) Process Count (processes) The following device level metrics are collected. Flavors Total (count) Images Total (count) Total count per image state (count) Servers Total (count) Total count per server state (count) Queues Event (count) Performance (count)
All monitoring is performed through the optional swift-recon API endpoint that can be enabled on all of your Swift object servers. Before using this ZenPack you must install and configure swift-recon on your Swift object servers. Usage Installing the ZenPack will add the following objects to your Zenoss system. Configuration Properties zSwiftObjectServerPort: Listening port of swift-object-server. Defaults to 6000. Monitoring Templates SwiftObjectServer in /Devices Process Classes OpenStack/Swift swift-account-auditor swift-account-reaper swift-account-replicator swift-account-server swift-container-auditor swift-container-replicator swift-container-server swift-container-sync swift-container-updater swift-object-auditor swift-object-replicator swift-object-server swift-object-updater swift-proxy-server Event Classes /Status/Swift /Perf/Swift The zSwiftObjectServerPort property is used by the SwiftObjectServer monitoring template to control what port it will attempt to find the recon API on. Normally the default of 6000/tcp will work unless you've chosen a different port for your swift-object-server process. By default the SwiftObjectServer monitoring template will not be bound to any devices. To make use of it you will need to either bind it directly to your Swift object server devices, or put your object servers into their own device class and bind the template to that device class. Typically this will be under either /Server/Linux or /Server/SSH/Linux so you get normal operating system monitoring in addition to the Swift-specific monitoring. Swift Metrics Assuming you have swift-recon and Zenoss setup properly you can expect to see the following extra graphs on your Swift object servers. Swift Object Server - Async Pending Trend of asynchronous pending tasks. When a Swift proxy server updates an object it attempts to synchronously update the object's container with the new object information. There is a three second timeout on this task and if it can't be completed in that time, it will be put into an asynchronous pending bucket to be executed later. By trending and thresholding on how many tasks are pending you can get an early read on cluster performance problems. By default a maximum threshold of 10 is set on this metric and will raise a warning severity event in the /Perf/Swift event class when it is breached. Swift Object Server - Disks Trend of total and unmounted disks on the storage node. Swift's mechanism for detecting failing or failed drives and taking them offline is to unmount them. By proactively monitoring for unmounted disks and replacing them you can keep your cluster healthy. By default a maximum threshold of 0 is set on unmounted disks and will raise a warning severity event in the /Status/Swift event class. Swift Object Server - Quarantine Trend of accounts, containers and objects that have been quarantined. Swift has an auditor process that will find corrupt items and move them into a quarantine area so good objects will be replicated back into their place. Sudden increases in quarantined items can indicate hardware problems on storage nodes. Additionally quarantine is not automatically pruned and can result in some storage nodes filling up their disk at a faster rate than others and running out of space. By default a maximum threshold of 100 is set individually on quarantined accounts, containers and objects. A warning event will be raised in the /Status/Swift event class if it is breached. Swift Object Server - Replication Time Trend of replication time. Swift has a replicator process that cycles continually. If a single replication cycle takes more than 30 minutes it can reduce the resiliency of the cluster. By default a maximum threshold of 30 minutes is set on replication time and will raise a warning severity event in the /Perf/Swift event class when breached. Swift Object Server - Load Averages Trend of 1, 5 and 15 minute operating system load average. Additionally the 15 minute load average divided by total disks is calculated. A perfectly efficient storage node will run at a load average of 1.0 per disk. By default a maximum treshold of 2.0 is set on 15 minute load average divided by total disks and will raise a warning severity event in the /Perf/Swift event class when breached. Swift Object Server - Process Churn Trend of processes created per second. High process churn can indicate a broken process being unnecessarily restarted. By default a maximum treshold of 100 processes per second is set and will raise a warning severity event in the /Perf/Swift event class when breached. Swift Object Server - Disk Usages Trend of maximum, average and minimum disk usage for all disks in the storage node. These are the primary storage capacity metrics within a cluster. Depending on the size of each individual disk, weights and the skew of store object sizes, an entire cluster can exceed capacity if a single disk runs out of capacity. By default a maximum threshold is set on the maximum usage metric. It will raise a warning severity in the /Status/Swift event class when breached. Swift Object Server - Disk Sizes Trend of maximum, average and minimum disk sizes for all disks in the storage node. Ideally all disks in a storage node will be the same size unless weights are closely managed. No default thresholds are set on these metrics. Swift Object Server - Processes Trend of total and running processes. No default thresholds are set on these metrics. Process Monitoring All Swift processes will be discovered and monitored based on the process classes listed above. If one of the processes is found to not be running on a node where it should be, an error severity event will be raised in the /Status/OSProcess event class. Each of the individual Swift process will also be monitored for its CPU and memory utilization.