You are here: Home / LBN / Up2date / Big Data / BastionLinux 19 / morgoth-0.2.0-0.0.1.git368693b.lbn19.x86_64

morgoth-0.2.0-0.0.1.git368693b.lbn19.x86_64

Package Attributes
RPM  morgoth-0.2.0-0.0.1.git368693b.lbn19.x86_64.rpm Architecture  x86_64 Size  6756049 Created  2019/09/30 06:59:42 UTC
Package Specification
Summary Metric anomaly detection
Group Unspecified
License APL 2.0
Home Page http://github.com/nathanielc/morgoth
Description

Morgoth is a framework for flexible anomaly detection algorithms packaged to be used with Kapacitor

Morgoth provides a framework via the for implementing the smaller pieces of an anomaly detection problem.

The basic framework is that Morgoth maintains a dictionary of normal behaviors and compares new windows of data to the normal dictionary. If the new window of data is not found in the dictionary then it is considered anomalous.

Morgoth uses algorithms, called fingerprinters, to compare windows of data to determine if they are similar. The Lossy Counting Algorithm(LCA) is used to maintain the dictionary of normal windows. The LCA is a space efficient algorithm that can account for drift in the normal dictionary, more on LCA below.

Morgoth uses a consensus model where each fingerprinter votes for whether it thinks the current window is anomalous. If the total votes percentage is greater than a consensus threshold then the window is considered anomalous.

Fingerprinters

A fingerprinter is a method that can determine if a window of data is similar to a previous window of data. In effect the fingerprinters take fingerprints of the incoming data and can compare fingerprints of new data to see if they match. These fingerprinting algorithms provide the core of Morgoth as they are the means by which Morgoth determines if a new window of data is new or something already observed.

An example fingerprinting algorithm is a sigma algorithm that computes the mean and standard deviation of a window and store them as the fingerprint for the window. When a new window arrives it compares the fingerprint (mean, stddev) of the new window to the previous window. If the windows are too far apart then they are not considered at match.

By defining several fingerprinting algorithms Morgoth can decide whether new data is anomalous or normal.

Lossy Counting Algorithm

The LCA counts frequent items in a stream of data. It is lossy because to conserve space it will drop less frequent items. The result is that the algorithm will find frequent items but may loose track of less frequent items. More on the specific mathematical properties of the algorithm can be found below.

There are two parameters to the algorithm, error tolerance (e) and minimum support (m). First e is in the range [0, 1] and is an error bound, interpreted as a percentage value. For example given and e = 0.01 (1%), items less the 1% frequent in the data set can be dropped. Decreasing e will require more space but will keep track of less frequent items. Increasing e will require less space but will loose track of less frequent items. Second m is in the range [0, 1] and is a minimum support such that items that are considered frequent have at least m% frequency. For example if m = 0.05 (5%) then if an item has a support less than 5% it is not considered frequent, aka normal. The minimum support becomes the threshold for when items are considered anomalous.

Notice that m > e, this is so that we reduce the number of false positives. For example say we set e = 5% and m = 5%. If a normal behavior X, has a true frequency of 6% than based on variations in the true frequency, X might fall below 5% for a small interval and be dropped. This will cause X's frequency to be underestimated, which will cause it to be flagged as an anomaly, triggering a false positive. By setting e < m we have a buffer to help mitigate creating false positives.

Properties

The Lossy Counting algorithm has three properties:

there are no false negatives, false positives are guaranteed to have a frequency of at least (m - e)N, the frequency of an item can underestimated by at most eN,

where N is the number of items encountered.

The space requirements for the algorithm are at most (1 / e) log(eN). It has also been show that if the item with low frequency are uniformly random than the space requirements are no more than 7 / e. This means that as Morgoth continues to processes windows of data its memory usage will grow as the log of the number of windows and can reach a stable upper bound.

Requires
rpmlib(PayloadFilesHavePrefix)  
rpmlib(FileDigests)  
rpmlib(PayloadIsXz)  
rpmlib(CompressedFileNames)  
Provides
morgoth
morgoth(x86-64)

Document Actions