F-FADE: Frequency Factorization for Anomaly Detection in Edge Streams

F-FADE is a new approach for detection of anomalies in edge streams, which uses a novel frequency-factorization technique to efficiently model the time-evolving distributions of frequencies of interactions between node-pairs. The anomalies are then determined based on the likelihood of the observed frequency of each incoming interaction. F-FADE is able to handle in an online streaming setting with a broad variety of anomalies with temporal and structural changes, while requiring only constant memory. Our experiments on one synthetic and six real-world dynamic networks show that F-FADE achieves state of the art performance and may detect anomalies that previous methods are unable to find.

Motivation

By definition, anomalies are caused by rare events and it might be impossible to find sufficient number of training labels for supervised methods. We thus focus here on unsupervised anomaly detection approaches. Most current approaches are snapshot-based where edge streams are aggregated into network snapshots over time. These approaches can detect anomalies only after an entire snapshot has been collected, which can introduce a significant and often prohibitive time lag. Instead, we want anomalies in edge streams to be reported in an online, streaming fashion as soon as anomalous interactions happen. However, this streaming approach to anomaly detection in edge streams introduces two major challenges.

First, the approach must be able to handle temporal and structural changes simultaneously. We illustrate this point by showing different patterns of anomalies in dynamic networks in Fig. 1. In case of pattern (iii), where an external node u starts interacting with a node from a different group (from a yellow node d to a red node g), we need to know that the nodes belong to different groups (a structural change) as well as the time between the interactions (a temporal change). The interaction is more likely anomalous, if the time is short due to the rapid switch to another group. Similarly, we need both temporal and structural changes in case of pattern (v), where $u$ interacts with many nodes from the same group. All pair-wise interactions between u and other individual nodes can be regular, yet when viewed together, these interactions can show an anomalous increase in the group interaction frequency.

Second, the approach should be time and memory efficient as a large number of interactions can take place in a short time, which significantly constrains the available time for analysis of each edge. This constraint is especially limiting in the case of streaming approaches that aim to digest both temporal and structural changes simultaneously and require significant time and space resources to compute. For example, an aggregation of timestamps for estimating time-evolving distributions of interactions must keep track of the network structural information, but a straightforward approach leads to an increasingly large memory cost as interactions between new pairs of nodes are encountered. Although snapshot-based approaches, by utilizing substantially more time and space, are able to analyze the entanglement of structural and temporal information, it remains an open problem of how to do that in stream-based approaches.


In summary, we want an efficient, unsupervised method for detecting anomalies in edge streams, where the method works in a streaming manner and is able to take advantage of both temporal and structural information in order to detect a wide range of anomalous interaction patterns.

The formal definition of F-FADE and relevant theoretical analysis and empirical evaluation can be found in our paper.

Code

A reference implementation of F-FADE in Python is available on GitHub.

Datasets

The datasets used by F-FADE project are included in the code repository.

Contributors

The following people contributed to F-FADE project:
Yen-Yu Chang
Pan Li
Rok Sosic
M. H. Afifi
Marco Schweighauser
Jure Leskovec

References