SNAP Library , Developer Reference  2013-01-07 14:03:36
SNAP, a general purpose, high performance system for analysis and manipulation of large networks
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
TLocClust Class Reference

#include <ncp.h>

Collaboration diagram for TLocClust:

List of all members.

Public Member Functions

 TLocClust (const PUNGraph &GraphPt, const double &AlphaVal)
int Len () const
 Returns the support of the approximate random walk, the number of nodes with non-zero PageRank score.
int GetRndWalkSup () const
 Returns the support of the approximate random walk, the number of nodes with non-zero PageRank score.
int GetNId (const int &NodeN) const
 Returns the ID of the NodeN-th node in the sweep vector.
int GetVol (const int &Nodes) const
 Returns the volume of the set of first NodeN nodes in the sweep vector.
int GetCut (const int &Nodes) const
 Returns the size of the cut of the first Nodes nodes in the sweep vector.
double GetPhi (const int &ValId) const
 Returns the conductance of the cut separating the first Nodes nodes in the sweep vector from the rest of the graph.
int BestCut () const
 Index K of the cut of the minimum conductance around the seed node.
int BestCutNodes () const
 Number of nodes inside the 'best' (minimum conductance) cut.
int GetCutEdges () const
 Number of edges in the 'best' (minimum conductance) cut.
int GetCutVol () const
 Volume of the 'best' (minimum conductance) cut.
double GetCutPhi () const
 Conductance of the 'best' (minimum conductance) cut.
int ApproxPageRank (const int &SeedNode, const double &Eps)
 Computes Approximate PageRank from the seed node SeedNId and with tolerance Eps.
void SupportSweep ()
 After the function ApproxPageRank() has been run the SupportSweep() computes the volume, cut size, node ids, conductance vectors.
void FindBestCut (const int &SeedNode, const int &ClustSz, const double &MinSizeFrac=0.2)
 Finds minimum conductance cut in the graph around the seed node.
void PlotVolDistr (const TStr &OutFNm, TStr Desc=TStr()) const
 Plots the cluster volume vs. cluster size K (cluster is composed of nodes NIdV[1...K]).
void PlotCutDistr (const TStr &OutFNm, TStr Desc=TStr()) const
 Plots the cluster cut size vs. cluster size K (cluster is composed of nodes NIdV[1...K]).
void PlotPhiDistr (const TStr &OutFNm, TStr Desc=TStr()) const
 Plots the cluster conductance vs. cluster size K (cluster is composed of nodes NIdV[1...K]).
void SavePajek (const TStr &OutFNm) const
 Saves the network in the Pajek format so it can be visualized. Red node represents the seed and color the cluster membership.

Static Public Member Functions

static void DrawWhiskers (const PUNGraph &Graph, TStr FNmPref, const int &PlotN)
 Draws the 'whiskers' of the graph. Whiskers are small sub-graphs that are attached to the rest of the graph via a single edge.
static void GetCutStat (const PUNGraph &Graph, const TIntV &NIdV, int &Vol, int &Cut, double &Phi, int GraphEdges=-1)
 For a given Graph and a set of nodes NIdV the function returns the Volume, CutSize and the Conductance of the cut.
static void GetCutStat (const PUNGraph &Graph, const TIntSet &NIdSet, int &Vol, int &Cut, double &Phi, int GraphEdges=-1)
 For a given Graph and a set of nodes NIdV the function returns the Volume, CutSize and the Conductance of the cut.
static void PlotNCP (const PUNGraph &Graph, const TStr &FNm, const TStr Desc="", const bool &BagOfWhiskers=true, const bool &RewireNet=false, const int &KMin=10, const int &KMax=Mega(100), const int &Coverage=10, const bool &SaveTxtStat=false, const bool &PlotBoltzman=false)

Static Public Attributes

static bool Verbose = true

Private Member Functions

const TIntVGetNIdV () const
 Vector of node IDs sorted in the random walk order.
const TIntVGetVolV () const
 Volume (sum of the degrees) of first K-nodes in the node-id vector (GetNIdV()).
const TIntVGetCutV () const
 Edges cut to separate the first K-nodes in the node-id vector (GetNIdV()) from the rest of the graph.
const TFltVGetPhiV () const
 Conducatance of the cut separating first K-nodes in the node-id vector (GetNIdV()) from the rest of the graph.

Private Attributes

PUNGraph Graph
int Nodes
int Edges2
TIntFltH ProbH
TIntFltH ResH
TIntQ NodeQ
double Alpha
int SeedNId
TIntV NIdV
TIntV VolV
TIntV CutV
TFltV PhiV
int BestCutIdx

Friends

class TLocClustStat

Detailed Description

Local Spectral Clustering algorithm. The code implements the PageRank Nibble local clustering algorithm of Andersen, Chung and Lang. Given a single starting seed node, the algorithm will then find the clusters around that node. This is achieved by the algorithm finding the approximate personalized PageRank score of every node with respect to the Seed node. Nodes are then ordered by the PageRank score and the idea is then that by 'sweeping' the vector of PageRank scores one can find communities around the chosen seed node. The idea is to try out K = 1...N/2 and then for a set of {node_1 ... node_K} test the value of the conductance (Phi). If the conductance at certain value of K achieves a local minima, then we found a good cut in the graph. This method is also used for computing the Network Community Profile plots. See: Local Graph Partitioning using PageRank Vectors by R. Andersen, F. Chung and K. Lang URL: http://www.math.ucsd.edu/~fan/wp/localpartition.pdf

Definition at line 24 of file ncp.h.


Constructor & Destructor Documentation

TLocClust::TLocClust ( const PUNGraph GraphPt,
const double &  AlphaVal 
) [inline]

Definition at line 48 of file ncp.h.

                                                             :
    Graph(GraphPt), Nodes(GraphPt->GetNodes()), Edges2(2*GraphPt->GetEdges()), Alpha(AlphaVal) { }

Member Function Documentation

int TLocClust::ApproxPageRank ( const int &  SeedNode,
const double &  Eps 
)

Computes Approximate PageRank from the seed node SeedNId and with tolerance Eps.

The algorithm basically sets the PageRank scores of nodes with score <Eps to zero. So the lower the value of Eps the longer the algorithm will run.

Definition at line 9 of file ncp.cpp.

References THash< TKey, TDat, THashFunc >::AddDat(), Alpha, THash< TKey, TDat, THashFunc >::Clr(), TQQueue< TVal >::Clr(), TQQueue< TVal >::Empty(), THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::GetNI(), TUNGraph::GetNodes(), TUNGraph::TNodeI::GetOutDeg(), TUNGraph::TNodeI::GetOutNId(), TExeTm::GetSecs(), TExeTm::GetStr(), Graph, THash< TKey, TDat, THashFunc >::Len(), TQQueue< TVal >::Len(), Mega, NodeQ, TQQueue< TVal >::Pop(), ProbH, TQQueue< TVal >::Push(), ResH, TQQueue< TVal >::Top(), and TFlt::Val.

Referenced by FindBestCut().

                                                                    {
  for (int i = 0; i < ProbH.Len(); i++) { ProbH[i]=0.0; }
  for (int i = 0; i < ResH.Len(); i++) { ResH[i]=0.0; }
  ProbH.Clr(false, -1, false);
  ResH.Clr(false, -1, false);
  ResH.AddDat(SeedNode, 1.0);
  int iter = 0;
  double OldRes = 0.0;
  NodeQ.Clr(false);
  NodeQ.Push(SeedNode);
  TExeTm ExeTm;
  while (! NodeQ.Empty()) {
    const int NId = NodeQ.Top(); NodeQ.Pop();
    const TUNGraph::TNodeI& Node = Graph->GetNI(NId);
    const int NIdDeg = Node.GetOutDeg();
    const double PushVal = ResH.GetDat(NId) - 0.5*Eps*NIdDeg;
    const double PutVal = (1.0-Alpha) * PushVal / double(NIdDeg);
    ProbH.AddDat(NId) += Alpha*PushVal;
    ResH.AddDat(NId) = 0.5 * Eps * NIdDeg;
    for (int e = 0; e < NIdDeg; e++) {
      const int DstNId = Node.GetOutNId(e);
      const int DstDeg = Graph->GetNI(DstNId).GetOutDeg();
      double& ResVal = ResH.AddDat(DstNId).Val;
      OldRes = ResVal;
      ResVal += PutVal;
      if (ResVal >= Eps*DstDeg && OldRes < Eps*DstDeg) {
        NodeQ.Push(DstNId); }
    }
    iter++;
    if (iter % Mega(1) == 0) { 
      printf(" %d[%s]", NodeQ.Len(), ExeTm.GetStr());
      if (iter/1000 > Graph->GetNodes() || ExeTm.GetSecs() > 4*3600) { // more than 2 hours
        printf("Too many iterations! Stop to save time.\n");
        return iter; }
    }
  }
  // check that the residuals are sufficiently small
  /*for (int i =0; i < ProbH.Len(); i++) {
    const int Deg = Graph->GetNI(ResH.GetKey(i)).GetOutDeg();
    IAssert(ResH[i] < Eps*Deg); } //*/
  return iter;
}

Here is the call graph for this function:

Here is the caller graph for this function:

int TLocClust::BestCut ( ) const [inline]

Index K of the cut of the minimum conductance around the seed node.

This means that the set of GetNId(0)...GetNId(K) forms the best cut around the seed node.

Definition at line 73 of file ncp.h.

References BestCutIdx.

Referenced by GetCutEdges(), GetCutPhi(), GetCutVol(), and SavePajek().

{ return BestCutIdx; }

Here is the caller graph for this function:

int TLocClust::BestCutNodes ( ) const [inline]

Number of nodes inside the 'best' (minimum conductance) cut.

Definition at line 75 of file ncp.h.

References BestCutIdx.

Referenced by TLocClustStat::Run(), and SavePajek().

{ return BestCutIdx+1; }

Here is the caller graph for this function:

void TLocClust::DrawWhiskers ( const PUNGraph Graph,
TStr  FNmPref,
const int &  PlotN = 10 
) [static]

Draws the 'whiskers' of the graph. Whiskers are small sub-graphs that are attached to the rest of the graph via a single edge.

Definition at line 172 of file ncp.cpp.

References THash< TKey, TDat, THashFunc >::AddDat(), TUNGraph::BegNI(), TStr::CStr(), TSnap::DrawGViz(), TUNGraph::EndNI(), TStr::Fmt(), TSnap::Get1CnCom(), TUNGraph::GetEdges(), TUNGraph::GetNI(), TUNGraph::GetNodes(), TSnap::GetSubGraph(), gpsLog10XY, gvlNeato, Len(), TVec< TVal >::Len(), TMath::Mn(), NIdV, TGnuPlot::PlotValCntH(), and TVec< TVal >::Sort().

                                                                                     {
  TCnComV CnComV;  TSnap::Get1CnCom(Graph, CnComV);
  CnComV.Sort(false);
  // plot size distribution
  { TIntH SzCntH;
  for (int i = 0; i < CnComV.Len(); i++) { SzCntH.AddDat(CnComV[i].Len()) += 1; }
  TGnuPlot::PlotValCntH(SzCntH, "whiskSz."+FNmPref, TStr::Fmt("%s. G(%d, %d)", FNmPref.CStr(), Graph->GetNodes(),
    Graph->GetEdges()), "Whisker size (Maximal component connected with a bridge edge)", "Count", gpsLog10XY, false); }
  // draw them
  int BrNodeId = -1;
  for (int c = 0; c < TMath::Mn(CnComV.Len(), PlotN); c++) {
    const PUNGraph BrClust = TSnap::GetSubGraph(Graph, CnComV[c].NIdV);
    for (TUNGraph::TNodeI NI = BrClust->BegNI(); NI < BrClust->EndNI(); NI++) {
      if (NI.GetOutDeg() != Graph->GetNI(NI.GetId()).GetOutDeg()) { BrNodeId=NI.GetId(); break; } }
    TIntStrH ClrH;  ClrH.AddDat(BrNodeId, "red");
    TSnap::DrawGViz(BrClust, gvlNeato, TStr::Fmt("whisk-%s-%02d.gif", FNmPref.CStr(), c+1),
      TStr::Fmt("Bridge node id: %d, n=%d, e=%d", BrNodeId, BrClust->GetNodes(), BrClust->GetEdges()), false, ClrH);
  }
}

Here is the call graph for this function:

void TLocClust::FindBestCut ( const int &  SeedNode,
const int &  ClustSz,
const double &  MinSizeFrac = 0.2 
)

Finds minimum conductance cut in the graph around the seed node.

Function first computes the ApproxPageRank(), initializes the SupportSweep() and then find the minimum conductance cluster. Parameter ClustSz controls the expected cluster size and is used to determine the tolerance (Eps) of the approximate PageRank calculation.

Definition at line 81 of file ncp.cpp.

References TVec< TVal >::Add(), ApproxPageRank(), BestCutIdx, TVec< TVal >::Clr(), THash< TKey, TDat, THashFunc >::GetKey(), TUNGraph::GetNI(), Graph, THash< TKey, TDat, THashFunc >::Len(), TVec< TVal >::Len(), TFlt::Mx, NIdV, PhiV, ProbH, SeedNId, THash< TKey, TDat, THashFunc >::SortByDat(), and SupportSweep().

Referenced by TLocClustStat::Run().

                                                                                              {
  double MaxPhi = TFlt::Mx;
  BestCutIdx = -1;
  SeedNId = SeedNode;
  // calculate pagerank and cut sets
  ApproxPageRank(SeedNId, 1.0/double(ClustSz));
  for (int i = 0; i < ProbH.Len(); i++) {
    ProbH[i] /= Graph->GetNI(ProbH.GetKey(i)).GetOutDeg(); }
  ProbH.SortByDat(false);
  SupportSweep();
  // find best cut
  NIdV.Clr(false);
  for (int i = 0; i < PhiV.Len(); i++) {
    const double Phi = PhiV[i];
    NIdV.Add(ProbH.GetKey(i));
    if (Phi < MaxPhi) { MaxPhi = Phi;  BestCutIdx = i; }
  }
}

Here is the call graph for this function:

Here is the caller graph for this function:

int TLocClust::GetCut ( const int &  Nodes) const [inline]

Returns the size of the cut of the first Nodes nodes in the sweep vector.

Size of the cut is the number of edges pointing between the first Nodes nodes and the remainder of the graph.

Definition at line 64 of file ncp.h.

References CutV, and Nodes.

Referenced by GetCutEdges(), and TLocClustStat::Run().

{ return CutV[Nodes]; }

Here is the caller graph for this function:

int TLocClust::GetCutEdges ( ) const [inline]

Number of edges in the 'best' (minimum conductance) cut.

Definition at line 77 of file ncp.h.

References BestCut(), and GetCut().

{ return GetCut(BestCut()); }

Here is the call graph for this function:

double TLocClust::GetCutPhi ( ) const [inline]

Conductance of the 'best' (minimum conductance) cut.

Definition at line 81 of file ncp.h.

References BestCut(), and GetPhi().

Referenced by TLocClustStat::Run().

{ return GetPhi(BestCut()); }

Here is the call graph for this function:

Here is the caller graph for this function:

void TLocClust::GetCutStat ( const PUNGraph Graph,
const TIntV NIdV,
int &  Vol,
int &  Cut,
double &  Phi,
int  GraphEdges = -1 
) [static]

For a given Graph and a set of nodes NIdV the function returns the Volume, CutSize and the Conductance of the cut.

Definition at line 192 of file ncp.cpp.

References THashSet< TKey, THashFunc >::AddKey(), and TVec< TVal >::Len().

Referenced by TLocClustStat::AddCut().

                                                                                                                    {
  TIntSet NIdSet(NIdV.Len());
  for (int i = 0; i < NIdV.Len(); i++) { NIdSet.AddKey(NIdV[i]); }
  GetCutStat(Graph, NIdSet, Vol, Cut, Phi, GraphEdges);
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TLocClust::GetCutStat ( const PUNGraph Graph,
const TIntSet NIdSet,
int &  Vol,
int &  Cut,
double &  Phi,
int  GraphEdges = -1 
) [static]

For a given Graph and a set of nodes NIdV the function returns the Volume, CutSize and the Conductance of the cut.

Definition at line 198 of file ncp.cpp.

References Edges2, TUNGraph::GetEdges(), TUNGraph::GetNI(), TUNGraph::TNodeI::GetOutDeg(), TUNGraph::TNodeI::GetOutNId(), THashSet< TKey, THashFunc >::IsKey(), TUNGraph::IsNode(), and THashSet< TKey, THashFunc >::Len().

                                                                                                                        {
  const int Edges2 = GraphEdges>=0 ? 2*GraphEdges : Graph->GetEdges();
  Vol=0;  Cut=0; Phi=0.0;
  for (int i = 0; i < NIdSet.Len(); i++) {
    if (! Graph->IsNode(NIdSet[i])) { continue; }
    TUNGraph::TNodeI NI = Graph->GetNI(NIdSet[i]);
    for (int e = 0; e < NI.GetOutDeg(); e++) {
      if (! NIdSet.IsKey(NI.GetOutNId(e))) { Cut += 1; }
    }
    Vol += NI.GetOutDeg();
  }
  // get conductance
  if (Vol != Edges2) {
    if (2*Vol > Edges2) { Phi = Cut / double(Edges2-Vol); }
    else if (Vol == 0) { Phi = 0.0; }
    else { Phi = Cut / double(Vol); }
  } else {
    if (Vol == Edges2) { Phi = 1.0; }
  }
}

Here is the call graph for this function:

const TIntV& TLocClust::GetCutV ( ) const [inline, private]

Edges cut to separate the first K-nodes in the node-id vector (GetNIdV()) from the rest of the graph.

Definition at line 44 of file ncp.h.

References CutV.

{ return CutV; } 
int TLocClust::GetCutVol ( ) const [inline]

Volume of the 'best' (minimum conductance) cut.

Definition at line 79 of file ncp.h.

References BestCut(), and GetVol().

Referenced by TLocClustStat::Run().

{ return GetVol(BestCut()); }

Here is the call graph for this function:

Here is the caller graph for this function:

int TLocClust::GetNId ( const int &  NodeN) const [inline]

Returns the ID of the NodeN-th node in the sweep vector.

Definition at line 56 of file ncp.h.

References NIdV.

{ return NIdV[NodeN]; }
const TIntV& TLocClust::GetNIdV ( ) const [inline, private]

Vector of node IDs sorted in the random walk order.

Definition at line 40 of file ncp.h.

References NIdV.

Referenced by TLocClustStat::Run().

{ return NIdV; }

Here is the caller graph for this function:

double TLocClust::GetPhi ( const int &  ValId) const [inline]

Returns the conductance of the cut separating the first Nodes nodes in the sweep vector from the rest of the graph.

Conductance is the ration Cut/Volume. The lower the conductance the 'better' the cluster (higher volume, less edges cut).

Definition at line 68 of file ncp.h.

References PhiV.

Referenced by GetCutPhi(), and TLocClustStat::Run().

{ return PhiV[ValId]; }

Here is the caller graph for this function:

const TFltV& TLocClust::GetPhiV ( ) const [inline, private]

Conducatance of the cut separating first K-nodes in the node-id vector (GetNIdV()) from the rest of the graph.

Definition at line 46 of file ncp.h.

References PhiV.

Referenced by TLocClustStat::Run().

{ return PhiV; } 

Here is the caller graph for this function:

int TLocClust::GetRndWalkSup ( ) const [inline]

Returns the support of the approximate random walk, the number of nodes with non-zero PageRank score.

Definition at line 53 of file ncp.h.

References TVec< TVal >::Len(), and VolV.

Referenced by Len().

{ return VolV.Len(); }

Here is the call graph for this function:

Here is the caller graph for this function:

int TLocClust::GetVol ( const int &  Nodes) const [inline]

Returns the volume of the set of first NodeN nodes in the sweep vector.

Volume is defined as the sum of the degrees of the first Nodes nodes. Or in other words volume = 2* edges inside the set + the edges pointing outside the set.

Definition at line 60 of file ncp.h.

References Nodes, and VolV.

Referenced by GetCutVol(), and TLocClustStat::Run().

{ return VolV[Nodes]; }

Here is the caller graph for this function:

const TIntV& TLocClust::GetVolV ( ) const [inline, private]

Volume (sum of the degrees) of first K-nodes in the node-id vector (GetNIdV()).

Definition at line 42 of file ncp.h.

References VolV.

{ return VolV; } 
int TLocClust::Len ( ) const [inline]

Returns the support of the approximate random walk, the number of nodes with non-zero PageRank score.

Definition at line 51 of file ncp.h.

References GetRndWalkSup().

Referenced by DrawWhiskers(), and TLocClustStat::Run().

{ return GetRndWalkSup(); }

Here is the call graph for this function:

Here is the caller graph for this function:

void TLocClust::PlotCutDistr ( const TStr OutFNm,
TStr  Desc = TStr() 
) const

Plots the cluster cut size vs. cluster size K (cluster is composed of nodes NIdV[1...K]).

Definition at line 113 of file ncp.cpp.

References TVec< TVal >::Add(), TGnuPlot::AddPlot(), TStr::CStr(), CutV, TStr::Empty(), TStr::Fmt(), gpwLinesPoints, TVec< TVal >::Len(), TLocClustStat::MakeExpBins(), TGnuPlot::SavePng(), SeedNId, and TGnuPlot::SetXYLabel().

                                                                {
  TFltPrV RankValV(CutV.Len(), 0);
  for (int i = 0; i < CutV.Len(); i++) {
    RankValV.Add(TFltPr(i+1, CutV[i].Val)); }
  if (Desc.Empty()) { Desc = OutFNm; }
  TFltPrV NewV;  TLocClustStat::MakeExpBins(RankValV, NewV);
  TGnuPlot GP(OutFNm+"-CUT", TStr::Fmt("CUT SIZE. Seed node %d.", SeedNId, Desc.CStr()));
  GP.AddPlot(NewV, gpwLinesPoints, ""); //, "pointtype 6 pointsize 1.5"
  GP.SetXYLabel("Rank", "Cut size");
  //GP.SetScale(gpsLog10X);
  GP.SavePng();
}

Here is the call graph for this function:

void TLocClust::PlotNCP ( const PUNGraph Graph,
const TStr FNm,
const TStr  Desc = "",
const bool &  BagOfWhiskers = true,
const bool &  RewireNet = false,
const int &  KMin = 10,
const int &  KMax = Mega(100),
const int &  Coverage = 10,
const bool &  SaveTxtStat = false,
const bool &  PlotBoltzman = false 
) [static]

Plots the Network Community Profile (NCP) of a given graph Graph. The NCP plot of a network captures the global community structure of the network. The NCP plot of a network captures the global community structure of the network. Refer to 'Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters by J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Internet Mathematics 6(1) 29--123, 2009' for the explanation of how to read these plots. URL: http://arxiv.org/abs/0810.1355

Definition at line 219 of file ncp.cpp.

References TLocClustStat::AddBagOfWhiskers(), Alpha, TSnap::GenRewire(), TLocClustStat::ImposeNCP(), TLocClustStat::PlotBoltzmanCurve(), TInt::Rnd, TLocClustStat::Run(), and TLocClustStat::SaveTxtInfo().

                                                                                                                                                                                                                                           {
  const double Alpha = 0.001, KFac = 1.5, SizeFrac = 0.001;
  //const int KMin = 2, KMax = Mega(100), Coverage = 10;
  TLocClustStat ClusStat1(Alpha, KMin, KMax, KFac, Coverage, SizeFrac);
  ClusStat1.Run(Graph, false, PlotBoltzman, SaveTxtStat);
  if (BagOfWhiskers) { ClusStat1.AddBagOfWhiskers(); }
  TLocClustStat ClusStat2(Alpha, KMin, KMax, KFac, Coverage, SizeFrac);
  ClusStat1.ImposeNCP(ClusStat2, FNm, Desc, "ORIGINAL", "REWIRED"); // plot before rewiring
  if (SaveTxtStat) { // for every piece size store modularity
    ClusStat1.SaveTxtInfo(FNm, Desc, false);
  }
  if (PlotBoltzman) {
    ClusStat1.PlotBoltzmanCurve(FNm, Desc);
  }
  if (RewireNet) {
    ClusStat2.Run(TSnap::GenRewire(Graph, 50, TInt::Rnd), false, false, false);
    if (BagOfWhiskers) { ClusStat2.AddBagOfWhiskers(); }
    ClusStat1.ImposeNCP(ClusStat2, FNm, Desc, "ORIGINAL", "REWIRED"); // if rewire, plot again
  }
}

Here is the call graph for this function:

void TLocClust::PlotPhiDistr ( const TStr OutFNm,
TStr  Desc = TStr() 
) const

Plots the cluster conductance vs. cluster size K (cluster is composed of nodes NIdV[1...K]).

Definition at line 126 of file ncp.cpp.

References TVec< TVal >::Add(), TGnuPlot::AddPlot(), TStr::CStr(), TStr::Empty(), TStr::Fmt(), gpwLinesPoints, TVec< TVal >::Len(), TLocClustStat::MakeExpBins(), PhiV, TGnuPlot::SavePng(), SeedNId, and TGnuPlot::SetXYLabel().

                                                                {
  TFltPrV RankValV(PhiV.Len(), 0);
  for (int i = 0; i < PhiV.Len(); i++) {
    RankValV.Add(TFltPr(i+1, PhiV[i])); }
  if (Desc.Empty()) { Desc = OutFNm; }
  TFltPrV NewV;  TLocClustStat::MakeExpBins(RankValV, NewV);
  TGnuPlot GP(OutFNm+"-PHI", TStr::Fmt("CONDUCTANCE (Cut size / Volume). Seed node %d.", SeedNId, Desc.CStr()));
  GP.AddPlot(NewV, gpwLinesPoints, ""); //, "pointtype 6 pointsize 1.5"
  GP.SetXYLabel("Rank", "Conductance (Cut size / Volume)");
  //GP.SetScale(gpsLog10X);
  GP.SavePng();
}

Here is the call graph for this function:

void TLocClust::PlotVolDistr ( const TStr OutFNm,
TStr  Desc = TStr() 
) const

Plots the cluster volume vs. cluster size K (cluster is composed of nodes NIdV[1...K]).

Definition at line 100 of file ncp.cpp.

References TVec< TVal >::Add(), TGnuPlot::AddPlot(), TStr::CStr(), TStr::Empty(), TStr::Fmt(), gpwLinesPoints, TVec< TVal >::Len(), TLocClustStat::MakeExpBins(), TGnuPlot::SavePng(), SeedNId, TGnuPlot::SetXYLabel(), and VolV.

                                                                {
  TFltPrV RankValV(VolV.Len(), 0);
  for (int i = 0; i < VolV.Len(); i++) {
    RankValV.Add(TFltPr(i+1, VolV[i].Val)); }
  if (Desc.Empty()) { Desc = OutFNm; }
  TFltPrV NewV;  TLocClustStat::MakeExpBins(RankValV, NewV);
  TGnuPlot GP(OutFNm+"-VOL", TStr::Fmt("VOLUME. Seed node %d.", SeedNId, Desc.CStr()));
  GP.AddPlot(NewV, gpwLinesPoints, ""); //, "pointtype 6 pointsize 1.5"
  GP.SetXYLabel("Rank", "Volume");
  //GP.SetScale(gpsLog10X);
  GP.SavePng();
}

Here is the call graph for this function:

void TLocClust::SavePajek ( const TStr OutFNm) const

Saves the network in the Pajek format so it can be visualized. Red node represents the seed and color the cluster membership.

Definition at line 139 of file ncp.cpp.

References TUNGraph::BegNI(), BestCut(), BestCutNodes(), TStr::CStr(), TUNGraph::EndNI(), TStr::Fmt(), TUNGraph::GetEdges(), TUNGraph::GetNodes(), TVec< TVal >::GetV(), Graph, NIdV, and SeedNId.

                                                  {
  FILE *F = fopen(TStr::Fmt("%s.net", OutFNm.CStr()).CStr(), "wt");
  TIntH NIdToIdH(Graph->GetNodes(), true);
  TIntH ClustSet(BestCut()+1);
  const int BucketSz = BestCutNodes()/ 4;
  TStrV ClrV = TStrV::GetV("Black", "Gray80", "Gray60", "Gray40", "Gray20", "RedViolet");
  for (int a = 0; a < BestCutNodes(); a++) {
    ClustSet.AddDat(NIdV[a], (a+1)/BucketSz);
  }
  fprintf(F, "*Vertices %d\n", Graph->GetNodes());
  int i = 0;
  for (TUNGraph::TNodeI NI = Graph->BegNI(); NI < Graph->EndNI(); NI++) {
    const int NId = NI.GetId();
    if (NId == SeedNId) {
      fprintf(F, "%d  \"%d\" diamond x_fact 2 y_fact 2 ic Green fos 10\n", i+1, NId); }
    else if (ClustSet.IsKey(NId)) {
      //fprintf(F, "%d  \"%d\" box x_fact 1 y_fact 1 ic Red fos 10\n", i+1, NId); }
      fprintf(F, "%d  \"%d\" box x_fact 2 y_fact 2 ic %s fos 10\n", i+1, NId, ClrV[ClustSet.GetDat(NId)].CStr()); }
    else {
      fprintf(F, "%d  \"%d\" ellipse x_fact 2 y_fact 2 ic Blue fos 10\n", i+1, NId); }
    NIdToIdH.AddDat(NId, i+1);
    i++;
  }
  fprintf(F, "*Arcs %d\n", Graph->GetEdges()); // arcs are directed, edges are undirected
  for (TUNGraph::TNodeI NI = Graph->BegNI(); NI < Graph->EndNI(); NI++) {
    const int NId = NIdToIdH.GetDat(NI.GetId());
    for (int e = 0; e < NI.GetOutDeg(); e++) {
      fprintf(F, "%d %d %g c Tan\n", NId, NIdToIdH.GetDat(NI.GetOutNId(e)).Val, 1.0);
    }
  }
  fclose(F);
}

Here is the call graph for this function:

After the function ApproxPageRank() has been run the SupportSweep() computes the volume, cut size, node ids, conductance vectors.

Definition at line 52 of file ncp.cpp.

References TVec< TVal >::Add(), TVec< TVal >::Clr(), CutV, Edges2, THash< TKey, TDat, THashFunc >::Empty(), THash< TKey, TDat, THashFunc >::GetKey(), THash< TKey, TDat, THashFunc >::GetKeyId(), TUNGraph::GetNI(), TUNGraph::TNodeI::GetOutDeg(), TUNGraph::TNodeI::GetOutNId(), Graph, IAssert, THash< TKey, TDat, THashFunc >::Len(), PhiV, ProbH, and VolV.

Referenced by FindBestCut().

                             {
  TExeTm ExeTm;
  VolV.Clr(false);  CutV.Clr(false);  PhiV.Clr(false);
  if (ProbH.Empty()) { return; }
  const int TopNId = ProbH.GetKey(0);
  const int TopNIdDeg = Graph->GetNI(TopNId).GetOutDeg();
  int Vol = TopNIdDeg, Cut = TopNIdDeg;
  double Phi = Cut/double(Vol);
  VolV.Add(Vol);  CutV.Add(Cut);  PhiV.Add(1.0);
  for (int i = 1; i < ProbH.Len(); i++) {
    const int NId = ProbH.GetKey(i);
    const TUNGraph::TNodeI& Node = Graph->GetNI(NId);
    const int OutDeg = Node.GetOutDeg();
    int CutSz = OutDeg; // edges outside
    for (int e = 0; e < OutDeg; e++) {
      const int Rank = ProbH.GetKeyId(Node.GetOutNId(e));
      if ( Rank > -1 && Rank < i) { CutSz -= 2;  }
    }
    Vol += OutDeg;  Cut += CutSz;
    if (Vol < Edges2) {
      if (2*Vol > Edges2) { Phi = Cut / double(Edges2-Vol); }
      else { Phi = Cut / double(Vol); }
    } else {
      Phi = 1.0;
    }
    IAssert((Phi+1e-6) >= double(1.0)/double(i*(i+1)+1)); // conductance is worse than the best possible
    VolV.Add(Vol);  CutV.Add(Cut);  PhiV.Add(Phi);
  }}

Here is the call graph for this function:

Here is the caller graph for this function:


Friends And Related Function Documentation

friend class TLocClustStat [friend]

Definition at line 120 of file ncp.h.


Member Data Documentation

double TLocClust::Alpha [private]

Definition at line 32 of file ncp.h.

Referenced by ApproxPageRank(), and PlotNCP().

int TLocClust::BestCutIdx [private]

Definition at line 37 of file ncp.h.

Referenced by BestCut(), BestCutNodes(), and FindBestCut().

Definition at line 35 of file ncp.h.

Referenced by GetCut(), GetCutV(), PlotCutDistr(), and SupportSweep().

int TLocClust::Edges2 [private]

Definition at line 29 of file ncp.h.

Referenced by GetCutStat(), and SupportSweep().

Definition at line 28 of file ncp.h.

Referenced by ApproxPageRank(), FindBestCut(), SavePajek(), and SupportSweep().

Definition at line 35 of file ncp.h.

Referenced by DrawWhiskers(), FindBestCut(), GetNId(), GetNIdV(), and SavePajek().

Definition at line 31 of file ncp.h.

Referenced by ApproxPageRank().

int TLocClust::Nodes [private]

Definition at line 29 of file ncp.h.

Referenced by GetCut(), and GetVol().

Definition at line 36 of file ncp.h.

Referenced by FindBestCut(), GetPhi(), GetPhiV(), PlotPhiDistr(), and SupportSweep().

Definition at line 30 of file ncp.h.

Referenced by ApproxPageRank(), FindBestCut(), and SupportSweep().

Definition at line 30 of file ncp.h.

Referenced by ApproxPageRank().

int TLocClust::SeedNId [private]

Definition at line 33 of file ncp.h.

Referenced by FindBestCut(), PlotCutDistr(), PlotPhiDistr(), PlotVolDistr(), and SavePajek().

bool TLocClust::Verbose = true [static]

Definition at line 26 of file ncp.h.

Referenced by TLocClustStat::Run().

Definition at line 35 of file ncp.h.

Referenced by GetRndWalkSup(), GetVol(), GetVolV(), PlotVolDistr(), and SupportSweep().


The documentation for this class was generated from the following files: