SNAP Library 2.2, Developer Reference  2014-03-11 19:15:55
SNAP, a general purpose, high performance system for analysis and manipulation of large networks
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
TAGMFit Class Reference

#include <agmfit.h>

Collaboration diagram for TAGMFit:

List of all members.

Public Member Functions

 TAGMFit ()
 ~TAGMFit ()
 TAGMFit (const PUNGraph &GraphPt, const TVec< TIntV > &CmtyVVPt, const int RndSeed=0)
 COMMENT. Use to describribe parameters.
 TAGMFit (const PUNGraph &GraphPt, const int InitComs, const int RndSeed=0)
 COMMENT. Use to describribe parameters.
 TAGMFit (const PUNGraph &GraphPt, const TVec< TIntV > &CmtyVVPt, const TRnd &RndPt)
 COMMENT. Use to describribe parameters.
void Save (TSOut &SOut)
void Load (TSIn &SIn, const int &RndSeed=0)
void RandomInitCmtyVV (const int InitComs, const double ComSzAlpha=1.3, const double MemAlpha=1.8, const int MinComSz=8, const int MaxComSz=200, const int MinMem=1, const int MaxMem=10)
 Randomly initialize bipartite community affiliation graph.
void AddBaseCmty ()
 Add Epsilon community (base community which includes all nodes) into community affiliation graph. This means that we will later fit the value of epsilon.
double Likelihood ()
 COMMENT.
double Likelihood (const TFltV &NewLambdaV)
double Likelihood (const TFltV &NewLambdaV, double &LEdges, double &LNoEdges)
void SetRegCoef (const double Val)
void GetEdgeJointCom ()
 For each (u, v) in edges, precompute C_uv (the set of communities nodes u and v share).
void NeighborComInit (const int InitComs)
 Initialize node community memberships using best neighborhood communities (see D. Gleich et al. KDD'12).
void GradLogLForLambda (TFltV &GradV)
int MLEGradAscentGivenCAG (const double &Thres=0.001, const int &MaxIter=10000, const TStr PlotNm=TStr())
 Gradient descent for p_c while keeping the community affiliation graph (CAG) fixed.
void SetDefaultPNoCom ()
 Set Epsilon (the probability that two nodes sharing no communities connect) to the default value.
void SetPNoCom (const double &Epsilon)
 Set Epsilon (the probability that two nodes sharing no communities connect) to the default value.
double GetPNoCom ()
double CalcPNoComByCmtyVV (const int &SamplePairs=-1)
 Compute the empirical edge probability between a pair of nodes who share no community (epsilon), based on current community affiliations.
void GetNewtonStep (TFltVV &HVV, TFltV &GradV, TFltV &DeltaLV)
double SelectLambdaSum (const TIntSet &ComK)
 Compute sum of lambda_c (which is log (1 - p_c)) over C_uv (ComK). The function is used to compute edge probability P_uv.
double SelectLambdaSum (const TFltV &NewLambdaV, const TIntSet &ComK)
 COMMENT.
void RandomInit (const int &MaxK)
 COMMENT.
void RunMCMC (const int &MaxIter, const int &EvalLambdaIter, const TStr &PlotFPrx=TStr())
 Main procedure for fitting the AGM to a given graph using MCMC.
void SampleTransition (int &NID, int &JoinCID, int &LeaveCID, double &DeltaL)
 Sample MMCM transitions: Choose among (join, leave, switch), and then sample (NID, CID).
void InitNodeData ()
 COMMENT.
void LeaveCom (const int &NID, const int &CID)
 After MCMC, NID leaves community CID.
void JoinCom (const int &NID, const int &JoinCID)
int RemoveEmptyCom ()
 Remove all communities with no members.
double SeekLeave (const int &UID, const int &CID)
 Compute the change in likelihood (Delta) if node UID leaves community CID.
double SeekJoin (const int &UID, const int &CID)
 Compute the change in likelihood (Delta) if node UID joins community CID.
double SeekSwitch (const int &UID, const int &CurCID, const int &NewCID)
double GetStepSizeByLineSearchForLambda (const TFltV &DeltaV, const TFltV &GradV, const double &Alpha, const double &Beta)
 Step size search for updating P_c (which is parametarized by regularization parameter lambda).
void SetLambdaV (const TFltV &LambdaPt)
 COMMENT.
void GetLambdaV (TFltV &OutV)
 COMMENT.
void GetQV (TFltV &OutV)
 Returns QV, a vector of (1 - p_c) for each community c.
void GetCmtyVV (TVec< TIntV > &CmtyVV, const double QMax=2.0)
 Get communities whose p_c is higher than 1 - QMax.
void GetCmtyVV (TVec< TIntV > &CmtyVV, TFltV &QV, const double QMax=2.0)
 COMMENT.
void SetCmtyVV (const TVec< TIntV > &CmtyVV)
 COMMENT.
void PrintSummary ()
 COMMENT.

Private Attributes

PUNGraph G
 Graph to fit.
TVec< TIntSetCIDNSetV
 Community ID -> Member Node ID Sets.
THash< TIntPr, TIntSetEdgeComVH
 Edge -> Shared Community ID Set.
THash< TInt, TIntSetNIDComVH
 Node ID -> Communitie IDs the node belongs to.
TIntV ComEdgesV
 The number of edges in each community.
TFlt PNoCom
 Probability of edge when two nodes share no community (epsilon in the paper).
TFltV LambdaV
 Parametrization of P_c (edge probability in community c), P_c = 1 - exp(-lambda).
TRnd Rnd
THash< TIntPr, TFltNIDCIDPrH
 <Node ID, Community ID> pairs (for sampling MCMC moves).
THash< TIntPr, TIntNIDCIDPrS
 <Node ID, Community ID> pairs (for sampling MCMC moves).
TFlt MinLambda
 Minimum value of regularization parameter lambda (default = 1e-5).
TFlt MaxLambda
 Maximum value of regularization parameter lambda (default = 10).
TFlt RegCoef
 Regularization parameter when we fit for P_c (for finding # communities).
TInt BaseCID
 ID of the Epsilon-community (in case we fit P_c of the epsilon community). We do not fit for the Epsilon-community in general.

Detailed Description

Fitting the Affilialiton Graph Model (AGM).

Definition at line 8 of file agmfit.h.


Constructor & Destructor Documentation

TAGMFit::TAGMFit ( ) [inline]

Definition at line 26 of file agmfit.h.

{ }
TAGMFit::~TAGMFit ( ) [inline]

Definition at line 27 of file agmfit.h.

{ }
TAGMFit::TAGMFit ( const PUNGraph GraphPt,
const TVec< TIntV > &  CmtyVVPt,
const int  RndSeed = 0 
) [inline]

COMMENT. Use to describribe parameters.

Definition at line 29 of file agmfit.h.

References SetCmtyVV().

: G(GraphPt), PNoCom(0.0), Rnd(RndSeed), MinLambda(0.00001), MaxLambda(10.0), RegCoef(0), BaseCID(-1) { SetCmtyVV(CmtyVVPt);  }

Here is the call graph for this function:

TAGMFit::TAGMFit ( const PUNGraph GraphPt,
const int  InitComs,
const int  RndSeed = 0 
) [inline]

COMMENT. Use to describribe parameters.

Definition at line 31 of file agmfit.h.

References NeighborComInit().

: G(GraphPt), PNoCom(0.0), Rnd(RndSeed), MinLambda(0.00001), MaxLambda(10.0), RegCoef(0), BaseCID(-1) { NeighborComInit(InitComs); }//RandomInitCmtyVV(InitComs);  }

Here is the call graph for this function:

TAGMFit::TAGMFit ( const PUNGraph GraphPt,
const TVec< TIntV > &  CmtyVVPt,
const TRnd RndPt 
) [inline]

COMMENT. Use to describribe parameters.

Definition at line 33 of file agmfit.h.

References SetCmtyVV().

: G(GraphPt), PNoCom(0.0), Rnd(RndPt), MinLambda(0.00001), MaxLambda(10.0), RegCoef(0), BaseCID(-1) { SetCmtyVV(CmtyVVPt); }

Here is the call graph for this function:


Member Function Documentation

Add Epsilon community (base community which includes all nodes) into community affiliation graph. This means that we will later fit the value of epsilon.

Definition at line 251 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Add(), BaseCID, CIDNSetV, G, GetCmtyVV(), TUNGraph::GetNIdV(), IAssert, InitNodeData(), TVec< TVal, TSizeTy >::Len(), PNoCom, and SetCmtyVV().

                          {
  TVec<TIntV> CmtyVV;
  GetCmtyVV(CmtyVV);
  TIntV TmpV = CmtyVV[0];
  CmtyVV.Add(TmpV);
  G->GetNIdV(CmtyVV[0]);
  IAssert(CIDNSetV.Len() + 1 == CmtyVV.Len());
  SetCmtyVV(CmtyVV);
  InitNodeData();
  BaseCID = 0;
  PNoCom = 0.0;
}

Here is the call graph for this function:

double TAGMFit::CalcPNoComByCmtyVV ( const int &  SamplePairs = -1)

Compute the empirical edge probability between a pair of nodes who share no community (epsilon), based on current community affiliations.

Definition at line 632 of file agmfit.cpp.

References G, THash< TKey, TDat, THashFunc >::GetDat(), TAGMUtil::GetIntersection(), TUNGraph::GetNIdV(), TUNGraph::GetNodes(), TUInt64::GetStr(), TUNGraph::IsEdge(), TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), NIDComVH, PNoCom, and TFlt::Val.

                                                         {
  TIntV NIdV;
  G->GetNIdV(NIdV);
  uint64 PairNoCom = 0, EdgesNoCom = 0;
  for (int u = 0; u < NIdV.Len(); u++) {
    for (int v = u + 1; v < NIdV.Len(); v++) {
      int SrcNID = NIdV[u], DstNID = NIdV[v];
      TIntSet JointCom;
      TAGMUtil::GetIntersection(NIDComVH.GetDat(SrcNID),NIDComVH.GetDat(DstNID),JointCom);
      if(JointCom.Len() == 0) {
        PairNoCom++;
        if (G->IsEdge(SrcNID, DstNID)) { EdgesNoCom++; }
        if (SamplePairs > 0 && PairNoCom >= (uint64) SamplePairs) { break; }
      }
    }
    if (SamplePairs > 0 && PairNoCom >= (uint64) SamplePairs) { break; }
  }
  double DefaultVal = 1.0 / (double)G->GetNodes() / (double)G->GetNodes();
  if (EdgesNoCom > 0) {
    PNoCom = (double) EdgesNoCom / (double) PairNoCom;
  } else {
    PNoCom = DefaultVal;
  }
  printf("%s / %s edges without joint com detected (PNoCom = %f)\n", TUInt64::GetStr(EdgesNoCom).CStr(), TUInt64::GetStr(PairNoCom).CStr(), PNoCom.Val);
  return PNoCom;
}

Here is the call graph for this function:

void TAGMFit::GetCmtyVV ( TVec< TIntV > &  CmtyVV,
const double  QMax = 2.0 
)

Get communities whose p_c is higher than 1 - QMax.

Definition at line 554 of file agmfit.cpp.

Referenced by AddBaseCmty(), and TAGMUtil::FindComsByAGM().

                                                              {
  TFltV TmpQV;
  GetCmtyVV(CmtyVV, TmpQV, QMax);
}

Here is the caller graph for this function:

void TAGMFit::GetCmtyVV ( TVec< TIntV > &  CmtyVV,
TFltV QV,
const double  QMax = 2.0 
)

COMMENT.

Definition at line 559 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Add(), THash< TKey, TDat, THashFunc >::AddDat(), BaseCID, CIDNSetV, G, TVec< TVal, TSizeTy >::Gen(), TUNGraph::GetNodes(), IAssert, LambdaV, TVec< TVal, TSizeTy >::Len(), and MinLambda.

                                                                         {
  CmtyVV.Gen(CIDNSetV.Len(), 0);
  QV.Gen(CIDNSetV.Len(), 0);
  TIntFltH CIDLambdaH(CIDNSetV.Len());
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    CIDLambdaH.AddDat(c, LambdaV[c]);
  }
  CIDLambdaH.SortByDat(false);
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    int CID = CIDLambdaH.GetKey(c);
    IAssert(LambdaV[CID] >= MinLambda);
    double Q = exp( - (double) LambdaV[CID]);
    if (Q > QMax) { continue; }
    TIntV CmtyV;
    CIDNSetV[CID].GetKeyV(CmtyV);
    if (CmtyV.Len() == 0) { continue; }
    if (CID == BaseCID) { //if the community is the base community(epsilon community), discard
      IAssert(CmtyV.Len() == G->GetNodes());
    } else {
      CmtyVV.Add(CmtyV);
      QV.Add(Q);
    }
  }
}

Here is the call graph for this function:

For each (u, v) in edges, precompute C_uv (the set of communities nodes u and v share).

Definition at line 50 of file agmfit.cpp.

References THash< TKey, TDat, THashFunc >::AddDat(), TUNGraph::BegNI(), CIDNSetV, ComEdgesV, EdgeComVH, TUNGraph::EndNI(), G, THash< TKey, TDat, THashFunc >::Gen(), TVec< TVal, TSizeTy >::Gen(), THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::GetEdges(), TAGMUtil::GetIntersection(), IAssert, THash< TKey, TDat, THashFunc >::IsKey(), THash< TKey, TDat, THashFunc >::Len(), TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), and NIDComVH.

Referenced by InitNodeData().

                              {
  ComEdgesV.Gen(CIDNSetV.Len());
  EdgeComVH.Gen(G->GetEdges());
  for (TUNGraph::TNodeI SrcNI = G->BegNI(); SrcNI < G->EndNI(); SrcNI++) {
    int SrcNID = SrcNI.GetId();
    for (int v = 0; v < SrcNI.GetDeg(); v++) {
      int DstNID = SrcNI.GetNbrNId(v);
      if (SrcNID >= DstNID) { continue; }
      TIntSet JointCom;
      IAssert(NIDComVH.IsKey(SrcNID));
      IAssert(NIDComVH.IsKey(DstNID));
      TAGMUtil::GetIntersection(NIDComVH.GetDat(SrcNID), NIDComVH.GetDat(DstNID), JointCom);
      EdgeComVH.AddDat(TIntPr(SrcNID,DstNID),JointCom);
      for (int k = 0; k < JointCom.Len(); k++) {
        ComEdgesV[JointCom[k]]++;
      }
    }
  }
  IAssert(EdgeComVH.Len() == G->GetEdges());
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::GetLambdaV ( TFltV OutV) [inline]

COMMENT.

Definition at line 93 of file agmfit.h.

References LambdaV.

{OutV = LambdaV;}
void TAGMFit::GetNewtonStep ( TFltVV HVV,
TFltV GradV,
TFltV DeltaLV 
)
double TAGMFit::GetPNoCom ( ) [inline]

Definition at line 58 of file agmfit.h.

References PNoCom.

{ return PNoCom; }
void TAGMFit::GetQV ( TFltV OutV)

Returns QV, a vector of (1 - p_c) for each community c.

Definition at line 451 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Gen(), LambdaV, and TVec< TVal, TSizeTy >::Len().

                               {
  OutV.Gen(LambdaV.Len());
  for (int i = 0; i < LambdaV.Len(); i++) {
    OutV[i] = exp(- LambdaV[i]);
  }
}

Here is the call graph for this function:

double TAGMFit::GetStepSizeByLineSearchForLambda ( const TFltV DeltaV,
const TFltV GradV,
const double &  Alpha,
const double &  Beta 
)

Step size search for updating P_c (which is parametarized by regularization parameter lambda).

Definition at line 109 of file agmfit.cpp.

References TLinAlg::DotProduct(), IAssert, LambdaV, TVec< TVal, TSizeTy >::Len(), Likelihood(), MaxLambda, and MinLambda.

Referenced by MLEGradAscentGivenCAG().

                                                                                                                                 {
  double StepSize = 1.0;
  double InitLikelihood = Likelihood();
  IAssert(LambdaV.Len() == DeltaV.Len());
  TFltV NewLambdaV(LambdaV.Len());
  for (int iter = 0; ; iter++) {
    for (int i = 0; i < LambdaV.Len(); i++) {
      NewLambdaV[i] = LambdaV[i] + StepSize * DeltaV[i];
      if (NewLambdaV[i] < MinLambda) { NewLambdaV[i] = MinLambda; }
      if (NewLambdaV[i] > MaxLambda) { NewLambdaV[i] = MaxLambda; }
    }
    if (Likelihood(NewLambdaV) < InitLikelihood + Alpha * StepSize * TLinAlg::DotProduct(GradV, DeltaV)) {
      StepSize *= Beta;
    } else {
      break;
    }
  }
  return StepSize;
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::GradLogLForLambda ( TFltV GradV)

Definition at line 595 of file agmfit.cpp.

References THashSet< TKey, THashFunc >::BegI(), CIDNSetV, ComEdgesV, EdgeComVH, THashSet< TKey, THashFunc >::EndI(), TVec< TVal, TSizeTy >::Gen(), LambdaV, THash< TKey, TDat, THashFunc >::Len(), TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), PNoCom, RegCoef, and SelectLambdaSum().

Referenced by MLEGradAscentGivenCAG().

                                            {
  GradV.Gen(LambdaV.Len());
  TFltV SumEdgeProbsV(LambdaV.Len());
  for (int e = 0; e < EdgeComVH.Len(); e++) {
    TIntSet& JointCom = EdgeComVH[e];
    double LambdaSum = SelectLambdaSum(JointCom);
    double Puv = 1 - exp(- LambdaSum);
    if (JointCom.Len() == 0) {  Puv = PNoCom;  }
    for (TIntSet::TIter SI = JointCom.BegI(); SI < JointCom.EndI(); SI++) {
      SumEdgeProbsV[SI.GetKey()] += (1 - Puv) / Puv;
    }
  }
  for (int k = 0; k < LambdaV.Len(); k++) {
    int MaxEk = CIDNSetV[k].Len() * (CIDNSetV[k].Len() - 1) / 2;
    int NotEdgesInCom = MaxEk - ComEdgesV[k];
    GradV[k] = SumEdgeProbsV[k] - (double) NotEdgesInCom;
    if (LambdaV[k] > 0.0 && RegCoef > 0.0) { //if regularization exists
      GradV[k] -= RegCoef;
    }
  }
}

Here is the call graph for this function:

Here is the caller graph for this function:

COMMENT.

Definition at line 264 of file agmfit.cpp.

References THash< TKey, TDat, THashFunc >::AddDat(), THash< TKey, TDat, THashFunc >::AddKey(), TUNGraph::BegNI(), CIDNSetV, ComEdgesV, TSnap::DelSelfEdges(), TVec< TVal, TSizeTy >::EndI(), TUNGraph::EndNI(), G, THash< TKey, TDat, THashFunc >::Gen(), TVec< TVal, TSizeTy >::Gen(), GetEdgeJointCom(), TAGMUtil::GetNodeMembership(), TUNGraph::GetNodes(), LambdaV, TVec< TVal, TSizeTy >::Len(), MaxLambda, MinLambda, NIDCIDPrS, and NIDComVH.

Referenced by AddBaseCmty(), NeighborComInit(), RandomInit(), RunMCMC(), and SetCmtyVV().

                           {
  TSnap::DelSelfEdges(G);
  NIDComVH.Gen(G->GetNodes());
  for (TUNGraph::TNodeI NI = G->BegNI(); NI < G->EndNI(); NI++) {
    NIDComVH.AddDat(NI.GetId());
  }
  TAGMUtil::GetNodeMembership(NIDComVH, CIDNSetV);
  GetEdgeJointCom();
  LambdaV.Gen(CIDNSetV.Len());
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    int MaxE = (CIDNSetV[c].Len()) * (CIDNSetV[c].Len() - 1) / 2;
    if (MaxE < 2) {
      LambdaV[c] = MaxLambda;
    }
    else{
      LambdaV[c] = -log((double) (MaxE - ComEdgesV[c]) / MaxE);
    }
    if (LambdaV[c] > MaxLambda) {  LambdaV[c] = MaxLambda;  }
    if (LambdaV[c] < MinLambda) {  LambdaV[c] = MinLambda;  }
  }
  NIDCIDPrS.Gen(G->GetNodes() * 10);
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    for (TIntSet::TIter SI = CIDNSetV[c].BegI(); SI < CIDNSetV[c].EndI(); SI++) {
      NIDCIDPrS.AddKey(TIntPr(SI.GetKey(), c));
    }
  }
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::JoinCom ( const int &  NID,
const int &  JoinCID 
)

Definition at line 309 of file agmfit.cpp.

References THash< TKey, TDat, THashFunc >::AddKey(), THashSet< TKey, THashFunc >::AddKey(), CIDNSetV, ComEdgesV, EdgeComVH, G, THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), THashSet< TKey, THashFunc >::IsKey(), TMath::Mn(), TMath::Mx(), NIDCIDPrS, and NIDComVH.

Referenced by RunMCMC().

                                                        {
  TUNGraph::TNodeI NI = G->GetNI(NID);
  for (int e = 0; e < NI.GetDeg(); e++) {
    int VID = NI.GetNbrNId(e);
    if (NIDComVH.GetDat(VID).IsKey(JoinCID)) {
      TIntPr SrcDstNIDPr = TIntPr(TMath::Mn(NID,VID), TMath::Mx(NID,VID));
      EdgeComVH.GetDat(SrcDstNIDPr).AddKey(JoinCID);
      ComEdgesV[JoinCID]++;
    }
  }
  CIDNSetV[JoinCID].AddKey(NID);
  NIDComVH.GetDat(NID).AddKey(JoinCID);
  NIDCIDPrS.AddKey(TIntPr(NID, JoinCID));
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::LeaveCom ( const int &  NID,
const int &  CID 
)

After MCMC, NID leaves community CID.

Definition at line 293 of file agmfit.cpp.

References CIDNSetV, ComEdgesV, THash< TKey, TDat, THashFunc >::DelKey(), THashSet< TKey, THashFunc >::DelKey(), EdgeComVH, G, THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), THashSet< TKey, THashFunc >::IsKey(), TMath::Mn(), TMath::Mx(), NIDCIDPrS, and NIDComVH.

Referenced by RunMCMC().

                                                     {
  TUNGraph::TNodeI NI = G->GetNI(NID);
  for (int e = 0; e < NI.GetDeg(); e++) {
    int VID = NI.GetNbrNId(e);
    if (NIDComVH.GetDat(VID).IsKey(CID)) {
      TIntPr SrcDstNIDPr = TIntPr(TMath::Mn(NID,VID), TMath::Mx(NID,VID));
      EdgeComVH.GetDat(SrcDstNIDPr).DelKey(CID);
      ComEdgesV[CID]--;
    }
  }
  CIDNSetV[CID].DelKey(NID);
  NIDComVH.GetDat(NID).DelKey(CID);
  NIDCIDPrS.DelKey(TIntPr(NID, CID));
}

Here is the call graph for this function:

Here is the caller graph for this function:

double TAGMFit::Likelihood ( )

COMMENT.

Definition at line 104 of file agmfit.cpp.

References LambdaV.

Referenced by TAGMUtil::FindComsByAGM(), GetStepSizeByLineSearchForLambda(), MLEGradAscentGivenCAG(), PrintSummary(), and RunMCMC().

                           { 
  return Likelihood(LambdaV); 
}

Here is the caller graph for this function:

double TAGMFit::Likelihood ( const TFltV NewLambdaV) [inline]

Definition at line 43 of file agmfit.h.

References Likelihood().

Referenced by Likelihood().

{ double Tmp1, Tmp2; return Likelihood(NewLambdaV, Tmp1, Tmp2); }

Here is the call graph for this function:

Here is the caller graph for this function:

double TAGMFit::Likelihood ( const TFltV NewLambdaV,
double &  LEdges,
double &  LNoEdges 
)

Definition at line 76 of file agmfit.cpp.

References CIDNSetV, ComEdgesV, EdgeComVH, IAssert, THash< TKey, TDat, THashFunc >::Len(), TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), TFlt::Mn, PNoCom, RegCoef, SelectLambdaSum(), and TLinAlg::SumVec().

                                                                                    {
  IAssert(CIDNSetV.Len() == NewLambdaV.Len());
  IAssert(ComEdgesV.Len() == CIDNSetV.Len());
  LEdges = 0.0; LNoEdges = 0.0;
  for (int e = 0; e < EdgeComVH.Len(); e++) {
    TIntSet& JointCom = EdgeComVH[e];
    double LambdaSum = SelectLambdaSum(NewLambdaV, JointCom);
    double Puv = 1 - exp(- LambdaSum);
    if (JointCom.Len() == 0) {  Puv = PNoCom;  }
    IAssert(! _isnan(log(Puv)));
    LEdges += log(Puv);
  }
  for (int k = 0; k < NewLambdaV.Len(); k++) {
    int MaxEk = CIDNSetV[k].Len() * (CIDNSetV[k].Len() - 1) / 2;
    int NotEdgesInCom = MaxEk - ComEdgesV[k];
    if(NotEdgesInCom > 0) {
      if (LNoEdges >= TFlt::Mn + double(NotEdgesInCom) * NewLambdaV[k]) { 
        LNoEdges -= double(NotEdgesInCom) * NewLambdaV[k];
      }
    }
  }
  double LReg = 0.0;
  if (RegCoef > 0.0) {
    LReg = - RegCoef * TLinAlg::SumVec(NewLambdaV);
  }
  return LEdges + LNoEdges + LReg;
}

Here is the call graph for this function:

void TAGMFit::Load ( TSIn SIn,
const int &  RndSeed = 0 
)
int TAGMFit::MLEGradAscentGivenCAG ( const double &  Thres = 0.001,
const int &  MaxIter = 10000,
const TStr  PlotNm = TStr() 
)

Gradient descent for p_c while keeping the community affiliation graph (CAG) fixed.

Definition at line 130 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Add(), TStr::Empty(), G, TUNGraph::GetEdges(), GetStepSizeByLineSearchForLambda(), TExeTm::GetTmStr(), GradLogLForLambda(), Kilo, LambdaV, TVec< TVal, TSizeTy >::Len(), Likelihood(), MaxLambda, MinLambda, TLinAlg::Norm(), and TGnuPlot::PlotValV().

Referenced by TAGMUtil::FindComsByAGM(), and RunMCMC().

                                                                                             {
  int Edges = G->GetEdges();
  TExeTm ExeTm;
  TFltV GradV(LambdaV.Len());
  int iter = 0;
  TIntFltPrV IterLV, IterGradNormV;
  double GradCutOff = 1000;
  for (iter = 0; iter < MaxIter; iter++) {
    GradLogLForLambda(GradV);    //if gradient is going out of the boundary, cut off
    for (int i = 0; i < LambdaV.Len(); i++) {
      if (GradV[i] < -GradCutOff) { GradV[i] = -GradCutOff; }
      if (GradV[i] > GradCutOff) { GradV[i] = GradCutOff; }
      if (LambdaV[i] <= MinLambda && GradV[i] < 0) { GradV[i] = 0.0; }
      if (LambdaV[i] >= MaxLambda && GradV[i] > 0) { GradV[i] = 0.0; }
    }
    double Alpha = 0.15, Beta = 0.2;
    if (Edges > Kilo(100)) { Alpha = 0.00015; Beta = 0.3;}
    double LearnRate = GetStepSizeByLineSearchForLambda(GradV, GradV, Alpha, Beta);
    if (TLinAlg::Norm(GradV) < Thres) { break; }
    for (int i = 0; i < LambdaV.Len(); i++) {
      double Change = LearnRate * GradV[i];
      LambdaV[i] += Change;
      if(LambdaV[i] < MinLambda) { LambdaV[i] = MinLambda;}
      if(LambdaV[i] > MaxLambda) { LambdaV[i] = MaxLambda;}
    }
    if (! PlotNm.Empty()) {
      double L = Likelihood();
      IterLV.Add(TIntFltPr(iter, L));
      IterGradNormV.Add(TIntFltPr(iter, TLinAlg::Norm(GradV)));
    }
  }
  if (! PlotNm.Empty()) {
    TGnuPlot::PlotValV(IterLV, PlotNm + ".likelihood_Q");
    TGnuPlot::PlotValV(IterGradNormV, PlotNm + ".gradnorm_Q");
    printf("MLE for Lambda completed with %d iterations(%s)\n",iter,ExeTm.GetTmStr());
  }
  return iter;
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::NeighborComInit ( const int  InitComs)

Initialize node community memberships using best neighborhood communities (see D. Gleich et al. KDD'12).

Definition at line 186 of file agmfit.cpp.

References CIDNSetV, G, TVec< TVal, TSizeTy >::Gen(), TAGMUtil::GetConductance(), TUNGraph::TNodeI::GetDeg(), TUNGraph::GetEdges(), TUNGraph::TNodeI::GetId(), TAGMUtil::GetNbhCom(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), TUNGraph::GetNIdV(), TUNGraph::GetNodes(), TUNGraph::GetRndNI(), TExeTm::GetTmStr(), IAssert, InitNodeData(), TVec< TVal, TSizeTy >::Len(), and SetDefaultPNoCom().

Referenced by TAGMFit().

                                                {
  CIDNSetV.Gen(InitComs);
  const int Edges = G->GetEdges();
  TFltIntPrV NIdPhiV(G->GetNodes(), 0);
  TIntSet InvalidNIDS(G->GetNodes());
  TIntV ChosenNIDV(InitComs, 0); //FOR DEBUG
  TExeTm RunTm;
  //compute conductance of neighborhood community
  TIntV NIdV;
  G->GetNIdV(NIdV);
  for (int u = 0; u < NIdV.Len(); u++) {
    TIntSet NBCmty(G->GetNI(NIdV[u]).GetDeg() + 1);
    double Phi;
    if (G->GetNI(NIdV[u]).GetDeg() < 5) { //do not include nodes with too few degree
      Phi = 1.0; 
    } else {
      TAGMUtil::GetNbhCom(G, NIdV[u], NBCmty);
      IAssert(NBCmty.Len() == G->GetNI(NIdV[u]).GetDeg() + 1);
      Phi = TAGMUtil::GetConductance(G, NBCmty, Edges);
    }
    NIdPhiV.Add(TFltIntPr(Phi, NIdV[u]));
  }
  NIdPhiV.Sort(true);
  printf("conductance computation completed [%s]\n", RunTm.GetTmStr());
  fflush(stdout);
  //choose nodes with local minimum in conductance
  int CurCID = 0;
  for (int ui = 0; ui < NIdPhiV.Len(); ui++) {
    int UID = NIdPhiV[ui].Val2;
    fflush(stdout);
    if (InvalidNIDS.IsKey(UID)) { continue; }
    ChosenNIDV.Add(UID); //FOR DEBUG
    //add the node and its neighbors to the current community
    CIDNSetV[CurCID].AddKey(UID);
    TUNGraph::TNodeI NI = G->GetNI(UID);
    fflush(stdout);
    for (int e = 0; e < NI.GetDeg(); e++) {
      CIDNSetV[CurCID].AddKey(NI.GetNbrNId(e));
    }
    //exclude its neighbors from the next considerations
    for (int e = 0; e < NI.GetDeg(); e++) {
      InvalidNIDS.AddKey(NI.GetNbrNId(e));
    }
    CurCID++;
    fflush(stdout);
    if (CurCID >= InitComs) { break;  }
  }
  if (InitComs > CurCID) {
    printf("%d communities needed to fill randomly\n", InitComs - CurCID);
  }
  //assign a member to zero-member community (if any)
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    if (CIDNSetV[c].Len() == 0) {
      int ComSz = 10;
      for (int u = 0; u < ComSz; u++) {
        int UID = G->GetRndNI().GetId();
        CIDNSetV[c].AddKey(UID);
      }
    }
  }
  InitNodeData();
  SetDefaultPNoCom();
}

Here is the call graph for this function:

Here is the caller graph for this function:

COMMENT.

Definition at line 659 of file agmfit.cpp.

References THash< TKey, TDat, THashFunc >::AddDat(), CIDNSetV, ComEdgesV, LambdaV, THash< TKey, TDat, THashFunc >::Len(), TVec< TVal, TSizeTy >::Len(), Likelihood(), NIDCIDPrS, PNoCom, and TFlt::Val.

                           {
  TIntFltH CIDLambdaH(CIDNSetV.Len());
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    CIDLambdaH.AddDat(c, LambdaV[c]);
  }
  CIDLambdaH.SortByDat(false);
  int Coms = 0;
  for (int i = 0; i < LambdaV.Len(); i++) {
    int CID = CIDLambdaH.GetKey(i);
    if (LambdaV[CID] <= 0.0001) { continue; }
    printf("P_c : %.3f Com Sz: %d, Total Edges inside: %d \n", 1.0 - exp(- LambdaV[CID]), CIDNSetV[CID].Len(), (int) ComEdgesV[CID]);
    Coms++;
  }
  printf("%d Communities, Total Memberships = %d, Likelihood = %.2f, Epsilon = %f\n", Coms, NIDCIDPrS.Len(), Likelihood(), PNoCom.Val);
}

Here is the call graph for this function:

void TAGMFit::RandomInit ( const int &  MaxK)

COMMENT.

Definition at line 169 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Add(), THashSet< TKey, THashFunc >::AddKey(), CIDNSetV, TVec< TVal, TSizeTy >::Clr(), G, TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetId(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), TRnd::GetUniDevInt(), InitNodeData(), TVec< TVal, TSizeTy >::Last(), Rnd, and SetDefaultPNoCom().

                                        {
  CIDNSetV.Clr();
  for (int c = 0; c < MaxK; c++) {
    CIDNSetV.Add();
    int NC = Rnd.GetUniDevInt(G -> GetNodes());
    TUNGraph::TNodeI NI = G -> GetRndNI();
    CIDNSetV.Last().AddKey(NI.GetId());
    for (int v = 0; v < NC; v++) {
      NI = G->GetNI(NI.GetNbrNId(Rnd.GetUniDevInt(NI.GetDeg())));
      CIDNSetV.Last().AddKey(NI.GetId());
    }
  }
  InitNodeData();
  SetDefaultPNoCom();
}

Here is the call graph for this function:

void TAGMFit::RandomInitCmtyVV ( const int  InitComs,
const double  ComSzAlpha = 1.3,
const double  MemAlpha = 1.8,
const int  MinComSz = 8,
const int  MaxComSz = 200,
const int  MinMem = 1,
const int  MaxMem = 10 
)

Randomly initialize bipartite community affiliation graph.

Definition at line 42 of file agmfit.cpp.

References G, TAGMUtil::GenCmtyVVFromPL(), TUNGraph::GetNodes(), Rnd, and SetCmtyVV().

                                                                                                                                                                             {
  TVec<TIntV> InitCmtyVV(InitComs, 0);
  TAGMUtil::GenCmtyVVFromPL(InitCmtyVV, G, G->GetNodes(), InitComs, ComSzAlpha, MemAlpha, MinComSz, MaxComSz,
      MinMem, MaxMem, Rnd);
  SetCmtyVV(InitCmtyVV);
}

Here is the call graph for this function:

Remove all communities with no members.

Definition at line 459 of file agmfit.cpp.

References CIDNSetV, TVec< TVal, TSizeTy >::DelLast(), LambdaV, TVec< TVal, TSizeTy >::Last(), and TVec< TVal, TSizeTy >::Len().

                            {
  int DelCnt = 0;
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    if (CIDNSetV[c].Len() == 0) {
      CIDNSetV[c] = CIDNSetV.Last();
      CIDNSetV.DelLast();
      LambdaV[c] = LambdaV.Last();
      LambdaV.DelLast();
      DelCnt++;
      c--;
    }
  }
  return DelCnt;
}

Here is the call graph for this function:

void TAGMFit::RunMCMC ( const int &  MaxIter,
const int &  EvalLambdaIter,
const TStr PlotFPrx = TStr() 
)

Main procedure for fitting the AGM to a given graph using MCMC.

MCMC fitting.

Definition at line 366 of file agmfit.cpp.

References TVec< TVal, TSizeTy >::Add(), TGnuPlot::AddPlot(), BaseCID, CIDNSetV, TStr::Fmt(), G, TUNGraph::GetEdges(), TUNGraph::GetNodes(), TExeTm::GetTmStr(), TRnd::GetUniDev(), gpwLinesPoints, InitNodeData(), JoinCom(), LambdaV, LeaveCom(), TStr::Len(), TVec< TVal, TSizeTy >::Len(), Likelihood(), MLEGradAscentGivenCAG(), TMath::Mx(), Rnd, SampleTransition(), TGnuPlot::SavePng(), TGnuPlot::SetDataPlotFNm(), TGnuPlot::SetTitle(), and TExeTm::Tick().

Referenced by TAGMUtil::FindComsByAGM().

                                                                                         {
  TExeTm IterTm, TotalTm;
  double PrevL = Likelihood(), DeltaL = 0;
  double BestL = PrevL;
  printf("initial likelihood = %f\n",PrevL);
  TIntFltPrV IterTrueLV, IterJoinV, IterLeaveV, IterAcceptV, IterSwitchV, IterLBV;
  TIntPrV IterTotMemV;
  TIntV IterV;
  TFltV BestLV;
  TVec<TIntSet> BestCmtySetV;
  int SwitchCnt = 0, LeaveCnt = 0, JoinCnt = 0, AcceptCnt = 0, ProbBinSz;
  int Nodes = G->GetNodes(), Edges = G->GetEdges();
  TExeTm PlotTm;
  ProbBinSz = TMath::Mx(1000, G->GetNodes() / 10); //bin to compute probabilities
  IterLBV.Add(TIntFltPr(1, BestL));

  for (int iter = 0; iter < MaxIter; iter++) {
    IterTm.Tick();
    int NID = -1;
    int JoinCID = -1, LeaveCID = -1;
    SampleTransition(NID, JoinCID, LeaveCID, DeltaL); //sample a move
    double OptL = PrevL;
    if (DeltaL > 0 || Rnd.GetUniDev() < exp(DeltaL)) { //if it is accepted
      IterTm.Tick();
      if (LeaveCID > -1 && LeaveCID != BaseCID) { LeaveCom(NID, LeaveCID); }
      if (JoinCID > -1 && JoinCID != BaseCID) { JoinCom(NID, JoinCID); }
      if (LeaveCID > -1 && JoinCID > -1 && JoinCID != BaseCID && LeaveCID != BaseCID) { SwitchCnt++; }
      else if (LeaveCID > -1  && LeaveCID != BaseCID) { LeaveCnt++;}
      else if (JoinCID > -1 && JoinCID != BaseCID) { JoinCnt++;}
      AcceptCnt++;
      if ((iter + 1) % EvalLambdaIter == 0) {
        IterTm.Tick();
        MLEGradAscentGivenCAG(0.01, 3);
        OptL = Likelihood();
      }
      else{
        OptL = PrevL + DeltaL;
      }
      if (BestL <= OptL && CIDNSetV.Len() > 0) {
        BestCmtySetV = CIDNSetV;
        BestLV = LambdaV;
        BestL = OptL;
      }
    }
    if (iter > 0 && (iter % ProbBinSz == 0) && PlotFPrx.Len() > 0) { 
      IterLBV.Add(TIntFltPr(iter, OptL));
      IterSwitchV.Add(TIntFltPr(iter, (double) SwitchCnt / (double) AcceptCnt));
      IterLeaveV.Add(TIntFltPr(iter, (double) LeaveCnt / (double) AcceptCnt));
      IterJoinV.Add(TIntFltPr(iter, (double) JoinCnt / (double) AcceptCnt));
      IterAcceptV.Add(TIntFltPr(iter, (double) AcceptCnt / (double) ProbBinSz));
      SwitchCnt = JoinCnt = LeaveCnt = AcceptCnt = 0;
    }
    PrevL = OptL;
    if ((iter + 1) % 10000 == 0) {
      printf("\r%d iterations completed [%.2f]", iter, (double) iter / (double) MaxIter);
    }
  }
  
  // plot the likelihood and acceptance probabilities if the plot file name is given
  if (PlotFPrx.Len() > 0) {
    TGnuPlot GP1;
    GP1.AddPlot(IterLBV, gpwLinesPoints, "likelihood");
    GP1.SetDataPlotFNm(PlotFPrx + ".likelihood.tab", PlotFPrx + ".likelihood.plt");
    TStr TitleStr = TStr::Fmt(" N:%d E:%d", Nodes, Edges);
    GP1.SetTitle(PlotFPrx + ".likelihood" + TitleStr);
    GP1.SavePng(PlotFPrx + ".likelihood.png");

    TGnuPlot GP2;
    GP2.AddPlot(IterSwitchV, gpwLinesPoints, "Switch");
    GP2.AddPlot(IterLeaveV, gpwLinesPoints, "Leave");
    GP2.AddPlot(IterJoinV, gpwLinesPoints, "Join");
    GP2.AddPlot(IterAcceptV, gpwLinesPoints, "Accept");
    GP2.SetTitle(PlotFPrx + ".transition");
    GP2.SetDataPlotFNm(PlotFPrx + "transition_prob.tab", PlotFPrx + "transition_prob.plt");
    GP2.SavePng(PlotFPrx + "transition_prob.png");
  }
  CIDNSetV = BestCmtySetV;
  LambdaV = BestLV;

  InitNodeData();
  MLEGradAscentGivenCAG(0.001, 100);
  printf("\nMCMC completed (best likelihood: %.2f) [%s]\n", BestL, TotalTm.GetTmStr());
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::SampleTransition ( int &  NID,
int &  JoinCID,
int &  LeaveCID,
double &  DeltaL 
)

Sample MMCM transitions: Choose among (join, leave, switch), and then sample (NID, CID).

Definition at line 325 of file agmfit.cpp.

References BaseCID, CIDNSetV, G, THash< TKey, TDat, THashFunc >::GetDat(), THash< TKey, TDat, THashFunc >::GetKey(), TUNGraph::GetNodes(), THash< TKey, TDat, THashFunc >::GetRndKeyId(), TUNGraph::GetRndNId(), TRnd::GetUniDevInt(), THash< TKey, TDat, THashFunc >::IsKey(), THash< TKey, TDat, THashFunc >::Len(), TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), TFlt::Mn, NIDCIDPrS, NIDComVH, Rnd, SeekJoin(), SeekLeave(), SeekSwitch(), TPair< TVal1, TVal2 >::Val1, and TPair< TVal1, TVal2 >::Val2.

Referenced by RunMCMC().

                                                                                    {
  int Option = Rnd.GetUniDevInt(3); //0:Join 1:Leave 2:Switch
  if (NIDCIDPrS.Len() <= 1) {    Option = 0;  } //if there is only one node membership, only join is possible.
  int TryCnt = 0;
  const int MaxTryCnt = G->GetNodes();
  DeltaL = TFlt::Mn;
  if (Option == 0) {
    do {
      JoinCID = Rnd.GetUniDevInt(CIDNSetV.Len());
      NID = G->GetRndNId();
    } while (TryCnt++ < MaxTryCnt && NIDCIDPrS.IsKey(TIntPr(NID, JoinCID)));
    if (TryCnt < MaxTryCnt) { //if successfully find a move
      DeltaL = SeekJoin(NID, JoinCID);
    }
  }
  else if (Option == 1) {
    do {
      TIntPr NIDCIDPr = NIDCIDPrS.GetKey(NIDCIDPrS.GetRndKeyId(Rnd));
      NID = NIDCIDPr.Val1;
      LeaveCID = NIDCIDPr.Val2;
    } while (TryCnt++ < MaxTryCnt && LeaveCID == BaseCID);
    if (TryCnt < MaxTryCnt) {//if successfully find a move
      DeltaL = SeekLeave(NID, LeaveCID);
    }
  }
  else{
    do {
      TIntPr NIDCIDPr = NIDCIDPrS.GetKey(NIDCIDPrS.GetRndKeyId(Rnd));
      NID = NIDCIDPr.Val1;
      LeaveCID = NIDCIDPr.Val2;
    } while (TryCnt++ < MaxTryCnt && (NIDComVH.GetDat(NID).Len() == CIDNSetV.Len() || LeaveCID == BaseCID));
    do {
      JoinCID = Rnd.GetUniDevInt(CIDNSetV.Len());
    } while (TryCnt++ < G->GetNodes() && NIDCIDPrS.IsKey(TIntPr(NID, JoinCID)));
    if (TryCnt < MaxTryCnt) {//if successfully find a move
      DeltaL = SeekSwitch(NID, LeaveCID, JoinCID);
    }
  }
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::Save ( TSOut SOut)

Definition at line 8 of file agmfit.cpp.

References BaseCID, CIDNSetV, ComEdgesV, EdgeComVH, G, LambdaV, MaxLambda, MinLambda, NIDCIDPrH, NIDCIDPrS, NIDComVH, PNoCom, RegCoef, THash< TKey, TDat, THashFunc >::Save(), TUNGraph::Save(), TVec< TVal, TSizeTy >::Save(), TInt::Save(), and TFlt::Save().

                              {
  G->Save(SOut);
  CIDNSetV.Save(SOut);
  EdgeComVH.Save(SOut);
  NIDComVH.Save(SOut);
  ComEdgesV.Save(SOut);
  PNoCom.Save(SOut);
  LambdaV.Save(SOut);
  NIDCIDPrH.Save(SOut);
  NIDCIDPrS.Save(SOut);
  MinLambda.Save(SOut);
  MaxLambda.Save(SOut);
  RegCoef.Save(SOut);
  BaseCID.Save(SOut);
}

Here is the call graph for this function:

double TAGMFit::SeekJoin ( const int &  UID,
const int &  CID 
)

Compute the change in likelihood (Delta) if node UID joins community CID.

Definition at line 502 of file agmfit.cpp.

References CIDNSetV, EdgeComVH, G, THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), IAssert, THashSet< TKey, THashFunc >::IsKey(), LambdaV, TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), TMath::Mn(), TMath::Mx(), NIDComVH, PNoCom, and SelectLambdaSum().

Referenced by SampleTransition(), and SeekSwitch().

                                                       {
  IAssert(! CIDNSetV[CID].IsKey(UID));
  double Delta = 0.0;
  TUNGraph::TNodeI NI = G->GetNI(UID);
  int NbhsInC = 0;
  for (int e = 0; e < NI.GetDeg(); e++) {
    const int VID = NI.GetNbrNId(e);
    if (! NIDComVH.GetDat(VID).IsKey(CID)) { continue; }
    TIntPr SrcDstNIDPr(TMath::Mn(UID,VID), TMath::Mx(UID,VID));
    TIntSet& JointCom = EdgeComVH.GetDat(SrcDstNIDPr);
    double CurPuv, NewPuv, LambdaSum = SelectLambdaSum(JointCom);
    CurPuv = 1 - exp(- LambdaSum);
    if (JointCom.Len() == 0) { CurPuv = PNoCom; }
    NewPuv = 1 - exp(- LambdaSum - LambdaV[CID]);
    Delta += (log(NewPuv) - log(CurPuv));
    IAssert(!_isnan(Delta));
    NbhsInC++;
  }
  Delta -= LambdaV[CID] * (CIDNSetV[CID].Len() - NbhsInC);
  return Delta;
}

Here is the call graph for this function:

Here is the caller graph for this function:

double TAGMFit::SeekLeave ( const int &  UID,
const int &  CID 
)

Compute the change in likelihood (Delta) if node UID leaves community CID.

Definition at line 475 of file agmfit.cpp.

References CIDNSetV, EdgeComVH, G, THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), IAssert, THashSet< TKey, THashFunc >::IsKey(), TUNGraph::IsNode(), LambdaV, TVec< TVal, TSizeTy >::Len(), THashSet< TKey, THashFunc >::Len(), TMath::Mn(), TMath::Mx(), NIDComVH, PNoCom, and SelectLambdaSum().

Referenced by SampleTransition(), and SeekSwitch().

                                                        {
  IAssert(CIDNSetV[CID].IsKey(UID));
  IAssert(G->IsNode(UID));
  double Delta = 0.0;
  TUNGraph::TNodeI NI = G->GetNI(UID);
  int NbhsInC = 0;
  for (int e = 0; e < NI.GetDeg(); e++) {
    const int VID = NI.GetNbrNId(e);
    if (! NIDComVH.GetDat(VID).IsKey(CID)) { continue; }
    TIntPr SrcDstNIDPr(TMath::Mn(UID,VID), TMath::Mx(UID,VID));
    TIntSet& JointCom = EdgeComVH.GetDat(SrcDstNIDPr);
    double CurPuv, NewPuv, LambdaSum = SelectLambdaSum(JointCom);
    CurPuv = 1 - exp(- LambdaSum);
    NewPuv = 1 - exp(- LambdaSum + LambdaV[CID]);
    IAssert(JointCom.Len() > 0);
    if (JointCom.Len() == 1) {
      NewPuv = PNoCom;
    }
    Delta += (log(NewPuv) - log(CurPuv));
    IAssert(!_isnan(Delta));
    NbhsInC++;
  }
  Delta += LambdaV[CID] * (CIDNSetV[CID].Len() - 1 - NbhsInC);
  return Delta;
}

Here is the call graph for this function:

Here is the caller graph for this function:

double TAGMFit::SeekSwitch ( const int &  UID,
const int &  CurCID,
const int &  NewCID 
)

Definition at line 525 of file agmfit.cpp.

References CIDNSetV, EdgeComVH, G, THash< TKey, TDat, THashFunc >::GetDat(), TUNGraph::TNodeI::GetDeg(), TUNGraph::TNodeI::GetNbrNId(), TUNGraph::GetNI(), IAssert, THashSet< TKey, THashFunc >::IsKey(), LambdaV, THashSet< TKey, THashFunc >::Len(), TMath::Mn(), TMath::Mx(), NIDComVH, PNoCom, SeekJoin(), SeekLeave(), SelectLambdaSum(), and TFlt::Val.

Referenced by SampleTransition().

                                                                               {
  IAssert(! CIDNSetV[NewCID].IsKey(UID));
  IAssert(CIDNSetV[CurCID].IsKey(UID));
  double Delta = SeekJoin(UID, NewCID) + SeekLeave(UID, CurCID);
  //correct only for intersection between new com and current com
  TUNGraph::TNodeI NI = G->GetNI(UID);
  for (int e = 0; e < NI.GetDeg(); e++) {
    const int VID = NI.GetNbrNId(e);
    if (! NIDComVH.GetDat(VID).IsKey(CurCID) || ! NIDComVH.GetDat(VID).IsKey(NewCID)) {continue;}
    TIntPr SrcDstNIDPr(TMath::Mn(UID,VID), TMath::Mx(UID,VID));
    TIntSet& JointCom = EdgeComVH.GetDat(SrcDstNIDPr);
    double CurPuv, NewPuvAfterJoin, NewPuvAfterLeave, NewPuvAfterSwitch, LambdaSum = SelectLambdaSum(JointCom);
    CurPuv = 1 - exp(- LambdaSum);
    NewPuvAfterLeave = 1 - exp(- LambdaSum + LambdaV[CurCID]);
    NewPuvAfterJoin = 1 - exp(- LambdaSum - LambdaV[NewCID]);
    NewPuvAfterSwitch = 1 - exp(- LambdaSum - LambdaV[NewCID] + LambdaV[CurCID]);
    if (JointCom.Len() == 1 || NewPuvAfterLeave == 0.0) {
      NewPuvAfterLeave = PNoCom;
    }
    Delta += (log(NewPuvAfterSwitch) + log(CurPuv) - log(NewPuvAfterLeave) - log(NewPuvAfterJoin));
    if (_isnan(Delta)) {
      printf("NS:%f C:%f NL:%f NJ:%f PNoCom:%f", NewPuvAfterSwitch, CurPuv, NewPuvAfterLeave, NewPuvAfterJoin, PNoCom.Val);
    }
    IAssert(!_isnan(Delta));
  }
  return Delta;
}

Here is the call graph for this function:

Here is the caller graph for this function:

double TAGMFit::SelectLambdaSum ( const TIntSet ComK)

Compute sum of lambda_c (which is log (1 - p_c)) over C_uv (ComK). The function is used to compute edge probability P_uv.

Definition at line 618 of file agmfit.cpp.

References LambdaV.

Referenced by GradLogLForLambda(), Likelihood(), SeekJoin(), SeekLeave(), and SeekSwitch().

                                                   { 
  return SelectLambdaSum(LambdaV, ComK); 
}

Here is the caller graph for this function:

double TAGMFit::SelectLambdaSum ( const TFltV NewLambdaV,
const TIntSet ComK 
)

COMMENT.

Definition at line 622 of file agmfit.cpp.

References THashSet< TKey, THashFunc >::BegI(), THashSet< TKey, THashFunc >::EndI(), and IAssert.

                                                                            {
  double Result = 0.0;
  for (TIntSet::TIter SI = ComK.BegI(); SI < ComK.EndI(); SI++) {
    IAssert(NewLambdaV[SI.GetKey()] >= 0);
    Result += NewLambdaV[SI.GetKey()];
  }
  return Result;
}

Here is the call graph for this function:

void TAGMFit::SetCmtyVV ( const TVec< TIntV > &  CmtyVV)

COMMENT.

Definition at line 584 of file agmfit.cpp.

References CIDNSetV, G, TVec< TVal, TSizeTy >::Gen(), IAssert, InitNodeData(), TUNGraph::IsNode(), TVec< TVal, TSizeTy >::Len(), and SetDefaultPNoCom().

Referenced by AddBaseCmty(), RandomInitCmtyVV(), and TAGMFit().

                                                 {
  CIDNSetV.Gen(CmtyVV.Len());
  for (int c = 0; c < CIDNSetV.Len(); c++) {
    CIDNSetV[c].AddKeyV(CmtyVV[c]);
    for (int j = 0; j < CmtyVV[c].Len(); j++) { IAssert(G->IsNode(CmtyVV[c][j])); } // check whether the member nodes exist in the graph
  }
  InitNodeData();
  SetDefaultPNoCom();
}

Here is the call graph for this function:

Here is the caller graph for this function:

Set Epsilon (the probability that two nodes sharing no communities connect) to the default value.

Definition at line 72 of file agmfit.cpp.

References G, TUNGraph::GetNodes(), and PNoCom.

Referenced by NeighborComInit(), RandomInit(), and SetCmtyVV().

                               {
  PNoCom = 1.0 / (double) G->GetNodes() / (double) G->GetNodes();
}

Here is the call graph for this function:

Here is the caller graph for this function:

void TAGMFit::SetLambdaV ( const TFltV LambdaPt) [inline]

COMMENT.

Definition at line 91 of file agmfit.h.

References LambdaV.

{LambdaV = LambdaPt;}
void TAGMFit::SetPNoCom ( const double &  Epsilon) [inline]

Set Epsilon (the probability that two nodes sharing no communities connect) to the default value.

Definition at line 57 of file agmfit.h.

References BaseCID, and PNoCom.

Referenced by TAGMUtil::FindComsByAGM().

{ if (BaseCID == -1 && Epsilon > 0.0) { PNoCom = Epsilon; } }

Here is the caller graph for this function:

void TAGMFit::SetRegCoef ( const double  Val) [inline]

Definition at line 45 of file agmfit.h.

References RegCoef.

Referenced by TAGMUtil::FindComsByAGM().

{ RegCoef = Val; }

Here is the caller graph for this function:


Member Data Documentation

ID of the Epsilon-community (in case we fit P_c of the epsilon community). We do not fit for the Epsilon-community in general.

Definition at line 23 of file agmfit.h.

Referenced by AddBaseCmty(), GetCmtyVV(), Load(), RunMCMC(), SampleTransition(), Save(), and SetPNoCom().

The number of edges in each community.

Definition at line 14 of file agmfit.h.

Referenced by GetEdgeJointCom(), GradLogLForLambda(), InitNodeData(), JoinCom(), LeaveCom(), Likelihood(), Load(), PrintSummary(), and Save().

Edge -> Shared Community ID Set.

Definition at line 12 of file agmfit.h.

Referenced by GetEdgeJointCom(), GradLogLForLambda(), JoinCom(), LeaveCom(), Likelihood(), Load(), Save(), SeekJoin(), SeekLeave(), and SeekSwitch().

Maximum value of regularization parameter lambda (default = 10).

Definition at line 21 of file agmfit.h.

Referenced by GetStepSizeByLineSearchForLambda(), InitNodeData(), Load(), MLEGradAscentGivenCAG(), and Save().

Minimum value of regularization parameter lambda (default = 1e-5).

Definition at line 20 of file agmfit.h.

Referenced by GetCmtyVV(), GetStepSizeByLineSearchForLambda(), InitNodeData(), Load(), MLEGradAscentGivenCAG(), and Save().

<Node ID, Community ID> pairs (for sampling MCMC moves).

Definition at line 18 of file agmfit.h.

Referenced by Load(), and Save().

<Node ID, Community ID> pairs (for sampling MCMC moves).

Definition at line 19 of file agmfit.h.

Referenced by InitNodeData(), JoinCom(), LeaveCom(), Load(), PrintSummary(), SampleTransition(), and Save().

Node ID -> Communitie IDs the node belongs to.

Definition at line 13 of file agmfit.h.

Referenced by CalcPNoComByCmtyVV(), GetEdgeJointCom(), InitNodeData(), JoinCom(), LeaveCom(), Load(), SampleTransition(), Save(), SeekJoin(), SeekLeave(), and SeekSwitch().

TFlt TAGMFit::PNoCom [private]

Probability of edge when two nodes share no community (epsilon in the paper).

Definition at line 15 of file agmfit.h.

Referenced by AddBaseCmty(), CalcPNoComByCmtyVV(), GetPNoCom(), GradLogLForLambda(), Likelihood(), Load(), PrintSummary(), Save(), SeekJoin(), SeekLeave(), SeekSwitch(), SetDefaultPNoCom(), and SetPNoCom().

Regularization parameter when we fit for P_c (for finding # communities).

Definition at line 22 of file agmfit.h.

Referenced by GradLogLForLambda(), Likelihood(), Load(), Save(), and SetRegCoef().

TRnd TAGMFit::Rnd [private]

Definition at line 17 of file agmfit.h.

Referenced by Load(), RandomInit(), RandomInitCmtyVV(), RunMCMC(), and SampleTransition().


The documentation for this class was generated from the following files: