SNAP Library 2.0, Developer Reference  2013-05-13 16:33:57
SNAP, a general purpose, high performance system for analysis and manipulation of large networks
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
TSsParser Class Reference

#include <ss.h>

Collaboration diagram for TSsParser:

List of all members.

Public Member Functions

 TSsParser (const TStr &FNm, const TSsFmt _SsFmt=ssfTabSep, const bool &_SkipLeadBlanks=false, const bool &_SkipCmt=true, const bool &_SkipEmptyFld=false)
 Constructor.
 TSsParser (const TStr &FNm, const char &Separator, const bool &_SkipLeadBlanks=false, const bool &_SkipCmt=true, const bool &_SkipEmptyFld=false)
 Constructor.
 ~TSsParser ()
bool Next ()
 Loads next line from the input file.
bool NextSlow ()
 Loads next line from the input file (older, slow implementation - deprecated).
int Len () const
 Returns the number of fields in the current line.
int GetFlds () const
 Returns the number of fields in the current line.
uint64 GetLineNo () const
 Returns the line number of the current line.
bool IsCmt () const
 Checks whether the current line is a comment (starts with '#').
bool Eof () const
 Checks for end of file.
const TChAGetLnStr () const
 Returns the current line.
void ToLc ()
 Transforms the current line to lower case.
const char * GetFld (const int &FldN) const
 Returns the contents of the field at index FldN.
char * GetFld (const int &FldN)
 Returns the contents of the field at index FldN.
const char * operator[] (const int &FldN) const
 Returns the contents of the field at index FldN.
char * operator[] (const int &FldN)
 Returns the contents of the field at index FldN.
bool GetInt (const int &FldN, int &Val) const
 If the field FldN is an integer its value is returned in Val and the function returns true.
int GetInt (const int &FldN) const
 Assumes FldN is an integer its value is returned. If FldN is not an integer an exception is thrown.
bool IsInt (const int &FldN) const
 Checks whether fields FldN is an integer.
bool GetFlt (const int &FldN, double &Val) const
 If the field FldN is a float its value is returned in Val and the function returns true.
bool IsFlt (const int &FldN) const
 Checks whether fields FldN is a float.
double GetFlt (const int &FldN) const
 Assumes FldN is a floating point number its value is returned. If FldN is not an integer an exception is thrown.
const char * DumpStr () const

Static Public Member Functions

static PSsParser New (const TStr &FNm, const TSsFmt SsFmt)

Private Member Functions

 UndefDefaultCopyAssign (TSsParser)

Private Attributes

TCRef CRef
TSsFmt SsFmt
 Separator type.
bool SkipLeadBlanks
 Ignore leading whitespace characters in a line.
bool SkipCmt
 Skip comments (lines starting with #).
bool SkipEmptyFld
 Skip empty fields (i.e., multiple consecutive separators are considered as one).
uint64 LineCnt
 Number of processed lines so far.
char SplitCh
 Separator character (if one of the non-started separators is used)
TChA LineStr
 Current line.
TVec< char * > FldV
 Pointers to fields of the current line.
PSIn FInPt
 Pointer to the input file stream.

Friends

class TPt< TSsParser >

Detailed Description

Definition at line 72 of file ss.h.


Constructor & Destructor Documentation

TSsParser::TSsParser ( const TStr FNm,
const TSsFmt  _SsFmt = ssfTabSep,
const bool &  _SkipLeadBlanks = false,
const bool &  _SkipCmt = true,
const bool &  _SkipEmptyFld = false 
)

Constructor.

Parameters:
FNmInput filename. Can be a text file or a compressed file.
_SsFmtSpread-sheet separator format. Each line will be broken in a set of fields, where the boundary between the fields is defined by the _SsFmt.
_SkipLeadBlanksIf true leading/trailing white-spaces of the line will be ignored.
_SkipCmtIf true lines starting with '#' will be considered as comments and will be skipped.
_SkipEmptyFldIf true then empty fields (consecutive occurrences of the separator) will be ignored.

Definition at line 351 of file ss.cpp.

References FailR, FInPt, TStr::GetFExt(), TZipIn::IsZipExt(), New(), SplitCh, ssfCommaSep, SsFmt, ssfSemicolonSep, ssfSpaceSep, ssfTabSep, ssfVBar, and ssfWhiteSep.

                                                                                                                                       : SsFmt(_SsFmt), 
 SkipLeadBlanks(_SkipLeadBlanks), SkipCmt(_SkipCmt), SkipEmptyFld(_SkipEmptyFld), LineCnt(0), /*Bf(NULL),*/ SplitCh('\t'), LineStr(), FldV(), FInPt(NULL) {
  if (TZipIn::IsZipExt(FNm.GetFExt())) { FInPt = TZipIn::New(FNm); }
  else { FInPt = TFIn::New(FNm); }
  //Bf = new char [BfLen];
  switch(SsFmt) {
    case ssfTabSep : SplitCh = '\t'; break;
    case ssfCommaSep : SplitCh = ','; break;
    case ssfSemicolonSep : SplitCh = ';'; break;
    case ssfVBar : SplitCh = '|'; break;
    case ssfSpaceSep : SplitCh = ' '; break;
    case ssfWhiteSep: SplitCh = ' '; break;
    default: FailR("Unknown separator character.");
  }
}

Here is the call graph for this function:

TSsParser::TSsParser ( const TStr FNm,
const char &  Separator,
const bool &  _SkipLeadBlanks = false,
const bool &  _SkipCmt = true,
const bool &  _SkipEmptyFld = false 
)

Constructor.

Parameters:
FNmInput filename. Can be a text file or a compressed file.
SeparatorSpread-sheet separator character. Each line will be broken in a set of fields, where the boundary between the fields is the Separator character.
_SkipLeadBlanksIf true leading/trailing white-spaces of the line will be ignored.
_SkipCmtIf true lines starting with '#' will be considered as comments and will be skipped.
_SkipEmptyFldIf true then empty fields (consecutive occurrences of the separator) will be ignored.

Definition at line 367 of file ss.cpp.

References FInPt, TStr::GetFExt(), TZipIn::IsZipExt(), New(), and SplitCh.

                                                                                                                                         : SsFmt(ssfSpaceSep), 
 SkipLeadBlanks(_SkipLeadBlanks), SkipCmt(_SkipCmt), SkipEmptyFld(_SkipEmptyFld), LineCnt(0), /*Bf(NULL),*/ SplitCh('\t'), LineStr(), FldV(), FInPt(NULL) {
  if (TZipIn::IsZipExt(FNm.GetFExt())) { FInPt = TZipIn::New(FNm); }
  else { FInPt = TFIn::New(FNm); }
  SplitCh = Separator;
}

Here is the call graph for this function:

Definition at line 374 of file ss.cpp.

                      {
  //if (Bf != NULL) { delete [] Bf; }
}

Member Function Documentation

const char * TSsParser::DumpStr ( ) const

Definition at line 484 of file ss.cpp.

References TChA::Clr(), TChA::CStr(), FldV, TStr::Fmt(), and TVec< TVal, TSizeTy >::Len().

                                     {
  static TChA ChA(10*1024);
  ChA.Clr();
  for (int i = 0; i < FldV.Len(); i++) {
    ChA += TStr::Fmt("  %d: '%s'\n", i, FldV[i]);
  }
  return ChA.CStr();
}

Here is the call graph for this function:

bool TSsParser::Eof ( ) const [inline]

Checks for end of file.

Definition at line 122 of file ss.h.

Referenced by TSnap::LoadPajek().

{ return FInPt->Eof(); }

Here is the caller graph for this function:

const char* TSsParser::GetFld ( const int &  FldN) const [inline]

Returns the contents of the field at index FldN.

Definition at line 129 of file ss.h.

Referenced by GetFlt(), GetInt(), and TTimeNENet::LoadEdgeTm().

{ return FldV[FldN]; }

Here is the caller graph for this function:

char* TSsParser::GetFld ( const int &  FldN) [inline]

Returns the contents of the field at index FldN.

Definition at line 131 of file ss.h.

{ return FldV[FldN]; }
int TSsParser::GetFlds ( ) const [inline]

Returns the number of fields in the current line.

Definition at line 116 of file ss.h.

Referenced by TAGMUtil::LoadCmtyVV(), and TNcpGraphsBase::TNcpGraphsBase().

{ return Len(); }

Here is the caller graph for this function:

bool TSsParser::GetFlt ( const int &  FldN,
double &  Val 
) const

If the field FldN is a float its value is returned in Val and the function returns true.

Definition at line 462 of file ss.cpp.

References GetFld(), TCh::IsNum(), and TCh::IsWs().

Referenced by TNcpGraphsBase::TNcpGraphsBase().

                                                         {
  // parsing format {ws} [+/-] +{d} ([.]{d}) ([E|e] [+/-] +{d})
  const char *c = GetFld(FldN);
  while (TCh::IsWs(*c)) { c++; }
  if (*c=='+' || *c=='-') { c++; }
  if (! TCh::IsNum(*c) && *c!='.') { return false; }
  while (TCh::IsNum(*c)) { c++; }
  if (*c == '.') {
    c++;
    while (TCh::IsNum(*c)) { c++; }
  }
  if (*c=='e' || *c == 'E') {
    c++;
    if (*c == '+' || *c == '-' ) { c++; }
    if (! TCh::IsNum(*c)) { return false; }
    while (TCh::IsNum(*c)) { c++; }
  }
  if (*c != 0) { return false; }
  Val = atof(GetFld(FldN));
  return true;
}

Here is the call graph for this function:

Here is the caller graph for this function:

double TSsParser::GetFlt ( const int &  FldN) const [inline]

Assumes FldN is a floating point number its value is returned. If FldN is not an integer an exception is thrown.

Definition at line 148 of file ss.h.

References IAssert.

                                       {
    double Val=0.0; IAssert(GetFlt(FldN, Val)); return Val; }
bool TSsParser::GetInt ( const int &  FldN,
int &  Val 
) const

If the field FldN is an integer its value is returned in Val and the function returns true.

Definition at line 443 of file ss.cpp.

References GetFld(), TCh::GetNum(), TCh::IsNum(), and TCh::IsWs().

Referenced by TAGMUtil::LoadCmtyVV(), TSnap::LoadConnList(), TSnap::LoadEdgeList(), TTimeNENet::LoadFlickr(), and TSnap::LoadPajek().

                                                      {
  // parsing format {ws} [+/-] +{ddd}
  int _Val = -1;
  bool Minus=false;
  const char *c = GetFld(FldN);
  while (TCh::IsWs(*c)) { c++; }
  if (*c=='-') { Minus=true; c++; }
  if (! TCh::IsNum(*c)) { return false; }
  _Val = TCh::GetNum(*c);  c++;
  while (TCh::IsNum(*c)){ 
    _Val = 10 * _Val + TCh::GetNum(*c); 
    c++; 
  }
  if (Minus) { _Val = -_Val; }
  if (*c != 0) { return false; }
  Val = _Val;
  return true;
}

Here is the call graph for this function:

Here is the caller graph for this function:

int TSsParser::GetInt ( const int &  FldN) const [inline]

Assumes FldN is an integer its value is returned. If FldN is not an integer an exception is thrown.

Definition at line 139 of file ss.h.

References TStr::Fmt(), and IAssertR.

                                    {
    int Val=0; IAssertR(GetInt(FldN, Val), TStr::Fmt("Field %d not INT.\n%s", FldN, DumpStr()).CStr()); return Val; }

Here is the call graph for this function:

uint64 TSsParser::GetLineNo ( ) const [inline]

Returns the line number of the current line.

Definition at line 118 of file ss.h.

Referenced by TTimeNENet::LoadFlickr().

{ return LineCnt; }

Here is the caller graph for this function:

const TChA& TSsParser::GetLnStr ( ) const [inline]

Returns the current line.

Definition at line 124 of file ss.h.

{ return LineStr; }
bool TSsParser::IsCmt ( ) const [inline]

Checks whether the current line is a comment (starts with '#').

Definition at line 120 of file ss.h.

Referenced by TTimeNENet::LoadEdgeTm().

{ return Len()>0 && GetFld(0)[0] == '#'; }

Here is the caller graph for this function:

bool TSsParser::IsFlt ( const int &  FldN) const [inline]

Checks whether fields FldN is a float.

Definition at line 146 of file ss.h.

Referenced by TNcpGraphsBase::TNcpGraphsBase().

{ double v; return GetFlt(FldN, v); }

Here is the caller graph for this function:

bool TSsParser::IsInt ( const int &  FldN) const [inline]

Checks whether fields FldN is an integer.

Definition at line 142 of file ss.h.

Referenced by TAGMUtil::LoadCmtyVV(), TSnap::LoadConnList(), and TSnap::LoadPajek().

{ int v; return GetInt(FldN, v); }

Here is the caller graph for this function:

int TSsParser::Len ( ) const [inline]

Returns the number of fields in the current line.

Definition at line 114 of file ss.h.

Referenced by TSnap::LoadConnList(), TSnap::LoadConnListStr(), TTimeNENet::LoadEdgeTm(), and TSnap::LoadPajek().

{ return FldV.Len(); }

Here is the caller graph for this function:

static PSsParser TSsParser::New ( const TStr FNm,
const TSsFmt  SsFmt 
) [inline, static]

Definition at line 102 of file ss.h.

Referenced by TSsParser().

{ return new TSsParser(FNm, SsFmt); }

Here is the caller graph for this function:

bool TSsParser::Next ( )

Loads next line from the input file.

If end of file is reached, return value is false.

Definition at line 410 of file ss.cpp.

References TVec< TVal, TSizeTy >::Add(), TChA::Clr(), TVec< TVal, TSizeTy >::Clr(), TChA::CStr(), TVec< TVal, TSizeTy >::DelLast(), TChA::Empty(), TVec< TVal, TSizeTy >::Empty(), FInPt, FldV, TSIn::GetNextLnBf(), TCh::IsWs(), TVec< TVal, TSizeTy >::Last(), LineCnt, LineStr, SkipCmt, SkipEmptyFld, SkipLeadBlanks, SplitCh, SsFmt, and ssfWhiteSep.

Referenced by TAGMUtil::LoadCmtyVV(), TSnap::LoadConnList(), TSnap::LoadConnListStr(), TSnap::LoadEdgeList(), TSnap::LoadEdgeListStr(), TAGMUtil::LoadEdgeListStr(), TTimeNENet::LoadEdgeTm(), TTimeNENet::LoadFlickr(), TSnap::LoadPajek(), and TNcpGraphsBase::TNcpGraphsBase().

                     { // split on SplitCh
  FldV.Clr(false);
  LineStr.Clr();
  FldV.Clr();
  LineCnt++;
  if (! FInPt->GetNextLnBf(LineStr)) { return false; }
  if (SkipCmt && !LineStr.Empty() && LineStr[0]=='#') { return Next(); }

  char* cur = LineStr.CStr();
  if (SkipLeadBlanks) { // skip leading blanks
    while (*cur && TCh::IsWs(*cur)) { cur++; }
  }
  char *last = cur;
  while (*cur) {
    if (SsFmt == ssfWhiteSep) { while (*cur && ! TCh::IsWs(*cur)) { cur++; } } 
    else { while (*cur && *cur!=SplitCh) { cur++; } }
    if (*cur == 0) { break; }
    *cur = 0;  cur++;
    FldV.Add(last);  last = cur;
    if (SkipEmptyFld && strlen(FldV.Last())==0) { FldV.DelLast(); } // skip empty fields
  }
  FldV.Add(last);  // add last field
  if (SkipEmptyFld && FldV.Empty()) { return Next(); } // skip empty lines
  return true; 
}

Here is the call graph for this function:

Here is the caller graph for this function:

Loads next line from the input file (older, slow implementation - deprecated).

If end of file is reached, return value is false. This function is deprecated, use Next instead.

Definition at line 382 of file ss.cpp.

References TVec< TVal, TSizeTy >::Add(), TChA::Clr(), TVec< TVal, TSizeTy >::Clr(), TChA::CStr(), TVec< TVal, TSizeTy >::DelLast(), TChA::Empty(), TVec< TVal, TSizeTy >::Empty(), FInPt, FldV, TSIn::GetNextLn(), TCh::IsWs(), TVec< TVal, TSizeTy >::Last(), LineCnt, LineStr, SkipCmt, SkipEmptyFld, SkipLeadBlanks, SplitCh, SsFmt, and ssfWhiteSep.

                         { // split on SplitCh
  FldV.Clr(false);
  LineStr.Clr();
  FldV.Clr();
  LineCnt++;
  if (! FInPt->GetNextLn(LineStr)) { return false; }
  if (SkipCmt && !LineStr.Empty() && LineStr[0]=='#') { return NextSlow(); }

  char* cur = LineStr.CStr();
  if (SkipLeadBlanks) { // skip leading blanks
    while (*cur && TCh::IsWs(*cur)) { cur++; }
  }
  char *last = cur;
  while (*cur) {
    if (SsFmt == ssfWhiteSep) { while (*cur && ! TCh::IsWs(*cur)) { cur++; } } 
    else { while (*cur && *cur!=SplitCh) { cur++; } }
    if (*cur == 0) { break; }
    *cur = 0;  cur++;
    FldV.Add(last);  last = cur;
    if (SkipEmptyFld && strlen(FldV.Last())==0) { FldV.DelLast(); } // skip empty fields
  }
  FldV.Add(last);  // add last field
  if (SkipEmptyFld && FldV.Empty()) { return NextSlow(); } // skip empty lines
  return true; 
}

Here is the call graph for this function:

const char* TSsParser::operator[] ( const int &  FldN) const [inline]

Returns the contents of the field at index FldN.

Definition at line 133 of file ss.h.

{ return FldV[FldN]; }
char* TSsParser::operator[] ( const int &  FldN) [inline]

Returns the contents of the field at index FldN.

Definition at line 135 of file ss.h.

{ return FldV[FldN]; }
void TSsParser::ToLc ( )

Transforms the current line to lower case.

Definition at line 436 of file ss.cpp.

References FldV, and TVec< TVal, TSizeTy >::Len().

Referenced by TSnap::LoadPajek().

                     {
  for (int f = 0; f < FldV.Len(); f++) {
    for (char *c = FldV[f]; *c; c++) {
      *c = tolower(*c); }
  }
}

Here is the call graph for this function:

Here is the caller graph for this function:


Friends And Related Function Documentation

friend class TPt< TSsParser > [friend]

Definition at line 72 of file ss.h.


Member Data Documentation

Definition at line 72 of file ss.h.

Pointer to the input file stream.

Definition at line 82 of file ss.h.

Referenced by Next(), NextSlow(), and TSsParser().

TVec<char*> TSsParser::FldV [private]

Pointers to fields of the current line.

Definition at line 81 of file ss.h.

Referenced by DumpStr(), Next(), NextSlow(), and ToLc().

Number of processed lines so far.

Definition at line 78 of file ss.h.

Referenced by Next(), and NextSlow().

Current line.

Definition at line 80 of file ss.h.

Referenced by Next(), and NextSlow().

bool TSsParser::SkipCmt [private]

Skip comments (lines starting with #).

Definition at line 76 of file ss.h.

Referenced by Next(), and NextSlow().

bool TSsParser::SkipEmptyFld [private]

Skip empty fields (i.e., multiple consecutive separators are considered as one).

Definition at line 77 of file ss.h.

Referenced by Next(), and NextSlow().

bool TSsParser::SkipLeadBlanks [private]

Ignore leading whitespace characters in a line.

Definition at line 75 of file ss.h.

Referenced by Next(), and NextSlow().

char TSsParser::SplitCh [private]

Separator character (if one of the non-started separators is used)

Definition at line 79 of file ss.h.

Referenced by Next(), NextSlow(), and TSsParser().

Separator type.

Definition at line 74 of file ss.h.

Referenced by Next(), NextSlow(), and TSsParser().


The documentation for this class was generated from the following files: