Tables

Tables in SNAP are represented by the class TTable.

TTable is designed to provide fast performance at scale, and to effortlessly handle datasets containing hundreds of millions of rows. They can be saved and loaded to disk in a binary format using the provided methods; loading from and saving to binary is orders of magnitude faster than using a text representation of the table.

A TTable can store integers, floats and strings in its entries. For performance reasons, strings are mapped to a unique integer, and the TTable stores only the integer which maps to the string. Each TTable object has an associated TTableContext which stores the mapping from integers to strings and back, and can be used when the string corresponding to an integer needs to be retrieved. (Note: many TTable objects can share the same context; this is often useful, for example, to ensure that equivalent strings in different tables are treated as equivalent in SNAP.)

A TTable object consists of multiple columns, each column being an integer, string or float. This is defined by the table’s Schema. A schema is simply a vector of pairs of TStr and TAttrType. (Note: TAttrType represents the type of the column. Currently supported values are snap.atInt, snap.atFlt and snap.atStr.) Each entry in the schema has the name of the column, and the attribute type.

After the schema and the colums are defined, the data can be stored in rows, with each row containing an entry for each column. It is possible to iterate over the data by row, using the TRowIterator class (see documentation below for details).

TTable also provides functionality for doing joins (using the Join() method), groupings (using the Aggregate() method), selection and projection (using the Select() and Project() methods), as well as sorting (using the Order() method).

In order to quickly retrieve elements by value, TTable allows the user to construct indexes on a column (using RequestIndexInt(), RequestIndexFlt() and RequestIndexStrMap(). Note that unless these functions are explicitly called, the default is to not create any indexes.)

TTable can be loaded from a text-file in spreadsheet (tab-separated or comma-separated) format using the static LoadSS() method.

Tables can be converted to SNAP graph classes using the provided ToNetwork() functions.

The tutorial provides extensive documentation on the use of table methods and functions in the section about Tables. The code snippets below additionally highlight some of the common operations using TTable objects. The reference descriptions of methods and functions used are documented in more detail below.

The following code snippet shows how to load a TTable object from a tab-separated file containing one integer, one float and two string columns, and then save the object to disk in binary format:

import snap

context = snap.TTableContext()
filename = "/path/to/input.tsv"

schema = snap.Schema()
schema.Add(snap.TStrTAttrPr("Col1", snap.atInt))
schema.Add(snap.TStrTAttrPr("Col2", snap.atFlt))
schema.Add(snap.TStrTAttrPr("Col3", snap.atStr))
schema.Add(snap.TStrTAttrPr("Col4", snap.atStr))

table = snap.TTable.LoadSS(schema, filename, context, "\t", snap.TBool(False))

outfile = "/path/to/output.bin"
FOut = snap.TFOut(outfile)
table.Save(FOut)
FOut.Flush()

The saved table can now be loaded from binary using:

import snap
context = snap.TTableContext()

outfile = "/path/to/output.bin"
FIn = snap.TFIn(outfile)
table = snap.TTable.Load(FIn, context)

Note that loading and saving from binary is over ten times faster than loading the raw text file.

Next, we present a slightly more involved example. Let’s say we have an authorship table for academic papers, PapAuthT where each row has a PaperID and an AuthorID. (Thus, if paper P1 was written by A1, A2 and A3, and paper P2 by authors A2, we would have four rows in our TTable, with data (P1, A1), (P1, A2) and (P1, A3), and (P2, A2).) Further, let’s say we have the citation count of each paper in a separate table, PapCitT, which has columns PaperID and CitCount. Assuming that these tables have already been loaded into TTable objects with appropriate schema, the following code shows how to perform various useful operations on these tables:

# Assuming that PapAuthT and PapCitT are already loaded into TTable objects with columns as described above.

# First, let's say we want to count the number of papers written by an author. We use Aggregate
# with the operation, snap.aaCount.

# This counts the number of elements with a particular value of the attributes in GroupBy
# (namely, AuthorID), and puts the count in a new column called "CountAuthPapers".
# Note that for the aggregation operation snap.aaCount, the third argument is irrelevant.
GroupBy = snap.TStrV()
GroupBy.Add("AuthorID")
PapAuthT.Aggregate(GroupBy, snap.aaCount, "AuthorID", "CountAuthPapers", snap.TBool(False))

# To keep only one row for each author, we can use the TTable.Unique() method as PapAuthT.Unique("AuthorID")
# which will remove all rows with duplicate values of AuthorID.

# Next, let's say we want to compute the total number of citations each author has.
# This is the sum of the citations of all the papers the author wrote.
# However, the citation info is in PapCitT. Hence, we must join it to this table now.

# Joins these two tables, merging rows which have the same PaperID in both.
# Now, each row has a PaperID, AuthorID and a CitCount
PapAuthCitJoinT = PapAuthT.Join("PaperID", PapCitT, "PaperID")

# We now aggregate the citation counts by author, summing them all up to get the
# total number of citations.
GroupBy = snap.TStrV()
GroupBy.Add("AuthorID")
PapAuthCitJoinT.Aggregate(GroupBy, snap.aaSum, "CitCount", "TotalAuthCits", snap.TBool(False))

# Now, we have the total number of citations by each author in a new column
# TotalAuthCits. We can now keep just the relevant columns, and drop duplicate rows
# with the same author ID.

ProjectCols = snap.TStrV()
ProjectCols.Add("AuthorID")
ProjectCols.Add("TotalAuthCits")
AuthCitT = PapAuthCitJoinT.Project(ProjectCols)
AuthCitT.Unique("AuthorID")

# We can also sort the authors in decreasing order of total citations.
OrderBy = snap.TStrV() # The TTable.Order method sorts using the values of
                       # the columns in OrderBy, in lexicographic order.
OrderBy.Add("TotalAuthCits")
AuthCitT.Order(OrderBy, "", snap.TBool(False), snap.TBool(False))

TTable

class TTable
class TTable(Context)
class TTable(S, Context)
class TTable(SIn, Context)
class TTable(H, Col1, Col2, Context, IsStrKeys=False)
class TTable(Table, const TIntV& RowIds)
class TTable(Table)

Returns a new table. If no parameters are provided, an empty table is returned. If S and Context are provided, the table is initialized with the provided Schema and TTableContext. If SIn is provided, the table is read from the binary stream. If H, a THash with TInt keys and either TInt or TFlt values, is given, the TTable is constructed from the hash table. If IsStrKeys is True, then the TInt keys in H refer to strings in the Context. Col1 provides the name for the keys in H in the schema for the table and Col2 does the same for the values. If Table is provided, the contents of Table are copied into the current table. If RowIds is given, then only those particular rows are copied.

Below is a list of functions supported by the TTable class:

AddDstNodeAttr(Attr)

Adds column with name Attr to be used as the destination node attribute of the graph.

AddDstNodeAttr(Attrs)

Adds columns with the names specified in Attrs, a TStrV, to be used as destination node attributes of the graph.

AddEdgeAttr(Attr)

Adds column with name Attr to be used as graph edge attribute.

AddEdgeAttr(Attrs)

Adds columns, with names provided in Attrs, to be used as graph edge attributes.

AddNodeAttr(Attr)

Adds column with name Attr to be used as node attribute (both source and destination).

AddNodeAttr(Attrs)

Adds columns, with names provided in Attrs, to be used as node attribute (both source and destination).

AddSrcNodeAttr(Attr)

Adds column with name Attr to be used as the source node attribute of the graph.

AddSrcNodeAttr(Attrs)

Adds columns with the names specified in Attrs, a TStrV, to be used as source node attributes of the graph.

Aggregate(GroupByAttrs, AggOp, ValAttr, ResAttr, Ordered=True)

Aggregates values over one attribute, ValAttr, after grouping with respect to a list of attributes given in GroupByAttrs. Results are stored in a new attribute with name ResAttr. Ordered indicates whether to treat grouping key as ordered (true) or unordered. AggOp gives the aggregation policy. It must be one of aaSum, aaCount, aaMin, aaMax, aaFirst, aaLast, aaMean, or aaMedian.

AggregateCols(AggrAttrs, AggOp, ResAttr)

For each row in the table, aggregates values over a list of attributes given by AggrAttrs. Results are stored in a new attribute ResAttr. AggOp gives the aggregation policy. It must be one of aaSum, aaCount, aaMin, aaMax, aaFirst, aaLast, aaMean, aaMedian

BegRI()

Gets an iterator to the first valid row of the table. Returns a TRowIterator.

BegRIWR()

Gets an iterator to remove the first valid row. Returns a TRowIteratorWithRemove.

Classify(Predicate, LabelAttr, PositiveLabel, NegativeLabel)

Adds a label attribute, LabelAttr, with positive labels, a TInt given by PositiveLabel, on rows selected according to the TPredicate Predicate, and negative labels, a TInt given by NegativeLabel, on the rest.

ClassifyAtomic(Attr1, Attr2, Cmp, LabelAttr, PositiveLabel,
NegativeLabel)

Adds an integer label attribute, LabelAttr, with positive labels, given by PositiveLabel, on selected rows and negative labels, given by NegativeLabel, on the rest. Rows are selected using the atomic compare operator of type TPredComp, Cmp, over Attr1 and Attr2. Cmp must be one of LT, LTE, EQ, NEQ, GTE, GT, SUBSTR, or SUPERSTR.

ColAdd(Attr1, Attr2, ResAttr=:class:`TStr`(""))
ColAdd(Attr1, Table, Attr2, ResAttr=:class:`TStr`(""), AddToFirstTable)
ColAdd(Attr1, Value, ResAttr=:class:`TStr`(""), FloatCast)

Performs the operation Attr1 + Attr2, where Attr1 and Attr2 are attributes which can belong to the same or different tables. Could also perform Attr1 + Value, depending on the function prototype. The result is stored in a new attribute, ResAttr. If ResAttr = “”, the result is stored instead in the column corresponding to Attr1. If FloatCast, a TBool, is set to true, then values in Int columns are cast to Flt values. AddToFirstTable is a flag specifying whether to add ResAttr to the table corresponding to the caller (true), or to the table Table. NOTE: This operation does not work on String columns.

ColConcat(Attr1, Attr2, Separator, ResAttr=:class:`TStr`(""))
ColConcat(Attr1, Table, Attr2, Separator, ResAttr=:class:`TStr`(""), AddToFirstTable)

Concatenates the two columns given by Attr1 and Attr2, separated by Separator. Table specifies the TTable Attr2 comes from. The result is stored in a new column, ResAttr. If ResAttr = “”, the result is stored instead in the column corresponding to Attr1. AddToFirstTable is a flag specifying whether to add ResAttr to the table corresponding to the caller (true), or to the table Table. NOTE: This operation only works on String columns.

ColConcatConst(Attr, Value, Separator, ResAttr=:class:`TStr`(""))

Concatenates values for column Attr with the given string value Value, separated by Separator. Result is stored in a new column ResAttr. If ResAttr = “”, the result is stored instead in the column corresponding to Attr1. NOTE: This operation only works on String columns.

ColDiv(Attr1, Attr2, ResAttr=:class:`TStr`(""))
ColDiv(Attr1, Table, Attr2, ResAttr, AddToFirstTable)
ColDiv(Attr1, Value, ResAttr=:class:`TStr`(""), FloatCast)

Performs the operation Attr1 / Attr2, where Attr1 and Attr2 are attributes which can belong to the same or different tables. Could also perform Attr1 / Value, depending on the function prototype. The result is stored in a new attribute, ResAttr. If ResAttr = “”, the result is stored instead in the column corresponding to Attr1. If FloatCast, a TBool, is set to true, then values in Int columns are cast to Flt values. AddToFirstTable is a flag specifying whether to add ResAttr to the table corresponding to the caller (true), or to the table Table. NOTE: This operation does not work on String columns.

ColMax(Attr1, Attr2, ResAttr=:class:`TStr`(""))

Performs the operation MAX (Attr1, Attr2), where Attr1 and Attr2 are attributes in a table. The result is stored in a new column ResAttr. If ResAttr = “”, the result is stored instead in the column corresponding to Attr1. NOTE: This operation does not work on String columns.

ColMin(Attr1, Attr2, ResAttr=:class:`TStr`(""))

Performs the operation MIN (Attr1, Attr2), where Attr1 and Attr2 are attributes in a table. The result is stored in a new column ResAttr. If ResAttr = “”, the result is stored instead in the column corresponding to Attr1. NOTE: This operation does not work on String columns.

ColMod(Attr1, Attr2, ResAttr)
ColMod(Attr1, Table, Attr2, ResAttr, AddToFirstTable)
ColMod(Attr1, Value, ResAttr, FloatCast)

Performs the operation Attr1 % Attr2, where Attr1 and Attr2 are attributes which can belong to the same or different tables. Could also perform Attr1 % Value, depending on the function prototype. The result is stored in a new attribute, ResAttr. If ResAttr = “”, the result is stored instead in the column corresponding to Attr1. If FloatCast, a TBool, is set to true, then values in Int columns are cast to Flt values. AddToFirstTable is a flag specifying whether to add ResAttr to the table corresponding to the caller (true), or to the table Table. NOTE: This operation does not work on String or float columns.

ColMul(Attr1, Attr2, ResAttr)
ColMul(Attr1, Table, Attr2, ResAttr, AddToFirstTable)
ColMul(Attr1, Value, ResAttr, FloatCast)

Performs the operation Attr1 * Attr2, where Attr1 and Attr2 are attributes which can belong to the same or different tables. Could also perform Attr1 * Value, depending on the function prototype. The result is stored in a new attribute, ResAttr. If ResAttr = “”, the result is stored instead in the column corresponding to Attr1. If FloatCast, a TBool, is set to true, then values in Int columns are cast to Flt values. AddToFirstTable is a flag specifying whether to add ResAttr to the table corresponding to the caller (true), or to the table Table. NOTE: This operation does not work on String columns.

ColSub(Attr1, Attr2, ResAttr)
ColSub(Attr1, Table, Attr2, ResAttr, AddToFirstTable)
ColSub(Attr1, Value, ResAttr, FloatCast)

Performs the operation Attr1 - Attr2, where Attr1 and Attr2 are attributes which can belong to the same or different tables. Could also perform Attr1 - Value, depending on the function prototype. The result is stored in a new attribute, ResAttr. If ResAttr = “”, the result is stored instead in the column corresponding to Attr1. If FloatCast, a TBool, is set to true, then values in Int columns are cast to Flt values. AddToFirstTable is a flag specifying whether to add ResAttr to the table corresponding to the caller (true), or to the table Table. NOTE: This operation does not work on String columns.

Count(Attr, ResAttr)

For each row of the table, counts number of rows in the table sharing the same value as it for a given attribute Attr, a TStr. The result is stored in a new attribute, ResAttr.

EndRI()

Gets an iterator to the last valid row of the table. Returns a TRowIterator.

EndRIWR()

Gets an iterator to remove the last valid row. Returns a TRowIteratorWithRemove.

GetColType(Attr)

Gets type of an attribute Attr. Returns a TAttrType object representing attribute type.

GetDstCol()

Returns the name, a TStr, of the column representing destination nodes in the graph.

GetDstNodeFltAttrV()

Returns the names of the Flt columns, in a TStrV, corresponding to attributes of the destination nodes.

GetDstNodeIntAttrV()

Returns the names of the Int columns, in a TStrV, corresponding to attributes of the destination nodes.

GetDstNodeStrAttrV()

Returns the names of the Str columns, in a TStrV, corresponding to attributes of the destination nodes.

GetEdgeFltAttrV()

Returns the names of the Flt columns, in a TStrV, corresponding to edge attributes.

GetEdgeIntAttrV()

Returns the names of the Int columns, in a TStrV, corresponding to edge attributes.

GetEdgeStrAttrV()

Returns the names of the Str columns, in a TStrV, corresponding to edge attributes.

GetEdgeTable(Network, Context)

Extracts edge TTable from the TNEANet Network, using the TTableContext Context. Returns the resulting PTable.

GetEdgeTablePN(Network, Context)

Extracts edge TTable from the PNGraphMP Network, using the TTableContext Context. Returns the resulting PTable. NOTE: Defined only if OpenMP present.

GetFltNodePropertyTable(Network, Property, NodeAttrName, NodeAttrType, PropertyAttrName, Context)

Extracts node and and edge property TTables from a THash. Network is of type TNEANet, Property is a TIntFltH, NodeAttrName and PropertyAttrName are TStr`s, *NodeAttrType* is a :class:`TAttrType, and Context is a TTableContext. Returns a PTable object.

GetFltVal(Attr, RowIdx)

Gets the value of float attribute with name Attr at row RowIdx.

GetFltValAtRowIdx(ColIdx, RowIdx)

Gets the value of the float column at index ColIdx at row RowIdx.

GetIntVal(Attr, RowIdx)

Gets the value of integer attribute with name Attr at row RowIdx.

GetIntValAtRowIdx(ColIdx, RowIdx)

Gets the value of the integer column at index ColIdx at row RowIdx.

GetMP()

Returns the value of the static variable TTable::UseMP, which controls whether to use multi-threading. TTable::UseMP is 1 by default (meaning algorithms are multi-threaded by default if the OpenMP library is present).

GetMapHitsIterator(GraphSeq, Context, MaxIter=20)

Computes a sequence of Hits tables for a graph sequence GraphSeq, a TVec. A TTableIterator is returned.

GetMapPageRank(GraphSeq, Context, C=0.85, Eps=1e-4, MaxIter=100)

Computes a sequence of PageRank tables for a graph sequence GraphSeq, a TVec. A TTableIterator is returned.

GetNodeTable()

Extracts node TTable from TNEANet Network, using TTableContext Context.

GetNumRows()

Returns total number of rows in the table. Count could include rows which have been deleted previously.

GetNumValidRows()

Returns total number of valid rows in the table.

GetSchema()

Returns the schema of the table. Return type is Schema.

GetSrcCol()

Returns the name of the column representing source nodes in the graph.

GetSrcNodeFltAttrV()

Returns the names of the Flt columns corresponding to attributes of the source nodes. Return type is TStrV.

GetSrcNodeIntAttrV()

Returns the names of the Int columns corresponding to attributes of the source nodes. Return type is TStrV.

GetSrcNodeStrAttrV()

Returns the names of the Str columns corresponding to attributes of the source nodes. Return type is TStrV.

GetStrVal(Attr, RowIdx)

Gets the value of string attribute with name Attr at row RowIdx.

Group(GroupByAttrs, GroupAttrName, Ordered=True)

Groups rows according to the attributes specified by GroupByAttrs, a TStrV. Result is stored in a new column of the table with name GroupAttrName.

Intersection(PTable)

Returns a new table containing rows present in the current table that are also present in PTable, which is of type PTable.

Join(Attr1, PTable, Attr2)

Performs an equi-join on the current table and another table, PTable over attributes Attr1 in the current table and Attr2 in PTable.

Load(SIn, Context)

Loads table from the input stream SIn using TTableContext Context. Returns a PTable.

LoadSS(Schema, InFNm, Context, Separator='\\t', HasTitleLine=False)

Loads table from spread sheet (TSV, CSV, etc). Schema is a Schema object, InFNm provides the input file name, Context is a :class:`TTableContext`, *Separator is the field separator character in the input file, and HasTitleLine indicates whether the first line is a title line with the name of the columns (without a # preceding it). If HasTitleLine is True, then Schema is validated against it.

Minus(PTable)

Returns a new table containing rows present in the current table which are not present in another table given by PTable.

Order(OrderByAttrs, ResAttr, ResetRankFlag=False, Asc=True)

Orders the rows according to the values in OrderByAttrs (a TStrV). Results are stored in new column with name ResAttr. If Asc is True, rows are ordered in ascending lexicographic order.

Project(ProjectAttrs)

Returns a table with only the attributes in ProjectAttrs, a TStrV.

ProjectInPlace(ProjectAttrs)

Modifies the current table to keep only the attributes specified in ProjectAttrs.

ReadFltCol(Attr, Result)

Reads values of an entire float column given by Attr into the TFltV Result.

ReadIntCol(Attr, Result)

Reads values of an entire int column given by Attr into the TFltV Result.

ReadStrCol(Attr, Result)

Reads values of an entire string column given by Attr into the TFltV Result.

Rename(Attr, NewAttr)

Renames an attribute with name Attr to new name NewAttr in a table.

SaveBin(OutFNm)

Saves table schema and content into a binary file with name OutFNm.

SaveSS(OutFNm)

Saves table schema and content into a TSV file with name OutFNm.

Select(Predicate, SelectedRows, Remove=True)

Selects rows that satisfy a given Predicate, of type TPredicate. The selected row indices are stored in SelectedRows, a TIntV. If Remove is True, rows that do not match the predicate are removed.

SelectAtomic(Attr1, Attr2, Cmp, SelectedRows, Remove=True)

Selects rows which satisfy an atomic compare operation, Cmp, of type TPredComp. Cmp must be one of LT, LTE, EQ, NEQ, GTE, GT, SUBSTR, or SUPERSTR. The selected row indices are stored in SelectedRows, a TIntV. If Remove is True, rows that do not match the predicate are removed.

SelectAtomicFltConst(Attr, Val, Cmp, SelectedTable)

Selects rows where the value of a float attribute, Attr, satisfies an atomic comparison, Cmp, with a primitive type Val. Cmp must be one of LT, LTE, EQ, NEQ, GTE, GT, SUBSTR, or SUPERSTR. The selected rows are added to the PTable SelectedTable.

SelectAtomicIntConst(Attr, Val, Cmp, SelectedTable)

Selects rows where the value of a int attribute, Attr, satisfies an atomic comparison, Cmp, with a primitive type Val. Cmp must be one of LT, LTE, EQ, NEQ, GTE, GT, SUBSTR, or SUPERSTR. The selected rows are added to the PTable SelectedTable.

SelectAtomicStrConst(Attr, Val, Cmp, SelectedTable)

Selects rows where the value of a string attribute, Attr, satisfies an atomic comparison, Cmp, with a primitive type Val. Cmp must be one of LT, LTE, EQ, NEQ, GTE, GT, SUBSTR, or SUPERSTR. The selected rows are added to the PTable SelectedTable.

SelectFirstNRows(N)

Modifies table in place so that it only its first N rows are retained.

SelfJoin(Attr)

Performs a self-join on the table on the attribute Attr. Returns a new table.

SelfSimJoin(Attrs, DistColAttr, SimType, Threshold)

Performs a self sim-join on a table. Performs join if the distance between two rows is less than the specified float threshold Threshold. SimType should be one of L1Norm, L2Norm, Jaccard, and Haversine. Attrs gives the list of attributes for computing the distance between rows. DistColAttr is the name of the attribute representing the distance between rows in the new table. A new PTable is returned.

SetCommonNodeAttrs(SrcAttr, DstAttr, CommonAttr)

Sets the columns to be used as both source and destination node attributes. All input parameters should be strings.

SetDstCol(Attr)

Sets the column representing destination nodes in the graph.

SetMP(Value)

Sets the value of the static variable TTable::UseMP to Value, an integer.

SetSrcCol(Attr)

Sets the column representing source nodes in the graph.

SimJoin(Attr1, Table, Attr2, DistColAttr, SimType, Threshold)

Performs SimJoin on the current table and Table. Performs join if the distance between two rows is less than the specified float threshold Threshold. SimType should be one of L1Norm, L2Norm, Jaccard, and Haversine. Attrs gives the list of attributes for computing the distance between rows. DistColAttr is the name of the attribute representing the distance between rows in the new table. A new PTable is returned.

SpliceByGroup(GroupByAttrs, Ordered)

Splices table into subtables according to the result of a grouping statement. GroupByAttrs is a TStrV, an attribute vector grouping should be performed with respect to. Ordered is a flag specifying whether to treat the grouping key as ordered or unordered.

StoreFltCol(ColName, ColVals)

Adds entire float column to the table. ColName gives the column name and ColVals is TFltV giving the vector of column values.

StoreIntCol(ColName, ColVals)

Adds entire int column to the table. ColName gives the column name and ColVals is TIntV giving the vector of column values.

StoreStrCol(ColName, ColVals)

Adds entire string column to the table. ColName gives the column name and ColVals is TStrV giving the vector of column values.

TableFromHashMap(HashMap, Attr1, Attr2, Context)

Returns a table constructed from the given hash map HashMap of type TIntH or TIntFltH. Attr1 is the name of the attribute corresponding to the first column and Attr2 for the second column.

ToGraphSequence(SplitAttr, AggrPolicy, WindowSize, JumpSize, StartVal, EndVal)

Returns a sequence of graphs created from the table, where partitioning is based on values of column with name SplitAttr and windows are specified by JumpSize and WindowSize. AggrPolicy is a TAttrAggr indicating the policy for aggregating node attribute values when a node appears in multiple rows of the table. It must be one of aaSum, aaCount, aaMin, aaMax, aaFirst, aaLast, aaMean, or aaMedian. WindowSize gives the partition size, and JumpSize gives the spacing of the partitions. Only values of SplitAttr between StartVal and EndVal, inclusive, are considered.

ToVarGraphSequence(SplitAttr, AggrPolicy, SplitIntervals)

Returns a sequence of graphs created from the table, where partitioning is based on values of column SplitAttr and intervals specified by SplitIntervals. SplitIntervals is a TIntPrV that gives the start and end SplitAttr attribute values for each partition of the table. AggrPolicy is a TAttrAggr indicating the policy for aggregating node attribute values when a node appears in multiple rows of the table.

ToGraphPerGroup(GroupAttr, AggrPolicy)

Returns a sequence of graphs created from the table, where partitioning is based on the group mappings specified by values of attribute GroupAttr. AggrPolicy is the policy for aggregating node attribute values. It must be one of aaSum, aaCount, aaMin, aaMax, aaFirst, aaLast, aaMean, aaMedian

ToGraphSequenceIterator(SplitAttr, AggrPolicy, WindowSize, JumpSize, StartVal, EndVal)

Similar to ToGraphSequence, but instead of returning the sequence of graphs, returns the first graph in the sequence. To iterate over the sequence, use TTable::NextGraphIterator and TTable::IsLastGraphOfSequence.

Calls to TTable::NextGraphIterator() will generate graphs one at a time. This is beneficial when the entire graph sequence cannot fit in memory.

ToVarGraphSequenceIterator(SplitAttr, AggrPolicy, SplitIntervals)

Similar to ToVarGraphSequence, but instead of returning the sequence of graphs, returns the first graph in the sequence. To iterate over the sequence, use TTable::NextGraphIterator and TTable::IsLastGraphOfSequence.

Calls to TTable::NextGraphIterator() will generate graphs one at a time. This is beneficial when the entire graph sequence cannot fit in memory.

ToGraphPerGroupIterator(GroupAttr, AggrPolicy)

Similar to ToGraphPerGroupSequence, but instead of returning the entire sequence of graphs, returns the first graph in the sequence. To iterate over the sequence, use TTable::NextGraphIterator and TTable::IsLastGraphOfSequence.

Calls to TTable::NextGraphIterator() will generate graphs one at a time. This is beneficial when the entire graph sequence cannot fit in memory.

NextGraphIterator()

Returns the next graph, a PNEANet object, in the sequence defined by one of the TTable::ToGraph*Iterator functions. Calls to this function must be preceded by a single call to one of the above TTable::ToGraph*Iterator functions.

IsLastGraphOfSequence()

Checks if the graph sequence defined by one of the TTable::ToGraph* Iterator functions has been completely iterated over. Calls to this function must be preceded by a single call to one of the above TTable::ToGraph*Iterator functions.

Union(PTable)

Returns a new table containing rows present in either one of the current table and the passed table. Duplicate rows across tables may not be preserved.

UnionAll(PTable)

Returns a new table containing rows present in either one of the current table and the passed table, PTable. Duplicate rows across tables are preserved.

Unique(Attrs, Ordered=True)

Removes rows with duplicate values across the given attributes in Attrs. If Ordered is True, values across attributes are treated as an ordered pair.

GetIntRowIdxByVal(const TStr& ColName, const TInt& Val)

Gets a vector containing the indices of rows containing Val in int column ColName. Uses an index if it has been requested explicitly; else, it loops over all the rows. Be sure to request an index using RequestIndexInt() first if you will call this multiple times.

GetStrRowIdxByMap(const TStr& ColName, const TInt& Map)

Gets a vector containing the indices of rows containing the integer Map (which maps to a string) in str column ColName. Uses an index if it has been requested explicitly; else, it loops over all the rows. Be sure to request an index using RequestIndexStrMap() first if you will call this multiple times.

GetFltRowIdxByVal(const TStr& ColName, const TFlt& Val)

Gets a vector containing the indices of rows containing Val in flt column ColName. Uses an index if it has been requested explicitly; else, it loops over all the rows. Be sure to request an index using RequestIndexFlt() first if you will call this multiple times.

RequestIndexInt(const TStr& ColName)

Creates a hash-based index for int column ColName, so that the rows containing a particular value can be retrieved efficiently. Used by GetIntRowIdxByVal()

RequestIndexFlt(const TStr& ColName)

Creates a hash-based index for float column ColName, so that the rows containing a particular value can be retrieved efficiently. Used by GetFltRowIdxByVal()

RequestIndexStrMap(const TStr& ColName)

Creates a hash-based index for string column ColName, using the integer mappings, so that the rows containing a particular value can be retrieved efficiently. Used by GetStrRowIdxByMap()

TAtomicPredicate

class TAtomicPredicate
class TAtomicPredicate(Typ, IsCnst, Cmp, L, R)
class TAtomicPredicate(Typ, IsCnst, Cmp, L, R, ICnst, FCnst, SCnst)

Returns a new atomic predicate, for encapsulating common operations. Typ provides the type of the predicate variables, IsCnst is a flag indicating if this atomic node represents a constant value, Cmp is one of LT, LTE, EQ, NEQ, GTE, GT, SUBSTR, or SUPERSTR, L and R are strings giving the left and right variable of the comparison op, and ICnst, FCnst, and SCnst give the int, float, and str constant value to use if the object is a constant of the respective type,

TPredicateNode

class TPredicateNode
class TPredicateNode(A)
class TPredicateNode(Opr)
class TPredicateNode(P)

Returns a new predicate node, which represents a binary predicate operation on two predicate nodes. Specify A, a TAtomicPredicate, if this is a leaf node, Opr, one of AND, NOT, NOP, or OR, for logical operation predicate internal nodes, or P, another TPredicateNode, for the copy constructor.

Below is a list of functions supported by the TPredicateNode class:

AddLeftChild(TPredicateNode* Child)

Adds Child as the left child of the given node. Child is a pointer to a TPredicateNode.

AddRightChild(TPredicateNode* Child)

Adds Child as the right child of the given node. Child is a pointer to a TPredicateNode.

GetVariables(Variables)

Adds variables to Variables in the predicate tree rooted at this node. Variables is a TStrV.

TPredicate

class TPredicate
class TPredicate(R)
class TPredicate(Pred)

Returns a new predicate, for encapsulating comparison operations. If R, a pointer to a TPredicateNode, is provided, it constructs a predicate with the given root node. If Pred, another TPredicate, is supplied, the copy constructor is called.

Below is a list of functions supported by the TPredicate class:

SetIntVal(VarName, VarVal)

Sets int variable with name VarName to value VarVal.

SetFltVal(VarName, VarVal)

Sets float variable with name VarName to value VarVal.

SetStrVal(VarName, VarVal)

Sets string variable with name VarName to value VarVal.

Eval()

Return the result of evaluating the current predicate.

EvalAtomicPredicate(Atom)

Evaluate the give atomic predicate Atom.

GetVariables(Variables)

Adds variables to Variables in the given predicate. Variables is a TStrV.

TTableContext

class TTableContext
class TTableContext(SIn)

Returns an context object. A TTableContext provides the execution context for a TTable. The context is loaded in binary from SIn, if it is provided.

The Context is primarily used to handle strings. It maps strings in the table to a unique integer. To support fast operations, the TTable objects store only the corresponding integer for all strings. When a program needs to retrive the string value, it does so by using the provided method’s in the table’s TTableContext.

Below is a list of functions supported by the TTableContext class:

Load(SIn)

Loads context in binary from SIn.

Save(SOut)

Saves context in binary to SOut.

AddStr(Key)

Adds string Key to the context and returns its KeyId.

GetStr(KeyId)

Returns the string key for the given KeyId.

TPrimitive

class TPrimitive
class TPrimitive(Val)
class TPrimitive(Prim)

Returns a new primitive, a wrapper around primitive types. If provided, initialized with primitive type Val, which can be an int, float, or string. Providing Prim, another TPrimitive, copies the contents.

Below is a list of functions supported by the TPrimitive class:

GetInt()

Returns the int value of the primitive. If the primitive does not represent an int, returns -1.

GetFlt()

Returns the float value of the primitive. If the primitive does not represent an float, returns -1.

GetStr()

Returns the string value of the primitive. If the primitive does not represent an string, returns the empty string.

GetType()

Returns the type of this primitive.

TTableRow

class TTableRow

Returns a row object for a TTable.

Below is a list of functions supported by the TTable class:

AddInt(Val)

Adds int attribute to this row.

AddInt(Val)

Adds float attribute to this row.

AddInt(Val)

Adds string attribute to this row.

GetIntVals()

Gets a vector of all the int attributes of this row.

GetFltVals()

Gets a vector of all the float attributes of this row.

GetStrVals()

Gets a vector of all the string attributes of this row.

TRowIterator

class TRowIterator

Returns a new row iterator for TTable. Normally, these objects are not created directly, but obtained via a call to the table class TTable method, such as BegRI(), that returns a row iterator.

Below is a list of functions supported by the TRowIterator class:

Next()

Increments the iterator.

GetRowIdx()

Gets the id of the row pointed by this iterator.

GetIntAttr(ColIdx)

Returns the value of integer attribute specified by the integer column index for the current row.

GetFltAttr(ColIdx)

Returns the value of float attribute specified by the integer column index for the current row.

GetStrAttr(ColIdx)

Returns the value of string attribute specified by the integer column index for the current row.

GetStrMapById(ColIdx)

Returns the integer mapping of a string attribute value specified by the string column index for the current row.

GetIntAttr(Col)

Returns value of the integer attribute specified by attribute name for the current row.

GetFltAttr(Col)

Returns value of the float attribute specified by attribute name for the current row.

GetStrAttr(Col)

Returns value of the string attribute specified by attribute name for the current row.

GetStrMapByName(Col)

Returns the integer mapping of string attribute specified by attribute name for the current row.

CompareAtomicConst(ColIdx, Val, Cmp)

Compares value in column ColIdx with given primitive Val. Cmp must be one of LT, LTE, EQ, NEQ, GTE, GT, SUBSTR, or SUPERSTR.

CompareAtomicConstTStr(ColIdx, Val, Cmp)

Compares value in column ColIdx with given TStr Val. Cmp must be one of LT, LTE, EQ, NEQ, GTE, GT, SUBSTR, or SUPERSTR.

TRowIteratorWithRemove

class TRowIteratorWithRemove

Returns a new row iterator that allows for logical row removal while iterating for TTable. Normally, these objects are not created directly, but obtained via a call to the table class TTable method, such as BegRIWR(), that returns a row iterator.

Below is a list of functions supported by the TRowIteratorWithRemove class:

Next()

Increments the iterator.

GetRowIdx()

Gets the id of the row pointed by this iterator.

GetNextRowIdx()

Gets the id of the next row.

GetNextIntAttr(ColIdx)

Returns the value of integer attribute specified by the integer column index for the next row.

GetNextFltAttr(ColIdx)

Returns the value of float attribute specified by the integer column index for the next row.

GetNextStrAttr(ColIdx)

Returns the value of string attribute specified by the integer column index for the next row.

GetNextIntAttr(Col)

Returns value of the integer attribute specified by attribute name for the next row.

GetNextFltAttr(Col)

Returns value of the float attribute specified by attribute name for the next row.

GetNextStrAttr(Col)

Returns value of the string attribute specified by attribute name for the next row.

IsFirst()

Checks whether iterator points to first valid row of the table.

RemoveNext()

Removes the next row.

CompareAtomicConst(ColIdx, Val, Cmp)

Compares value in column ColIdx with given primitive Val. Cmp must be one of LT, LTE, EQ, NEQ, GTE, GT, SUBSTR, or SUPERSTR.

TTableIterator

class TTableIterator

Returns a new iterator over vector of PTable. Normally, these objects are not created directly, but obtained via a call to the table class TTable method, such as GetMapPageRank(), that returns a node iterator.

Below is a list of functions supported by the TTable class:

Next()

Returns next table in the sequence and update iterator.

HasNext()

Checks if iterator has reached end of the sequence.