Octane v1.01.20 - The Open Compression Toolkit for C++ | http://octane.sourceforge.net/ |
#include <substringparser.hpp>
Each symbol can hold an arbitrary string of character, and is annotated with a 'weight' which stores the frequency of that substring during the construction of the symbol table. Symbols also contain a numerical index which allows each symbol to report its symbol id (position in the symbol vector), which is how parsers interact with other components of statistical coding (by symbol id #).
Definition at line 67 of file substringparser.hpp.
Public Member Functions | |
SubstringSymbol (const std::string &invalue, TSubStrParserWeight inweight=0) | |
bool | operator> (const SubstringSymbol *&a) const |
The comparison operator used to order the priority queue. | |
TSubStrParserWeight | get_weight () const |
Returns the current weight (frequency) of the symbol. | |
void | increment_weight (int increment) |
Increment weight/frequency of symbol. | |
void | set_weight (TSubStrParserWeight val) |
Set weight of symbol. | |
std::string * | get_valuep () |
Get pointer to string value of symbol. | |
std::string & | get_value () |
Get string value of symbol (i.e. the text of the symbol). | |
int | get_valuelen () |
Get length of symbol string. | |
void | set_value (const std::string &invalue) |
Set value of symbol string. | |
int | get_symbolvectorpos () |
Return the symbol id # (the position of the symbol in the symbol vector). | |
void | set_symbolvectorpos (int inpos) |
Set the symbol id #. | |
bool | dontprune () |
Returns true if this is a "primitive" protected symbol and should not be pruned. | |
unsigned int | get_memoryused () const |
Returns the actual memory used by this symbol (uses string.capacity). | |
double | get_cost () |
The parser may evaluate the quality of its symbol set by estimating the "cost" of the symbolset even in the absence of any specified coding and compression strategy; it calculates the cost of each symbol as the inverse of frequency, lower is better. | |
Protected Attributes | |
TSubStrParserWeight | weight |
Frequency of the symbol, as calculated during a 'training' phase. | |
std::string | value |
Value represented by the symbol (i.e. the character, word, or Substring). | |
int | symbolvectorpos |
Each symbol keeps track of its position in symbol vector. |
|
Returns true if this is a "primitive" protected symbol and should not be pruned. used for things like insisting that the ascii characters are never pruned from symbol set even though they are not used during training. Definition at line 106 of file substringparser.hpp. References value.
00106 {if (value.length()<=1) return true; else return false;} |