Octane v1.01.20 - The Open Compression Toolkit for C++ | http://octane.sourceforge.net/ |
#include <parser.hpp>
Inheritance diagram for OctaneParser:
Parsers are commonly just character-based, returning the ascii character for each symbol parsed, and therefore require no training. But some parsers can be more sophisticated, scanning input streams and building a list of all words or common words, etc.
Definition at line 48 of file parser.hpp.
Public Member Functions | |
OctaneParser () | |
constructor | |
virtual | ~OctaneParser () |
destructor | |
virtual std::string | GetName () |
provide a unique name for the coder, used in some cases to automatically register the object with a manager | |
virtual std::string | GetDescription () |
optionally provide a longer (maybe 20-60 characters) description | |
virtual std::string | GetHelpInformation () |
optionally provide more information about the object on request for help | |
virtual bool | ResetState () |
virtual bool | CreateSymbolSetUsingStream (bitreader &from) |
Process (train on) an input stream to update/create a symbol set from it. | |
virtual bool | PrepareForParsing () |
Prepare for parsing mode; must be called after a CreateSymbol call, and before parsing begins. | |
virtual bool | IsReadyToParse () |
are we ready to parse? i.e. has symbol set been built. | |
virtual bool | RewindAnyBufferedInput (bitreader &from) |
Let go of any buffered stream - this is an odd function that can be called be compressor if it wants to hand off the input stream to a new parser or otherwise access the input bitstream from after last symbol parsed. | |
virtual void | SynchronizeStateForNewStream () |
Synchronize state for a new stream - this will always be called before beginning a new parsing stream, and should be used to reset the parse into any initial predictable state. | |
virtual int | GetSymbolCount ()=0 |
Get a count of the number of symbols stored in the parser. | |
virtual bool | ParseNextSymbolFromInput (bitreader &from, int &symbolnum)=0 |
Parse the next symbol from the input and return its #. | |
virtual bool | WriteSymbolText (bitwriter &to, int symbolnum, bool &isendofstreamsymbol)=0 |
Parse the next symbol from the input, and set symbolnum to the symbol id#,. | |
virtual string | LookupSymbolText (int symbolnum)=0 |
Helper function to return the text string of a symbol number. | |
Static Protected Member Functions | |
bool | IsNonWordCharacter (unsigned char c) |
Static helper function to decide whether c is a character found in words, or should be treated as a non-word character. |
|
provide a unique name for the coder, used in some cases to automatically register the object with a manager
Reimplemented from OctaneClass. Definition at line 58 of file parser.hpp.
00058 {return "OctaneParser";} |
|
optionally provide a longer (maybe 20-60 characters) description
Reimplemented from OctaneClass. Definition at line 59 of file parser.hpp.
00059 {return "Base Parser Class";} |
|
optionally provide more information about the object on request for help
Reimplemented from OctaneClass. Definition at line 60 of file parser.hpp. Referenced by OctaneCompressor_Statistical::GetHelpInformation().
00060 { return ""; } |
|
Process (train on) an input stream to update/create a symbol set from it.
Reimplemented in BitParser, SampleParser, and SubstringParser. Definition at line 68 of file parser.hpp. Referenced by OctaneCompressor_Statistical::DoProtectedCreateSymbolsAndModelsUsingStream().
00068 {return true;}; |
|
Prepare for parsing mode; must be called after a CreateSymbol call, and before parsing begins.
Definition at line 71 of file parser.hpp. Referenced by OctaneCompressor_Statistical::PrepareForCompression().
00071 {return true;}; |
|
Let go of any buffered stream - this is an odd function that can be called be compressor if it wants to hand off the input stream to a new parser or otherwise access the input bitstream from after last symbol parsed. it is necessary because some parsers can read-ahead in input buffer, and so must rewind the bitstream. Reimplemented in BitParser, SampleParser, and SubstringParser. Definition at line 77 of file parser.hpp.
00077 {return true;}; |
|
Parse the next symbol from the input and return its #.
Implemented in BitParser, SampleParser, and SubstringParser. Referenced by SymbolWeightVector::CountSymbolFrequencies(), and OctaneCompressor_Statistical::DoProtectedCompress(). |
|
Parse the next symbol from the input, and set symbolnum to the symbol id#,.
Implemented in BitParser, SampleParser, and SubstringParser. Referenced by OctaneCompressor_Statistical::DoProtectedDecompress(). |
|
Helper function to return the text string of a symbol number.
Implemented in BitParser, SampleParser, and SubstringParser. |
|
Static helper function to decide whether c is a character found in words, or should be treated as a non-word character. used by some parsers to differentiate between word characters and separators. Definition at line 17 of file parser.cpp.
00018 { 00019 // return true if c is a non-word character, ie a delimiter between words 00020 // ATTN: this is a pretty inefficient function; we could use a static lookup table if we wanted to do this fast 00021 if (c==39) 00022 { 00023 // apostrophe 00024 return false; 00025 } 00026 else if (c<48) 00027 { 00028 // punctuation and nonprintables 00029 return true; 00030 } 00031 else if (c<58) 00032 { 00033 // digits 00034 return false; 00035 } 00036 else if (c<65) 00037 { 00038 // punctuation 00039 return true; 00040 } 00041 else if (c<91) 00042 { 00043 // uppercase 00044 return false; 00045 } 00046 else if (c<97) 00047 { 00048 // punctuation 00049 return true; 00050 } 00051 else if (c<123) 00052 { 00053 // lowercase 00054 return false; 00055 } 00056 else if (c<154) 00057 { 00058 // punctuation and nonprintables 00059 return true; 00060 } 00061 else 00062 { 00063 // ATTN: we've got some non-english characters here which mind form characters in other languages 00064 // for now we treat as word separators 00065 return true; 00066 } 00067 } |