| Octane v1.01.20 - The Open Compression Toolkit for C++ | http://octane.sourceforge.net/ |
#include <parser.hpp>
Inheritance diagram for OctaneParser:

Parsers are commonly just character-based, returning the ascii character for each symbol parsed, and therefore require no training. But some parsers can be more sophisticated, scanning input streams and building a list of all words or common words, etc.
Definition at line 48 of file parser.hpp.
Public Member Functions | |
| OctaneParser () | |
| constructor | |
| virtual | ~OctaneParser () |
| destructor | |
| virtual std::string | GetName () |
| provide a unique name for the coder, used in some cases to automatically register the object with a manager | |
| virtual std::string | GetDescription () |
| optionally provide a longer (maybe 20-60 characters) description | |
| virtual std::string | GetHelpInformation () |
| optionally provide more information about the object on request for help | |
| virtual bool | ResetState () |
| virtual bool | CreateSymbolSetUsingStream (bitreader &from) |
| Process (train on) an input stream to update/create a symbol set from it. | |
| virtual bool | PrepareForParsing () |
| Prepare for parsing mode; must be called after a CreateSymbol call, and before parsing begins. | |
| virtual bool | IsReadyToParse () |
| are we ready to parse? i.e. has symbol set been built. | |
| virtual bool | RewindAnyBufferedInput (bitreader &from) |
| Let go of any buffered stream - this is an odd function that can be called be compressor if it wants to hand off the input stream to a new parser or otherwise access the input bitstream from after last symbol parsed. | |
| virtual void | SynchronizeStateForNewStream () |
| Synchronize state for a new stream - this will always be called before beginning a new parsing stream, and should be used to reset the parse into any initial predictable state. | |
| virtual int | GetSymbolCount ()=0 |
| Get a count of the number of symbols stored in the parser. | |
| virtual bool | ParseNextSymbolFromInput (bitreader &from, int &symbolnum)=0 |
| Parse the next symbol from the input and return its #. | |
| virtual bool | WriteSymbolText (bitwriter &to, int symbolnum, bool &isendofstreamsymbol)=0 |
| Parse the next symbol from the input, and set symbolnum to the symbol id#,. | |
| virtual string | LookupSymbolText (int symbolnum)=0 |
| Helper function to return the text string of a symbol number. | |
Static Protected Member Functions | |
| bool | IsNonWordCharacter (unsigned char c) |
| Static helper function to decide whether c is a character found in words, or should be treated as a non-word character. | |
|
|
provide a unique name for the coder, used in some cases to automatically register the object with a manager
Reimplemented from OctaneClass. Definition at line 58 of file parser.hpp.
00058 {return "OctaneParser";}
|
|
|
optionally provide a longer (maybe 20-60 characters) description
Reimplemented from OctaneClass. Definition at line 59 of file parser.hpp.
00059 {return "Base Parser Class";}
|
|
|
optionally provide more information about the object on request for help
Reimplemented from OctaneClass. Definition at line 60 of file parser.hpp. Referenced by OctaneCompressor_Statistical::GetHelpInformation().
00060 { return ""; }
|
|
|
Process (train on) an input stream to update/create a symbol set from it.
Reimplemented in BitParser, SampleParser, and SubstringParser. Definition at line 68 of file parser.hpp. Referenced by OctaneCompressor_Statistical::DoProtectedCreateSymbolsAndModelsUsingStream().
00068 {return true;};
|
|
|
Prepare for parsing mode; must be called after a CreateSymbol call, and before parsing begins.
Definition at line 71 of file parser.hpp. Referenced by OctaneCompressor_Statistical::PrepareForCompression().
00071 {return true;};
|
|
|
Let go of any buffered stream - this is an odd function that can be called be compressor if it wants to hand off the input stream to a new parser or otherwise access the input bitstream from after last symbol parsed. it is necessary because some parsers can read-ahead in input buffer, and so must rewind the bitstream. Reimplemented in BitParser, SampleParser, and SubstringParser. Definition at line 77 of file parser.hpp.
00077 {return true;};
|
|
||||||||||||
|
Parse the next symbol from the input and return its #.
Implemented in BitParser, SampleParser, and SubstringParser. Referenced by SymbolWeightVector::CountSymbolFrequencies(), and OctaneCompressor_Statistical::DoProtectedCompress(). |
|
||||||||||||||||
|
Parse the next symbol from the input, and set symbolnum to the symbol id#,.
Implemented in BitParser, SampleParser, and SubstringParser. Referenced by OctaneCompressor_Statistical::DoProtectedDecompress(). |
|
|
Helper function to return the text string of a symbol number.
Implemented in BitParser, SampleParser, and SubstringParser. |
|
|
Static helper function to decide whether c is a character found in words, or should be treated as a non-word character. used by some parsers to differentiate between word characters and separators. Definition at line 17 of file parser.cpp.
00018 {
00019 // return true if c is a non-word character, ie a delimiter between words
00020 // ATTN: this is a pretty inefficient function; we could use a static lookup table if we wanted to do this fast
00021 if (c==39)
00022 {
00023 // apostrophe
00024 return false;
00025 }
00026 else if (c<48)
00027 {
00028 // punctuation and nonprintables
00029 return true;
00030 }
00031 else if (c<58)
00032 {
00033 // digits
00034 return false;
00035 }
00036 else if (c<65)
00037 {
00038 // punctuation
00039 return true;
00040 }
00041 else if (c<91)
00042 {
00043 // uppercase
00044 return false;
00045 }
00046 else if (c<97)
00047 {
00048 // punctuation
00049 return true;
00050 }
00051 else if (c<123)
00052 {
00053 // lowercase
00054 return false;
00055 }
00056 else if (c<154)
00057 {
00058 // punctuation and nonprintables
00059 return true;
00060 }
00061 else
00062 {
00063 // ATTN: we've got some non-english characters here which mind form characters in other languages
00064 // for now we treat as word separators
00065 return true;
00066 }
00067 }
|