Octane v1.01.20 - The Open Compression Toolkit for C++ http://octane.sourceforge.net/
Homepage | Main | Modules | Class Hierarchy | Compound List | File List | Compound Members | Related Pages

OctaneParser Class Reference
[Parsers]

#include <parser.hpp>

Inheritance diagram for OctaneParser:

OctaneClass BitParser SampleParser SubstringParser List of all members.

Detailed Description

The Base Parser class is responsible for dividing the input stream into a series of numerical symbols, and for converting symbol numbers to their symbol texts during decompression.

Parsers are commonly just character-based, returning the ascii character for each symbol parsed, and therefore require no training. But some parsers can be more sophisticated, scanning input streams and building a list of all words or common words, etc.

Definition at line 48 of file parser.hpp.

Public Member Functions

 OctaneParser ()
 constructor

virtual ~OctaneParser ()
 destructor

virtual std::string GetName ()
 provide a unique name for the coder, used in some cases to automatically register the object with a manager

virtual std::string GetDescription ()
 optionally provide a longer (maybe 20-60 characters) description

virtual std::string GetHelpInformation ()
 optionally provide more information about the object on request for help

virtual bool ResetState ()
virtual bool CreateSymbolSetUsingStream (bitreader &from)
 Process (train on) an input stream to update/create a symbol set from it.

virtual bool PrepareForParsing ()
 Prepare for parsing mode; must be called after a CreateSymbol call, and before parsing begins.

virtual bool IsReadyToParse ()
 are we ready to parse? i.e. has symbol set been built.

virtual bool RewindAnyBufferedInput (bitreader &from)
 Let go of any buffered stream - this is an odd function that can be called be compressor if it wants to hand off the input stream to a new parser or otherwise access the input bitstream from after last symbol parsed.

virtual void SynchronizeStateForNewStream ()
 Synchronize state for a new stream - this will always be called before beginning a new parsing stream, and should be used to reset the parse into any initial predictable state.

virtual int GetSymbolCount ()=0
 Get a count of the number of symbols stored in the parser.

virtual bool ParseNextSymbolFromInput (bitreader &from, int &symbolnum)=0
 Parse the next symbol from the input and return its #.

virtual bool WriteSymbolText (bitwriter &to, int symbolnum, bool &isendofstreamsymbol)=0
 Parse the next symbol from the input, and set symbolnum to the symbol id#,.

virtual string LookupSymbolText (int symbolnum)=0
 Helper function to return the text string of a symbol number.


Static Protected Member Functions

bool IsNonWordCharacter (unsigned char c)
 Static helper function to decide whether c is a character found in words, or should be treated as a non-word character.


Member Function Documentation

virtual std::string OctaneParser::GetName  )  [inline, virtual]
 

provide a unique name for the coder, used in some cases to automatically register the object with a manager

Returns:
a short *unique* name

Reimplemented from OctaneClass.

Definition at line 58 of file parser.hpp.

00058 {return "OctaneParser";}

virtual std::string OctaneParser::GetDescription  )  [inline, virtual]
 

optionally provide a longer (maybe 20-60 characters) description

Returns:
a one line description

Reimplemented from OctaneClass.

Definition at line 59 of file parser.hpp.

00059 {return "Base Parser Class";}

virtual std::string OctaneParser::GetHelpInformation  )  [inline, virtual]
 

optionally provide more information about the object on request for help

Returns:
a long string (can be multiple
newlines)

Reimplemented from OctaneClass.

Definition at line 60 of file parser.hpp.

Referenced by OctaneCompressor_Statistical::GetHelpInformation().

00060 { return ""; }

virtual bool OctaneParser::CreateSymbolSetUsingStream bitreader from  )  [inline, virtual]
 

Process (train on) an input stream to update/create a symbol set from it.

Returns:
true on success

Reimplemented in BitParser, SampleParser, and SubstringParser.

Definition at line 68 of file parser.hpp.

Referenced by OctaneCompressor_Statistical::DoProtectedCreateSymbolsAndModelsUsingStream().

00068 {return true;};

virtual bool OctaneParser::PrepareForParsing  )  [inline, virtual]
 

Prepare for parsing mode; must be called after a CreateSymbol call, and before parsing begins.

Returns:
true on success

Definition at line 71 of file parser.hpp.

Referenced by OctaneCompressor_Statistical::PrepareForCompression().

00071 {return true;};

virtual bool OctaneParser::RewindAnyBufferedInput bitreader from  )  [inline, virtual]
 

Let go of any buffered stream - this is an odd function that can be called be compressor if it wants to hand off the input stream to a new parser or otherwise access the input bitstream from after last symbol parsed.

it is necessary because some parsers can read-ahead in input buffer, and so must rewind the bitstream.

Reimplemented in BitParser, SampleParser, and SubstringParser.

Definition at line 77 of file parser.hpp.

00077 {return true;};

virtual bool OctaneParser::ParseNextSymbolFromInput bitreader from,
int &  symbolnum
[pure virtual]
 

Parse the next symbol from the input and return its #.

Note:
The end of stream situation must to be handled specially:

When a parser encounters the end of a stream, it *MUST* return a symbol signifying an end-of-stream symbol.

This end of stream symbol must be a unique symbol from the symbol set.

Returns:
false *after* the end of stream symbol is returned on prior call.

Implemented in BitParser, SampleParser, and SubstringParser.

Referenced by SymbolWeightVector::CountSymbolFrequencies(), and OctaneCompressor_Statistical::DoProtectedCompress().

virtual bool OctaneParser::WriteSymbolText bitwriter to,
int  symbolnum,
bool &  isendofstreamsymbol
[pure virtual]
 

Parse the next symbol from the input, and set symbolnum to the symbol id#,.

Returns:
false *after* end of stream (i.e. first response at end of stream should be the end-of-stream symbol).

Implemented in BitParser, SampleParser, and SubstringParser.

Referenced by OctaneCompressor_Statistical::DoProtectedDecompress().

virtual string OctaneParser::LookupSymbolText int  symbolnum  )  [pure virtual]
 

Helper function to return the text string of a symbol number.

Returns:
a string with the text of the symbol
Note:
this should be the empty string to signify end of stream symbol.

Implemented in BitParser, SampleParser, and SubstringParser.

bool OctaneParser::IsNonWordCharacter unsigned char  c  )  [static, protected]
 

Static helper function to decide whether c is a character found in words, or should be treated as a non-word character.

used by some parsers to differentiate between word characters and separators.

Definition at line 17 of file parser.cpp.

00018 {
00019         // return true if c is a non-word character, ie a delimiter between words
00020         // ATTN: this is a pretty inefficient function; we could use a static lookup table if we wanted to do this fast
00021         if (c==39)
00022                 {
00023                 // apostrophe
00024                 return false;
00025                 }
00026         else if (c<48)
00027                 {
00028                 // punctuation and nonprintables
00029                 return true;
00030                 }
00031         else if (c<58)
00032                 {
00033                 // digits
00034                 return false;
00035                 }
00036         else if (c<65)
00037                 {
00038                 // punctuation
00039                 return true;
00040                 }
00041         else if (c<91)
00042                 {
00043                 // uppercase
00044                 return false;
00045                 }
00046         else if (c<97)
00047                 {
00048                 // punctuation
00049                 return true;
00050                 }
00051         else if (c<123)
00052                 {
00053                 // lowercase
00054                 return false;
00055                 }
00056         else if (c<154)
00057                 {
00058                 // punctuation and nonprintables
00059                 return true;
00060                 }
00061         else
00062                 {
00063                 // ATTN: we've got some non-english characters here which mind form characters in other languages
00064                 // for now we treat as word separators
00065                 return true;
00066                 }
00067 }


The documentation for this class was generated from the following files:  
Generated on 20 May 2004 by doxygen 1.3.3