Parsing Engine

danbikel.parser
Class ProbabilityStructure

java.lang.Object
  extended by danbikel.parser.ProbabilityStructure
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
BrokenLexPriorModelStructure, BrokenModWordModelStructure, BrokenTopLexModelStructure, GapModelStructure1, HeadModelStructure1, LexPriorModelStructure1, ModNonterminalModelStructure1, ModNonterminalModelStructure2, ModNonterminalModelStructure3, ModNonterminalModelStructure4, ModNonterminalModelStructure6, ModNonterminalModelStructure7, ModNonterminalModelStructure8, ModNonterminalModelStructure9, ModWordModelStructure1, ModWordModelStructure2, ModWordModelStructure3, ModWordModelStructure4, ModWordModelStructure5, ModWordModelStructure6, ModWordModelStructure7, ModWordModelStructure8, ModWordModelStructure9, NonterminalPriorModelStructure1, SubcatModelStructure1, SubcatModelStructure2, TagModelStructure1, TagModelStructure2, TopLexModelStructure1, TopNonterminalModelStructure1

public abstract class ProbabilityStructure
extends Object
implements Serializable

Abstract class to represent the probability structure—the entire set of of back-off levels, including the top level—for the estimation of a particular parameter class in the overall parsing model (using "class" in the statistical, non-Java sense of the word). Providing this abstract structure is intended to facilitate the experimentation with differing smoothing or back-off schemes. Various data members are provided to enable efficient construction of SexpEvent objects to represent events in the back-off scheme, but any class that implements the Event interface may be used to record events in a concrete subclass of this class.

Design note: The probability estimates of a Model object using a ProbabilityStructure will be somewhat unpredictable if the history contexts at the back-off levels do not represent supersets of one another. That is, the history context at back-off level i + 1 must be a superset of the context at back-off level i.

Concurrency note: A separate ProbabiityStructure object needs to be constructed for each thread that needs to use its facilities, to avoid concurrent access and modification of its data members (which are intended to improve efficiency and are thus not designed for concurrent access via synchronized blocks).

See Also:
Model, JointModel, Trainer, Serialized Form

Field Summary
protected  Object additionalData
          Handle onto additional data object for this probability structure, whose value is null if no other data is required for the concrete probability structure.
protected static String defaultModelClassName
          The value off the Settings.defaultModelClass setting.
protected static Constructor defaultModelConstructor
          The constructor of the class specified by the Settings.defaultModelClass setting, taking a single ProbabilityStructure as its only argument.
protected  boolean doPruning
          Indicates whether certain events/distributions of low or no utility should be pruned from the model using this probability structure.
 double[] estimates
          An array used only during the computation of top-level probabilities, used to store the ML estimates of all the levels of back-off.
protected  SexpList futureList
          Deprecated. Ever since the Event and MutableEvent interfaces were re-worked to include methods to add and iterate over event components and the SexpEvent class was retrofitted to these new specifications, this object became superfluous, as SexpEvent objects can now be efficiently constructed directly, by using the SexpEvent.add(Object) method.
protected  MutableEvent[] futures
          A reusable SexpEvent array to represent futures; the array will be initialized to have the size of numLevels().
protected  MutableEvent[] futuresWithSubcats
          A reusable SexpSubcatEvent array to represent futures; the array will be initialized to have the size of numLevels().
protected  MutableEvent[] histories
          A reusable SexpEvent array to represent history contexts; the array will be initialized to have the size of numLevels().
protected  MutableEvent[] historiesWithSubcats
          A reusable SexpSubcatEvent array to represent histories; the array will be initialized to have the size of numLevels().
protected  SexpList historyList
          Deprecated. Ever since the Event and MutableEvent interfaces were re-worked to include methods to add and iterate over event components and the SexpEvent class was retrofitted to these new specifications, this object became superfluous, as SexpEvent objects can now be efficiently constructed directly, by using the SexpEvent.add(Object) method.
 double[] lambdas
          An array used only during the computation of top-level probabilities, used to store the lambdas calculated at all the levels of back-off.
 double prevHistCount
          A temporary value used in the computation of top-level probabilities, used in the computation of lambdas.
protected  int topLevelCacheSize
          The size of the cache that model's of this probability structure should use for events containing maximal context.
 Transition[] transitions
          A reusable Transition array to store transitions.
 
Constructor Summary
protected ProbabilityStructure()
          Usually called implicitly, this constructor initializes the internal, reusable historyList to have an initial capacity of the return value of maxEventComponents.
 
Method Summary
 int cacheSize(int level)
          Returns the recommended cache size for the specified back-off level of the model that uses this probability structure.
abstract  ProbabilityStructure copy()
          Returns a deep copy of this object.
protected  String defaultSmoothingParamsFilename()
          Returns a default name of the smoothing parameters file, which is the value of getClass().getName() + ".smoothingParams".
 boolean doCleanup()
          Indicates whether the Model class needs to invoke its cleanup method at the end of its deriveCounts method.
protected  boolean dontAddNewParameters()
          Indicates whether this probability structure's associated Model object should not add new parameters when deriving counts by consulting the smoothing parameters from smoothingParametersFile().
 boolean doPruning()
          Returns whether models using this probability structure should prune parameters.
 Object getAdditionalData()
          Returns the value of the additionalData member.
abstract  Event getFuture(TrainerEvent trainerEvent, int backOffLevel)
          Extracts the future for the specified level of back-off from the specified trainer event.
abstract  Event getHistory(TrainerEvent trainerEvent, int backOffLevel)
          Extracts the history context for the specified back-off level from the specified trainer event.
protected  int getTopLevelCacheSize()
          This method converts the value of the setting named getClass().getName() + ".topLevelCacheSize" to an integer and returns it.
 Transition getTransition(TrainerEvent trainerEvent, int backOffLevel)
          Returns the reusable transition object for the specified back-off level, with its history set to the result of calling getHistory(trainerEvent, backOffLevel) and its future the result of getFuture(trainerEvent, backOffLevel).
 ProbabilityStructure[] jointModel()
          Returns an array of other ProbabilityStructure objects for use in a JointModel instance, or null if this probability structure should not be composed with a JointModel instance.
 double lambdaFudge(int backOffLevel)
          Returns the "fudge factor" for the lambda computation for backOffLevel.
 double lambdaFudgeTerm(int backOffLevel)
          Returns the "fudge term" for the lambda computation for backOffLevel.
 double lambdaPenalty(int backOffLevel)
          Returns the smoothing value to be used with back-off levels whose histories never occurred in training, meaning that 1 minus this value will be the total probability mass for the smoothed estimate at the specified back-off level (resulting in a degenerate model unless this value is zero).
protected  int maxEventComponents()
          Allows subclasses to specify the maximum number of event components, so that the constructor of this class may pre-allocate space in its internal, reusable MutableEvent objects (used for efficient event construction).
 Model newModel()
          Returns a newly-constructed Model object for this probability structure.
abstract  int numLevels()
          Returns the number of back-off levels.
 int priorLevel()
          Returns the level that corresponds to the prior for that which is being predicted (the future); if there is no such level, this method returns -1 (the default implementation returns -1).
 boolean removeFuture(int backOffLevel, Event future)
          Indicates that Model.cleanup(), which is invoked at the end of Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap), can safely remove the specified event from the Model object's internal counts tables, as the event is not applicable to any of the probabilities for which the model will produce an estimate.
 boolean removeHistory(int backOffLevel, Event history)
          Indicates that Model.cleanup(), which is invoked at the end of Model.deriveCounts, can safely remove the specified event from the Model object's internal counts tables, as the event is not applicable to any of the probabilities for which the model will produce an estimate.
 boolean removeTransition(int backOffLevel, Transition transition)
          Returns true if the specified transition contains either a history or future for which removeHistory(int,Event) or removeFuture(int,Event) returns true, respectively.
protected  boolean saveSmoothingParameters()
          Indicates that this probability structure's associated Model object should save the smoothing parameters to the file named by smoothingParametersFile() when precomputing probabilities during training.
 void setAdditionalData(Object data)
          Sets the value of the additionalData member.
 String smoothingParametersFile()
          Returns the name of the smoothing parameters file, either to be created if saveSmoothingParameters() returns true, or read from and used if either dontAddNewParameters() or useSmoothingParameters() return true.
protected  boolean useSmoothingParameters()
          Indicates whether this probability structure's associated Model object should use the smoothing parameters contained in the file smoothingParametersFile() when deriving counts and precomputing probabilities.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

topLevelCacheSize

protected transient int topLevelCacheSize
The size of the cache that model's of this probability structure should use for events containing maximal context.

See Also:
cacheSize(int)

doPruning

protected transient boolean doPruning
Indicates whether certain events/distributions of low or no utility should be pruned from the model using this probability structure. This variable will be set to the boolean value of the setting getClass().getName() + ".doPruning". For example, if there is a concrete subclass of this class named com.pkg.Foo, and if it is desirable to have pruning performed for models using the com.pkg.Foo probability structure, then the settings file should include the line com.pkg.Foo.doPruning=true.


defaultModelClassName

protected static String defaultModelClassName
The value off the Settings.defaultModelClass setting.


defaultModelConstructor

protected static Constructor defaultModelConstructor
The constructor of the class specified by the Settings.defaultModelClass setting, taking a single ProbabilityStructure as its only argument.


historyList

protected SexpList historyList
Deprecated. Ever since the Event and MutableEvent interfaces were re-worked to include methods to add and iterate over event components and the SexpEvent class was retrofitted to these new specifications, this object became superfluous, as SexpEvent objects can now be efficiently constructed directly, by using the SexpEvent.add(Object) method.
A reusable list to enable efficient construction of SexpEvent objects of various sizes to represent history contexts.

See Also:
SexpEvent.add(Object), histories, historiesWithSubcats

futureList

protected SexpList futureList
Deprecated. Ever since the Event and MutableEvent interfaces were re-worked to include methods to add and iterate over event components and the SexpEvent class was retrofitted to these new specifications, this object became superfluous, as SexpEvent objects can now be efficiently constructed directly, by using the SexpEvent.add(Object) method.
A reusable list to enable efficient construction of SexpEvent objects of various sizes to represent futures.

See Also:
SexpEvent.add(Object), futures, futuresWithSubcats

histories

protected MutableEvent[] histories
A reusable SexpEvent array to represent history contexts; the array will be initialized to have the size of numLevels(). These objects may be used as the return values of getHistory(TrainerEvent,int).

See Also:
getHistory(TrainerEvent,int)

futures

protected MutableEvent[] futures
A reusable SexpEvent array to represent futures; the array will be initialized to have the size of numLevels(). These objects may be used as the return values of getFuture(TrainerEvent,int).

See Also:
getFuture(TrainerEvent,int)

historiesWithSubcats

protected MutableEvent[] historiesWithSubcats
A reusable SexpSubcatEvent array to represent histories; the array will be initialized to have the size of numLevels(). These objects may be used as the return values of getHistory(TrainerEvent,int).

See Also:
getHistory(TrainerEvent,int)

futuresWithSubcats

protected MutableEvent[] futuresWithSubcats
A reusable SexpSubcatEvent array to represent futures; the array will be initialized to have the size of numLevels(). These objects may be used as the return values of getFuture(TrainerEvent,int).

See Also:
getFuture(TrainerEvent,int)

transitions

public Transition[] transitions
A reusable Transition array to store transitions. The Transition objects in this array may be used as the return values of getTransition(TrainerEvent,int).


estimates

public double[] estimates
An array used only during the computation of top-level probabilities, used to store the ML estimates of all the levels of back-off.

See Also:
Model.estimateLogProb(int,TrainerEvent)

lambdas

public double[] lambdas
An array used only during the computation of top-level probabilities, used to store the lambdas calculated at all the levels of back-off.

See Also:
Model.estimateLogProb(int,TrainerEvent)

prevHistCount

public double prevHistCount
A temporary value used in the computation of top-level probabilities, used in the computation of lambdas.

See Also:
Model.estimateLogProb(int,TrainerEvent)

additionalData

protected Object additionalData
Handle onto additional data object for this probability structure, whose value is null if no other data is required for the concrete probability structure.

Constructor Detail

ProbabilityStructure

protected ProbabilityStructure()
Usually called implicitly, this constructor initializes the internal, reusable historyList to have an initial capacity of the return value of maxEventComponents.

See Also:
historyList, futureList, maxEventComponents()
Method Detail

getTopLevelCacheSize

protected int getTopLevelCacheSize()
This method converts the value of the setting named getClass().getName() + ".topLevelCacheSize" to an integer and returns it. This method is used within the constructor of this abstract class to set the value of the topLevelCacheSize data member. Subclasses should override this method if such a setting may not be available or if a different mechanism for determining the top-level cache size is desired.

See Also:
Settings.get(String)

doPruning

public boolean doPruning()
Returns whether models using this probability structure should prune parameters.

Returns:
whether models using this probability structure should prune parameters.
See Also:
doPruning

defaultSmoothingParamsFilename

protected String defaultSmoothingParamsFilename()
Returns a default name of the smoothing parameters file, which is the value of getClass().getName() + ".smoothingParams".

Returns:
a default name of the smoothing parameters file, which is the value of getClass().getName() + ".smoothingParams"
See Also:
smoothingParametersFile()

smoothingParametersFile

public String smoothingParametersFile()
Returns the name of the smoothing parameters file, either to be created if saveSmoothingParameters() returns true, or read from and used if either dontAddNewParameters() or useSmoothingParameters() return true.

The name of the smoothing file returned by this method is the value of the setting getClass().getName() + ".smoothingParametersFile", or the value returned by defaultSmoothingParamsFilename() if this property is not set.

Returns:
the name of the smoothing parameters file, either to be created or read from and used in a training run
See Also:
saveSmoothingParameters(), dontAddNewParameters(), useSmoothingParameters()

saveSmoothingParameters

protected boolean saveSmoothingParameters()
Indicates that this probability structure's associated Model object should save the smoothing parameters to the file named by smoothingParametersFile() when precomputing probabilities during training. If the Settings.precomputeProbs setting is false then the value of this property is ignored.

The default implementation here gets the boolean value of the setting getClass().getName() + ".saveSoothingParameters", as determined by Boolean.valueOf(String).

Returns:
whether or not this probability structure's associated Model object should save the smoothing parameters to the file named by smoothingParametersFile()
See Also:
smoothingParametersFile()

dontAddNewParameters

protected boolean dontAddNewParameters()
Indicates whether this probability structure's associated Model object should not add new parameters when deriving counts by consulting the smoothing parameters from smoothingParametersFile(). Specifically, for each history context derived from a TrainerEvent, a derived count will only be added for that history context if it has a non-zero smoothing parameter, as determined by the information contained in smoothingParametersFile(). Effectively, when this method returns true, it indicates to use the smoothing parameters contained in smoothingParametersFile() only to determine which histories have non-zero smoothing values. If useSmoothingParameters() returns true, then the all the smoothing parameters contained in smoothingParametersFile() will be used, meaning that no new parameters will be added, making the return value of this method irrelevant (because it will implicitly be true).

The default implementation here gets the boolean value of the setting getClass().getName() + ".dontAddNewParameters", as determined by Boolean.valueOf(String).

Returns:
whether this probability structure's associated Model object should not add new parameters when deriving counts by consulting the smoothing parameters from smoothingParametersFile()
See Also:
smoothingParametersFile(), useSmoothingParameters()

useSmoothingParameters

protected boolean useSmoothingParameters()
Indicates whether this probability structure's associated Model object should use the smoothing parameters contained in the file smoothingParametersFile() when deriving counts and precomputing probabilities. Note that when this method returns true, no new parameters will be added to the model when deriving counts, thus making the return value of dontAddNewParameters() irrelevant.

The default implementation here gets the boolean value of the setting getClass().getName() + ".dontAddNewParameters", as determined by Boolean.valueOf(String).

Returns:
whether this probability structure's associated Model object should use the smoothing parameters contained in the file smoothingParametersFile() when deriving counts and precomputing probabilities
See Also:
smoothingParametersFile(), dontAddNewParameters()

maxEventComponents

protected int maxEventComponents()
Allows subclasses to specify the maximum number of event components, so that the constructor of this class may pre-allocate space in its internal, reusable MutableEvent objects (used for efficient event construction). The default implementation simply returns 1.

Returns:
1 (subclasses should override this method)
See Also:
MutableEvent.ensureCapacity(int)

newModel

public Model newModel()
Returns a newly-constructed Model object for this probability structure. The default implementation here returns an instance of Model. If a concrete ProbabilityStructure class overrides jointModel(), it should use this method to return an instance of a class that is suitable for handling multiple ProbabilityStructure objects, such as JointModel.

See Also:
jointModel(), Model, JointModel

jointModel

public ProbabilityStructure[] jointModel()
Returns an array of other ProbabilityStructure objects for use in a JointModel instance, or null if this probability structure should not be composed with a JointModel instance. This default implementation returns null.

Returns:
an array of other ProbabilityStructure objects, or null if this probability structure should not be composed with a JointModel instance
See Also:
JointModel

numLevels

public abstract int numLevels()
Returns the number of back-off levels.


priorLevel

public int priorLevel()
Returns the level that corresponds to the prior for that which is being predicted (the future); if there is no such level, this method returns -1 (the default implementation returns -1).


lambdaFudge

public double lambdaFudge(int backOffLevel)
Returns the "fudge factor" for the lambda computation for backOffLevel. The default implementation returns 5.0.

Parameters:
backOffLevel - the back-off level for which to return a "fudge factor"

lambdaFudgeTerm

public double lambdaFudgeTerm(int backOffLevel)
Returns the "fudge term" for the lambda computation for backOffLevel. The default implementation returns 0.0.


lambdaPenalty

public double lambdaPenalty(int backOffLevel)
Returns the smoothing value to be used with back-off levels whose histories never occurred in training, meaning that 1 minus this value will be the total probability mass for the smoothed estimate at the specified back-off level (resulting in a degenerate model unless this value is zero). From another perspective, this method returns the confidence that the raw maximum-likelihood estimate for this back-off level should be zero given that this history was never seen during training.

By default this method returns 0.0 for all back-off levels.


getHistory

public abstract Event getHistory(TrainerEvent trainerEvent,
                                 int backOffLevel)
Extracts the history context for the specified back-off level from the specified trainer event.

Parameters:
trainerEvent - the event for which a history context is desired for the specified back-off level
backOffLevel - the back-off level for which to get a history context from the specified trainer event
Returns:
an Event object that represents the history context for the specified back-off level

getFuture

public abstract Event getFuture(TrainerEvent trainerEvent,
                                int backOffLevel)
Extracts the future for the specified level of back-off from the specified trainer event. Typically, futures remain the same regardless of back-off level.

Parameters:
trainerEvent - the event from which a future is to be extracted
backOffLevel - the back-off level for which to get the future event
Returns:
an Event object that represents the future for the specified back-off level

getTransition

public Transition getTransition(TrainerEvent trainerEvent,
                                int backOffLevel)
Returns the reusable transition object for the specified back-off level, with its history set to the result of calling getHistory(trainerEvent, backOffLevel) and its future the result of getFuture(trainerEvent, backOffLevel).

Parameters:
trainerEvent - the event from which a transition is to be extracted
backOffLevel - the back-off level for which to get the transition
Returns:
the reusable transition object containing the history and future of the specified back-off level

doCleanup

public boolean doCleanup()
Indicates whether the Model class needs to invoke its cleanup method at the end of its deriveCounts method. The default implementation here returns false.

See Also:
removeHistory(int,Event), removeFuture(int,Event), removeTransition(int,Transition), Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap), Model.cleanup()

removeHistory

public boolean removeHistory(int backOffLevel,
                             Event history)
Indicates that Model.cleanup(), which is invoked at the end of Model.deriveCounts, can safely remove the specified event from the Model object's internal counts tables, as the event is not applicable to any of the probabilities for which the model will produce an estimate.

The default implementation simply returns false.

See Also:
Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap), Model.cleanup()

removeFuture

public boolean removeFuture(int backOffLevel,
                            Event future)
Indicates that Model.cleanup(), which is invoked at the end of Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap), can safely remove the specified event from the Model object's internal counts tables, as the event is not applicable to any of the probabilities for which the model will produce an estimate.

The default implementation simply returns false.

See Also:
Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap), Model.cleanup()

removeTransition

public boolean removeTransition(int backOffLevel,
                                Transition transition)
Returns true if the specified transition contains either a history or future for which removeHistory(int,Event) or removeFuture(int,Event) returns true, respectively.

See Also:
Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap), Model.cleanup()

cacheSize

public int cacheSize(int level)
Returns the recommended cache size for the specified back-off level of the model that uses this probability structure. This default implementation simply returns topLevelCacheSize / 2^level.

See Also:
topLevelCacheSize

getAdditionalData

public Object getAdditionalData()
Returns the value of the additionalData member.

Returns:
the value of the additionalData member.

setAdditionalData

public void setAdditionalData(Object data)
Sets the value of the additionalData member.

Parameters:
data - an additional data object associated with this probability structure

copy

public abstract ProbabilityStructure copy()
Returns a deep copy of this object. Currently, all data members of ProbabilityStructure objects are used solely as temporary storage during certain method invocations; therefore, this copy method should simply return a new instance of the runtime type of this ProbabilityStructure object, with freshly-created data members that are not deep copies of the data members of this object. The general contract of the copy method is slightly violated here, but without undue harm, given the lack of persistent data of these types of objects. If a concrete subclass has specific requirements for its data members to be deeply copied, this method should be overridden.


Parsing Engine

Author: Dan Bikel.