|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdanbikel.parser.ProbabilityStructure
public abstract class ProbabilityStructure
Abstract class to represent the probability structure—the entire
set of of back-off levels, including the top level—for the
estimation of a particular parameter class in the overall parsing
model (using "class" in the statistical, non-Java sense of the
word). Providing this abstract structure is intended to facilitate
the experimentation with differing smoothing or back-off schemes.
Various data members are provided to enable efficient construction
of SexpEvent
objects to represent events in the
back-off scheme, but any class that implements the Event
interface may be used to record events in a concrete subclass of
this class.
Model
object
using a ProbabilityStructure
will be somewhat unpredictable if
the history contexts at the back-off levels do not represent supersets
of one another. That is, the history context at back-off level
i + 1 must be a superset of the context
at back-off level i.
Concurrency note: A separate ProbabiityStructure
object
needs to be constructed for each thread that needs to use its facilities,
to avoid concurrent access and modification of its data members (which
are intended to improve efficiency and are thus not designed for
concurrent access via synchronized
blocks).
Model
,
JointModel
,
Trainer
,
Serialized FormField Summary | |
---|---|
protected Object |
additionalData
Handle onto additional data object for this probability structure, whose value is null if no other data is required for
the concrete probability structure. |
protected static String |
defaultModelClassName
The value off the Settings.defaultModelClass setting. |
protected static Constructor |
defaultModelConstructor
The constructor of the class specified by the Settings.defaultModelClass setting, taking a single
ProbabilityStructure as its only argument. |
protected boolean |
doPruning
Indicates whether certain events/distributions of low or no utility should be pruned from the model using this probability structure. |
double[] |
estimates
An array used only during the computation of top-level probabilities, used to store the ML estimates of all the levels of back-off. |
protected SexpList |
futureList
Deprecated. Ever since the Event and
MutableEvent interfaces were re-worked to include
methods to add and iterate over event components and the
SexpEvent class was retrofitted to these new
specifications, this object became superfluous, as
SexpEvent objects can now be efficiently constructed
directly, by using the SexpEvent.add(Object) method. |
protected MutableEvent[] |
futures
A reusable SexpEvent array to represent futures;
the array will be initialized to have the size of numLevels() . |
protected MutableEvent[] |
futuresWithSubcats
A reusable SexpSubcatEvent array to represent futures;
the array will be initialized to have the size of
numLevels() . |
protected MutableEvent[] |
histories
A reusable SexpEvent array to represent history
contexts; the array will be initialized to have the size of
numLevels() . |
protected MutableEvent[] |
historiesWithSubcats
A reusable SexpSubcatEvent array to represent
histories; the array will be initialized to have the size of
numLevels() . |
protected SexpList |
historyList
Deprecated. Ever since the Event and
MutableEvent interfaces were re-worked to include
methods to add and iterate over event components and the
SexpEvent class was retrofitted to these new
specifications, this object became superfluous, as
SexpEvent objects can now be efficiently constructed
directly, by using the SexpEvent.add(Object) method. |
double[] |
lambdas
An array used only during the computation of top-level probabilities, used to store the lambdas calculated at all the levels of back-off. |
double |
prevHistCount
A temporary value used in the computation of top-level probabilities, used in the computation of lambdas. |
protected int |
topLevelCacheSize
The size of the cache that model's of this probability structure should use for events containing maximal context. |
Transition[] |
transitions
A reusable Transition array to store transitions. |
Constructor Summary | |
---|---|
protected |
ProbabilityStructure()
Usually called implicitly, this constructor initializes the internal, reusable historyList to have an initial capacity of
the return value of maxEventComponents . |
Method Summary | |
---|---|
int |
cacheSize(int level)
Returns the recommended cache size for the specified back-off level of the model that uses this probability structure. |
abstract ProbabilityStructure |
copy()
Returns a deep copy of this object. |
protected String |
defaultSmoothingParamsFilename()
Returns a default name of the smoothing parameters file, which is the value of getClass().getName() + ".smoothingParams" . |
boolean |
doCleanup()
Indicates whether the Model class needs to invoke
its cleanup method at the end of its deriveCounts method. |
protected boolean |
dontAddNewParameters()
Indicates whether this probability structure's associated Model
object should not add new parameters when deriving counts by consulting
the smoothing parameters from smoothingParametersFile() . |
boolean |
doPruning()
Returns whether models using this probability structure should prune parameters. |
Object |
getAdditionalData()
Returns the value of the additionalData member. |
abstract Event |
getFuture(TrainerEvent trainerEvent,
int backOffLevel)
Extracts the future for the specified level of back-off from the specified trainer event. |
abstract Event |
getHistory(TrainerEvent trainerEvent,
int backOffLevel)
Extracts the history context for the specified back-off level from the specified trainer event. |
protected int |
getTopLevelCacheSize()
This method converts the value of the setting named getClass().getName() + ".topLevelCacheSize"
to an integer and returns it. |
Transition |
getTransition(TrainerEvent trainerEvent,
int backOffLevel)
Returns the reusable transition object for the specified back-off level, with its history set to the result of calling getHistory(trainerEvent, backOffLevel) and its
future the result of getFuture(trainerEvent, backOffLevel) . |
ProbabilityStructure[] |
jointModel()
Returns an array of other ProbabilityStructure objects
for use in a JointModel instance, or null
if this probability structure should not be composed with a
JointModel instance. |
double |
lambdaFudge(int backOffLevel)
Returns the "fudge factor" for the lambda computation for backOffLevel . |
double |
lambdaFudgeTerm(int backOffLevel)
Returns the "fudge term" for the lambda computation for backOffLevel . |
double |
lambdaPenalty(int backOffLevel)
Returns the smoothing value to be used with back-off levels whose histories never occurred in training, meaning that 1 minus this value will be the total probability mass for the smoothed estimate at the specified back-off level (resulting in a degenerate model unless this value is zero). |
protected int |
maxEventComponents()
Allows subclasses to specify the maximum number of event components, so that the constructor of this class may pre-allocate space in its internal, reusable MutableEvent objects (used for efficient
event construction). |
Model |
newModel()
Returns a newly-constructed Model object for this
probability structure. |
abstract int |
numLevels()
Returns the number of back-off levels. |
int |
priorLevel()
Returns the level that corresponds to the prior for that which is being predicted (the future); if there is no such level, this method returns -1 (the default implementation returns -1). |
boolean |
removeFuture(int backOffLevel,
Event future)
Indicates that Model.cleanup() , which is invoked at the end
of Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap) ,
can safely remove the specified event from the Model
object's internal counts tables, as the event is not applicable
to any of the probabilities for which the model will produce an estimate. |
boolean |
removeHistory(int backOffLevel,
Event history)
Indicates that Model.cleanup() , which is invoked at the end
of Model.deriveCounts ,
can safely remove the specified event from the Model
object's internal counts tables, as the event is not applicable
to any of the probabilities for which the model will produce an estimate. |
boolean |
removeTransition(int backOffLevel,
Transition transition)
Returns true if the specified transition contains
either a history or future for which removeHistory(int,Event) or removeFuture(int,Event)
returns true , respectively. |
protected boolean |
saveSmoothingParameters()
Indicates that this probability structure's associated Model
object should save the smoothing parameters to the file named by
smoothingParametersFile() when precomputing probabilities during
training. |
void |
setAdditionalData(Object data)
Sets the value of the additionalData member. |
String |
smoothingParametersFile()
Returns the name of the smoothing parameters file, either to be created if saveSmoothingParameters() returns true , or
read from and used if either dontAddNewParameters() or
useSmoothingParameters() return true . |
protected boolean |
useSmoothingParameters()
Indicates whether this probability structure's associated Model
object should use the smoothing parameters contained in the file
smoothingParametersFile() when deriving counts and precomputing
probabilities. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected transient int topLevelCacheSize
cacheSize(int)
protected transient boolean doPruning
getClass().getName()
+ ".doPruning"
. For example, if there is a concrete subclass of this
class named com.pkg.Foo
, and if it is desirable to have
pruning performed for models using the com.pkg.Foo
probability
structure, then the settings file should include the line
com.pkg.Foo.doPruning=true.
protected static String defaultModelClassName
Settings.defaultModelClass
setting.
protected static Constructor defaultModelConstructor
Settings.defaultModelClass
setting, taking a single
ProbabilityStructure
as its only argument.
protected SexpList historyList
Event
and
MutableEvent
interfaces were re-worked to include
methods to add and iterate over event components and the
SexpEvent
class was retrofitted to these new
specifications, this object became superfluous, as
SexpEvent
objects can now be efficiently constructed
directly, by using the SexpEvent.add(Object)
method.SexpEvent
objects of various sizes to represent history contexts.
SexpEvent.add(Object)
,
histories
,
historiesWithSubcats
protected SexpList futureList
Event
and
MutableEvent
interfaces were re-worked to include
methods to add and iterate over event components and the
SexpEvent
class was retrofitted to these new
specifications, this object became superfluous, as
SexpEvent
objects can now be efficiently constructed
directly, by using the SexpEvent.add(Object)
method.SexpEvent
objects of various sizes to represent futures.
SexpEvent.add(Object)
,
futures
,
futuresWithSubcats
protected MutableEvent[] histories
SexpEvent
array to represent history
contexts; the array will be initialized to have the size of
numLevels()
. These objects may be used as the return values of
getHistory(TrainerEvent,int)
.
getHistory(TrainerEvent,int)
protected MutableEvent[] futures
SexpEvent
array to represent futures;
the array will be initialized to have the size of numLevels()
.
These objects may be used as the return values of
getFuture(TrainerEvent,int)
.
getFuture(TrainerEvent,int)
protected MutableEvent[] historiesWithSubcats
SexpSubcatEvent
array to represent
histories; the array will be initialized to have the size of
numLevels()
.
These objects may be used as the return values of
getHistory(TrainerEvent,int)
.
getHistory(TrainerEvent,int)
protected MutableEvent[] futuresWithSubcats
SexpSubcatEvent
array to represent futures;
the array will be initialized to have the size of
numLevels()
. These objects may be used as the return values of
getFuture(TrainerEvent,int)
.
getFuture(TrainerEvent,int)
public Transition[] transitions
Transition
array to store transitions.
The Transition
objects in this array may be used as the
return values of getTransition(TrainerEvent,int)
.
public double[] estimates
Model.estimateLogProb(int,TrainerEvent)
public double[] lambdas
Model.estimateLogProb(int,TrainerEvent)
public double prevHistCount
Model.estimateLogProb(int,TrainerEvent)
protected Object additionalData
null
if no other data is required for
the concrete probability structure.
Constructor Detail |
---|
protected ProbabilityStructure()
historyList
to have an initial capacity of
the return value of maxEventComponents
.
historyList
,
futureList
,
maxEventComponents()
Method Detail |
---|
protected int getTopLevelCacheSize()
getClass().getName() + ".topLevelCacheSize"
to an integer and returns it. This method is used within the
constructor of this abstract class to set the value of the
topLevelCacheSize
data member. Subclasses should override
this method if such a setting may not be available or if a different
mechanism for determining the top-level cache size is desired.
Settings.get(String)
public boolean doPruning()
doPruning
protected String defaultSmoothingParamsFilename()
getClass().getName() + ".smoothingParams"
.
getClass().getName() + ".smoothingParams"
smoothingParametersFile()
public String smoothingParametersFile()
saveSmoothingParameters()
returns true
, or
read from and used if either dontAddNewParameters()
or
useSmoothingParameters()
return true
.
The name of the smoothing file returned by this method is
the value of the setting
getClass().getName() + ".smoothingParametersFile"
,
or the value returned by defaultSmoothingParamsFilename()
if
this property is not set.
saveSmoothingParameters()
,
dontAddNewParameters()
,
useSmoothingParameters()
protected boolean saveSmoothingParameters()
Model
object should save the smoothing parameters to the file named by
smoothingParametersFile()
when precomputing probabilities during
training. If the Settings.precomputeProbs
setting is
false
then the value of this property is ignored.
The default implementation here gets the boolean value of the setting
getClass().getName() + ".saveSoothingParameters"
,
as determined by Boolean.valueOf(String)
.
Model
object should save the smoothing parameters to the file
named by smoothingParametersFile()
smoothingParametersFile()
protected boolean dontAddNewParameters()
Model
object should not add new parameters when deriving counts by consulting
the smoothing parameters from smoothingParametersFile()
.
Specifically, for each history context derived from a TrainerEvent
,
a derived count will only be added for that history context if it has a
non-zero smoothing parameter, as determined by the information contained
in smoothingParametersFile()
. Effectively, when this method
returns true
, it indicates to use the smoothing parameters
contained in smoothingParametersFile()
only to determine
which histories have non-zero smoothing values.
If useSmoothingParameters()
returns true
, then the
all the smoothing parameters contained in
smoothingParametersFile()
will be used, meaning that
no new parameters will be added, making the return value of this
method irrelevant (because it will implicitly be true).
The default implementation here gets the boolean value of the setting
getClass().getName() + ".dontAddNewParameters"
,
as determined by Boolean.valueOf(String)
.
Model
object should not add new parameters when deriving counts by consulting
the smoothing parameters from smoothingParametersFile()
smoothingParametersFile()
,
useSmoothingParameters()
protected boolean useSmoothingParameters()
Model
object should use the smoothing parameters contained in the file
smoothingParametersFile()
when deriving counts and precomputing
probabilities. Note that when this method returns true
, no
new parameters will be added to the model when deriving counts, thus
making the return value of dontAddNewParameters()
irrelevant.
The default implementation here gets the boolean value of the setting
getClass().getName() + ".dontAddNewParameters"
,
as determined by Boolean.valueOf(String)
.
Model
object should use the smoothing parameters contained in the file
smoothingParametersFile()
when deriving counts and precomputing
probabilitiessmoothingParametersFile()
,
dontAddNewParameters()
protected int maxEventComponents()
MutableEvent
objects (used for efficient
event construction). The default implementation simply returns 1.
MutableEvent.ensureCapacity(int)
public Model newModel()
Model
object for this
probability structure. The default implementation here returns
an instance of Model
. If a concrete
ProbabilityStructure
class overrides
jointModel()
, it should use this method to return an
instance of a class that is suitable for handling multiple
ProbabilityStructure
objects, such as JointModel
.
jointModel()
,
Model
,
JointModel
public ProbabilityStructure[] jointModel()
ProbabilityStructure
objects
for use in a JointModel
instance, or null
if this probability structure should not be composed with a
JointModel
instance. This default implementation returns
null
.
ProbabilityStructure
objects, or null
if this probability structure should
not be composed with a JointModel
instanceJointModel
public abstract int numLevels()
public int priorLevel()
public double lambdaFudge(int backOffLevel)
backOffLevel
. The default implementation returns
5.0
.
backOffLevel
- the back-off level for which to return a "fudge
factor"public double lambdaFudgeTerm(int backOffLevel)
backOffLevel
. The default implementation returns
0.0
.
public double lambdaPenalty(int backOffLevel)
public abstract Event getHistory(TrainerEvent trainerEvent, int backOffLevel)
trainerEvent
- the event for which a history context is desired
for the specified back-off levelbackOffLevel
- the back-off level for which to get a history context
from the specified trainer event
Event
object that represents the history context
for the specified back-off levelpublic abstract Event getFuture(TrainerEvent trainerEvent, int backOffLevel)
trainerEvent
- the event from which a future is to be extractedbackOffLevel
- the back-off level for which to get the future event
Event
object that represents the future
for the specified back-off levelpublic Transition getTransition(TrainerEvent trainerEvent, int backOffLevel)
getHistory(trainerEvent, backOffLevel)
and its
future the result of getFuture(trainerEvent, backOffLevel)
.
trainerEvent
- the event from which a transition is to be extractedbackOffLevel
- the back-off level for which to get the transition
public boolean doCleanup()
Model
class needs to invoke
its cleanup method at the end of its deriveCounts
method. The default implementation here returns
false
.
removeHistory(int,Event)
,
removeFuture(int,Event)
,
removeTransition(int,Transition)
,
Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap)
,
Model.cleanup()
public boolean removeHistory(int backOffLevel, Event history)
Model.cleanup()
, which is invoked at the end
of Model.deriveCounts
,
can safely remove the specified event from the Model
object's internal counts tables, as the event is not applicable
to any of the probabilities for which the model will produce an estimate.
The default implementation simply returns false
.
Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap)
,
Model.cleanup()
public boolean removeFuture(int backOffLevel, Event future)
Model.cleanup()
, which is invoked at the end
of Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap)
,
can safely remove the specified event from the Model
object's internal counts tables, as the event is not applicable
to any of the probabilities for which the model will produce an estimate.
The default implementation simply returns false
.
Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap)
,
Model.cleanup()
public boolean removeTransition(int backOffLevel, Transition transition)
true
if the specified transition contains
either a history or future for which removeHistory(int,Event)
or removeFuture(int,Event)
returns true
, respectively.
Model.deriveCounts(CountsTable,danbikel.util.Filter,double,danbikel.util.FlexibleMap)
,
Model.cleanup()
public int cacheSize(int level)
topLevelCacheSize / 2^level
.
topLevelCacheSize
public Object getAdditionalData()
additionalData
member.
additionalData
member.public void setAdditionalData(Object data)
additionalData
member.
data
- an additional data object associated with this probability
structurepublic abstract ProbabilityStructure copy()
ProbabilityStructure
objects are used solely as
temporary storage during certain method invocations; therefore,
this copy method should simply return a new instance of the runtime
type of this ProbabilityStructure
object, with
freshly-created data members that are not deep copies of
the data members of this object. The general contract of the
copy method is slightly violated here, but without undue harm,
given the lack of persistent data of these types of objects. If a
concrete subclass has specific requirements for its data members
to be deeply copied, this method should be overridden.
|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |