|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdanbikel.parser.ModelCollection
public class ModelCollection
Provides access to all Model
objects and maps necessary for parsing.
By bundling all of this information together, all of the objects necessary
for parsing can be stored and retrieved simply by serializing and
de-serializing this object to a Java object file. The types of output
elements that are modeled are determined by ProbabilityStructure
objects around which Model
objects are wrapped, by the method ProbabilityStructure.newModel()
. This collection holds ten different Model
objects, each modeling a different output element of this parser
(nonterminal, word, subcategorization frame, etc.) becuase each wraps a
different type of ProbabilityStructure
object. The concrete types of
ProbabilityStructure
objects are determined by various run-time
settings, as described in the documentation for
Settings.globalModelStructureNumber
. The other counts tables,
maps and resources contained in this object are derived by the
Trainer
.
Settings.globalModelStructureNumber
,
Settings.precomputeProbs
,
Settings.writeCanonicalEvents
,
Serialized FormField Summary | |
---|---|
protected static boolean |
callGCAfterReadingObject
Indicates whether to invoke System.gc() after
this object has been de-serialized from a stream. |
protected FlexibleMap |
canonicalEvents
The reflexive map used to canonicalize objects created when deriving counts for all models in this model collection. |
protected Model |
gapModel
The model for generating gaps. |
protected Model |
headModel
The model for generating a head nonterminal given its (lexicalized) parent. |
protected Map |
headToParentMap
A mapping from head labels to possible parent labels. |
protected Map |
leftSubcatMap
A mapping from left subcat-prediction conditioning contexts (typically parent and head nonterminal labels) to all possible subcat frames. |
protected Model |
leftSubcatModel
The model for generating subcats on the left side of the head child. |
protected Model |
lexPriorModel
The model for lexical priors. |
protected Model[] |
modelArr
An array containing all Model objects contained by this
model collection, set up by createModelArray() . |
protected Map |
modNonterminalMap
A mapping from the last level of back-off of modifying nonterminal conditioning contexts to all possible modifying nonterminals. |
protected Model |
modNonterminalModel
The model for generating partially-lexicalized nonterminals that modify the head child. |
protected Model |
modWordModel
The model for generating head words of lexicalized nonterminals that modify the head child. |
protected Symbol[] |
nonterminalArr
An array of all nonterminal labels, providing a mapping of unique integers (indices into this array) to nonterminal labels. |
protected Map |
nonterminalMap
A map from nonterminal labels ( Symbol objects) to
unique integers that are indices in the
nonterminal array. |
protected Model |
nonterminalPriorModel
The model for nonoterminal priors. |
protected CountsTable |
nonterminals
A table that maps unlexicalized nonterminals to their counts in the training corpus. |
protected Map |
posMap
A mapping from lexical items to all of their possible parts of speech. |
protected Set |
prunedPreterms
The set of preterminals pruned during training. |
protected Set |
prunedPunctuation
The set of punctuation preterminals pruned during training. |
protected Map |
rightSubcatMap
A mapping from right subcat-prediction conditioning contexts (typically parent and head nonterminal labels) to all possible subcat frames. |
protected Model |
rightSubcatModel
The model for generating subcats on the right side of the head child. |
protected Map |
simpleModNonterminalMap
A map from unlexicalized parent-head-side triples to all possible partially-lexicalized modifying nonterminals. |
protected Model |
topLexModel
The model for generating the head word and part of speech of observed root nonterminals given the hidden +TOP+ nonterminal. |
protected Model |
topNonterminalModel
The model for generating observed root nonterminals given the hidden +TOP+ nonterminal. |
protected static boolean |
verbose
Indicates whether to output verbose messages to System.err . |
protected CountsTable |
vocabCounter
A table that maps observed words to their counts in the training corpus. |
protected CountsTable |
wordFeatureCounter
A table that maps observed word-feature vectors to their counts in the training corpus. |
Constructor Summary | |
---|---|
ModelCollection()
Constructs a new ModelCollection that initially contains
no data. |
Method Summary | |
---|---|
FlexibleMap |
canonicalEvents()
Returns the reflexive map used to canonicalize objects created when deriving counts for all models in this model collection. |
protected void |
createModelArray()
Populates the modelArr with the Model objects that
are contained in this model collection. |
Model |
gapModel()
Returns the gap-generation model. |
String |
getModelCacheStats()
Invokes Model.getCacheStats() on each Model contained in
this model collection, and returns the results as a single String . |
Symbol[] |
getNonterminalArr()
Returns the nonterminalArr member. |
Map |
getNonterminalMap()
Returns the nonterminalMap member. |
Model |
headModel()
Returns the head-generation model. |
Map |
headToParentMap()
Returns a mapping from head labels to possible parent labels. |
protected void |
internalReadObject(ObjectInputStream s)
Reads an instance of this class from the specified stream. |
protected void |
internalWriteObject(ObjectOutputStream s)
Writes this object to the specified stream. |
Map |
leftSubcatMap()
Returns a mapping from left subcat-prediction conditioning contexts (typically parent and head nonterminal labels) to all possible subcat frames. |
Model |
leftSubcatModel()
Returns the left subcat-generation model. |
Model |
lexPriorModel()
Returns the model for marginal probabilities of lexical elements (for the estimation of the joint event that is a fully lexicalized nonterminal) |
Iterator |
modelIterator()
Syntactic sugar for modelList().iterator() . |
List |
modelList()
Returns an unmodifiable list view of the Model objects contained
in this model collection. |
Map |
modNonterminalMap()
Returns a mapping from the last level of back-off of modifying nonterminal conditioning contexts to all possible modifying nonterminals. |
Model |
modNonterminalModel()
Returns the modifying nonterminal–generation model. |
Model |
modWordModel()
Returns the model that generates head words of modifying nonterminals. |
Model |
nonterminalPriorModel()
Returns the model for conditional probabilities of nonterminals given the lexical components (for the estimation of the joint event that is a fully lexicalized nonterminal) |
CountsTable |
nonterminals()
Returns a mapping of (unlexicalized) nonterminals to their counts in the training data. |
int |
numNonterminals()
Returns the number of unique (unlexicalized) nonterminals observed in the training data. |
Map |
posMap()
Returns a mapping from Symbol objects representing words to
SexpList objects that contain the set of their possible parts of speech
(a list of Symbol ). |
Set |
prunedPreterms()
Returns set of preterminals pruned during training. |
Set |
prunedPunctuation()
Returns set of punctuation preterminals pruned during training. |
Map |
rightSubcatMap()
Returns a mapping from right subcat-prediction conditioning contexts (typically parent and head nonterminal labels) to all possible subcat frames. |
Model |
rightSubcatModel()
Returns the right subcat-generation model. |
void |
set(Model lexPriorModel,
Model nonterminalPriorModel,
Model topNonterminalModel,
Model topLexModel,
Model headModel,
Model gapModel,
Model leftSubcatModel,
Model rightSubcatModel,
Model modNonterminalModel,
Model modWordModel,
CountsTable vocabCounter,
CountsTable wordFeatureCounter,
CountsTable nonterminals,
Map posMap,
Map headToParentMap,
Map leftSubcatMap,
Map rightSubcatMap,
Map modNonterminalMap,
Map simpleModNonterminalMap,
Set prunedPreterms,
Set prunedPunctuation,
FlexibleMap canonicalEvents)
Sets all the data members of this object. |
void |
shareCounts(boolean verbose)
In a dangerous but effective way, this method shares counts for a back-off level from one model with another model; in this case, the last level of back-off from the modWordModel is being shared (i.e., will be
used) as the last level of back-off for topLexModel , as the last
levels of both these models typically estimate
p(w | t). |
Map |
simpleModNonterminalMap()
Returns a map from unlexicalized parent-head-side triples to all possible partially-lexicalized modifying nonterminals. |
Model |
topLexModel()
Returns the head-word generation model for heads of entire sentences. |
Model |
topNonterminalModel()
Returns the head-generation model for heads whose parents are Training.topSym() . |
CountsTable |
vocabCounter()
Returns a mapping from Symbol objects representing words to
their count in the training data. |
CountsTable |
wordFeatureCounter()
Returns a mapping from Symbol objects that are word features
to their count in the training data. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final boolean verbose
System.err
.
The value of this constant is normally true
(this is
research software, after all).
protected static final boolean callGCAfterReadingObject
System.gc()
after
this object has been de-serialized from a stream.
protected transient Model[] modelArr
Model
objects contained by this
model collection, set up by createModelArray()
.
protected transient Model lexPriorModel
protected transient Model nonterminalPriorModel
protected transient Model topNonterminalModel
protected transient Model topLexModel
protected transient Model headModel
protected transient Model gapModel
protected transient Model leftSubcatModel
protected transient Model rightSubcatModel
protected transient Model modNonterminalModel
protected transient Model modWordModel
protected transient CountsTable vocabCounter
protected transient CountsTable wordFeatureCounter
protected transient CountsTable nonterminals
protected transient Map posMap
protected transient Map headToParentMap
protected transient Map leftSubcatMap
protected transient Map rightSubcatMap
protected transient Map modNonterminalMap
protected transient Map simpleModNonterminalMap
protected transient Set prunedPreterms
protected transient Set prunedPunctuation
protected transient FlexibleMap canonicalEvents
protected Map nonterminalMap
Symbol
objects) to
unique integers that are indices in the
nonterminal array.
protected Symbol[] nonterminalArr
nonterminalMap
.
Constructor Detail |
---|
public ModelCollection()
ModelCollection
that initially contains
no data.
Method Detail |
---|
public void set(Model lexPriorModel, Model nonterminalPriorModel, Model topNonterminalModel, Model topLexModel, Model headModel, Model gapModel, Model leftSubcatModel, Model rightSubcatModel, Model modNonterminalModel, Model modWordModel, CountsTable vocabCounter, CountsTable wordFeatureCounter, CountsTable nonterminals, Map posMap, Map headToParentMap, Map leftSubcatMap, Map rightSubcatMap, Map modNonterminalMap, Map simpleModNonterminalMap, Set prunedPreterms, Set prunedPunctuation, FlexibleMap canonicalEvents)
lexPriorModel
- the model for marginal probabilities of
lexical elements (for the estimation of the joint event that is a
fully lexicalized nonterminal)nonterminalPriorModel
- the model for conditional probabilities of
nonterminals given the lexical components (for the estimation of the
joint event that is a fully lexicalized nonterminal)topNonterminalModel
- the head-generation model for heads whose
parents are Training.topSym()
topLexModel
- the head-word generation model for heads of entire
sentencesheadModel
- the head-generation modelgapModel
- the gap-generation modelleftSubcatModel
- the left subcat-generation modelrightSubcatModel
- the right subcat-generation mode,lmodNonterminalModel
- the modifying nonterminal-generation modelmodWordModel
- the modifying word-generation modelvocabCounter
- a table of counts of all "known" words of the
training datawordFeatureCounter
- a table of counts of all word features ("unknown"
words) of the training datanonterminals
- a table of counts of all nonterminals occurring in
the training dataposMap
- a mapping from lexical items to all of their possible parts
of speechleftSubcatMap
- a mapping from left subcat-prediction conditioning
contexts (typically parent and head nonterminal labels) to all possible
subcat framesrightSubcatMap
- a mapping from right subcat-prediction conditioning
contexts (typically parent and head nonterminal labels) to all possible
subcat framesmodNonterminalMap
- a mapping from the last level of back-off of
modifying nonterminal conditioning contexts to all possible modifying
nonterminalssimpleModNonterminalMap
- a mapping from parent-head-side triples
to all possible partially-lexicalized modifying nonterminalsprunedPreterms
- the set of preterminals pruned during trainingprunedPunctuation
- the set of punctuation preterminals pruned
during trainingcanonicalEvents
- the reflexive map used to canonicalize objects
created when deriving counts for all models in this model collectionprotected void createModelArray()
modelArr
with the Model
objects that
are contained in this model collection.
public List modelList()
Model
objects contained
in this model collection.
Model
objects contained
in this model collectionpublic Iterator modelIterator()
modelList().iterator()
.
modelList()
public void shareCounts(boolean verbose)
modWordModel
is being shared (i.e., will be
used) as the last level of back-off for topLexModel
, as the last
levels of both these models typically estimate
p(w | t).
verbose
- indicates whether to print a message to
System.err
Settings.trainerShareCounts
public int numNonterminals()
public Map getNonterminalMap()
nonterminalMap
member.
public Symbol[] getNonterminalArr()
nonterminalArr
member.
public Model lexPriorModel()
public Model nonterminalPriorModel()
public Model topNonterminalModel()
Training.topSym()
.
public Model topLexModel()
public Model headModel()
public Model gapModel()
public Model leftSubcatModel()
public Model rightSubcatModel()
public Model modNonterminalModel()
public Model modWordModel()
public CountsTable vocabCounter()
Symbol
objects representing words to
their count in the training data.
public CountsTable wordFeatureCounter()
Symbol
objects that are word features
to their count in the training data.
public CountsTable nonterminals()
public Map posMap()
Symbol
objects representing words to
SexpList
objects that contain the set of their possible parts of speech
(a list of Symbol
).
public Map headToParentMap()
public Map leftSubcatMap()
public Map rightSubcatMap()
public Map modNonterminalMap()
public Map simpleModNonterminalMap()
public Set prunedPreterms()
public Set prunedPunctuation()
public FlexibleMap canonicalEvents()
public String getModelCacheStats()
Model.getCacheStats()
on each Model
contained in
this model collection, and returns the results as a single String
.
Model.getCacheStats()
strings for the models in this collectionprotected void internalWriteObject(ObjectOutputStream s) throws IOException
s
- the stream to which to write this object
IOException
- if there is a problem writing to the specified streamprotected void internalReadObject(ObjectInputStream s) throws IOException, ClassNotFoundException
s
- the stream from which to read an instance of this class
IOException
- if there is a problem reading from the specified
stream
ClassNotFoundException
- if any of the concrete types that
are in the specified stream cannot be found
|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |