|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdanbikel.parser.Decoder
public class Decoder
Provides the methods necessary to perform CKY parsing on input sentences.
Nested Class Summary | |
---|---|
protected static class |
Decoder.TimeoutException
Exception to be thrown when the maximum parse time has been reached. |
Field Summary | |
---|---|
protected Map |
canonicalPrevModLists
A reflexive map in which to store canonical versions of SexpList
objects that represent unlexicalized previous modifier lists. |
protected Map |
canonicalWords
A reflexive map of Word objects, for getting a canonical version. |
protected int |
cellLimit
The cell limit for the parsing chart (stored here for debugging). |
protected CKYChart |
chart
The parsing chart. |
protected boolean[] |
commaForPruning
A reusable array for storing which words are considered commas for the comma-pruning constraint. |
protected boolean[] |
conjForPruning
A reusable array for storing which words are considered conjunctions for the conjunction-pruning constraint. |
protected ConstraintSet |
constraints
Caches the ConstraintSet, if any, for the current sentence. |
protected List |
currItemsAdded
One of a pair of lists used by addUnariesAndStopProbs(int, int) . |
protected boolean |
dontPostProcess
Indicates whether to perform post-processing on a tree after parsing, that is, whether to invoke Training.postProcess(Sexp)
on the tree. |
protected boolean |
downcaseWords
The boolean value of the Settings.downcaseWords setting. |
protected Subcat |
emptySubcat
An instance of an empty subcat, for use when constructing lookup events. |
protected static PrintWriter |
err
A writer wrapped around System.err for error messages that might
contain encoding-specific characters. |
protected boolean |
findAtLeastOneSatisfyingConstraint
Caches the value of ConstraintSet.findAtLeastOneSatisfying() ,
if there are constraints for the current sentence; otherwise, this
data member will be set to false. |
protected boolean |
hardConstraints
The boolean to indicate whether to allow probability estimates equal to Constants.logOfZero and to allow other hard constraints (that
amount to implicit log of zero probability estimates). |
protected Map |
headToParentMap
A map from futures of the last back-off level of the head generation model to possible history contexts. |
protected int |
id
The id of the parsing client that is using this decoder. |
protected boolean |
isomorphicTreeConstraints
Caches whether or not the ConstraintSet for the current sentence requires a tree that is isomorphic to the tree of constraints. |
protected int |
kBest
The maximum number of top-scoring parses for the various parse methods to return. |
protected boolean |
keepAllWords
Cached value of Settings.keepAllWords , for efficiency and
convenience. |
protected static boolean |
LEFT
The value of Constants.LEFT cached for better readability. |
protected Map |
leftSubcatMap
A map from contexts of the last back-off level of the left subcat generation model to possible subcats. |
protected ProbabilityStructure |
leftSubcatPS
The left subcat generation model structure. |
protected int |
leftSubcatPSLastLevel
The last level of back-off in the left subcat generation model structure. |
protected static double |
logOfZero
The value of Constants.logOfZero cached for readability. |
protected static double |
logProbCertain
The value of Constants.logProbCertain cached for readability. |
protected HeadEvent |
lookupHeadEvent
A reusable HeadEvent object for look-ups in tables. |
protected ModifierEvent |
lookupLeftStopEvent
A reusable ModifierEvent object for look-ups in tables. |
protected ModifierEvent |
lookupModEvent
A reusable ModifierEvent object for look-ups in tables. |
protected PriorEvent |
lookupPriorEvent
A reusable PriorEvent object for look-ups in tables. |
protected ModifierEvent |
lookupRightStopEvent
A reusable ModifierEvent object for look-ups in tables. |
protected Subcat |
lookupSubcat
A (currently unused) reusable lookup object. |
protected Word |
lookupWord
A lookup Word object, for obtaining a canonical version. |
protected int |
maxParseTime
The timer (used when Settings.maxParseTime is greater than zero). |
protected double |
maxPruneFact
The maximum prune factor (for beam-widening). |
protected int |
maxSentLen
The maximum length of sentences to be parsed. |
protected Map |
modNonterminalMap
A map from contexts of the last back-off level of the modifying nonterminal generation model to possible modifying nonterminal labels. |
protected ProbabilityStructure |
modNonterminalPS
The modifying nonterminal generation model structure. |
protected int |
modNonterminalPSLastLevel
The last level of back-off in the modifying nonterminal generation model structure. |
protected Symbol[] |
nonterminals
An array of all nonterminals observed in training, that is initialized and filled in at construction time. |
protected int |
numPrevMods
The value of the setting Settings.numPrevMods . |
protected int |
numPrevWords
The value of the setting Settings.numPrevWords . |
protected SexpList |
originalSentence
The original sentence, before preprocessing. |
protected SexpList |
originalTags
The original tag list, before preprocessing. |
protected SexpList |
originalWords
The original sentence, but with word removed to match pre-processing. |
protected SexpList |
parentHeadSideLookupList
A reusable object used for constructing parent-head-side triples when employing the simpler of two methods for determining whether a particular modifier is possible in the context of a particular parent-head-side combination. |
protected SexpList |
partiallyLexedModLookupList
A reusable object used for constructing a partially-lexicalized modifier nonterminal when employing the simpler of two methods for determining whether a particular modifier is possible in the context of a particular parent-head-side combination. |
protected Map |
posMap
The map from vocabulary items to their possible parts of speech. |
protected Set |
posSet
The set of possible parts of speech, derived from posMap . |
protected Map |
posToExampleWordMap
A cache derived from posMap that is a map of (presumably
closed-class) parts of speech to random example words observed with
the part of speech from which they are mapped. |
protected List |
prevItemsAdded
One of a pair of lists used by addUnariesAndStopProbs(int, int) . |
protected SexpList |
prevModLookupList
A reusable object for constructing previous modifier lists for chart items. |
protected WordList |
prevModWordLeftLookupList
A reusable object for constructing previous left-modifier word lists for chart items. |
protected WordList |
prevModWordRightLookupList
A reusable object for constructing previous right-modifier word lists for chart items. |
protected Map |
prunedPretermsPosMap
A map of each word pruned during training to its set of part-of-speech tags observed with its pruned instances. |
protected Set |
prunedPretermsPosSet
The set of part-of-speech tags of words pruned during training. |
protected Map |
prunedPunctuationPosMap
A map of each punctuation word that was pruned during training to the set of its parts of speech observed with the pruned instances. |
protected double |
pruneFact
The prune factor for the parsing chart (stored here for debugging). |
protected double |
pruneFactIncrement
The prune factor increment used when doing beam-widening. |
protected boolean |
relaxConstraints
The value of Settings.decoderRelaxConstraintsAfterBeamWidening ,
cached here for readability and convenience. |
protected boolean |
restorePrunedWords
The value of the Settings.restorePrunedWords setting. |
protected static boolean |
RIGHT
The value of Constants.RIGHT cached for better readability. |
protected Map |
rightSubcatMap
A map from contexts of the last back-off level of the right subcat generation model to possible subcats. |
protected ProbabilityStructure |
rightSubcatPS
The right subcat generation model structure. |
protected int |
rightSubcatPSLastLevel
The last level of back-off in the right subcat generation model structure. |
protected SexpList |
sentence
The current sentence. |
protected int |
sentenceIdx
The current sentence index for this decoder (starts at 0). |
protected int |
sentLen
The length of the current sentence, cached here for convenience. |
protected DecoderServerRemote |
server
The server for this decoder. |
protected Map |
simpleModNonterminalMap
A map from unlexicalized parent-head-side triples to all possible partially-lexicalized modifying nonterminals. |
protected SexpList |
startList
A list containing only Training.startSym() , which is the
type of list that should be used when there are zero real previous
modifiers (to start the Markov modifier process). |
protected Symbol |
startSym
The value of Training.startSym() , cached here for efficiency
and convenience. |
protected Word |
startWord
The value of Training.startWord() , cached here for efficiency
and convenience. |
protected WordList |
startWordList
A list containing only Training.startWord() , which is the
type of list that should be used when there are zero real previous
modifiers (to start the Markov modifier process). |
protected List |
stopProbItemsToAdd
A temporary storage area used by addStopProbs(danbikel.parser.CKYItem, java.util.List) for storing
items to be added to the chart when iterating over a cell in the chart. |
protected Symbol |
stopSym
The value of Training.stopSym() , cached here for efficiency
and convenience. |
protected Word |
stopWord
The value of Training.stopWord() , cached here for efficiency
and convenience. |
protected boolean |
substituteWordsForClosedClassTags
The boolean value of the Settings.decoderSubstituteWordsForClosedClassTags setting. |
protected Time |
time
An object for keeping track of wall-clock time. |
protected SLNode |
tmpChildrenList
A reusable list node for use by getPrevMods(danbikel.parser.CKYItem, danbikel.util.SLNode) and getPrevModWords(danbikel.parser.CKYItem, danbikel.util.SLNode, boolean) . |
protected List |
topProbItemsToAdd
A temporary storage area used by addTopUnaries(int) for storing
items to be added to the chart when iterating over a cell in the chart. |
protected Symbol |
topSym
The value of Training.topSym() , cached here for efficiency
and convenience. |
protected List |
unaryItemsToAdd
A temporary storage area used by addUnaries(danbikel.parser.CKYItem, java.util.List) for storing
items to be added to the chart when iterating over a cell in the chart. |
protected boolean |
useCommaConstraint
The boolean value of Settings.decoderUseCommaConstraint . |
protected boolean |
useHeadToParentMap
The boolean value of the Settings.decoderUseHeadToParentMap
setting. |
protected boolean |
useLowFreqTags
The boolean value of the Settings.useLowFreqTags setting. |
protected boolean |
useOnlySuppliedTags
The boolean value of the Settings.decoderUseOnlySuppliedTags
setting. |
protected boolean |
useSimpleModNonterminalMap
The boolean value of the Settings.useSimpleModNonterminalMap
setting. |
protected Set |
wordSet
A reusable set for storing Word objects, used when seeding
the chart in initialize(danbikel.lisp.SexpList) . |
protected static Subcat[] |
zeroSubcatArr
An array of Subcat of length zero. |
Constructor Summary | |
---|---|
Decoder(int id,
DecoderServerRemote server)
Constructs a new decoder that will use the specified DecoderServer to get all information and probabilities
required for decoding (parsing). |
Method Summary | |
---|---|
protected List |
addStopProbs(CKYItem item,
List itemsAdded)
Adds stop probabilities to the specified item and adds these items to the chart; as a side effect, all items successfully added to the chart are also stored in the specified itemsAdded list. |
protected void |
addTopUnaries(int end)
Adds hiden root nonterminal probabilities. |
protected List |
addUnaries(CKYItem item,
List itemsAdded)
Finds all possible parent-head (or unary) productions using the root node of the specified chart item as the head, creates new items based on the specified item, multiplying in the parent-head probability. |
protected void |
addUnariesAndStopProbs(int start,
int end)
Finds all possible parent-head (or unary) productions using the root node of each existing chart item within the specified span as the head, creates new items based on these existing items, multiplying in the parent-head probability; then, using these new items, this method also creates additional new items in which stop probabilities have been multiplied; all new items are added to the chart. |
protected boolean |
commaConstraintViolation(int start,
int split,
int end)
There is a comma contraint violation if the word at the split point is a comma and there exists a word following end and that
word is not a comma and when it is not the case that the word at
end is not a conunction. |
protected void |
complete(int start,
int end)
Constructs all possible items spanning the specified indices and adds them to the chart. |
protected void |
convertHeadToParentMap()
Converts the values of the read-only headToParentMap from Set objects to arrays of Symbol , that is, arrays of type
Symbol[] . |
protected void |
convertSubcatMap(Map subcatMap)
Helper method used by convertSubcatMaps() . |
protected void |
convertSubcatMaps()
This helper method used by constructor converts the values of the subcat maps from Set objects (containing Subcat
objects) to Subcat arrays, that is, objects of type
Subcat[] . |
protected boolean |
derivationOrderOK(CKYItem modificand,
boolean modifySide)
Enforces that modificand receives all its right modifiers before receiving any left modifiers, by ensuring that right-modification only happens when a modificand has no left-children (this is both necessary and sufficient to enforce derivation order). |
protected Word |
getCanonicalWord(Word lookup)
Gets the canonical Word object for the specified object. |
protected Symbol |
getExampleWordForTag(Symbol tag)
Returns a known word that was observed with the specified part of speech tag. |
protected Subcat[] |
getPossibleSubcats(Map subcatMap,
HeadEvent headEvent,
ProbabilityStructure subcatPS,
int lastLevel)
Gets all possible Subcat s for the context contained in the
specified HeadEvent . |
protected SexpList |
getPrevMods(CKYItem item,
SLNode modChildren)
Creates a new previous-modifier list given the specified current list and the last modifier on a particular side. |
protected WordList |
getPrevModWords(CKYItem item,
SLNode modChildren,
boolean side)
Creates a new previous-modifier word list given the specified current list and the last modifier on a particular side. |
protected SexpList |
getTagSet(SexpList tags,
int wordIdx,
Symbol word,
boolean wordIsUnknown,
Symbol origWord,
HashSet tmpSet)
Gets the set of possible part-of-speech tags for a word in the sentence to be parsed. |
protected void |
initialize(SexpList sentence)
Initializes the chart for parsing the specified sentence. |
protected void |
initialize(SexpList sentence,
SexpList tags)
Initializes the chart for parsing the specified sentence, using the specified coordinated list of part-of-speech tags when assigning parts of speech to unknown words. |
protected boolean |
isPuncRaiseWord(Sexp word)
Returns whether the specified word was raised as part of the punctuation-raising procedure performed during training. |
protected void |
joinItems(CKYItem modificand,
CKYItem modifier,
boolean side)
Joins two chart items, one representing the modificand that has not yet received its stop probabilities, the other representing the modifier that has received its stop probabilities. |
protected Sexp |
parse(SexpList sentence)
Parses the specified sentence. |
protected Sexp |
parse(SexpList sentence,
SexpList tags)
Parses the specified sentence using the supplied list of part-of-speech tags. |
protected Sexp |
parse(SexpList sentence,
SexpList tags,
ConstraintSet constraints)
Parses the specified sentence using the supplied list of part-of-speech tags and the supplied set of parsing constraints. |
protected void |
postProcess(Sexp tree)
Performs post-processing on a sentence that has been parsed. |
protected void |
preProcess(SexpList sentence,
SexpList tags)
Performs all preprocessing to the specified coordinated lists of words and part-of-speech tags of the sentence that is about to be parsed. |
protected void |
removeWord(SexpList sentence,
SexpList tags,
int i)
A helper method used by preProcess(danbikel.lisp.SexpList, danbikel.lisp.SexpList) that removes words from
the specified sentence and originalWords lists, and also
from the specified tags list, if it is not null . |
protected int |
restoreOriginalWords(Sexp tree,
int wordIdx)
Restores the original words in the current sentence. |
protected void |
restorePrunedWords(Sexp tree)
Restores pruned words to a parsed sentence. |
protected int |
restorePrunedWordsRecursive(Sexp tree,
int wordIdx)
The recursive helper method for restorePrunedWords(Sexp) . |
protected void |
seedChart(Symbol word,
int wordIdx,
Symbol features,
boolean neverObserved,
SexpList tagSet,
boolean wordIsUnknown,
Symbol origWord,
ConstraintSet constraints)
Adds a chart item for every possible part of speech for the specified word at the specified index in the current sentence. |
protected void |
setCommaConstraintData()
Caches the locations of commas to be used for the comma constraint in the boolean array commaForPruning . |
protected SexpList |
setUnion(SexpList l1,
SexpList l2,
Set tmpSet)
Returns a new list that is the union of the two specified lists. |
void |
update(Map<String,String> changedSettings)
Invoked by this class to notify the requesting class that one or more settings have changed. |
protected void |
wrapCachingServer()
Wraps the normal DecoderServerRemote instance in a caching
version. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final boolean LEFT
Constants.LEFT
cached for better readability.
protected static final boolean RIGHT
Constants.RIGHT
cached for better readability.
protected static final double logOfZero
Constants.logOfZero
cached for readability.
protected static final double logProbCertain
Constants.logProbCertain
cached for readability.
protected static final Subcat[] zeroSubcatArr
Subcat
of length zero.
getPossibleSubcats(java.util.Map,HeadEvent,ProbabilityStructure,int)
protected static PrintWriter err
System.err
for error messages that might
contain encoding-specific characters. The encoding of the writer
is Language.encoding()
.
protected final SexpList startList
Training.startSym()
, which is the
type of list that should be used when there are zero real previous
modifiers (to start the Markov modifier process).
Trainer.newStartList()
protected final WordList startWordList
Training.startWord()
, which is the
type of list that should be used when there are zero real previous
modifiers (to start the Markov modifier process).
Trainer.newStartWordList()
protected int id
protected DecoderServerRemote server
protected int sentenceIdx
protected SexpList sentence
protected int sentLen
protected int maxSentLen
Settings.maxSentLen
protected int kBest
parse
methods to return.
Settings.kBest
protected int maxParseTime
Settings.maxParseTime
protected Time time
protected CKYChart chart
protected Map posMap
protected Map posToExampleWordMap
posMap
that is a map of (presumably
closed-class) parts of speech to random example words observed with
the part of speech from which they are mapped.
protected Set posSet
posMap
.
protected Symbol[] nonterminals
protected Map headToParentMap
protected Map leftSubcatMap
protected Map rightSubcatMap
protected ProbabilityStructure leftSubcatPS
protected int leftSubcatPSLastLevel
protected ProbabilityStructure rightSubcatPS
protected int rightSubcatPSLastLevel
protected Map modNonterminalMap
protected Map simpleModNonterminalMap
protected ProbabilityStructure modNonterminalPS
protected int modNonterminalPSLastLevel
protected Map prunedPretermsPosMap
Training.removeWord
.
decod * @see DecoderServerRemote#prunedPreterms()
protected Set prunedPretermsPosSet
Training.removeWord
.
DecoderServerRemote.prunedPreterms()
protected Map prunedPunctuationPosMap
DecoderServerRemote.prunedPunctuation()
protected int cellLimit
protected double pruneFact
protected double maxPruneFact
Settings.decoderMaxPruneFactor
protected double pruneFactIncrement
Settings.decoderPruneFactorIncrement
protected boolean relaxConstraints
Settings.decoderRelaxConstraintsAfterBeamWidening
,
cached here for readability and convenience.
protected boolean hardConstraints
Constants.logOfZero
and to allow other hard constraints (that
amount to implicit log of zero probability estimates). If false,
all estimates equal to Constants.logOfZero
are modified to be
Constants.logProbSmall
and all other hard constraints except the
comma-pruning constraint are relaxed. This data member is true
by default, but is temporarily set to false by the decoder when no
parse is produced after all beam widening.
protected SexpList originalSentence
protected SexpList originalTags
protected boolean restorePrunedWords
Settings.restorePrunedWords
setting.
protected SexpList originalWords
protected Subcat emptySubcat
protected boolean downcaseWords
Settings.downcaseWords
setting.
protected boolean useLowFreqTags
Settings.useLowFreqTags
setting.
protected boolean substituteWordsForClosedClassTags
Settings.decoderSubstituteWordsForClosedClassTags
setting.
protected boolean useOnlySuppliedTags
Settings.decoderUseOnlySuppliedTags
setting.
protected boolean useHeadToParentMap
Settings.decoderUseHeadToParentMap
setting.
protected boolean useSimpleModNonterminalMap
Settings.useSimpleModNonterminalMap
setting.
protected Symbol startSym
Training.startSym()
, cached here for efficiency
and convenience.
protected Word startWord
Training.startWord()
, cached here for efficiency
and convenience.
protected Symbol stopSym
Training.stopSym()
, cached here for efficiency
and convenience.
protected Word stopWord
Training.stopWord()
, cached here for efficiency
and convenience.
protected Symbol topSym
Training.topSym()
, cached here for efficiency
and convenience.
protected int numPrevMods
Settings.numPrevMods
.
protected int numPrevWords
Settings.numPrevWords
.
protected List prevItemsAdded
addUnariesAndStopProbs(int, int)
.
protected List currItemsAdded
addUnariesAndStopProbs(int, int)
.
protected List topProbItemsToAdd
addTopUnaries(int)
for storing
items to be added to the chart when iterating over a cell in the chart.
protected List unaryItemsToAdd
addUnaries(danbikel.parser.CKYItem, java.util.List)
for storing
items to be added to the chart when iterating over a cell in the chart.
protected List stopProbItemsToAdd
addStopProbs(danbikel.parser.CKYItem, java.util.List)
for storing
items to be added to the chart when iterating over a cell in the chart.
protected PriorEvent lookupPriorEvent
PriorEvent
object for look-ups in tables.
protected HeadEvent lookupHeadEvent
HeadEvent
object for look-ups in tables.
protected ModifierEvent lookupModEvent
ModifierEvent
object for look-ups in tables.
protected ModifierEvent lookupLeftStopEvent
ModifierEvent
object for look-ups in tables.
protected ModifierEvent lookupRightStopEvent
ModifierEvent
object for look-ups in tables.
protected Word lookupWord
protected Map canonicalWords
protected Set wordSet
Word
objects, used when seeding
the chart in initialize(danbikel.lisp.SexpList)
.
protected SLNode tmpChildrenList
getPrevMods(danbikel.parser.CKYItem, danbikel.util.SLNode)
and getPrevModWords(danbikel.parser.CKYItem, danbikel.util.SLNode, boolean)
.
protected Map canonicalPrevModLists
SexpList
objects that represent unlexicalized previous modifier lists.
protected SexpList prevModLookupList
protected WordList prevModWordLeftLookupList
protected WordList prevModWordRightLookupList
protected Subcat lookupSubcat
protected SexpList parentHeadSideLookupList
Settings.useSimpleModNonterminalMap
,
DecoderServerRemote.simpleModNonterminalMap()
protected SexpList partiallyLexedModLookupList
Settings.useSimpleModNonterminalMap
,
DecoderServerRemote.simpleModNonterminalMap()
protected boolean useCommaConstraint
Settings.decoderUseCommaConstraint
.
protected boolean[] commaForPruning
commaForPruning[i]
will be true
after setCommaConstraintData()
has been invoked.
Settings.decoderUseCommaConstraint
,
setCommaConstraintData()
protected boolean[] conjForPruning
conjForPruning[i]
will be
true
after setCommaConstraintData()
has been
invoked.
Settings.decoderUseCommaConstraint
,
setCommaConstraintData()
protected boolean keepAllWords
Settings.keepAllWords
, for efficiency and
convenience.
protected ConstraintSet constraints
protected boolean findAtLeastOneSatisfyingConstraint
ConstraintSet.findAtLeastOneSatisfying()
,
if there are constraints for the current sentence; otherwise, this
data member will be set to false.
constraints
protected boolean isomorphicTreeConstraints
ConstraintSet.findAtLeastOneSatisfying()
and
ConstraintSet.hasTreeStructure()
methods of the current
sentence's constraint set both return true.
If there is no constraint set for the current sentence, this data
member is set to false.
constraints
protected boolean dontPostProcess
Training.postProcess(Sexp)
on the tree.
Settings.decoderDontPostProcess
,
Settings.decoderOutputInsideProbs
Constructor Detail |
---|
public Decoder(int id, DecoderServerRemote server)
DecoderServer
to get all information and probabilities
required for decoding (parsing).
id
- the id of this parsing clientserver
- the DecoderServerRemote
implementor
(either local or remote) that provides this decoder object with
information and probabilities required for decoding (parsing)Method Detail |
---|
protected void wrapCachingServer()
DecoderServerRemote
instance in a caching
version.
Settings.decoderUseLocalProbabilityCache
,
CachingDecoderServer
protected void convertHeadToParentMap()
headToParentMap
from Set
objects to arrays of Symbol
, that is, arrays of type
Symbol[]
. This is an optimization so that there is no need to
create a new iterator object for each traversal of the set.
protected void convertSubcatMaps()
Set
objects (containing Subcat
objects) to Subcat
arrays, that is, objects of type
Subcat[]
. This allows possible subcats for given contexts
to be iterated over without the need to create Iterator
objects during decoding.
protected void convertSubcatMap(Map subcatMap)
convertSubcatMaps()
.
subcatMap
- the subcat map whose values are to be convertedprotected boolean isPuncRaiseWord(Sexp word)
word
- the word to be tested
Training.raisePunctuation(Sexp)
,
prunedPunctuationPosMap
protected void removeWord(SexpList sentence, SexpList tags, int i)
preProcess(danbikel.lisp.SexpList, danbikel.lisp.SexpList)
that removes words from
the specified sentence and originalWords
lists, and also
from the specified tags list, if it is not null
.
sentence
- the sentence from which to remove a wordtags
- the list of tag lists that is coordinated with the specified
sentence from which an item is to be removedi
- the index of the word to be removedprotected void preProcess(SexpList sentence, SexpList tags) throws RemoteException
sentence
- a list of words in a sentence to be parsedtags
- a list of part-of-speech tags in a sentence to be parsed,
coordinated with the specified list of words
RemoteException
protected void postProcess(Sexp tree)
tree
- the parse tree of a sentence that has been parsed.Settings.restorePrunedWords
,
Training.postProcess(Sexp)
protected int restoreOriginalWords(Sexp tree, int wordIdx)
tree
- the sentence for which to restore the original words,
cached during execution of preProcess(danbikel.lisp.SexpList, danbikel.lisp.SexpList)
wordIdx
- a threaded word index
protected void restorePrunedWords(Sexp tree)
tree
- the parse tree of a sentence that has been parsedpostProcess(Sexp)
,
Settings.restorePrunedWords
protected int restorePrunedWordsRecursive(Sexp tree, int wordIdx)
restorePrunedWords(Sexp)
. This
method restores all words except those pruned from the very end of the
original sentence.
tree
- the tree whose pruned words are to be restoredwordIdx
- the current word idx (threaded through this recursive function)
protected void setCommaConstraintData()
commaForPruning
. Also, sets up an array
(initialized to be entirely false) of booleans to cache the locations of
conjunctions, determined within initialize(SexpList,SexpList)
(hence, the initialization of the conjForPruning
array is not
complete until after initialize(SexpList,SexpList)
has finished
executing).
protected Symbol getExampleWordForTag(Symbol tag)
tag
- a part of speech tag for which an example word is to be found
protected SexpList getTagSet(SexpList tags, int wordIdx, Symbol word, boolean wordIsUnknown, Symbol origWord, HashSet tmpSet)
tags
- the list of supplied part-of-speech tags with the current
sentence, or null
if no tags were suppliedwordIdx
- the index of the word whose possible tags
are to be gottenword
- the word at the specified index whose possible tags
are to be gottenwordIsUnknown
- whether the specified word is unknown, as far
as the DecoderServerRemote
is concernedorigWord
- the original word before any mapping to a word-feature vectortmpSet
- a temporary set used during the invocation of this method
protected void seedChart(Symbol word, int wordIdx, Symbol features, boolean neverObserved, SexpList tagSet, boolean wordIsUnknown, Symbol origWord, ConstraintSet constraints) throws RemoteException
word
- the current wordwordIdx
- the index of the current word in the current sentencefeatures
- the word-feature vector for the current wordneverObserved
- indicates whether the current word was never observed
during training (a truly unknown word)tagSet
- a list containing all possible part of speech tags for
the current wordconstraints
- the constraint set for this sentence
RemoteException
- if any calls to the underlying
DecoderServerRemote
object throw a RemoteException
Chart.add(int,int,Item)
protected void initialize(SexpList sentence) throws RemoteException
sentence
- the sentence to parse, which must be a list containing
only symbols as its elements
RemoteException
protected void initialize(SexpList sentence, SexpList tags) throws RemoteException
sentence
- the sentence to parse, which must be a list containing
only symbols as its elementstags
- a list that is the same length as sentence
that
will be used when seeding the chart with the parts of speech for unknown
words; each element i of tags
should itself be a
SexpList
containing all possible parts of speech for the
ith word in sentence
; if the value of this
argument is null
, then for each unknown word (or feature
vector), all possible parts of speech observed in the training data for
that unknown word will be used
RemoteException
protected Word getCanonicalWord(Word lookup)
Word
object for the specified object.
lookup
- the Word
object to be canonicalized
Word
object for the specified object.canonicalWords
protected SexpList setUnion(SexpList l1, SexpList l2, Set tmpSet)
l1
- the first list whose element are to be in the unionl2
- the second list whose element are to be in the uniontmpSet
- a temporary set to be used during the invocation of this
method
protected Sexp parse(SexpList sentence) throws RemoteException
sentence
- a list of symbols representing words of a sentence to be
parsed
null
if no
parse could be found or if a Decoder.TimeoutException
is thrown
RemoteException
- if the internal DecoderServerRemote
instance throws an exception, or some other
exception is thrownprotected Sexp parse(SexpList sentence, SexpList tags) throws RemoteException
sentence
- a list of symbols representing the words of a sentence
to be parsedtags
- a list of part-of-speech tags (symbols) coordinated with
the specified list of words
null
if no
parse could be found or if a Decoder.TimeoutException
is thrown
RemoteException
- if the internal DecoderServerRemote
instance throws an exception, or some other
exception is thrownprotected Sexp parse(SexpList sentence, SexpList tags, ConstraintSet constraints) throws RemoteException
sentence
- a list of symbols representing the words of a sentence
to be parsedtags
- a list of part-of-speech tags (symbols) coordinated with
the specified list of wordsconstraints
- a set of parsing constraints for the specified sentence
null
if no
parse could be found or if a Decoder.TimeoutException
is thrown
RemoteException
- if the internal DecoderServerRemote
instance throws an exception, or some other
exception is thrownprotected void addTopUnaries(int end) throws RemoteException
Training.topSym()
has been multiplied to the
existing item's score.
end
- the index of the last word of the sentence being parsed
RemoteException
protected void complete(int start, int end) throws RemoteException, Decoder.TimeoutException
start
- the index of the first word in the span for which all chart
items are to be created and added to the chartend
- the index of the last word in the span for which all chart
items are to be created and added to the chart
RemoteException
Decoder.TimeoutException
- if the boolean value of Settings.maxParseTime
is greater than zero has been reached while
parsingjoinItems(CKYItem,CKYItem,boolean)
protected boolean derivationOrderOK(CKYItem modificand, boolean modifySide)
complete(int,int)
.
protected void joinItems(CKYItem modificand, CKYItem modifier, boolean side) throws RemoteException
modificand
- the chart item representing a partially-completed
subtree, to be modified on side
by modifier
modifier
- the chart item representing a completed subtree that
will be added as a modifier on side
of
modificand
's subtreeside
- the side on which to attempt to add the specified modifier
to the specified modificand
RemoteException
protected void addUnariesAndStopProbs(int start, int end) throws RemoteException
Training.stopSym()
as
a modifier on either side of a production.
start
- the index of the first word in the spanend
- the index of the last word in the span
RemoteException
addUnaries(CKYItem, java.util.List)
,
addStopProbs(CKYItem, java.util.List)
protected List addUnaries(CKYItem item, List itemsAdded) throws RemoteException
itemsAdded
list.
item
- the item for which unary productions are to be addeditemsAdded
- an empty list in which all new chart items will be
stored
itemsAdded
list having been modified
RemoteException
protected final Subcat[] getPossibleSubcats(Map subcatMap, HeadEvent headEvent, ProbabilityStructure subcatPS, int lastLevel)
Subcat
s for the context contained in the
specified HeadEvent
.
subcatMap
- the map of contexts to sets of possible Subcat
objects (each set is an array of Subcat
)headEvent
- the head event for whose context possible subcats are to
be gottensubcatPS
- the probability structure for generating subcatslastLevel
- the last level of back-off for the specified subcat
probability structure
Subcat
s for the context contained in the
specified HeadEvent
protected List addStopProbs(CKYItem item, List itemsAdded) throws RemoteException
itemsAdded
list. Stop probabilities
are the probabilities associated with generating Training.stopSym()
as a modifier on either side of a production.
item
- the item for which stop probabilites are to be added,
creating a new “stopped” itemitemsAdded
- a list into which chart items added by this method are to
be stored
itemsAdded
list, modified by this
method
RemoteException
protected SexpList getPrevMods(CKYItem item, SLNode modChildren)
item
- the item for which a previous-modifier list is to be
constructedmodChildren
- the last node of modifying children on a particular
side of the head of a chart item
itemPrevMods
list, without its final element
(which is "bumped off" the edge, since the previous-modifier list
has a constant length)protected WordList getPrevModWords(CKYItem item, SLNode modChildren, boolean side)
item
- the item for which a previous-modifier list is to be
constructedmodChildren
- the last node of modifying children on a particular side
of the head of a chart itemside
- the side of the specified item's head child on which the
specified modifier children occur
itemPrevMods
list, without its final element
(which is "bumped off" the edge, since the previous-modifier list
has a constant length)protected final boolean commaConstraintViolation(int start, int split, int end)
end
and that
word is not a comma and when it is not the case that the word at
end
is not a conunction. The check for a conjunction
is to allow chart items representing partial derivations of the form
P → α β γwhere
Treebank.isComma(Symbol)
and
Treebank.isConjunction(Symbol)
.
P → α , CCThis addition to Mike Collins’ definition of the comma constraint was necessary because, unlike in Collins' parser, commas and conjunctions are generated in two separate steps.
public void update(Map<String,String> changedSettings)
Settings.Change
update
in interface Settings.Change
changedSettings
- the keys of this map are the settings that have
changed since the last time this method was
invoked, and the values are the old values for
those changed settingsSettings.register(Class,Settings.Change,Set)
,
Settings.register(Settings.Change)
|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |