Parsing Engine

danbikel.parser
Class Decoder

java.lang.Object
  extended by danbikel.parser.Decoder
All Implemented Interfaces:
Settings.Change, Serializable
Direct Known Subclasses:
EMDecoder

public class Decoder
extends Object
implements Serializable, Settings.Change

Provides the methods necessary to perform CKY parsing on input sentences.

See Also:
Serialized Form

Nested Class Summary
protected static class Decoder.TimeoutException
          Exception to be thrown when the maximum parse time has been reached.
 
Field Summary
protected  Map canonicalPrevModLists
          A reflexive map in which to store canonical versions of SexpList objects that represent unlexicalized previous modifier lists.
protected  Map canonicalWords
          A reflexive map of Word objects, for getting a canonical version.
protected  int cellLimit
          The cell limit for the parsing chart (stored here for debugging).
protected  CKYChart chart
          The parsing chart.
protected  boolean[] commaForPruning
          A reusable array for storing which words are considered commas for the comma-pruning constraint.
protected  boolean[] conjForPruning
          A reusable array for storing which words are considered conjunctions for the conjunction-pruning constraint.
protected  ConstraintSet constraints
          Caches the ConstraintSet, if any, for the current sentence.
protected  List currItemsAdded
          One of a pair of lists used by addUnariesAndStopProbs(int, int).
protected  boolean dontPostProcess
          Indicates whether to perform post-processing on a tree after parsing, that is, whether to invoke Training.postProcess(Sexp) on the tree.
protected  boolean downcaseWords
          The boolean value of the Settings.downcaseWords setting.
protected  Subcat emptySubcat
          An instance of an empty subcat, for use when constructing lookup events.
protected static PrintWriter err
          A writer wrapped around System.err for error messages that might contain encoding-specific characters.
protected  boolean findAtLeastOneSatisfyingConstraint
          Caches the value of ConstraintSet.findAtLeastOneSatisfying(), if there are constraints for the current sentence; otherwise, this data member will be set to false.
protected  boolean hardConstraints
          The boolean to indicate whether to allow probability estimates equal to Constants.logOfZero and to allow other hard constraints (that amount to implicit log of zero probability estimates).
protected  Map headToParentMap
          A map from futures of the last back-off level of the head generation model to possible history contexts.
protected  int id
          The id of the parsing client that is using this decoder.
protected  boolean isomorphicTreeConstraints
          Caches whether or not the ConstraintSet for the current sentence requires a tree that is isomorphic to the tree of constraints.
protected  int kBest
          The maximum number of top-scoring parses for the various parse methods to return.
protected  boolean keepAllWords
          Cached value of Settings.keepAllWords, for efficiency and convenience.
protected static boolean LEFT
          The value of Constants.LEFT cached for better readability.
protected  Map leftSubcatMap
          A map from contexts of the last back-off level of the left subcat generation model to possible subcats.
protected  ProbabilityStructure leftSubcatPS
          The left subcat generation model structure.
protected  int leftSubcatPSLastLevel
          The last level of back-off in the left subcat generation model structure.
protected static double logOfZero
          The value of Constants.logOfZero cached for readability.
protected static double logProbCertain
          The value of Constants.logProbCertain cached for readability.
protected  HeadEvent lookupHeadEvent
          A reusable HeadEvent object for look-ups in tables.
protected  ModifierEvent lookupLeftStopEvent
          A reusable ModifierEvent object for look-ups in tables.
protected  ModifierEvent lookupModEvent
          A reusable ModifierEvent object for look-ups in tables.
protected  PriorEvent lookupPriorEvent
          A reusable PriorEvent object for look-ups in tables.
protected  ModifierEvent lookupRightStopEvent
          A reusable ModifierEvent object for look-ups in tables.
protected  Subcat lookupSubcat
          A (currently unused) reusable lookup object.
protected  Word lookupWord
          A lookup Word object, for obtaining a canonical version.
protected  int maxParseTime
          The timer (used when Settings.maxParseTime is greater than zero).
protected  double maxPruneFact
          The maximum prune factor (for beam-widening).
protected  int maxSentLen
          The maximum length of sentences to be parsed.
protected  Map modNonterminalMap
          A map from contexts of the last back-off level of the modifying nonterminal generation model to possible modifying nonterminal labels.
protected  ProbabilityStructure modNonterminalPS
          The modifying nonterminal generation model structure.
protected  int modNonterminalPSLastLevel
          The last level of back-off in the modifying nonterminal generation model structure.
protected  Symbol[] nonterminals
          An array of all nonterminals observed in training, that is initialized and filled in at construction time.
protected  int numPrevMods
          The value of the setting Settings.numPrevMods.
protected  int numPrevWords
          The value of the setting Settings.numPrevWords.
protected  SexpList originalSentence
          The original sentence, before preprocessing.
protected  SexpList originalTags
          The original tag list, before preprocessing.
protected  SexpList originalWords
          The original sentence, but with word removed to match pre-processing.
protected  SexpList parentHeadSideLookupList
          A reusable object used for constructing parent-head-side triples when employing the simpler of two methods for determining whether a particular modifier is possible in the context of a particular parent-head-side combination.
protected  SexpList partiallyLexedModLookupList
          A reusable object used for constructing a partially-lexicalized modifier nonterminal when employing the simpler of two methods for determining whether a particular modifier is possible in the context of a particular parent-head-side combination.
protected  Map posMap
          The map from vocabulary items to their possible parts of speech.
protected  Set posSet
          The set of possible parts of speech, derived from posMap.
protected  Map posToExampleWordMap
          A cache derived from posMap that is a map of (presumably closed-class) parts of speech to random example words observed with the part of speech from which they are mapped.
protected  List prevItemsAdded
          One of a pair of lists used by addUnariesAndStopProbs(int, int).
protected  SexpList prevModLookupList
          A reusable object for constructing previous modifier lists for chart items.
protected  WordList prevModWordLeftLookupList
          A reusable object for constructing previous left-modifier word lists for chart items.
protected  WordList prevModWordRightLookupList
          A reusable object for constructing previous right-modifier word lists for chart items.
protected  Map prunedPretermsPosMap
          A map of each word pruned during training to its set of part-of-speech tags observed with its pruned instances.
protected  Set prunedPretermsPosSet
          The set of part-of-speech tags of words pruned during training.
protected  Map prunedPunctuationPosMap
          A map of each punctuation word that was pruned during training to the set of its parts of speech observed with the pruned instances.
protected  double pruneFact
          The prune factor for the parsing chart (stored here for debugging).
protected  double pruneFactIncrement
          The prune factor increment used when doing beam-widening.
protected  boolean relaxConstraints
          The value of Settings.decoderRelaxConstraintsAfterBeamWidening, cached here for readability and convenience.
protected  boolean restorePrunedWords
          The value of the Settings.restorePrunedWords setting.
protected static boolean RIGHT
          The value of Constants.RIGHT cached for better readability.
protected  Map rightSubcatMap
          A map from contexts of the last back-off level of the right subcat generation model to possible subcats.
protected  ProbabilityStructure rightSubcatPS
          The right subcat generation model structure.
protected  int rightSubcatPSLastLevel
          The last level of back-off in the right subcat generation model structure.
protected  SexpList sentence
          The current sentence.
protected  int sentenceIdx
          The current sentence index for this decoder (starts at 0).
protected  int sentLen
          The length of the current sentence, cached here for convenience.
protected  DecoderServerRemote server
          The server for this decoder.
protected  Map simpleModNonterminalMap
          A map from unlexicalized parent-head-side triples to all possible partially-lexicalized modifying nonterminals.
protected  SexpList startList
          A list containing only Training.startSym(), which is the type of list that should be used when there are zero real previous modifiers (to start the Markov modifier process).
protected  Symbol startSym
          The value of Training.startSym(), cached here for efficiency and convenience.
protected  Word startWord
          The value of Training.startWord(), cached here for efficiency and convenience.
protected  WordList startWordList
          A list containing only Training.startWord(), which is the type of list that should be used when there are zero real previous modifiers (to start the Markov modifier process).
protected  List stopProbItemsToAdd
          A temporary storage area used by addStopProbs(danbikel.parser.CKYItem, java.util.List) for storing items to be added to the chart when iterating over a cell in the chart.
protected  Symbol stopSym
          The value of Training.stopSym(), cached here for efficiency and convenience.
protected  Word stopWord
          The value of Training.stopWord(), cached here for efficiency and convenience.
protected  boolean substituteWordsForClosedClassTags
          The boolean value of the Settings.decoderSubstituteWordsForClosedClassTags setting.
protected  Time time
          An object for keeping track of wall-clock time.
protected  SLNode tmpChildrenList
          A reusable list node for use by getPrevMods(danbikel.parser.CKYItem, danbikel.util.SLNode) and getPrevModWords(danbikel.parser.CKYItem, danbikel.util.SLNode, boolean).
protected  List topProbItemsToAdd
          A temporary storage area used by addTopUnaries(int) for storing items to be added to the chart when iterating over a cell in the chart.
protected  Symbol topSym
          The value of Training.topSym(), cached here for efficiency and convenience.
protected  List unaryItemsToAdd
          A temporary storage area used by addUnaries(danbikel.parser.CKYItem, java.util.List) for storing items to be added to the chart when iterating over a cell in the chart.
protected  boolean useCommaConstraint
          The boolean value of Settings.decoderUseCommaConstraint.
protected  boolean useHeadToParentMap
          The boolean value of the Settings.decoderUseHeadToParentMap setting.
protected  boolean useLowFreqTags
          The boolean value of the Settings.useLowFreqTags setting.
protected  boolean useOnlySuppliedTags
          The boolean value of the Settings.decoderUseOnlySuppliedTags setting.
protected  boolean useSimpleModNonterminalMap
          The boolean value of the Settings.useSimpleModNonterminalMap setting.
protected  Set wordSet
          A reusable set for storing Word objects, used when seeding the chart in initialize(danbikel.lisp.SexpList).
protected static Subcat[] zeroSubcatArr
          An array of Subcat of length zero.
 
Constructor Summary
Decoder(int id, DecoderServerRemote server)
          Constructs a new decoder that will use the specified DecoderServer to get all information and probabilities required for decoding (parsing).
 
Method Summary
protected  List addStopProbs(CKYItem item, List itemsAdded)
          Adds stop probabilities to the specified item and adds these items to the chart; as a side effect, all items successfully added to the chart are also stored in the specified itemsAdded list.
protected  void addTopUnaries(int end)
          Adds hiden root nonterminal probabilities.
protected  List addUnaries(CKYItem item, List itemsAdded)
          Finds all possible parent-head (or unary) productions using the root node of the specified chart item as the head, creates new items based on the specified item, multiplying in the parent-head probability.
protected  void addUnariesAndStopProbs(int start, int end)
          Finds all possible parent-head (or unary) productions using the root node of each existing chart item within the specified span as the head, creates new items based on these existing items, multiplying in the parent-head probability; then, using these new items, this method also creates additional new items in which stop probabilities have been multiplied; all new items are added to the chart.
protected  boolean commaConstraintViolation(int start, int split, int end)
          There is a comma contraint violation if the word at the split point is a comma and there exists a word following end and that word is not a comma and when it is not the case that the word at end is not a conunction.
protected  void complete(int start, int end)
          Constructs all possible items spanning the specified indices and adds them to the chart.
protected  void convertHeadToParentMap()
          Converts the values of the read-only headToParentMap from Set objects to arrays of Symbol, that is, arrays of type Symbol[].
protected  void convertSubcatMap(Map subcatMap)
          Helper method used by convertSubcatMaps().
protected  void convertSubcatMaps()
          This helper method used by constructor converts the values of the subcat maps from Set objects (containing Subcat objects) to Subcat arrays, that is, objects of type Subcat[].
protected  boolean derivationOrderOK(CKYItem modificand, boolean modifySide)
          Enforces that modificand receives all its right modifiers before receiving any left modifiers, by ensuring that right-modification only happens when a modificand has no left-children (this is both necessary and sufficient to enforce derivation order).
protected  Word getCanonicalWord(Word lookup)
          Gets the canonical Word object for the specified object.
protected  Symbol getExampleWordForTag(Symbol tag)
          Returns a known word that was observed with the specified part of speech tag.
protected  Subcat[] getPossibleSubcats(Map subcatMap, HeadEvent headEvent, ProbabilityStructure subcatPS, int lastLevel)
          Gets all possible Subcats for the context contained in the specified HeadEvent.
protected  SexpList getPrevMods(CKYItem item, SLNode modChildren)
          Creates a new previous-modifier list given the specified current list and the last modifier on a particular side.
protected  WordList getPrevModWords(CKYItem item, SLNode modChildren, boolean side)
          Creates a new previous-modifier word list given the specified current list and the last modifier on a particular side.
protected  SexpList getTagSet(SexpList tags, int wordIdx, Symbol word, boolean wordIsUnknown, Symbol origWord, HashSet tmpSet)
          Gets the set of possible part-of-speech tags for a word in the sentence to be parsed.
protected  void initialize(SexpList sentence)
          Initializes the chart for parsing the specified sentence.
protected  void initialize(SexpList sentence, SexpList tags)
          Initializes the chart for parsing the specified sentence, using the specified coordinated list of part-of-speech tags when assigning parts of speech to unknown words.
protected  boolean isPuncRaiseWord(Sexp word)
          Returns whether the specified word was raised as part of the punctuation-raising procedure performed during training.
protected  void joinItems(CKYItem modificand, CKYItem modifier, boolean side)
          Joins two chart items, one representing the modificand that has not yet received its stop probabilities, the other representing the modifier that has received its stop probabilities.
protected  Sexp parse(SexpList sentence)
          Parses the specified sentence.
protected  Sexp parse(SexpList sentence, SexpList tags)
          Parses the specified sentence using the supplied list of part-of-speech tags.
protected  Sexp parse(SexpList sentence, SexpList tags, ConstraintSet constraints)
          Parses the specified sentence using the supplied list of part-of-speech tags and the supplied set of parsing constraints.
protected  void postProcess(Sexp tree)
          Performs post-processing on a sentence that has been parsed.
protected  void preProcess(SexpList sentence, SexpList tags)
          Performs all preprocessing to the specified coordinated lists of words and part-of-speech tags of the sentence that is about to be parsed.
protected  void removeWord(SexpList sentence, SexpList tags, int i)
          A helper method used by preProcess(danbikel.lisp.SexpList, danbikel.lisp.SexpList) that removes words from the specified sentence and originalWords lists, and also from the specified tags list, if it is not null.
protected  int restoreOriginalWords(Sexp tree, int wordIdx)
          Restores the original words in the current sentence.
protected  void restorePrunedWords(Sexp tree)
          Restores pruned words to a parsed sentence.
protected  int restorePrunedWordsRecursive(Sexp tree, int wordIdx)
          The recursive helper method for restorePrunedWords(Sexp).
protected  void seedChart(Symbol word, int wordIdx, Symbol features, boolean neverObserved, SexpList tagSet, boolean wordIsUnknown, Symbol origWord, ConstraintSet constraints)
          Adds a chart item for every possible part of speech for the specified word at the specified index in the current sentence.
protected  void setCommaConstraintData()
          Caches the locations of commas to be used for the comma constraint in the boolean array commaForPruning.
protected  SexpList setUnion(SexpList l1, SexpList l2, Set tmpSet)
          Returns a new list that is the union of the two specified lists.
 void update(Map<String,String> changedSettings)
          Invoked by this class to notify the requesting class that one or more settings have changed.
protected  void wrapCachingServer()
          Wraps the normal DecoderServerRemote instance in a caching version.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LEFT

protected static final boolean LEFT
The value of Constants.LEFT cached for better readability.

See Also:
Constant Field Values

RIGHT

protected static final boolean RIGHT
The value of Constants.RIGHT cached for better readability.

See Also:
Constant Field Values

logOfZero

protected static final double logOfZero
The value of Constants.logOfZero cached for readability.

See Also:
Constant Field Values

logProbCertain

protected static final double logProbCertain
The value of Constants.logProbCertain cached for readability.

See Also:
Constant Field Values

zeroSubcatArr

protected static final Subcat[] zeroSubcatArr
An array of Subcat of length zero.

See Also:
getPossibleSubcats(java.util.Map,HeadEvent,ProbabilityStructure,int)

err

protected static PrintWriter err
A writer wrapped around System.err for error messages that might contain encoding-specific characters. The encoding of the writer is Language.encoding().


startList

protected final SexpList startList
A list containing only Training.startSym(), which is the type of list that should be used when there are zero real previous modifiers (to start the Markov modifier process).

See Also:
Trainer.newStartList()

startWordList

protected final WordList startWordList
A list containing only Training.startWord(), which is the type of list that should be used when there are zero real previous modifiers (to start the Markov modifier process).

See Also:
Trainer.newStartWordList()

id

protected int id
The id of the parsing client that is using this decoder.


server

protected DecoderServerRemote server
The server for this decoder.


sentenceIdx

protected int sentenceIdx
The current sentence index for this decoder (starts at 0).


sentence

protected SexpList sentence
The current sentence.


sentLen

protected int sentLen
The length of the current sentence, cached here for convenience.


maxSentLen

protected int maxSentLen
The maximum length of sentences to be parsed. All sentences greater than this length will be skipped.

See Also:
Settings.maxSentLen

kBest

protected int kBest
The maximum number of top-scoring parses for the various parse methods to return.

See Also:
Settings.kBest

maxParseTime

protected int maxParseTime
The timer (used when Settings.maxParseTime is greater than zero).

See Also:
Settings.maxParseTime

time

protected Time time
An object for keeping track of wall-clock time.


chart

protected CKYChart chart
The parsing chart.


posMap

protected Map posMap
The map from vocabulary items to their possible parts of speech.


posToExampleWordMap

protected Map posToExampleWordMap
A cache derived from posMap that is a map of (presumably closed-class) parts of speech to random example words observed with the part of speech from which they are mapped.


posSet

protected Set posSet
The set of possible parts of speech, derived from posMap.


nonterminals

protected Symbol[] nonterminals
An array of all nonterminals observed in training, that is initialized and filled in at construction time.


headToParentMap

protected Map headToParentMap
A map from futures of the last back-off level of the head generation model to possible history contexts.


leftSubcatMap

protected Map leftSubcatMap
A map from contexts of the last back-off level of the left subcat generation model to possible subcats.


rightSubcatMap

protected Map rightSubcatMap
A map from contexts of the last back-off level of the right subcat generation model to possible subcats.


leftSubcatPS

protected ProbabilityStructure leftSubcatPS
The left subcat generation model structure.


leftSubcatPSLastLevel

protected int leftSubcatPSLastLevel
The last level of back-off in the left subcat generation model structure.


rightSubcatPS

protected ProbabilityStructure rightSubcatPS
The right subcat generation model structure.


rightSubcatPSLastLevel

protected int rightSubcatPSLastLevel
The last level of back-off in the right subcat generation model structure.


modNonterminalMap

protected Map modNonterminalMap
A map from contexts of the last back-off level of the modifying nonterminal generation model to possible modifying nonterminal labels.


simpleModNonterminalMap

protected Map simpleModNonterminalMap
A map from unlexicalized parent-head-side triples to all possible partially-lexicalized modifying nonterminals.


modNonterminalPS

protected ProbabilityStructure modNonterminalPS
The modifying nonterminal generation model structure.


modNonterminalPSLastLevel

protected int modNonterminalPSLastLevel
The last level of back-off in the modifying nonterminal generation model structure.


prunedPretermsPosMap

protected Map prunedPretermsPosMap
A map of each word pruned during training to its set of part-of-speech tags observed with its pruned instances. This map is used by Training.removeWord. decod * @see DecoderServerRemote#prunedPreterms()


prunedPretermsPosSet

protected Set prunedPretermsPosSet
The set of part-of-speech tags of words pruned during training. This set is used by Training.removeWord.

See Also:
DecoderServerRemote.prunedPreterms()

prunedPunctuationPosMap

protected Map prunedPunctuationPosMap
A map of each punctuation word that was pruned during training to the set of its parts of speech observed with the pruned instances.

See Also:
DecoderServerRemote.prunedPunctuation()

cellLimit

protected int cellLimit
The cell limit for the parsing chart (stored here for debugging).


pruneFact

protected double pruneFact
The prune factor for the parsing chart (stored here for debugging).


maxPruneFact

protected double maxPruneFact
The maximum prune factor (for beam-widening).

See Also:
Settings.decoderMaxPruneFactor

pruneFactIncrement

protected double pruneFactIncrement
The prune factor increment used when doing beam-widening.

See Also:
Settings.decoderPruneFactorIncrement

relaxConstraints

protected boolean relaxConstraints
The value of Settings.decoderRelaxConstraintsAfterBeamWidening, cached here for readability and convenience.


hardConstraints

protected boolean hardConstraints
The boolean to indicate whether to allow probability estimates equal to Constants.logOfZero and to allow other hard constraints (that amount to implicit log of zero probability estimates). If false, all estimates equal to Constants.logOfZero are modified to be Constants.logProbSmall and all other hard constraints except the comma-pruning constraint are relaxed. This data member is true by default, but is temporarily set to false by the decoder when no parse is produced after all beam widening.


originalSentence

protected SexpList originalSentence
The original sentence, before preprocessing.


originalTags

protected SexpList originalTags
The original tag list, before preprocessing.


restorePrunedWords

protected boolean restorePrunedWords
The value of the Settings.restorePrunedWords setting.


originalWords

protected SexpList originalWords
The original sentence, but with word removed to match pre-processing. This will be used to restore the original words after parsing.


emptySubcat

protected Subcat emptySubcat
An instance of an empty subcat, for use when constructing lookup events.


downcaseWords

protected boolean downcaseWords
The boolean value of the Settings.downcaseWords setting.


useLowFreqTags

protected boolean useLowFreqTags
The boolean value of the Settings.useLowFreqTags setting.


substituteWordsForClosedClassTags

protected boolean substituteWordsForClosedClassTags
The boolean value of the Settings.decoderSubstituteWordsForClosedClassTags setting.


useOnlySuppliedTags

protected boolean useOnlySuppliedTags
The boolean value of the Settings.decoderUseOnlySuppliedTags setting.


useHeadToParentMap

protected boolean useHeadToParentMap
The boolean value of the Settings.decoderUseHeadToParentMap setting.


useSimpleModNonterminalMap

protected boolean useSimpleModNonterminalMap
The boolean value of the Settings.useSimpleModNonterminalMap setting.


startSym

protected Symbol startSym
The value of Training.startSym(), cached here for efficiency and convenience.


startWord

protected Word startWord
The value of Training.startWord(), cached here for efficiency and convenience.


stopSym

protected Symbol stopSym
The value of Training.stopSym(), cached here for efficiency and convenience.


stopWord

protected Word stopWord
The value of Training.stopWord(), cached here for efficiency and convenience.


topSym

protected Symbol topSym
The value of Training.topSym(), cached here for efficiency and convenience.


numPrevMods

protected int numPrevMods
The value of the setting Settings.numPrevMods.


numPrevWords

protected int numPrevWords
The value of the setting Settings.numPrevWords.


prevItemsAdded

protected List prevItemsAdded
One of a pair of lists used by addUnariesAndStopProbs(int, int).


currItemsAdded

protected List currItemsAdded
One of a pair of lists used by addUnariesAndStopProbs(int, int).


topProbItemsToAdd

protected List topProbItemsToAdd
A temporary storage area used by addTopUnaries(int) for storing items to be added to the chart when iterating over a cell in the chart.


unaryItemsToAdd

protected List unaryItemsToAdd
A temporary storage area used by addUnaries(danbikel.parser.CKYItem, java.util.List) for storing items to be added to the chart when iterating over a cell in the chart.


stopProbItemsToAdd

protected List stopProbItemsToAdd
A temporary storage area used by addStopProbs(danbikel.parser.CKYItem, java.util.List) for storing items to be added to the chart when iterating over a cell in the chart.


lookupPriorEvent

protected PriorEvent lookupPriorEvent
A reusable PriorEvent object for look-ups in tables.


lookupHeadEvent

protected HeadEvent lookupHeadEvent
A reusable HeadEvent object for look-ups in tables.


lookupModEvent

protected ModifierEvent lookupModEvent
A reusable ModifierEvent object for look-ups in tables.


lookupLeftStopEvent

protected ModifierEvent lookupLeftStopEvent
A reusable ModifierEvent object for look-ups in tables.


lookupRightStopEvent

protected ModifierEvent lookupRightStopEvent
A reusable ModifierEvent object for look-ups in tables.


lookupWord

protected Word lookupWord
A lookup Word object, for obtaining a canonical version.


canonicalWords

protected Map canonicalWords
A reflexive map of Word objects, for getting a canonical version.


wordSet

protected Set wordSet
A reusable set for storing Word objects, used when seeding the chart in initialize(danbikel.lisp.SexpList).


tmpChildrenList

protected SLNode tmpChildrenList
A reusable list node for use by getPrevMods(danbikel.parser.CKYItem, danbikel.util.SLNode) and getPrevModWords(danbikel.parser.CKYItem, danbikel.util.SLNode, boolean).


canonicalPrevModLists

protected Map canonicalPrevModLists
A reflexive map in which to store canonical versions of SexpList objects that represent unlexicalized previous modifier lists.


prevModLookupList

protected SexpList prevModLookupList
A reusable object for constructing previous modifier lists for chart items.


prevModWordLeftLookupList

protected WordList prevModWordLeftLookupList
A reusable object for constructing previous left-modifier word lists for chart items.


prevModWordRightLookupList

protected WordList prevModWordRightLookupList
A reusable object for constructing previous right-modifier word lists for chart items.


lookupSubcat

protected Subcat lookupSubcat
A (currently unused) reusable lookup object.


parentHeadSideLookupList

protected SexpList parentHeadSideLookupList
A reusable object used for constructing parent-head-side triples when employing the simpler of two methods for determining whether a particular modifier is possible in the context of a particular parent-head-side combination.

See Also:
Settings.useSimpleModNonterminalMap, DecoderServerRemote.simpleModNonterminalMap()

partiallyLexedModLookupList

protected SexpList partiallyLexedModLookupList
A reusable object used for constructing a partially-lexicalized modifier nonterminal when employing the simpler of two methods for determining whether a particular modifier is possible in the context of a particular parent-head-side combination.

See Also:
Settings.useSimpleModNonterminalMap, DecoderServerRemote.simpleModNonterminalMap()

useCommaConstraint

protected boolean useCommaConstraint
The boolean value of Settings.decoderUseCommaConstraint.


commaForPruning

protected boolean[] commaForPruning
A reusable array for storing which words are considered commas for the comma-pruning constraint. If a word at index i is such a comma, then commaForPruning[i] will be true after setCommaConstraintData() has been invoked.

See Also:
Settings.decoderUseCommaConstraint, setCommaConstraintData()

conjForPruning

protected boolean[] conjForPruning
A reusable array for storing which words are considered conjunctions for the conjunction-pruning constraint. If a word at index i is such a conjunction, then conjForPruning[i] will be true after setCommaConstraintData() has been invoked.

See Also:
Settings.decoderUseCommaConstraint, setCommaConstraintData()

keepAllWords

protected boolean keepAllWords
Cached value of Settings.keepAllWords, for efficiency and convenience.


constraints

protected ConstraintSet constraints
Caches the ConstraintSet, if any, for the current sentence.


findAtLeastOneSatisfyingConstraint

protected boolean findAtLeastOneSatisfyingConstraint
Caches the value of ConstraintSet.findAtLeastOneSatisfying(), if there are constraints for the current sentence; otherwise, this data member will be set to false.

See Also:
constraints

isomorphicTreeConstraints

protected boolean isomorphicTreeConstraints
Caches whether or not the ConstraintSet for the current sentence requires a tree that is isomorphic to the tree of constraints. Specifically, this data member will be set to true if the ConstraintSet.findAtLeastOneSatisfying() and ConstraintSet.hasTreeStructure() methods of the current sentence's constraint set both return true. If there is no constraint set for the current sentence, this data member is set to false.

See Also:
constraints

dontPostProcess

protected boolean dontPostProcess
Indicates whether to perform post-processing on a tree after parsing, that is, whether to invoke Training.postProcess(Sexp) on the tree.

See Also:
Settings.decoderDontPostProcess, Settings.decoderOutputInsideProbs
Constructor Detail

Decoder

public Decoder(int id,
               DecoderServerRemote server)
Constructs a new decoder that will use the specified DecoderServer to get all information and probabilities required for decoding (parsing).

Parameters:
id - the id of this parsing client
server - the DecoderServerRemote implementor (either local or remote) that provides this decoder object with information and probabilities required for decoding (parsing)
Method Detail

wrapCachingServer

protected void wrapCachingServer()
Wraps the normal DecoderServerRemote instance in a caching version.

See Also:
Settings.decoderUseLocalProbabilityCache, CachingDecoderServer

convertHeadToParentMap

protected void convertHeadToParentMap()
Converts the values of the read-only headToParentMap from Set objects to arrays of Symbol, that is, arrays of type Symbol[]. This is an optimization so that there is no need to create a new iterator object for each traversal of the set.


convertSubcatMaps

protected void convertSubcatMaps()
This helper method used by constructor converts the values of the subcat maps from Set objects (containing Subcat objects) to Subcat arrays, that is, objects of type Subcat[]. This allows possible subcats for given contexts to be iterated over without the need to create Iterator objects during decoding.


convertSubcatMap

protected void convertSubcatMap(Map subcatMap)
Helper method used by convertSubcatMaps().

Parameters:
subcatMap - the subcat map whose values are to be converted

isPuncRaiseWord

protected boolean isPuncRaiseWord(Sexp word)
Returns whether the specified word was raised as part of the punctuation-raising procedure performed during training.

Parameters:
word - the word to be tested
Returns:
whether the specified word was raised as part of the punctuation-raising procedure performed during training.
See Also:
Training.raisePunctuation(Sexp), prunedPunctuationPosMap

removeWord

protected void removeWord(SexpList sentence,
                          SexpList tags,
                          int i)
A helper method used by preProcess(danbikel.lisp.SexpList, danbikel.lisp.SexpList) that removes words from the specified sentence and originalWords lists, and also from the specified tags list, if it is not null.

Parameters:
sentence - the sentence from which to remove a word
tags - the list of tag lists that is coordinated with the specified sentence from which an item is to be removed
i - the index of the word to be removed

preProcess

protected void preProcess(SexpList sentence,
                          SexpList tags)
                   throws RemoteException
Performs all preprocessing to the specified coordinated lists of words and part-of-speech tags of the sentence that is about to be parsed.

Parameters:
sentence - a list of words in a sentence to be parsed
tags - a list of part-of-speech tags in a sentence to be parsed, coordinated with the specified list of words
Throws:
RemoteException

postProcess

protected void postProcess(Sexp tree)
Performs post-processing on a sentence that has been parsed.

Parameters:
tree - the parse tree of a sentence that has been parsed.
See Also:
Settings.restorePrunedWords, Training.postProcess(Sexp)

restoreOriginalWords

protected int restoreOriginalWords(Sexp tree,
                                   int wordIdx)
Restores the original words in the current sentence.

Parameters:
tree - the sentence for which to restore the original words, cached during execution of preProcess(danbikel.lisp.SexpList, danbikel.lisp.SexpList)
wordIdx - a threaded word index
Returns:
the current value of the monotonically-increasing word index, after replacing all words in the current subtree

restorePrunedWords

protected void restorePrunedWords(Sexp tree)
Restores pruned words to a parsed sentence.

Parameters:
tree - the parse tree of a sentence that has been parsed
See Also:
postProcess(Sexp), Settings.restorePrunedWords

restorePrunedWordsRecursive

protected int restorePrunedWordsRecursive(Sexp tree,
                                          int wordIdx)
The recursive helper method for restorePrunedWords(Sexp). This method restores all words except those pruned from the very end of the original sentence.

Parameters:
tree - the tree whose pruned words are to be restored
wordIdx - the current word idx (threaded through this recursive function)
Returns:
the word index of the last word in the specified tree whose pruned words were restored

setCommaConstraintData

protected void setCommaConstraintData()
Caches the locations of commas to be used for the comma constraint in the boolean array commaForPruning. Also, sets up an array (initialized to be entirely false) of booleans to cache the locations of conjunctions, determined within initialize(SexpList,SexpList) (hence, the initialization of the conjForPruning array is not complete until after initialize(SexpList,SexpList) has finished executing).


getExampleWordForTag

protected Symbol getExampleWordForTag(Symbol tag)
Returns a known word that was observed with the specified part of speech tag.

Parameters:
tag - a part of speech tag for which an example word is to be found
Returns:
a word that was observed with the specified part of speech tag.

getTagSet

protected SexpList getTagSet(SexpList tags,
                             int wordIdx,
                             Symbol word,
                             boolean wordIsUnknown,
                             Symbol origWord,
                             HashSet tmpSet)
Gets the set of possible part-of-speech tags for a word in the sentence to be parsed. The set returned is a list of symbols.

Parameters:
tags - the list of supplied part-of-speech tags with the current sentence, or null if no tags were supplied
wordIdx - the index of the word whose possible tags are to be gotten
word - the word at the specified index whose possible tags are to be gotten
wordIsUnknown - whether the specified word is unknown, as far as the DecoderServerRemote is concerned
origWord - the original word before any mapping to a word-feature vector
tmpSet - a temporary set used during the invocation of this method
Returns:
the set of possible part-of-speech tags for a word in the sentence to be parsed, as a list of symbols

seedChart

protected void seedChart(Symbol word,
                         int wordIdx,
                         Symbol features,
                         boolean neverObserved,
                         SexpList tagSet,
                         boolean wordIsUnknown,
                         Symbol origWord,
                         ConstraintSet constraints)
                  throws RemoteException
Adds a chart item for every possible part of speech for the specified word at the specified index in the current sentence.

Parameters:
word - the current word
wordIdx - the index of the current word in the current sentence
features - the word-feature vector for the current word
neverObserved - indicates whether the current word was never observed during training (a truly unknown word)
tagSet - a list containing all possible part of speech tags for the current word
constraints - the constraint set for this sentence
Throws:
RemoteException - if any calls to the underlying DecoderServerRemote object throw a RemoteException
See Also:
Chart.add(int,int,Item)

initialize

protected void initialize(SexpList sentence)
                   throws RemoteException
Initializes the chart for parsing the specified sentence. Specifically, this method will add a chart item for each possible part of speech for each word.

Parameters:
sentence - the sentence to parse, which must be a list containing only symbols as its elements
Throws:
RemoteException

initialize

protected void initialize(SexpList sentence,
                          SexpList tags)
                   throws RemoteException
Initializes the chart for parsing the specified sentence, using the specified coordinated list of part-of-speech tags when assigning parts of speech to unknown words.

Parameters:
sentence - the sentence to parse, which must be a list containing only symbols as its elements
tags - a list that is the same length as sentence that will be used when seeding the chart with the parts of speech for unknown words; each element i of tags should itself be a SexpList containing all possible parts of speech for the ith word in sentence; if the value of this argument is null, then for each unknown word (or feature vector), all possible parts of speech observed in the training data for that unknown word will be used
Throws:
RemoteException

getCanonicalWord

protected Word getCanonicalWord(Word lookup)
Gets the canonical Word object for the specified object.

Parameters:
lookup - the Word object to be canonicalized
Returns:
the canonical Word object for the specified object.
See Also:
canonicalWords

setUnion

protected SexpList setUnion(SexpList l1,
                            SexpList l2,
                            Set tmpSet)
Returns a new list that is the union of the two specified lists.

Parameters:
l1 - the first list whose element are to be in the union
l2 - the second list whose element are to be in the union
tmpSet - a temporary set to be used during the invocation of this method
Returns:
a new list that is the union of the two specified lists.

parse

protected Sexp parse(SexpList sentence)
              throws RemoteException
Parses the specified sentence.

Parameters:
sentence - a list of symbols representing words of a sentence to be parsed
Returns:
a parse tree for the specified sentence, or null if no parse could be found or if a Decoder.TimeoutException is thrown
Throws:
RemoteException - if the internal DecoderServerRemote instance throws an exception, or some other exception is thrown

parse

protected Sexp parse(SexpList sentence,
                     SexpList tags)
              throws RemoteException
Parses the specified sentence using the supplied list of part-of-speech tags.

Parameters:
sentence - a list of symbols representing the words of a sentence to be parsed
tags - a list of part-of-speech tags (symbols) coordinated with the specified list of words
Returns:
a parse tree for the specified sentence, or null if no parse could be found or if a Decoder.TimeoutException is thrown
Throws:
RemoteException - if the internal DecoderServerRemote instance throws an exception, or some other exception is thrown

parse

protected Sexp parse(SexpList sentence,
                     SexpList tags,
                     ConstraintSet constraints)
              throws RemoteException
Parses the specified sentence using the supplied list of part-of-speech tags and the supplied set of parsing constraints.

Parameters:
sentence - a list of symbols representing the words of a sentence to be parsed
tags - a list of part-of-speech tags (symbols) coordinated with the specified list of words
constraints - a set of parsing constraints for the specified sentence
Returns:
a parse tree for the specified sentence, or null if no parse could be found or if a Decoder.TimeoutException is thrown
Throws:
RemoteException - if the internal DecoderServerRemote instance throws an exception, or some other exception is thrown

addTopUnaries

protected void addTopUnaries(int end)
                      throws RemoteException
Adds hiden root nonterminal probabilities. That is, for each derivation spanning the entire sentence from index 0 to the specified end index, this method produces new chart items in which the probability of producing that derivation given Training.topSym() has been multiplied to the existing item's score.

Parameters:
end - the index of the last word of the sentence being parsed
Throws:
RemoteException

complete

protected void complete(int start,
                        int end)
                 throws RemoteException,
                        Decoder.TimeoutException
Constructs all possible items spanning the specified indices and adds them to the chart. This involves joining modificands (items to be modified) with modifiers when the modificand has not yet received its stop probabilities and when the spans of both modificand and modifier cover the specified span.

Parameters:
start - the index of the first word in the span for which all chart items are to be created and added to the chart
end - the index of the last word in the span for which all chart items are to be created and added to the chart
Throws:
RemoteException
Decoder.TimeoutException - if the boolean value of Settings.maxParseTime is greater than zero has been reached while parsing
See Also:
joinItems(CKYItem,CKYItem,boolean)

derivationOrderOK

protected boolean derivationOrderOK(CKYItem modificand,
                                    boolean modifySide)
Enforces that modificand receives all its right modifiers before receiving any left modifiers, by ensuring that right-modification only happens when a modificand has no left-children (this is both necessary and sufficient to enforce derivation order). Also, in the case of left-modification, this method checks to make sure that the right subcat is empty (necessary but not sufficient to enforce derivation order). This method is called by complete(int,int).


joinItems

protected void joinItems(CKYItem modificand,
                         CKYItem modifier,
                         boolean side)
                  throws RemoteException
Joins two chart items, one representing the modificand that has not yet received its stop probabilities, the other representing the modifier that has received its stop probabilities.

Parameters:
modificand - the chart item representing a partially-completed subtree, to be modified on side by modifier
modifier - the chart item representing a completed subtree that will be added as a modifier on side of modificand's subtree
side - the side on which to attempt to add the specified modifier to the specified modificand
Throws:
RemoteException

addUnariesAndStopProbs

protected void addUnariesAndStopProbs(int start,
                                      int end)
                               throws RemoteException
Finds all possible parent-head (or unary) productions using the root node of each existing chart item within the specified span as the head, creates new items based on these existing items, multiplying in the parent-head probability; then, using these new items, this method also creates additional new items in which stop probabilities have been multiplied; all new items are added to the chart. Stop probabilities are the probabilities associated with generating Training.stopSym() as a modifier on either side of a production.

Parameters:
start - the index of the first word in the span
end - the index of the last word in the span
Throws:
RemoteException
See Also:
addUnaries(CKYItem, java.util.List), addStopProbs(CKYItem, java.util.List)

addUnaries

protected List addUnaries(CKYItem item,
                          List itemsAdded)
                   throws RemoteException
Finds all possible parent-head (or unary) productions using the root node of the specified chart item as the head, creates new items based on the specified item, multiplying in the parent-head probability. All new items are added to the chart; those that are successfully added are also stored in the specified itemsAdded list.

Parameters:
item - the item for which unary productions are to be added
itemsAdded - an empty list in which all new chart items will be stored
Returns:
the specified itemsAdded list having been modified
Throws:
RemoteException

getPossibleSubcats

protected final Subcat[] getPossibleSubcats(Map subcatMap,
                                            HeadEvent headEvent,
                                            ProbabilityStructure subcatPS,
                                            int lastLevel)
Gets all possible Subcats for the context contained in the specified HeadEvent.

Parameters:
subcatMap - the map of contexts to sets of possible Subcat objects (each set is an array of Subcat)
headEvent - the head event for whose context possible subcats are to be gotten
subcatPS - the probability structure for generating subcats
lastLevel - the last level of back-off for the specified subcat probability structure
Returns:
all possible Subcats for the context contained in the specified HeadEvent

addStopProbs

protected List addStopProbs(CKYItem item,
                            List itemsAdded)
                     throws RemoteException
Adds stop probabilities to the specified item and adds these items to the chart; as a side effect, all items successfully added to the chart are also stored in the specified itemsAdded list. Stop probabilities are the probabilities associated with generating Training.stopSym() as a modifier on either side of a production.

Parameters:
item - the item for which stop probabilites are to be added, creating a new “stopped” item
itemsAdded - a list into which chart items added by this method are to be stored
Returns:
the specified itemsAdded list, modified by this method
Throws:
RemoteException

getPrevMods

protected SexpList getPrevMods(CKYItem item,
                               SLNode modChildren)
Creates a new previous-modifier list given the specified current list and the last modifier on a particular side.

Parameters:
item - the item for which a previous-modifier list is to be constructed
modChildren - the last node of modifying children on a particular side of the head of a chart item
Returns:
the list whose first element is the label of the specified modifying child and whose subsequent elements are those of the specified itemPrevMods list, without its final element (which is "bumped off" the edge, since the previous-modifier list has a constant length)

getPrevModWords

protected WordList getPrevModWords(CKYItem item,
                                   SLNode modChildren,
                                   boolean side)
Creates a new previous-modifier word list given the specified current list and the last modifier on a particular side.

Parameters:
item - the item for which a previous-modifier list is to be constructed
modChildren - the last node of modifying children on a particular side of the head of a chart item
side - the side of the specified item's head child on which the specified modifier children occur
Returns:
the list whose first element is the head word of the specified modifying child and whose subsequent elements are those of the specified itemPrevMods list, without its final element (which is "bumped off" the edge, since the previous-modifier list has a constant length)

commaConstraintViolation

protected final boolean commaConstraintViolation(int start,
                                                 int split,
                                                 int end)
There is a comma contraint violation if the word at the split point is a comma and there exists a word following end and that word is not a comma and when it is not the case that the word at end is not a conunction. The check for a conjunction is to allow chart items representing partial derivations of the form
P → α β γ
where

In the English Penn Treebank, the concrete form of this partial derivation would be

P → α , CC

This addition to Mike Collins’ definition of the comma constraint was necessary because, unlike in Collins' parser, commas and conjunctions are generated in two separate steps.


update

public void update(Map<String,String> changedSettings)
Description copied from interface: Settings.Change
Invoked by this class to notify the requesting class that one or more settings have changed.

Specified by:
update in interface Settings.Change
Parameters:
changedSettings - the keys of this map are the settings that have changed since the last time this method was invoked, and the values are the old values for those changed settings
See Also:
Settings.register(Class,Settings.Change,Set), Settings.register(Settings.Change)

Parsing Engine

Author: Dan Bikel.