Parsing Engine

danbikel.parser.english
Class Training

java.lang.Object
  extended by danbikel.parser.lang.AbstractTraining
      extended by danbikel.parser.english.Training
All Implemented Interfaces:
Training, Serializable
Direct Known Subclasses:
NPArgThreadTraining

public class Training
extends AbstractTraining

Provides methods for language-specific processing of training parse trees. Even though this subclass of Training is in the default English language package, its primary purpose is simply to fill in the AbstractTraining.argContexts, AbstractTraining.semTagArgStopSet and AbstractTraining.nodesToPrune data members using a metadata resource. If this capability is desired in another language package, this class may be subclassed.

This class also re-defined the method Training.addBaseNPs(Sexp), with an important change that is possibly only relevant to the Penn Treebank.

See Also:
Serialized Form

Field Summary
 
Fields inherited from class danbikel.parser.lang.AbstractTraining
addGapInfo, argAugmentations, argContexts, argContextsSym, argNonterminals, baseNP, canonicalAugDelimSym, defaultArgAugmentation, delimAndGapStr, delimAndGapStrLen, gapAugmentation, headFinder, headPostSym, headPreSym, headSym, metadataPropertyPrefix, nodesToPrune, nodesToPruneSym, NP, prunedPreterms, prunedPunctuation, relabelHeadChildrenAsArgs, repairBaseNPs, semTagArgStopListSym, semTagArgStopSet, traceTag, treebank, wordsToPrune
 
Constructor Summary
Training()
          The default constructor, to be invoked by Language.
 
Method Summary
 Sexp fixSubjectlessSentences(Sexp tree)
          De-transforms sentence labels changed by relabelSubjectlessSentences(Sexp) when the subjectless sentence node has children prior to its head child that are arguments.
protected  boolean isTypeOfSentence(Symbol label)
          A helper method used by AbstractTraining.repairBaseNPs(Sexp,int,Sexp).
static void main(String[] args)
          Test driver for this class.
 void postProcess(Sexp tree)
          Post-processes a parse tree after decoding, eseentially undoing the steps performed in preprocessing.
 Sexp preProcess(Sexp tree)
          The method to call before counting events in a training parse tree.
 Sexp relabelSubjectlessSentences(Sexp tree)
          We override Training.relabelSubjectlessSentences(Sexp) so that we can make the definition of a subjectless sentence slightly more restrictive: a subjectless sentence not only must have a null-element child that is marked with the subject augmentation, but also its head must be a VP (this is Mike Collins' definition of a subjectless sentence).
 boolean removeWord(Symbol word, Symbol tag, int idx, SexpList sentence, SexpList tags, SexpList originalTags, Set prunedPretermsPosSet, Map prunedPretermsPosMap)
          Invoked by the decoder as the first step in preprocessing (prior to the invocation of Training.preProcessTest(danbikel.lisp.SexpList, danbikel.lisp.SexpList, danbikel.lisp.SexpList)).
protected  Sexp unrepairBaseNPs(Sexp tree)
          Attempts to un-do the transformation performed by AbstractTraining.repairBaseNPs(Sexp), in which sentential nodes that occur to the right of the head child of a base NP are moved to become immediate right siblings of the base NP; accordingly, this method moves all such sentential nodes that occur immediately to the right of a base NP to be the rightmost child under that base NP.
 
Methods inherited from class danbikel.parser.lang.AbstractTraining
addArgAugmentation, addBaseNPs, addGapInformation, argNonterminals, canonicalizeNonterminals, collectPreterms, createArgAugmentationsList, createArgNonterminalsSet, defaultArgAugmentation, gapAugmentation, getCanonicalArg, getCanonicalArg, getPrunedPreterms, getPrunedPunctuation, hasGap, hasGap, hasPossessiveChild, headPostSym, headPreSym, headSym, identifyArguments, isAllNodesToPrune, isArgument, isArgument, isArgument, isArgumentFast, isCoordinatedPhrase, isValidTree, needToAddNormalNPLevel, preProcessTest, printMetadata, prune, raisePunctuation, readMetadata, readMetadataHook, relabelArgChildren, removeArgAugmentation, removeArgAugmentation, removeGapAugmentation, removeNullElements, removeOnlyChildBaseNPs, repairBaseNPs, repairBaseNPs, setUpFastArgMap, skip, startSym, startWord, staticSetUpFastArgMap, stopSym, stopWord, stripAugmentations, stripAugmentations, stripAugmentations, threadNPArgAugmentations, topSym, topWord, traceTag, transformSubjectNTs, unaryProductionsToNull
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Training

public Training()
         throws FileNotFoundException,
                IOException
The default constructor, to be invoked by Language. This constructor looks for a resource named by the property metadataPropertyPrefix + language where metadataPropertyPrefix is the value of the constant AbstractTraining.metadataPropertyPrefix and language is the value of Settings.get(Settings.language). For example, the property for English is "parser.training.metadata.english".

Throws:
FileNotFoundException
IOException
Method Detail

preProcess

public Sexp preProcess(Sexp tree)
Description copied from class: AbstractTraining
The method to call before counting events in a training parse tree. This default implementation executes the following methods of this class in order:
  1. AbstractTraining.prune(Sexp)
  2. AbstractTraining.addBaseNPs(Sexp)
  3. AbstractTraining.repairBaseNPs(Sexp)
  4. AbstractTraining.addGapInformation(Sexp)
  5. AbstractTraining.relabelSubjectlessSentences(Sexp)
  6. AbstractTraining.removeNullElements(Sexp)
  7. AbstractTraining.raisePunctuation(Sexp)
  8. AbstractTraining.identifyArguments(Sexp)
  9. AbstractTraining.stripAugmentations(Sexp)
While every attempt has been made to make the default implementations of these preprocessing methods independent of one another, the order above is not entirely arbitrary. In particular:

Specified by:
preProcess in interface Training
Overrides:
preProcess in class AbstractTraining
Parameters:
tree - the parse tree to pre-process
Returns:
tree having been pre-processed

removeWord

public boolean removeWord(Symbol word,
                          Symbol tag,
                          int idx,
                          SexpList sentence,
                          SexpList tags,
                          SexpList originalTags,
                          Set prunedPretermsPosSet,
                          Map prunedPretermsPosMap)
Description copied from interface: Training
Invoked by the decoder as the first step in preprocessing (prior to the invocation of Training.preProcessTest(danbikel.lisp.SexpList, danbikel.lisp.SexpList, danbikel.lisp.SexpList)). Returns whether the specified word should be removed from the sentence before parsing.

Specified by:
removeWord in interface Training
Overrides:
removeWord in class AbstractTraining
Parameters:
word - a word in the sentence about to parsed
tag - the supplied part-of-speech tag of the specified word, or null if tags were not supplied
idx - the index of the specified word in the specified sentence
sentence - a list of Symbol objects that represent the words of the sentence to be parsed
tags - coordinated list of supplied part-of-speech tag lists for each of the words in the specified sentence, or null if no tags were supplied
originalTags - the cached copy of the specified tags list, used when Settings.restorePrunedWords is true
prunedPretermsPosSet - the set of part-of-speech tags that were pruned during training
prunedPretermsPosMap - a map of words pruned during training to their part-of-speech tags when they were pruned
Returns:
whether the specified word should be removed from the sentence before parsing

relabelSubjectlessSentences

public Sexp relabelSubjectlessSentences(Sexp tree)
We override Training.relabelSubjectlessSentences(Sexp) so that we can make the definition of a subjectless sentence slightly more restrictive: a subjectless sentence not only must have a null-element child that is marked with the subject augmentation, but also its head must be a VP (this is Mike Collins' definition of a subjectless sentence).

Specified by:
relabelSubjectlessSentences in interface Training
Overrides:
relabelSubjectlessSentences in class AbstractTraining
Parameters:
tree - the parse tree in which to relabel subjectless sentences
Returns:
the same tree that was passed in, with subjectless sentence nodes relabeled
See Also:
Treebank.isSentence(Symbol), Treebank.subjectAugmentation(), Treebank.isNullElementPreterminal(Sexp), Treebank.subjectlessSentenceLabel()

isTypeOfSentence

protected boolean isTypeOfSentence(Symbol label)
Description copied from class: AbstractTraining
A helper method used by AbstractTraining.repairBaseNPs(Sexp,int,Sexp). While the default implementation here simply returns the result of calling Treebank.isSentence(Symbol) with the specified label, subclasses may override this method if different semantics are required for identifying sentences that occur as siblings of base NPs.

Overrides:
isTypeOfSentence in class AbstractTraining
Parameters:
label - the nonterminal label to test
Returns:
true if the specified nonterminal represents a sentence, false otherwise

fixSubjectlessSentences

public Sexp fixSubjectlessSentences(Sexp tree)
De-transforms sentence labels changed by relabelSubjectlessSentences(Sexp) when the subjectless sentence node has children prior to its head child that are arguments.


unrepairBaseNPs

protected Sexp unrepairBaseNPs(Sexp tree)
Attempts to un-do the transformation performed by AbstractTraining.repairBaseNPs(Sexp), in which sentential nodes that occur to the right of the head child of a base NP are moved to become immediate right siblings of the base NP; accordingly, this method moves all such sentential nodes that occur immediately to the right of a base NP to be the rightmost child under that base NP.

Parameters:
tree - the tree for which NPs that were transformed by AbstractTraining.repairBaseNPs(Sexp) are to be de-transformed
Returns:
the specified tree, with certain NPs de-transformed

postProcess

public void postProcess(Sexp tree)
Description copied from interface: Training
Post-processes a parse tree after decoding, eseentially undoing the steps performed in preprocessing.

Specified by:
postProcess in interface Training
Overrides:
postProcess in class AbstractTraining
Parameters:
tree - the tree to be post-processed

main

public static void main(String[] args)
Test driver for this class.

Parameters:
args - usage: [-risan] <filename> where
-rraise punctuation
-iidentify arguments
-srelabel subjectless sentences
-astrip nonterminal augmentations
-nadd/relabel base NPs

Parsing Engine

Author: Dan Bikel.