Parsing Engine

danbikel.parser.english
Class BrokenTraining

java.lang.Object
  extended by danbikel.parser.lang.AbstractTraining
      extended by danbikel.parser.english.BrokenTraining
All Implemented Interfaces:
Training, Serializable

public class BrokenTraining
extends AbstractTraining

Provides methods for language-specific processing of training parse trees. This class’ primary purpose is simply to fill in the AbstractTraining.argContexts, AbstractTraining.semTagArgStopSet and AbstractTraining.nodesToPrune data members using a metadata resource. If this capability is desired in another language package, this class may be subclassed.

This class also re-defined the method Training.addBaseNPs(Sexp), with an important change that is possibly only relevant to the Penn Treebank.

Important note: This class is similar to Training, except that it is “broken” in the sense that instead of doing the closest possible emulation of Collins’ parsing model, it only uses details found in Collins’ published papers. See Intricacies of Collins” Parsing Model for details.

See Also:
Serialized Form

Field Summary
 
Fields inherited from class danbikel.parser.lang.AbstractTraining
addGapInfo, argAugmentations, argContexts, argNonterminals, baseNP, canonicalAugDelimSym, defaultArgAugmentation, delimAndGapStr, delimAndGapStrLen, gapAugmentation, headFinder, headPostSym, headPreSym, headSym, metadataPropertyPrefix, nodesToPrune, NP, prunedPreterms, prunedPunctuation, relabelHeadChildrenAsArgs, repairBaseNPs, semTagArgStopSet, traceTag, treebank, wordsToPrune
 
Constructor Summary
BrokenTraining()
          The default constructor, to be invoked by Language.
 
Method Summary
 Sexp fixSubjectlessSentences(Sexp tree)
          This method has been written to do nothing to the specified tree.
 Sexp identifyArguments(Sexp tree)
          Marks certain nodes as arguments by appending a suffix to their respective labels.
protected  boolean isTypeOfSentence(Symbol label)
          Unlike Mike's definition of a sentence for the purpose of relabeling subjectless sentences, which includes any label that starts with 'S', we strictly require here that the label strictly be S, or S with some augmentations.
static void main(String[] args)
          Test driver for this class.
protected  boolean needToAddNormalNPLevel(Sexp grandparent, int parentIdx, Sexp tree)
          The following method has been overridden so that the two unpublished conditions under which one needs to add a normal NP level are overlooked.
 void postProcess(Sexp tree)
          Post-processes a parse tree after decoding, eseentially undoing the steps performed in preprocessing.
 Sexp preProcess(Sexp tree)
          The method to call before counting events in a training parse tree.
protected  Sexp unrepairBaseNPs(Sexp tree)
          De-transforms NPs that were transformed by the Training.repairBaseNPs(Sexp) method.
 
Methods inherited from class danbikel.parser.lang.AbstractTraining
addArgAugmentation, addBaseNPs, addGapInformation, argNonterminals, canonicalizeNonterminals, collectPreterms, createArgAugmentationsList, createArgNonterminalsSet, defaultArgAugmentation, gapAugmentation, getCanonicalArg, getCanonicalArg, getPrunedPreterms, getPrunedPunctuation, hasGap, hasGap, hasPossessiveChild, headPostSym, headPreSym, headSym, isAllNodesToPrune, isArgument, isArgument, isArgument, isArgumentFast, isCoordinatedPhrase, isValidTree, preProcessTest, printMetadata, prune, raisePunctuation, readMetadata, readMetadataHook, relabelArgChildren, relabelSubjectlessSentences, removeArgAugmentation, removeArgAugmentation, removeGapAugmentation, removeNullElements, removeOnlyChildBaseNPs, removeWord, repairBaseNPs, repairBaseNPs, setUpFastArgMap, skip, startSym, startWord, staticSetUpFastArgMap, stopSym, stopWord, stripAugmentations, stripAugmentations, stripAugmentations, threadNPArgAugmentations, topSym, topWord, traceTag, transformSubjectNTs, unaryProductionsToNull
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BrokenTraining

public BrokenTraining()
               throws FileNotFoundException,
                      IOException
The default constructor, to be invoked by Language. This constructor looks for a resource named by the property metadataPropertyPrefix + language where metadataPropertyPrefix is the value of the constant AbstractTraining.metadataPropertyPrefix and language is the value of Settings.get(Settings.language). For example, the property for English is "parser.training.metadata.english".

Throws:
FileNotFoundException
IOException
Method Detail

preProcess

public Sexp preProcess(Sexp tree)
Description copied from class: AbstractTraining
The method to call before counting events in a training parse tree. This default implementation executes the following methods of this class in order:
  1. AbstractTraining.prune(Sexp)
  2. AbstractTraining.addBaseNPs(Sexp)
  3. AbstractTraining.repairBaseNPs(Sexp)
  4. AbstractTraining.addGapInformation(Sexp)
  5. AbstractTraining.relabelSubjectlessSentences(Sexp)
  6. AbstractTraining.removeNullElements(Sexp)
  7. AbstractTraining.raisePunctuation(Sexp)
  8. AbstractTraining.identifyArguments(Sexp)
  9. AbstractTraining.stripAugmentations(Sexp)
While every attempt has been made to make the default implementations of these preprocessing methods independent of one another, the order above is not entirely arbitrary. In particular:

Specified by:
preProcess in interface Training
Overrides:
preProcess in class AbstractTraining
Parameters:
tree - the parse tree to pre-process
Returns:
tree having been pre-processed

identifyArguments

public Sexp identifyArguments(Sexp tree)
Marks certain nodes as arguments by appending a suffix to their respective labels. Collins’ implementation of his parsing model has very specific conditions under which a nonterminal may be identified as an argument; this implementation ignores some of those conditions, making this one of the reasons this class is “broken”.

Specified by:
identifyArguments in interface Training
Overrides:
identifyArguments in class AbstractTraining
Parameters:
tree - the tree in which to identify argument nodes
Returns:
the specified tree modified so that argument nodes are identified via a nonterminal augmentation (suffix)
See Also:
Treebank.canonicalAugDelimiter()

isTypeOfSentence

protected boolean isTypeOfSentence(Symbol label)
Unlike Mike's definition of a sentence for the purpose of relabeling subjectless sentences, which includes any label that starts with 'S', we strictly require here that the label strictly be S, or S with some augmentations. We also do *not* override the default definition of relabelSubjectlessSentences(Sexp), since we are pretending we are not aware that Mike defines subjectless sentences more strictly than is conveyed by his thesis. These are a couple of ways in which this class is “broken”.

Overrides:
isTypeOfSentence in class AbstractTraining
Parameters:
label - the nonterminal label to test
Returns:
true if the specified nonterminal represents a sentence, false otherwise

fixSubjectlessSentences

public Sexp fixSubjectlessSentences(Sexp tree)
This method has been written to do nothing to the specified tree. This is one way in which this class is “broken”


unrepairBaseNPs

protected Sexp unrepairBaseNPs(Sexp tree)
De-transforms NPs that were transformed by the Training.repairBaseNPs(Sexp) method. This method is currently unused.

Parameters:
tree - the tree whose NPs are to be de-transformed
Returns:
a modified version of the specified tree

postProcess

public void postProcess(Sexp tree)
Description copied from interface: Training
Post-processes a parse tree after decoding, eseentially undoing the steps performed in preprocessing.

Specified by:
postProcess in interface Training
Overrides:
postProcess in class AbstractTraining
Parameters:
tree - the tree to be post-processed

needToAddNormalNPLevel

protected boolean needToAddNormalNPLevel(Sexp grandparent,
                                         int parentIdx,
                                         Sexp tree)
The following method has been overridden so that the two unpublished conditions under which one needs to add a normal NP level are overlooked. This is one reason why this class is “broken”.

Overrides:
needToAddNormalNPLevel in class AbstractTraining
Parameters:
grandparent - the parent of the "parent" that is a base NP
parentIdx - the index of the child of grandparent that is the base NP (that is,
grandparent.list().get(parentIdx) == tree
tree - the base NP, whose parent is grandparent

main

public static void main(String[] args)
Test driver for this class.


Parsing Engine

Author: Dan Bikel.