|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdanbikel.parser.lang.AbstractTraining
danbikel.parser.english.Training
public class Training
Provides methods for language-specific processing of training parse trees.
Even though this subclass of Training
is in the
default English language package, its primary purpose is simply to fill in
the AbstractTraining.argContexts
, AbstractTraining.semTagArgStopSet
and AbstractTraining.nodesToPrune
data members using a metadata resource. If this capability is desired in
another language package, this class may be subclassed.
Training.addBaseNPs(Sexp)
, with an important change
that is possibly only relevant to the Penn Treebank.
Field Summary |
---|
Fields inherited from class danbikel.parser.lang.AbstractTraining |
---|
addGapInfo, argAugmentations, argContexts, argContextsSym, argNonterminals, baseNP, canonicalAugDelimSym, defaultArgAugmentation, delimAndGapStr, delimAndGapStrLen, gapAugmentation, headFinder, headPostSym, headPreSym, headSym, metadataPropertyPrefix, nodesToPrune, nodesToPruneSym, NP, prunedPreterms, prunedPunctuation, relabelHeadChildrenAsArgs, repairBaseNPs, semTagArgStopListSym, semTagArgStopSet, traceTag, treebank, wordsToPrune |
Constructor Summary | |
---|---|
Training()
The default constructor, to be invoked by Language . |
Method Summary | |
---|---|
Sexp |
fixSubjectlessSentences(Sexp tree)
De-transforms sentence labels changed by relabelSubjectlessSentences(Sexp) when the subjectless sentence
node has children prior to its head child that are arguments. |
protected boolean |
isTypeOfSentence(Symbol label)
A helper method used by AbstractTraining.repairBaseNPs(Sexp,int,Sexp) . |
static void |
main(String[] args)
Test driver for this class. |
void |
postProcess(Sexp tree)
Post-processes a parse tree after decoding, eseentially undoing the steps performed in preprocessing. |
Sexp |
preProcess(Sexp tree)
The method to call before counting events in a training parse tree. |
Sexp |
relabelSubjectlessSentences(Sexp tree)
We override Training.relabelSubjectlessSentences(Sexp)
so that we can make the definition of a subjectless sentence slightly more
restrictive: a subjectless sentence not only must have a null-element child
that is marked with the subject augmentation, but also its head must be a
VP (this is Mike Collins' definition of a subjectless sentence). |
boolean |
removeWord(Symbol word,
Symbol tag,
int idx,
SexpList sentence,
SexpList tags,
SexpList originalTags,
Set prunedPretermsPosSet,
Map prunedPretermsPosMap)
Invoked by the decoder as the first step in preprocessing (prior to the invocation of Training.preProcessTest(danbikel.lisp.SexpList, danbikel.lisp.SexpList, danbikel.lisp.SexpList) ). |
protected Sexp |
unrepairBaseNPs(Sexp tree)
Attempts to un-do the transformation performed by AbstractTraining.repairBaseNPs(Sexp) , in which sentential nodes that occur to the right of
the head child of a base NP are moved to become immediate right siblings of
the base NP; accordingly, this method moves all such sentential nodes that
occur immediately to the right of a base NP to be the rightmost child under
that base NP. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Training() throws FileNotFoundException, IOException
Language
.
This constructor looks for a resource named by the property
metadataPropertyPrefix + language
where
metadataPropertyPrefix
is the value of the constant AbstractTraining.metadataPropertyPrefix
and language
is the value of
Settings.get(Settings.language)
. For example, the property for
English is "parser.training.metadata.english"
.
FileNotFoundException
IOException
Method Detail |
---|
public Sexp preProcess(Sexp tree)
AbstractTraining
AbstractTraining.prune(Sexp)
AbstractTraining.addBaseNPs(Sexp)
AbstractTraining.repairBaseNPs(Sexp)
AbstractTraining.addGapInformation(Sexp)
AbstractTraining.relabelSubjectlessSentences(Sexp)
AbstractTraining.removeNullElements(Sexp)
AbstractTraining.raisePunctuation(Sexp)
AbstractTraining.identifyArguments(Sexp)
AbstractTraining.stripAugmentations(Sexp)
AbstractTraining.addGapInformation(Sexp)
should be run after methods that
introduce new nodes, which in this case is AbstractTraining.addBaseNPs(Sexp)
, as
these new nodes may need to be used to thread the gap feature
AbstractTraining.relabelSubjectlessSentences(Sexp)
should be run after
AbstractTraining.addGapInformation(Sexp)
because only those sentences whose
empty subjects are not the result of WH-movement should be
relabeled
AbstractTraining.removeNullElements(Sexp)
should be run after any
methods that depend on the presence of null elements, such as
AbstractTraining.relabelSubjectlessSentences(Sexp)
because a sentence cannot
be determined to be subjectless unless a null element is present as
a child of a subject-marked node
AbstractTraining.addGapInformation(Sexp)
because the determination of
the location of a trace requires the presence of indexed null elements
AbstractTraining.raisePunctuation(Sexp)
should be run after
AbstractTraining.removeNullElements(Sexp)
because a null element that is a
leftmost or rightmost child can block detection of a punctuation element
that needs to be raised after removal of the null element (if a punctuation
element is the next-to-leftmost or next-to-rightmost child of an interior
node)
AbstractTraining.stripAugmentations(Sexp)
should be run after all methods
that may depend upon the presence of nonterminal augmentations: AbstractTraining.identifyArguments(Sexp)
, AbstractTraining.relabelSubjectlessSentences(Sexp)
and
AbstractTraining.addGapInformation(Sexp)
preProcess
in interface Training
preProcess
in class AbstractTraining
tree
- the parse tree to pre-process
tree
having been pre-processedpublic boolean removeWord(Symbol word, Symbol tag, int idx, SexpList sentence, SexpList tags, SexpList originalTags, Set prunedPretermsPosSet, Map prunedPretermsPosMap)
Training
Training.preProcessTest(danbikel.lisp.SexpList, danbikel.lisp.SexpList, danbikel.lisp.SexpList)
).
Returns whether the specified word should be removed from the sentence
before parsing.
removeWord
in interface Training
removeWord
in class AbstractTraining
word
- a word in the sentence about to parsedtag
- the supplied part-of-speech tag of the specified word,
or null if tags were not suppliedidx
- the index of the specified word in the specified sentencesentence
- a list of Symbol
objects that represent the words
of the sentence to be parsedtags
- coordinated list of supplied part-of-speech tag lists for each
of the words in the specified sentence, or null if no tags
were suppliedoriginalTags
- the cached copy of the specified tags list,
used when Settings.restorePrunedWords
is trueprunedPretermsPosSet
- the set of part-of-speech tags that were
pruned during trainingprunedPretermsPosMap
- a map of words pruned during training to
their part-of-speech tags when they were pruned
public Sexp relabelSubjectlessSentences(Sexp tree)
Training.relabelSubjectlessSentences(Sexp)
so that we can make the definition of a subjectless sentence slightly more
restrictive: a subjectless sentence not only must have a null-element child
that is marked with the subject augmentation, but also its head must be a
VP (this is Mike Collins' definition of a subjectless sentence).
relabelSubjectlessSentences
in interface Training
relabelSubjectlessSentences
in class AbstractTraining
tree
- the parse tree in which to relabel subjectless sentences
tree
that was passed in, with
subjectless sentence nodes relabeledTreebank.isSentence(Symbol)
,
Treebank.subjectAugmentation()
,
Treebank.isNullElementPreterminal(Sexp)
,
Treebank.subjectlessSentenceLabel()
protected boolean isTypeOfSentence(Symbol label)
AbstractTraining
AbstractTraining.repairBaseNPs(Sexp,int,Sexp)
.
While the default implementation here simply returns the result of
calling Treebank.isSentence(Symbol)
with the specified label,
subclasses may override this method if different semantics are required
for identifying sentences that occur as siblings of base NPs.
isTypeOfSentence
in class AbstractTraining
label
- the nonterminal label to test
true
if the specified nonterminal represents a
sentence, false
otherwisepublic Sexp fixSubjectlessSentences(Sexp tree)
relabelSubjectlessSentences(Sexp)
when the subjectless sentence
node has children prior to its head child that are arguments.
protected Sexp unrepairBaseNPs(Sexp tree)
AbstractTraining.repairBaseNPs(Sexp)
, in which sentential nodes that occur to the right of
the head child of a base NP are moved to become immediate right siblings of
the base NP; accordingly, this method moves all such sentential nodes that
occur immediately to the right of a base NP to be the rightmost child under
that base NP.
tree
- the tree for which NPs that were transformed by AbstractTraining.repairBaseNPs(Sexp)
are to be de-transformed
public void postProcess(Sexp tree)
Training
postProcess
in interface Training
postProcess
in class AbstractTraining
tree
- the tree to be post-processedpublic static void main(String[] args)
args
- usage: [-risan] <filename> where
-r | raise punctuation |
-i | identify arguments |
-s | relabel subjectless sentences |
-a | strip nonterminal augmentations |
-n | add/relabel base NPs |
|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |