|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdanbikel.parser.lang.AbstractTraining
danbikel.parser.arabic.Training
public class Training
Provides methods for language-specific processing of training parse trees.
Even though this subclass of Training
is
in the default English language package, its primary purpose is simply
to fill in the AbstractTraining.argContexts
, AbstractTraining.semTagArgStopSet
and
AbstractTraining.nodesToPrune
data members using a metadata resource. If this
capability is desired in another language package, this class may be
subclassed.
This class also re-defined the method
AbstractTraining.hasPossessiveChild(Sexp)
.
Field Summary | |
---|---|
protected static String[] |
caseMarkers
An array of case markers in Arabic Treebank part-of-speech tags. |
protected static String[] |
definiteMarkers
An array of definite/indefinite markers in Arabic Treebank part-of-speech tags. |
protected static String[] |
detPrefixMarkers
An array of determiner markers in Arabic Treebank part-of-speech tags. |
protected static String[] |
genderMarkers
An array of gender markers in Arabic Treebank part-of-speech tags. |
protected static String[][] |
markers
An array of the various markers arrays. |
protected static String[] |
moodMarkers
An array of verb mood markers in Arabic Treebank part-of-speech tags. |
protected static String[] |
nounSuffixMarkers
An array of noun markers in Arabic Treebank part-of-speech tags. |
protected static String[] |
numberMarkers
An array of number markers in Arabic Treebank part-of-speech tags (Arabic has forms for singular, plural and dual). |
protected static String[] |
personMarkers
An array of person/number markers (indicating information such as “first person singular”) in Arabic Treebank part-of-speech tags. |
protected static String[] |
pronounMarkers
An array of pronoun markers in Arabic Treebank part-of-speech tags. |
protected static boolean |
regularizeVerbs
If regularizeVerbs is true , it indicates that part of speech
tags that contain any of the patterns in the verbPatterns array
should be transformed simply into the pattern itself. |
protected static boolean[] |
remove
Indicates which of the various types of markers should be removed from Arabic Treebank part-of-speech tags during preprocessing (currently unused). |
protected static Symbol |
tagMapSym
The symbol associated with tag map metadata. |
protected static String[] |
verbPatterns
The match patterns used when regularizeVerbs is
true . |
Fields inherited from class danbikel.parser.lang.AbstractTraining |
---|
addGapInfo, argAugmentations, argContexts, argNonterminals, baseNP, canonicalAugDelimSym, defaultArgAugmentation, delimAndGapStr, delimAndGapStrLen, gapAugmentation, headFinder, headPostSym, headPreSym, headSym, metadataPropertyPrefix, nodesToPrune, NP, prunedPreterms, prunedPunctuation, relabelHeadChildrenAsArgs, repairBaseNPs, semTagArgStopSet, traceTag, treebank, wordsToPrune |
Constructor Summary | |
---|---|
Training()
The default constructor, to be invoked by Language . |
Method Summary | |
---|---|
protected void |
canonicalizeNonterminals(Sexp tree)
For arabic, we do not want to transform preterminals (parts of speech) to their canonical forms, so this method is overridden. |
protected int |
contains(StringBuffer searchBuf,
String[] searchPatterns,
IntCounter patternIdx)
Helper method used by TagMap.transformTag(Word) . |
protected void |
createArgNonterminalsSet()
An overridden version of AbstractTraining.createArgNonterminalsSet()
that adds argument nonterminal patterns, such as *-SBJ, to the
set of argument nonterminals. |
protected boolean |
hasPossessiveChild(Sexp tree)
We override this method so that it always returns false ,
so that the default implementation of addBaseNPs(Sexp) |
boolean |
isValidTree(Sexp tree)
If the specified tree has a root label with a print name equal to "X", then this method returns false ;
otherwise, this method returns the value of the default implementation in
the superclass with the specified tree
(super.isValidTree(tree) ). |
static void |
main(String[] args)
Test driver for this class. |
Sexp |
preProcess(Sexp tree)
The method to call before counting events in a training parse tree. |
SexpList |
preProcessTest(SexpList sentence,
SexpList originalWords,
SexpList tags)
Preprocesses the specified test sentence and its coordinated list of part-of-speech tags, leaving the original sentence untouched but providing a modified version of the coordinated list of tags, where each tag has been mapped using the value of the original word and the original tag using TagMap.transformTag(Word) . |
protected void |
readMetadataHook(Symbol dataType,
int metadataLen,
SexpList metadata)
Reads the tag map metadata if the specified data type is equal to tagMapSym . |
Symbol |
startSym()
Returns the symbol to indicate hidden nonterminals that precede the first in a sequence of modifier nonterminals. |
Word |
startWord()
Returns the Word object that represents the hidden "head word"
of the start symbol. |
Symbol |
stopSym()
Returns the symbol to indicate a hidden nonterminal that follows the last in a sequence of modifier nonterminals. |
Word |
stopWord()
Returns the Word object that represents the hidden "head word"
of the stop symbol. |
Symbol |
topSym()
Returns the symbol to indicate the hidden root of all parse trees. |
Word |
topWord()
Returns the Word object that represents the hidden "head word"
of the hidden root of all parse trees. |
protected Symbol |
transformTagOld(Word word)
Deprecated. This method is the old mechanism by which to transform the part-of-speech tag associated with an Arabic word; it has been superseded by the method TagMap.transformTag(Word) . |
protected Sexp |
transformTags(Sexp tree)
Does an in-place transformation of the part-of-speech tags in the specified tree. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final Symbol tagMapSym
protected static final String[] nounSuffixMarkers
protected static final String[] detPrefixMarkers
protected static final String[] personMarkers
protected static final String[] numberMarkers
protected static final String[] genderMarkers
protected static final String[] caseMarkers
protected static final String[] definiteMarkers
protected static final String[] pronounMarkers
protected static final String[] moodMarkers
protected static final String[][] markers
nounSuffixMarkers
,
detPrefixMarkers
,
personMarkers
,
numberMarkers
,
genderMarkers
,
caseMarkers
,
definiteMarkers
,
pronounMarkers
,
moodMarkers
protected static final boolean[] remove
markers
.
protected static final boolean regularizeVerbs
true
, it indicates that part of speech
tags that contain any of the patterns in the verbPatterns
array
should be transformed simply into the pattern itself. For example, the
tag IV2D+VERB_IMPERFECT+IVSUFF_SUBJ:D_MOOD:SJ would be
transformed into, simply, VERB_IMPERFECT.
protected static final String[] verbPatterns
regularizeVerbs
is
true
.
Constructor Detail |
---|
public Training() throws FileNotFoundException, IOException
Language
.
This constructor looks for a resource named by the property
metadataPropertyPrefix + language
where metadataPropertyPrefix
is the value of
the constant AbstractTraining.metadataPropertyPrefix
and language
is the value of Settings.get(Settings.language)
.
For example, the property for English is
"parser.training.metadata.english"
.
FileNotFoundException
IOException
Method Detail |
---|
protected void readMetadataHook(Symbol dataType, int metadataLen, SexpList metadata)
tagMapSym
.
readMetadataHook
in class AbstractTraining
dataType
- the data type of the specified metadata resource; if
the specified symbol is equal to tagMapSym
then this method
will read and store the associated tag map metadatametadataLen
- the length of the metadata listmetadata
- the metadata resourcepublic Symbol startSym()
startSym
in interface Training
startSym
in class AbstractTraining
Trainer
public Word startWord()
Word
object that represents the hidden "head word"
of the start symbol. This method overrides the default implementation so
as to return a Word
containing symbols that do not contain a plus
sign (+), which is a nonterminal augmentation delimiter in the
Arabic Treebank.
startWord
in interface Training
startWord
in class AbstractTraining
startSym
,
Trainer
public Symbol stopSym()
stopSym
in interface Training
stopSym
in class AbstractTraining
Trainer
public Word stopWord()
Word
object that represents the hidden "head word"
of the stop symbol. This method overrides the default implementation so as
to return a Word
containing symbols that do not contain a plus
sign (+), which is a nonterminal augmentation delimiter in the
Arabic Treebank.
stopWord
in interface Training
stopWord
in class AbstractTraining
stopSym
,
Trainer
public Symbol topSym()
topSym
in interface Training
topSym
in class AbstractTraining
Trainer
public Word topWord()
Word
object that represents the hidden "head word"
of the hidden root of all parse trees. This method overrides the default
implementation so as to return a Word
containing symbols that do
not contain a plus sign (+), which is a nonterminal augmentation
delimiter in the Arabic Treebank.
topWord
in interface Training
topWord
in class AbstractTraining
public Sexp preProcess(Sexp tree)
transformTags(Sexp)
AbstractTraining.prune(Sexp)
AbstractTraining.addBaseNPs(Sexp)
AbstractTraining.removeNullElements(Sexp)
AbstractTraining.raisePunctuation(Sexp)
AbstractTraining.identifyArguments(Sexp)
AbstractTraining.stripAugmentations(Sexp)
AbstractTraining.raisePunctuation(Sexp)
should be run after
AbstractTraining.removeNullElements(Sexp)
because a null element that is a
leftmost or rightmost child can block detection of a punctuation element
that needs to be raised after removal of the null element (if a punctuation
element is the next-to-leftmost or next-to-rightmost child of an interior
node)
AbstractTraining.stripAugmentations(Sexp)
should be run after all methods
that may depend upon the presence of nonterminal augmentations, such as
AbstractTraining.identifyArguments(Sexp)
preProcess
in interface Training
preProcess
in class AbstractTraining
tree
- the parse tree to pre-process
tree
having been pre-processedprotected void createArgNonterminalsSet()
AbstractTraining.createArgNonterminalsSet()
that adds argument nonterminal patterns, such as *-SBJ, to the
set of argument nonterminals.
createArgNonterminalsSet
in class AbstractTraining
public SexpList preProcessTest(SexpList sentence, SexpList originalWords, SexpList tags)
TagMap.transformTag(Word)
.
preProcessTest
in interface Training
preProcessTest
in class AbstractTraining
sentence
- the list of words, where a known word is a symbol and
an unknown word is represented by a 3-element list
(see DecoderServerRemote.convertUnknownWords(danbikel.lisp.SexpList)
)originalWords
- the list of unprocessed words (all symbols)tags
- the list of tag lists, where the list at index
i is the list of possible parts of speech for
the word at that index
sentence
and the
second of which is a processed version of tags
; if
tags
is null
, then the returned list will
contain only one element (since SexpList
objects are
not designed to handle null elements)TagMap.transformTag(Word)
public boolean isValidTree(Sexp tree)
false
;
otherwise, this method returns the value of the default implementation in
the superclass with the specified tree
(super.isValidTree(tree)
).
isValidTree
in interface Training
isValidTree
in class AbstractTraining
tree
- the tree to test for validitiy
false
if the specified tree's root label is equal to
Symbol.add("X")
, or super.isValidTree(tree)
otherwiseAbstractTraining.isAllNodesToPrune(Sexp)
,
Treebank.isPreterminal(Sexp)
protected int contains(StringBuffer searchBuf, String[] searchPatterns, IntCounter patternIdx)
TagMap.transformTag(Word)
.
protected Symbol transformTagOld(Word word)
TagMap.transformTag(Word)
.
word
- the word whose part-of-speech tag is to be transformed
Word
object
TagMap.transformTag(Word)
protected Sexp transformTags(Sexp tree)
tree
- the tree whose part-of-speech tags are to be mapped
protected boolean hasPossessiveChild(Sexp tree)
false
,
so that the default implementation of addBaseNPs(Sexp)
never considers an NP to be a possessive NP. Thus,
the behavior of addBaseNPs
is much simpler: all and only
NPs that do not dominate other NPs will be relabeled
NPB.
- Overrides:
hasPossessiveChild
in class AbstractTraining
- Parameters:
tree
- the tree to be tested
- Returns:
false
, regardless of the value of the specified tree
protected void canonicalizeNonterminals(Sexp tree)
canonicalizeNonterminals
in class AbstractTraining
tree
- the tree for which nonterminals, but not parts of speech,
are to be transformed into their canonical formsTreebank.getCanonical(Symbol)
public static void main(String[] args)
|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |