|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface Treebank
A Treebank
implementation provides data and methods specific
to the structures found in a particular Treebank.
A language package must provide an implementation of this interface.
Method Summary | |
---|---|
void |
addAugmentation(Nonterminal nonterminal,
Symbol augmentation)
Adds the specified augmentation to the end of the (possibly empty) augmentation list of the specified Nonterminal object. |
String |
augmentationDelimiters()
Returns a string whose characters are the set of delimiters for complex nonterminal labels. |
Symbol |
baseNPLabel()
Returns the symbol with which Training.addBaseNPs(Sexp) will
relabel core NPs. |
char |
canonicalAugDelimiter()
Returns the first character of the string returned by augmentationDelimiters() , which will be considered the
"canonical" augmentation delimiter when adding
new augmentations, such as the argument augmentations added by
implementations of Training.identifyArguments(Sexp) . |
Sexp |
constructPreterminal(Word word)
Converts a Word object into a preterminal subtree. |
boolean |
containsAugmentation(Symbol nonterminal,
Symbol augmentation)
Provides an efficient, thread-safe method for testing whether the specified nonterminal contains the specified augmentation (without parsing the nonterminal). |
void |
defaultParseNonterminal(Symbol label,
Nonterminal nonterminal)
Fills in the specified Nonterminal object to represent
all the components of a complex nonterminal annotation: the base label,
any augmentations and any index. |
Symbol |
getCanonical(Symbol label)
Returns a canonical version of the specified nonterminal label; if label already is in canonical form, it is returned. |
Symbol |
getCanonical(Symbol label,
boolean stripAugmentations)
Returns a canonical version of the specified nonterminal label; if label already is in canonical form, it is returned. |
Symbol |
getTag(Sexp preterminal)
Gets the component of the preterminal tree that corresponds to the part of speech tag. |
int |
getTraceIndex(Sexp preterm,
Nonterminal nonterminal)
Returns the index of a trace for the specified null element preterminal. |
boolean |
isAugDelim(Sexp sexp)
Returns whether the specified S-expression is a symbol that is an augmentation delimiter for a complex nonterminal label. |
boolean |
isBaseNP(Symbol label)
Returns whether the specified label is for a base NP. |
boolean |
isComma(Symbol word)
Returns true if the specified word is a comma. |
boolean |
isConjunction(Symbol label)
Returns true if the canonical version of the specified label
is a conjunction tag or nonterminal in a particular Treebank. |
boolean |
isLeftParen(Symbol word)
Returns true if the specified word is a left
parenthesis. |
boolean |
isNP(Symbol label)
Returns true if the canonical version of the specified label
is an NP for the current language's Treebank. |
boolean |
isNullElementPreterminal(Sexp tree)
Returns true if the specified S-expression represents
a preterminal whose terminal element is the null element for the current
language's Treebank. |
boolean |
isPossessivePreterminal(Sexp tree)
Returns true if the specified S-expression represents
a preterminal that is the possessive part of speech. |
boolean |
isPreterminal(Sexp tree)
Returns whether tree represents a preterminal subtree in the
parse trees for this language's Treebank. |
boolean |
isPuncToRaise(Sexp preterm)
Returns true if the specified S-expression represents
a preterminal and a part-of-speech tag that indicates punctuation
to be raised when running Training.raisePunctuation(Sexp) . |
boolean |
isPunctuation(Symbol tag)
Returns true if the specified part of speech tag is one
for which isPuncToRaise(Sexp) would return true . |
boolean |
isRightParen(Symbol word)
Returns true if the specified word is a right
parenthesis. |
boolean |
isSentence(Symbol label)
Returns true is the specified nonterminal label represents a
sentence in the current language's Treebank. |
boolean |
isVerb(Sexp preterminal)
Returns true if the specified preterminal is that of a verb. |
boolean |
isVerbTag(Symbol tag)
Returns true if the specified symbol is the part of speech
tag of a verb. |
boolean |
isWHNP(Symbol label)
Returns true if the canonical version of the specified label
is an NP that undergoes WH-movement in a particular Treebank. |
Word |
makeWord(Sexp preterminal)
Constructs a Word object from the specified preterminal
subtree. |
char |
nonTreebankDelimiter()
Returns a delimiter not already in use by the current treebank, for use when constructing lexicalized nonterminals when the Settings.decoderOutputHeadLexicalizedLabels is true. |
char |
nonTreebankLeftBracket()
Returns a left-bracket character that is not an existing metacharacter in the current treebank, for use when the Settings.decoderOutputHeadLexicalizedLabels is true. |
char |
nonTreebankRightBracket()
Returns a right-bracket character that is not an existing metacharacter in the current treebank, for use when constructing lexicalized nonterminals when the Settings.decoderOutputHeadLexicalizedLabels is
true. |
Symbol |
NPLabel()
Returns the symbol that Training.addBaseNPs(Sexp) should
add as a parent if a base NP is not dominated by an NP. |
Nonterminal |
parseNonterminal(Symbol label)
Returns a Nonterminal object to represent all the
components of a complex nonterminal annotation: the base label, any
augmentations and any index. |
Nonterminal |
parseNonterminal(Symbol label,
Nonterminal nonterminal)
Identical to parseNonterminal(Symbol) , except that instead of
returning a newly-created Nonterminal object, this
method merely modifies the specified Nonterminal object. |
boolean |
removeAugmentation(Nonterminal nonterminal,
Symbol augmentation)
Removes the specified augmentation from the augmentation list of the specified Nonterminal object, and the previous augmentation
delimiter. |
Sexp |
removeAugmentation(Sexp sexp,
Nonterminal nonterminal,
Symbol augmentation)
Removes the specified nonterminal augmentation from the specified S-expression, using the specified Nonterminal object for temporary
storage. |
Symbol |
sentenceLabel()
Returns the canonical label for a sentence, for de-transforming sentences that were transformed via Training.relabelSubjectlessSentences(Sexp) . |
Symbol |
stripAllButIndex(Symbol label)
Returns a symbol identical to the specified label , except
all augmentations other than the index will be removed. |
Symbol |
stripAllButIndex(Symbol label,
Nonterminal nonterminal)
Identical to stripAllButIndex(Symbol) , except that instead of
creating a new Nonterminal object for use by
parseNonterminal(Symbol,Nonterminal) , this method
uses the specified nonterminal object. |
Symbol |
stripAugmentation(Symbol label)
Returns the Symbol created by stripping off all
augmentations, that is all characters after and including the first
character that appears in the string returned by
augmentationDelimiters() . |
Symbol |
stripIndex(Symbol label)
Returns label , but stripped of any index augmentation. |
Symbol |
stripIndex(Symbol label,
Nonterminal nonterminal)
Identical to stripIndex(Symbol) , except that instead of creating
a new Nonterminal object for use by parseNonterminal(Symbol,Nonterminal) , this method simply passes the
specified nonterminal object. |
Symbol |
subjectAugmentation()
Returns the symbol that is used to augment nonterminals to indicate matrix subjects in the current language's Treebank. |
Symbol |
subjectlessSentenceLabel()
Returns the symbol with which Training.relabelSubjectlessSentences(Sexp)
will relabel sentences when they have no subjects. |
Method Detail |
---|
boolean isPreterminal(Sexp tree)
tree
represents a preterminal subtree in the
parse trees for this language's Treebank. Typically, preterminals are
part-of-speech tags.
Symbol getTag(Sexp preterminal)
preterminal
- a tree that is assumed to be a preterminal
preterminal
that is a part of speechWord makeWord(Sexp preterminal)
Word
object from the specified preterminal
subtree.
preterminal
- a tree that is assumed to be a preterminal
preterminal
that is a part of speechSexp constructPreterminal(Word word)
Word
object into a preterminal subtree.
word
- the word object from which to create a preterminal subtree
word
Symbol getCanonical(Symbol label)
label
already is in canonical form, it is returned.
label
- the label to be canonicalizedSymbol getCanonical(Symbol label, boolean stripAugmentations)
label
already is in canonical form, it is returned.
label
- the label to be canonicalizedstripAugmentations
- indicates whether to strip any augmentations
from the specified label before attempting to get its canonical form
boolean isSentence(Symbol label)
true
is the specified nonterminal label represents a
sentence in the current language's Treebank. This method is intended to
be used by implementations of Training.relabelSubjectlessSentences(Sexp)
.
Symbol sentenceLabel()
Training.relabelSubjectlessSentences(Sexp)
.
Symbol subjectlessSentenceLabel()
Training.relabelSubjectlessSentences(Sexp)
will relabel sentences when they have no subjects.
Symbol subjectAugmentation()
Training.relabelSubjectlessSentences(Sexp)
boolean isNullElementPreterminal(Sexp tree)
true
if the specified S-expression represents
a preterminal whose terminal element is the null element for the current
language's Treebank. This method is intended to be used by implementations
of Training.relabelSubjectlessSentences(Sexp)
.
Training.relabelSubjectlessSentences(Sexp)
int getTraceIndex(Sexp preterm, Nonterminal nonterminal)
preterm
is not a null element preterminal (that is, a
preterminal for which isNullElementPreterminal(Sexp)
returns
false
), the semantics of this method are undefined.
preterm
- the null element preterminal whose trace index is to be
returnednonterminal
- the object used as the second argument to
parseNonterminal(Symbol,Nonterminal)
preterm
, or -1 if the null element does not have an indexboolean isPuncToRaise(Sexp preterm)
true
if the specified S-expression represents
a preterminal and a part-of-speech tag that indicates punctuation
to be raised when running Training.raisePunctuation(Sexp)
. If
punctuation raising is not desirable for a particular language
package, this method may be implemented simply to return
false
.
preterm
- the preterminal to testTraining.raisePunctuation(Sexp)
boolean isPunctuation(Symbol tag)
true
if the specified part of speech tag is one
for which isPuncToRaise(Sexp)
would return true
.
tag
- the part of speech to testisPuncToRaise(Sexp)
boolean isPossessivePreterminal(Sexp tree)
true
if the specified S-expression represents
a preterminal that is the possessive part of speech. This method is
intended to be used by implementations of Training.addBaseNPs(Sexp)
.
Training.addBaseNPs(Sexp)
boolean isNP(Symbol label)
true
if the canonical version of the specified label
is an NP for the current language's Treebank.
label
- the label to testTraining.addBaseNPs(Sexp)
Symbol baseNPLabel()
Training.addBaseNPs(Sexp)
will
relabel core NPs.isBaseNP(Symbol)
.
Training.addBaseNPs(Sexp)
boolean isBaseNP(Symbol label)
label
- the label to test
boolean isWHNP(Symbol label)
true
if the canonical version of the specified label
is an NP that undergoes WH-movement in a particular Treebank. This method
is used by Training.addGapInformation(Sexp)
. If a particular
language package does not require gap information, then this method may be
implemented simply to return false
.
Training.addGapInformation(Sexp)
Symbol NPLabel()
Training.addBaseNPs(Sexp)
should
add as a parent if a base NP is not dominated by an NP.
Training.addBaseNPs(Sexp)
boolean isConjunction(Symbol label)
true
if the canonical version of the specified label
is a conjunction tag or nonterminal in a particular Treebank.
boolean isVerb(Sexp preterminal)
true
if the specified preterminal is that of a verb.
This method is used by HeadTreeNode
to determine if a particular
subtree contains a verb, which is in turn used by Trainer
to
calculate the distance metric, which depends on whether a verb occurs
in the subtrees of the previous modifiers. It is the responsibility
of the caller to insure that preterminal
is a
Sexp
object for which isPreterminal(Sexp)
returns
true
.
HeadTreeNode
,
Trainer
boolean isVerbTag(Symbol tag)
true
if the specified symbol is the part of speech
tag of a verb. This method should return true for exactly the same
parts of speech for which isVerb(Sexp)
returns true
,
and is used to calculate the distance metric while decoding.
CKYItem.containsVerb()
,
Decoder
boolean isComma(Symbol word)
true
if the specified word is a comma. This method
is used by the Decoder
class when performing the comma
constraint on chart items.
word
- the word to testSettings.decoderUseCommaConstraint
boolean isLeftParen(Symbol word)
true
if the specified word is a left
parenthesis. This method is used by the Decoder
class when performing the comma constraint on chart items.
word
- the word to testSettings.decoderUseCommaConstraint
boolean isRightParen(Symbol word)
true
if the specified word is a right
parenthesis. This method is used by the Decoder
class when performing the comma constraint on chart items.
word
- the word to testSettings.decoderUseCommaConstraint
String augmentationDelimiters()
Nonterminal
as an
argument or return a Nonterminal
.
isAugDelim(Sexp)
,
stripAugmentation(Symbol)
,
defaultParseNonterminal(Symbol,Nonterminal)
char canonicalAugDelimiter()
augmentationDelimiters()
, which will be considered the
"canonical" augmentation delimiter when adding
new augmentations, such as the argument augmentations added by
implementations of Training.identifyArguments(Sexp)
.
char nonTreebankLeftBracket()
Settings.decoderOutputHeadLexicalizedLabels
is true.
For most treebanks, '[' is a good default.
char nonTreebankRightBracket()
Settings.decoderOutputHeadLexicalizedLabels
is
true. For most treebanks, ']' is a good default.
char nonTreebankDelimiter()
Settings.decoderOutputHeadLexicalizedLabels
is true.
Symbol stripAugmentation(Symbol label)
Symbol
created by stripping off all
augmentations, that is all characters after and including the first
character that appears in the string returned by
augmentationDelimiters()
.
label
- the potentially-complex nonterminal label to be stripped
label
with all augmentations removedSymbol stripIndex(Symbol label)
label
, but stripped of any index augmentation. This
method assumes that the index will always be the final augmentation in a
complex nonterminal label.Nonterminal
object, to be filled in by stripIndex(Symbol,Nonterminal)
.
label
- the nonterminal to be stripped of any possible index
Symbol
that is identical to label
,
except that all characters after and including the final delimiter
are removed if the final augmentation is composed entirely of digitsSymbol stripIndex(Symbol label, Nonterminal nonterminal)
stripIndex(Symbol)
, except that instead of creating
a new Nonterminal
object for use by parseNonterminal(Symbol,Nonterminal)
, this method simply passes the
specified nonterminal
object. In a sequential run, this
method provides maximum efficiency, as only one Nonterminal
object need be created at the beginning of the run.
Symbol stripAllButIndex(Symbol label)
label
, except
all augmentations other than the index will be removed. If
label
had no index to begin with, then this method
is functionally identical to stripAugmentation(Symbol)
.
label
- the nonterminal label to strip of non-index augmentationsSymbol stripAllButIndex(Symbol label, Nonterminal nonterminal)
stripAllButIndex(Symbol)
, except that instead of
creating a new Nonterminal
object for use by
parseNonterminal(Symbol,Nonterminal)
, this method
uses the specified nonterminal
object. In a sequential
run, this method provides maximum efficiency, as only one
Nonterminal
object need be created at the beginning
of the run.
Nonterminal parseNonterminal(Symbol label)
Nonterminal
object to represent all the
components of a complex nonterminal annotation: the base label, any
augmentations and any index. If there are no augmentations, the
augmentations
field of the returned object will contain
a list with zero elements; if there is no index, the
value of index will be -1. A final requirement of the contract of this
method is to represent all the delimiters in the list of augmentations;
this requirement is met, for example, by the helper method defaultParseNonterminal(Symbol,Nonterminal)
.Nonterminal
object with every invocation.
label
- a (possibly complex) nonterminal label from a Treebank
Nonterminal
object representing any and
all components of the specified complex nonterminalNonterminal
Nonterminal parseNonterminal(Symbol label, Nonterminal nonterminal)
parseNonterminal(Symbol)
, except that instead of
returning a newly-created Nonterminal
object, this
method merely modifies the specified Nonterminal
object.
This method may be used for efficiency: in a particular, sequential
training run, only one Nonterminal
need be created,
repeatedly passed in to this method for modification.
label
- a (possibly complex) nonterminal label from a Treebanknonterminal
- the representation of any and all components present
in label
void defaultParseNonterminal(Symbol label, Nonterminal nonterminal)
Nonterminal
object to represent
all the components of a complex nonterminal annotation: the base label,
any augmentations and any index. If there are no augmentations, the
augmentations
field of the returned object will contain a
list with no elements; if there is no index, the value of index will be
-1. Augmentation delimiters are the characters in the string returned by
augmentationDelimiters()
.parseNonterminal(Symbol,Nonterminal)
.
label
- a (possibly complex) nonterminal label from a TreebankNonterminal
boolean containsAugmentation(Symbol nonterminal, Symbol augmentation)
N.B.: This method assumes that the augmentation is preceded
by the canonical augmentation delimiter. To search for an augmentation
preceded by any of the possible augmentaion delimiters (as defined
by augmentationDelimiters()
), use
parseNonterminal(nonterminal).augmentations.contains(augmentation)
void addAugmentation(Nonterminal nonterminal, Symbol augmentation)
Nonterminal
object.
This method takes care to add the canonical augmentation delimiter
before adding the augmentation itself, and also takes care to add
these two elements before a final delimiter between the main augmentations
and the index, if one exists.
nonterminal
- the nonterminal to which to add an augmentationaugmentation
- the augmentation to add to nonterminal
's
augmentation listboolean removeAugmentation(Nonterminal nonterminal, Symbol augmentation)
Nonterminal
object, and the previous augmentation
delimiter. If the specified augmentation is not preceded by an
augmentation delimiter, meaning it is the base label itself, then it is not
removed.
nonterminal
- the nonterminal from which to remove an augmentationaugmentation
- the augmentation to remove from nonterminal
true
if augmentation
and a preceding
augmentation delimiter was removed from nonterminal
's
augmentation list, or false
otherwiseSexp removeAugmentation(Sexp sexp, Nonterminal nonterminal, Symbol augmentation)
Nonterminal
object for temporary
storage. If the specified S-expression is a list, then each element will
be destructively replaced with the return value of this method; otherwise,
if the specified S-epxression is a symbol, its augmentation is removed and
the new symbol is returned.
N.B.: While the description of the behavior of this method on lists
is recursive, a concrete implementation need not use a recursive
algorithm.
sexp
- the S-expression containing symbols whose augmentations
are to be removednonterminal
- an object used for temporary storage during the
invocation of this methodaugmentation
- the augmentation to be removed from all symbols in the
specified S-expression
boolean isAugDelim(Sexp sexp)
sexp
- the S-expression to be tested
augmentationDelimiters()
|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |