|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdanbikel.parser.lang.AbstractTreebank
danbikel.parser.chinese.Treebank
public class Treebank
Provides data and methods specific to the structures found in the Chinese Treebank or any other treebank that conforms to the same annotation guidelines.
Field Summary |
---|
Fields inherited from class danbikel.parser.lang.AbstractTreebank |
---|
augmentationDelimSet, canonicalAugDelimSym, nonterminalExceptionSet |
Constructor Summary | |
---|---|
Treebank()
Constructs a Chinese Treebank object. |
Method Summary | |
---|---|
String |
augmentationDelimiters()
Returns a string of the three characters that serve as augmentation delimiters in the Chinese Treebank: "-=|" . |
Symbol |
baseNPLabel()
Returns the symbol with which AbstractTraining.addBaseNPs(Sexp) will
relabel base NPs. |
Symbol |
getCanonical(Symbol label)
Returns a canonical mapping for the specified nonterminal label; if label already is in canonical form, it is returned. |
Symbol |
getCanonical(Symbol label,
boolean stripAugmentations)
When the stripAugmentations argument is true, this method returns the same value as would be returned by getCanonical(Symbol)
when passed the label argument; otherwise, the specified nonterminal
is canonicalized unless it contains augmentations, in which case
it is returned untouched. |
boolean |
isComma(Symbol word)
Returns true if the specified word is a comma. |
boolean |
isConjunction(Symbol label)
Returns true if label is equal to the symbol
whose print name is "CC" . |
boolean |
isLeftParen(Symbol word)
Returns true if the specified word is a left
parenthesis. |
boolean |
isNP(Symbol label)
Returns true if the canonical version of the specified label
is an NP for for Chinese Treebank. |
boolean |
isNullElementPreterminal(Sexp tree)
Returns true if the specified S-expression represents a
preterminal whose terminal element is the null element
("-NONE-" ) for the Chinese Treebank. |
boolean |
isPossessivePreterminal(Sexp tree)
Returns true if the specified S-expression represents
a preterminal that is the possessive part of speech. |
boolean |
isPreterminal(Sexp tree)
Returns true if tree represents a preterminal
subtree (part-of-speech tag and word). |
boolean |
isPuncToRaise(Sexp preterm)
Returns true if the specified S-expression is a preterminal
whose part of speech is "," or
". |
boolean |
isPunctuation(Symbol tag)
Returns true if the specified part of speech tag is one
for which AbstractTreebank.isPuncToRaise(Sexp) would return true . |
boolean |
isRightParen(Symbol word)
Returns true if the specified word is a right
parenthesis. |
boolean |
isSentence(Symbol label)
Returns true is the specified nonterminal label represents a
sentence in the Penn Treebank, that is, if the canonical version of
label is equal to "S" . |
boolean |
isVerb(Sexp preterminal)
Returns true if preterminal represents a
terminal with one of the following parts of speech: VB, VBD, VBG,
VBN, VBP or VBZ. |
boolean |
isVerbTag(Symbol tag)
Returns true if the specified symbol is the part of speech
tag of a verb. |
boolean |
isWHNP(Symbol label)
Returns true if the canonical version of the specified label
is a WHNP in the Chinese Treebank. |
Symbol |
NPLabel()
Returns the symbol that AbstractTraining.addBaseNPs(Sexp) should
add as a parent if a base NP is not dominated by an NP. |
Nonterminal |
parseNonterminal(Symbol label,
Nonterminal nonterminal)
Calls AbstractTreebank.defaultParseNonterminal(Symbol, Nonterminal) with
the specified arguments. |
Symbol |
sentenceLabel()
Returns the canonical label for a sentence, for de-transforming sentences that were transformed via Training.relabelSubjectlessSentences(Sexp) . |
Symbol |
subjectAugmentation()
Returns the symbol that is used to augment nonterminals to indicate matrix subjects in this language’s Treebank. |
Symbol |
subjectlessSentenceLabel()
Returns the symbol that relabelSubjectlessSentences
will use for sentences that have no subjects. |
Methods inherited from class danbikel.parser.lang.AbstractTreebank |
---|
addAugmentation, canonicalAugDelimiter, constructPreterminal, containsAugmentation, defaultParseNonterminal, getTag, getTraceIndex, isAugDelim, isBaseNP, makeWord, nonTreebankDelimiter, nonTreebankLeftBracket, nonTreebankRightBracket, parseNonterminal, removeAugmentation, removeAugmentation, stripAllButIndex, stripAllButIndex, stripAugmentation, stripIndex, stripIndex |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Treebank()
Treebank
object.
Method Detail |
---|
public final boolean isPreterminal(Sexp tree)
true
if tree
represents a preterminal
subtree (part-of-speech tag and word). Specifically, this method
returns true
if tree
is an instance of
SexpList
, has a length of 2 and has a first list element
of type Symbol
.
isPreterminal
in interface Treebank
isPreterminal
in class AbstractTreebank
public boolean isSentence(Symbol label)
true
is the specified nonterminal label represents a
sentence in the Penn Treebank, that is, if the canonical version of
label
is equal to "S"
.
isSentence
in interface Treebank
isSentence
in class AbstractTreebank
Training.relabelSubjectlessSentences(Sexp)
public Symbol sentenceLabel()
AbstractTreebank
Training.relabelSubjectlessSentences(Sexp)
.
sentenceLabel
in interface Treebank
sentenceLabel
in class AbstractTreebank
public Symbol subjectlessSentenceLabel()
relabelSubjectlessSentences
will use for sentences that have no subjects.
subjectlessSentenceLabel
in interface Treebank
subjectlessSentenceLabel
in class AbstractTreebank
public Symbol subjectAugmentation()
AbstractTreebank
subjectAugmentation
in interface Treebank
subjectAugmentation
in class AbstractTreebank
Training.relabelSubjectlessSentences(Sexp)
public boolean isNullElementPreterminal(Sexp tree)
true
if the specified S-expression represents a
preterminal whose terminal element is the null element
("-NONE-"
) for the Chinese Treebank.
N.B.: Some null elements in the Chinese Treebank have indices appended. Consequently, this method simply checks if the print name of the preterminal starts with the string -NONE-.
isNullElementPreterminal
in interface Treebank
isNullElementPreterminal
in class AbstractTreebank
Training.relabelSubjectlessSentences(Sexp)
public boolean isPuncToRaise(Sexp preterm)
true
if the specified S-expression is a preterminal
whose part of speech is ","
or
"."
.
isPuncToRaise
in interface Treebank
isPuncToRaise
in class AbstractTreebank
preterm
- the preterminal to testTraining.raisePunctuation(Sexp)
public boolean isPunctuation(Symbol tag)
AbstractTreebank
true
if the specified part of speech tag is one
for which AbstractTreebank.isPuncToRaise(Sexp)
would return true
.
isPunctuation
in interface Treebank
isPunctuation
in class AbstractTreebank
tag
- the part of speech to testAbstractTreebank.isPuncToRaise(Sexp)
public boolean isPossessivePreterminal(Sexp tree)
true
if the specified S-expression represents
a preterminal that is the possessive part of speech. This method is
intended to be used by implementations of AbstractTraining.addBaseNPs(Sexp)
.
isPossessivePreterminal
in interface Treebank
isPossessivePreterminal
in class AbstractTreebank
Training.addBaseNPs(Sexp)
public boolean isNP(Symbol label)
true
if the canonical version of the specified label
is an NP for for Chinese Treebank.
isNP
in interface Treebank
isNP
in class AbstractTreebank
label
- the label to testAbstractTraining.addBaseNPs(Sexp)
public Symbol baseNPLabel()
AbstractTraining.addBaseNPs(Sexp)
will
relabel base NPs.
baseNPLabel
in interface Treebank
baseNPLabel
in class AbstractTreebank
AbstractTraining.addBaseNPs(danbikel.lisp.Sexp)
public boolean isWHNP(Symbol label)
true
if the canonical version of the specified label
is a WHNP in the Chinese Treebank.
isWHNP
in interface Treebank
isWHNP
in class AbstractTreebank
AbstractTraining.addGapInformation(Sexp)
public Symbol NPLabel()
AbstractTraining.addBaseNPs(Sexp)
should
add as a parent if a base NP is not dominated by an NP.
NPLabel
in interface Treebank
NPLabel
in class AbstractTreebank
Training.addBaseNPs(Sexp)
public boolean isConjunction(Symbol label)
true
if label
is equal to the symbol
whose print name is "CC"
.
isConjunction
in interface Treebank
isConjunction
in class AbstractTreebank
public boolean isVerb(Sexp preterminal)
true
if preterminal
represents a
terminal with one of the following parts of speech: VB, VBD, VBG,
VBN, VBP or VBZ. It is an error to call this method
with a Sexp
object for which isPreterminal(Sexp)
returns false
.
isVerb
in interface Treebank
isVerb
in class AbstractTreebank
preterminal
- the preterminal to test
true
if preterminal
is a verbHeadTreeNode
,
Trainer
public boolean isVerbTag(Symbol tag)
AbstractTreebank
true
if the specified symbol is the part of speech
tag of a verb. This method should return true for exactly the same
parts of speech for which AbstractTreebank.isVerb(Sexp)
returns true
,
and is used to calculate the distance metric while decoding.
isVerbTag
in interface Treebank
isVerbTag
in class AbstractTreebank
CKYItem.containsVerb()
,
Decoder
public boolean isComma(Symbol word)
AbstractTreebank
true
if the specified word is a comma. This method
is used by the Decoder
class when performing the comma
constraint on chart items.
isComma
in interface Treebank
isComma
in class AbstractTreebank
word
- the word to testSettings.decoderUseCommaConstraint
public boolean isLeftParen(Symbol word)
AbstractTreebank
true
if the specified word is a left
parenthesis. This method is used by the Decoder
class when performing the comma constraint on chart items.
isLeftParen
in interface Treebank
isLeftParen
in class AbstractTreebank
word
- the word to testSettings.decoderUseCommaConstraint
public boolean isRightParen(Symbol word)
AbstractTreebank
true
if the specified word is a right
parenthesis. This method is used by the Decoder
class when performing the comma constraint on chart items.
isRightParen
in interface Treebank
isRightParen
in class AbstractTreebank
word
- the word to testSettings.decoderUseCommaConstraint
public final Symbol getCanonical(Symbol label)
label
already is in canonical form, it is returned.
The canonical mapping refers to transformations performed on nonterminals
during the training process. Before obtaining a label's canonical form,
it is also stripped of all augmentations (see
AbstractTreebank.stripAugmentation(Symbol)
).
getCanonical
in interface Treebank
getCanonical
in class AbstractTreebank
label
- the label to be canonicalized
Symbol
with the same print name as
label
, except that all training transformations and Treebank
augmentations have been undone and strippedHeadFinder.findHead(Sexp)
public final Symbol getCanonical(Symbol label, boolean stripAugmentations)
getCanonical(Symbol)
when passed the label argument; otherwise, the specified nonterminal
is canonicalized unless it contains augmentations, in which case
it is returned untouched.
getCanonical
in interface Treebank
getCanonical
in class AbstractTreebank
label
- the nonterminal label for which a canonical form is to be
returnedstripAugmentations
- whether to strip augmentations from the
specified nonterminal label before canonicalization
public Nonterminal parseNonterminal(Symbol label, Nonterminal nonterminal)
AbstractTreebank.defaultParseNonterminal(Symbol, Nonterminal)
with
the specified arguments.
parseNonterminal
in interface Treebank
parseNonterminal
in class AbstractTreebank
label
- to the nonterminal label to parsenonterminal
- the Nonterminal
object to fill with
the components of label
public String augmentationDelimiters()
"-=|"
.
augmentationDelimiters
in interface Treebank
augmentationDelimiters
in class AbstractTreebank
AbstractTreebank.stripAugmentation(Symbol)
,
AbstractTreebank.defaultParseNonterminal(Symbol,Nonterminal)
|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |