Parsing Engine

danbikel.parser.chinese
Class Treebank

java.lang.Object
  extended by danbikel.parser.lang.AbstractTreebank
      extended by danbikel.parser.chinese.Treebank
All Implemented Interfaces:
Treebank, Serializable

public class Treebank
extends AbstractTreebank

Provides data and methods specific to the structures found in the Chinese Treebank or any other treebank that conforms to the same annotation guidelines.

See Also:
Serialized Form

Field Summary
 
Fields inherited from class danbikel.parser.lang.AbstractTreebank
augmentationDelimSet, canonicalAugDelimSym, nonterminalExceptionSet
 
Constructor Summary
Treebank()
          Constructs a Chinese Treebank object.
 
Method Summary
 String augmentationDelimiters()
          Returns a string of the three characters that serve as augmentation delimiters in the Chinese Treebank: "-=|".
 Symbol baseNPLabel()
          Returns the symbol with which AbstractTraining.addBaseNPs(Sexp) will relabel base NPs.
 Symbol getCanonical(Symbol label)
          Returns a canonical mapping for the specified nonterminal label; if label already is in canonical form, it is returned.
 Symbol getCanonical(Symbol label, boolean stripAugmentations)
          When the stripAugmentations argument is true, this method returns the same value as would be returned by getCanonical(Symbol) when passed the label argument; otherwise, the specified nonterminal is canonicalized unless it contains augmentations, in which case it is returned untouched.
 boolean isComma(Symbol word)
          Returns true if the specified word is a comma.
 boolean isConjunction(Symbol label)
          Returns true if label is equal to the symbol whose print name is "CC".
 boolean isLeftParen(Symbol word)
          Returns true if the specified word is a left parenthesis.
 boolean isNP(Symbol label)
          Returns true if the canonical version of the specified label is an NP for for Chinese Treebank.
 boolean isNullElementPreterminal(Sexp tree)
          Returns true if the specified S-expression represents a preterminal whose terminal element is the null element ("-NONE-") for the Chinese Treebank.
 boolean isPossessivePreterminal(Sexp tree)
          Returns true if the specified S-expression represents a preterminal that is the possessive part of speech.
 boolean isPreterminal(Sexp tree)
          Returns true if tree represents a preterminal subtree (part-of-speech tag and word).
 boolean isPuncToRaise(Sexp preterm)
          Returns true if the specified S-expression is a preterminal whose part of speech is "," or ".
 boolean isPunctuation(Symbol tag)
          Returns true if the specified part of speech tag is one for which AbstractTreebank.isPuncToRaise(Sexp) would return true.
 boolean isRightParen(Symbol word)
          Returns true if the specified word is a right parenthesis.
 boolean isSentence(Symbol label)
          Returns true is the specified nonterminal label represents a sentence in the Penn Treebank, that is, if the canonical version of label is equal to "S".
 boolean isVerb(Sexp preterminal)
          Returns true if preterminal represents a terminal with one of the following parts of speech: VB, VBD, VBG, VBN, VBP or VBZ.
 boolean isVerbTag(Symbol tag)
          Returns true if the specified symbol is the part of speech tag of a verb.
 boolean isWHNP(Symbol label)
          Returns true if the canonical version of the specified label is a WHNP in the Chinese Treebank.
 Symbol NPLabel()
          Returns the symbol that AbstractTraining.addBaseNPs(Sexp) should add as a parent if a base NP is not dominated by an NP.
 Nonterminal parseNonterminal(Symbol label, Nonterminal nonterminal)
          Calls AbstractTreebank.defaultParseNonterminal(Symbol, Nonterminal) with the specified arguments.
 Symbol sentenceLabel()
          Returns the canonical label for a sentence, for de-transforming sentences that were transformed via Training.relabelSubjectlessSentences(Sexp).
 Symbol subjectAugmentation()
          Returns the symbol that is used to augment nonterminals to indicate matrix subjects in this language’s Treebank.
 Symbol subjectlessSentenceLabel()
          Returns the symbol that relabelSubjectlessSentences will use for sentences that have no subjects.
 
Methods inherited from class danbikel.parser.lang.AbstractTreebank
addAugmentation, canonicalAugDelimiter, constructPreterminal, containsAugmentation, defaultParseNonterminal, getTag, getTraceIndex, isAugDelim, isBaseNP, makeWord, nonTreebankDelimiter, nonTreebankLeftBracket, nonTreebankRightBracket, parseNonterminal, removeAugmentation, removeAugmentation, stripAllButIndex, stripAllButIndex, stripAugmentation, stripIndex, stripIndex
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Treebank

public Treebank()
Constructs a Chinese Treebank object.

Method Detail

isPreterminal

public final boolean isPreterminal(Sexp tree)
Returns true if tree represents a preterminal subtree (part-of-speech tag and word). Specifically, this method returns true if tree is an instance of SexpList, has a length of 2 and has a first list element of type Symbol.

Specified by:
isPreterminal in interface Treebank
Specified by:
isPreterminal in class AbstractTreebank

isSentence

public boolean isSentence(Symbol label)
Returns true is the specified nonterminal label represents a sentence in the Penn Treebank, that is, if the canonical version of label is equal to "S".

Specified by:
isSentence in interface Treebank
Specified by:
isSentence in class AbstractTreebank
See Also:
Training.relabelSubjectlessSentences(Sexp)

sentenceLabel

public Symbol sentenceLabel()
Description copied from class: AbstractTreebank
Returns the canonical label for a sentence, for de-transforming sentences that were transformed via Training.relabelSubjectlessSentences(Sexp).

Specified by:
sentenceLabel in interface Treebank
Specified by:
sentenceLabel in class AbstractTreebank

subjectlessSentenceLabel

public Symbol subjectlessSentenceLabel()
Returns the symbol that relabelSubjectlessSentences will use for sentences that have no subjects.

Specified by:
subjectlessSentenceLabel in interface Treebank
Specified by:
subjectlessSentenceLabel in class AbstractTreebank

subjectAugmentation

public Symbol subjectAugmentation()
Description copied from class: AbstractTreebank
Returns the symbol that is used to augment nonterminals to indicate matrix subjects in this language’s Treebank.

Specified by:
subjectAugmentation in interface Treebank
Specified by:
subjectAugmentation in class AbstractTreebank
See Also:
Training.relabelSubjectlessSentences(Sexp)

isNullElementPreterminal

public boolean isNullElementPreterminal(Sexp tree)
Returns true if the specified S-expression represents a preterminal whose terminal element is the null element ("-NONE-") for the Chinese Treebank.

N.B.: Some null elements in the Chinese Treebank have indices appended. Consequently, this method simply checks if the print name of the preterminal starts with the string -NONE-.

Specified by:
isNullElementPreterminal in interface Treebank
Specified by:
isNullElementPreterminal in class AbstractTreebank
See Also:
Training.relabelSubjectlessSentences(Sexp)

isPuncToRaise

public boolean isPuncToRaise(Sexp preterm)
Returns true if the specified S-expression is a preterminal whose part of speech is "," or ".".

Specified by:
isPuncToRaise in interface Treebank
Specified by:
isPuncToRaise in class AbstractTreebank
Parameters:
preterm - the preterminal to test
See Also:
Training.raisePunctuation(Sexp)

isPunctuation

public boolean isPunctuation(Symbol tag)
Description copied from class: AbstractTreebank
Returns true if the specified part of speech tag is one for which AbstractTreebank.isPuncToRaise(Sexp) would return true.

Specified by:
isPunctuation in interface Treebank
Specified by:
isPunctuation in class AbstractTreebank
Parameters:
tag - the part of speech to test
See Also:
AbstractTreebank.isPuncToRaise(Sexp)

isPossessivePreterminal

public boolean isPossessivePreterminal(Sexp tree)
Returns true if the specified S-expression represents a preterminal that is the possessive part of speech. This method is intended to be used by implementations of AbstractTraining.addBaseNPs(Sexp).

Specified by:
isPossessivePreterminal in interface Treebank
Specified by:
isPossessivePreterminal in class AbstractTreebank
See Also:
Training.addBaseNPs(Sexp)

isNP

public boolean isNP(Symbol label)
Returns true if the canonical version of the specified label is an NP for for Chinese Treebank.

Specified by:
isNP in interface Treebank
Specified by:
isNP in class AbstractTreebank
Parameters:
label - the label to test
See Also:
AbstractTraining.addBaseNPs(Sexp)

baseNPLabel

public Symbol baseNPLabel()
Returns the symbol with which AbstractTraining.addBaseNPs(Sexp) will relabel base NPs.

Specified by:
baseNPLabel in interface Treebank
Specified by:
baseNPLabel in class AbstractTreebank
See Also:
AbstractTraining.addBaseNPs(danbikel.lisp.Sexp)

isWHNP

public boolean isWHNP(Symbol label)
Returns true if the canonical version of the specified label is a WHNP in the Chinese Treebank.

Specified by:
isWHNP in interface Treebank
Specified by:
isWHNP in class AbstractTreebank
See Also:
AbstractTraining.addGapInformation(Sexp)

NPLabel

public Symbol NPLabel()
Returns the symbol that AbstractTraining.addBaseNPs(Sexp) should add as a parent if a base NP is not dominated by an NP.

Specified by:
NPLabel in interface Treebank
Specified by:
NPLabel in class AbstractTreebank
See Also:
Training.addBaseNPs(Sexp)

isConjunction

public boolean isConjunction(Symbol label)
Returns true if label is equal to the symbol whose print name is "CC".

Specified by:
isConjunction in interface Treebank
Specified by:
isConjunction in class AbstractTreebank

isVerb

public boolean isVerb(Sexp preterminal)
Returns true if preterminal represents a terminal with one of the following parts of speech: VB, VBD, VBG, VBN, VBP or VBZ. It is an error to call this method with a Sexp object for which isPreterminal(Sexp) returns false.

Specified by:
isVerb in interface Treebank
Specified by:
isVerb in class AbstractTreebank
Parameters:
preterminal - the preterminal to test
Returns:
true if preterminal is a verb
See Also:
HeadTreeNode, Trainer

isVerbTag

public boolean isVerbTag(Symbol tag)
Description copied from class: AbstractTreebank
Returns true if the specified symbol is the part of speech tag of a verb. This method should return true for exactly the same parts of speech for which AbstractTreebank.isVerb(Sexp) returns true, and is used to calculate the distance metric while decoding.

Specified by:
isVerbTag in interface Treebank
Specified by:
isVerbTag in class AbstractTreebank
See Also:
CKYItem.containsVerb(), Decoder

isComma

public boolean isComma(Symbol word)
Description copied from class: AbstractTreebank
Returns true if the specified word is a comma. This method is used by the Decoder class when performing the comma constraint on chart items.

Specified by:
isComma in interface Treebank
Specified by:
isComma in class AbstractTreebank
Parameters:
word - the word to test
See Also:
Settings.decoderUseCommaConstraint

isLeftParen

public boolean isLeftParen(Symbol word)
Description copied from class: AbstractTreebank
Returns true if the specified word is a left parenthesis. This method is used by the Decoder class when performing the comma constraint on chart items.

Specified by:
isLeftParen in interface Treebank
Specified by:
isLeftParen in class AbstractTreebank
Parameters:
word - the word to test
See Also:
Settings.decoderUseCommaConstraint

isRightParen

public boolean isRightParen(Symbol word)
Description copied from class: AbstractTreebank
Returns true if the specified word is a right parenthesis. This method is used by the Decoder class when performing the comma constraint on chart items.

Specified by:
isRightParen in interface Treebank
Specified by:
isRightParen in class AbstractTreebank
Parameters:
word - the word to test
See Also:
Settings.decoderUseCommaConstraint

getCanonical

public final Symbol getCanonical(Symbol label)
Returns a canonical mapping for the specified nonterminal label; if label already is in canonical form, it is returned. The canonical mapping refers to transformations performed on nonterminals during the training process. Before obtaining a label's canonical form, it is also stripped of all augmentations (see AbstractTreebank.stripAugmentation(Symbol)).

Specified by:
getCanonical in interface Treebank
Specified by:
getCanonical in class AbstractTreebank
Parameters:
label - the label to be canonicalized
Returns:
a Symbol with the same print name as label, except that all training transformations and Treebank augmentations have been undone and stripped
See Also:
HeadFinder.findHead(Sexp)

getCanonical

public final Symbol getCanonical(Symbol label,
                                 boolean stripAugmentations)
When the stripAugmentations argument is true, this method returns the same value as would be returned by getCanonical(Symbol) when passed the label argument; otherwise, the specified nonterminal is canonicalized unless it contains augmentations, in which case it is returned untouched.

Specified by:
getCanonical in interface Treebank
Specified by:
getCanonical in class AbstractTreebank
Parameters:
label - the nonterminal label for which a canonical form is to be returned
stripAugmentations - whether to strip augmentations from the specified nonterminal label before canonicalization
Returns:
a canonical version of the specified nonterminal label, unless stripAugmentations is false and the specified label contains one or more augmentations

parseNonterminal

public Nonterminal parseNonterminal(Symbol label,
                                    Nonterminal nonterminal)
Calls AbstractTreebank.defaultParseNonterminal(Symbol, Nonterminal) with the specified arguments.

Specified by:
parseNonterminal in interface Treebank
Specified by:
parseNonterminal in class AbstractTreebank
Parameters:
label - to the nonterminal label to parse
nonterminal - the Nonterminal object to fill with the components of label

augmentationDelimiters

public String augmentationDelimiters()
Returns a string of the three characters that serve as augmentation delimiters in the Chinese Treebank: "-=|".

Specified by:
augmentationDelimiters in interface Treebank
Specified by:
augmentationDelimiters in class AbstractTreebank
See Also:
AbstractTreebank.stripAugmentation(Symbol), AbstractTreebank.defaultParseNonterminal(Symbol,Nonterminal)

Parsing Engine

Author: Dan Bikel.