Parsing Engine

danbikel.parser.chinese
Class WordFeatures

java.lang.Object
  extended by danbikel.parser.lang.AbstractWordFeatures
      extended by danbikel.parser.chinese.WordFeatures
All Implemented Interfaces:
WordFeatures, Serializable

public class WordFeatures
extends AbstractWordFeatures

WordFeatures are orthographic and morphological features of words. Specifically, the word features encoded by the methods of this class are:

  1. capitalization
  2. hyphenization
  3. inflection
  4. derivation
  5. numeric
The features are encoded into a single symbol of the form: CcHhIiDdNn, where c encodes capitalization, h encodes hyphenization, i encodes inflection, d encodes derivation and n encodes the numeric feature. For example, "C3H0I0D3N0" encodes the features for the word "Geography" (that is, non-sentence-initial capitalized, no hyphenization, no inflection, "graphy" derivation and non-numeric).

See Also:
Serialized Form

Field Summary
static String useUnderscoresProperty
          The property obtained from the Settings class to indicate whether or not to consider underscores when creating the feature vector.
 
Fields inherited from class danbikel.parser.lang.AbstractWordFeatures
unknownWordSym
 
Constructor Summary
WordFeatures()
           
 
Method Summary
 Symbol defaultFeatureVector()
          The symbol that represents the case where none of the features fires for a particular word.
 Symbol features(Symbol word, boolean firstWord)
          Returns the features of a word.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

useUnderscoresProperty

public static final String useUnderscoresProperty
The property obtained from the Settings class to indicate whether or not to consider underscores when creating the feature vector.

See Also:
Constant Field Values
Constructor Detail

WordFeatures

public WordFeatures()
Method Detail

features

public Symbol features(Symbol word,
                       boolean firstWord)
Returns the features of a word.

Specified by:
features in interface WordFeatures
Overrides:
features in class AbstractWordFeatures
Parameters:
word - the word.
firstWord - indicates whether word is the first word of the sentence in which it occurs
Returns:
the encoded feature symbol.
See Also:
AbstractWordFeatures.unknownWordSym

defaultFeatureVector

public Symbol defaultFeatureVector()
Description copied from class: AbstractWordFeatures
The symbol that represents the case where none of the features fires for a particular word.

Specified by:
defaultFeatureVector in interface WordFeatures
Specified by:
defaultFeatureVector in class AbstractWordFeatures

Parsing Engine

Author: Dan Bikel.