Parsing Engine

danbikel.parser
Class Chart

java.lang.Object
  extended by danbikel.parser.Chart
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
CKYChart

public abstract class Chart
extends Object
implements Serializable

Provides the skeletal infrastructure for a chart indexed by start and end words, as well as by arbitrary labels taken from the chart items.

See Also:
Item, Serialized Form

Nested Class Summary
protected static class Chart.Entry
          Contains all information and items covering a particular span.
 
Field Summary
protected  int cellLimit
          The maximum number of items allowed in a cell (span) of this chart.
protected  Chart.Entry[][] chart
          A chart is a two-dimensional array of maps, each of which maps Item objects to their logProbs.
protected static boolean debugNumItemsGenerated
          Indicats to keep track of the number of items generated by the decoder and print that information to System.err whenever reclaimItemsInChart() is invoked.
protected static boolean debugNumPrunedItems
          Indicats to keep track of the number of items pruned when parsing and print that information to System.err whenever reclaimItemsInChart() is invoked.
protected  ObjectPool itemPool
          The pool of chart items, to be used by the decoder instead of constructing new chart items while decoding.
protected  int numPrePruned
          The total number of items pre-pruned for a particular sentence, via calls to the toPrune(int,int,Item) method.
protected  int numPruned
          The total number of items pruned during the parse of a particular sentence.
protected  double pruneFact
          The natural log of the prune factor for this chart.
protected  boolean pruning
          Indicates whether the chart is currently doing any pruning.
protected  boolean relax
          Indicates whether to use a more relaxed form of pruning (wider beam in certain instances, to be determined by the concrete subclass).
protected  int size
          The current size of the chart.
protected  int totalItems
          The total number of items added to this chart for a particular sentence (between calls to clear()).
protected  int totalItemsGenerated
          The total number of items generated, that is, the total number of items that a decoder attempts to add to this chart (used for debugging).
 
Constructor Summary
protected Chart()
          Constructs a new chart with the default chart size.
protected Chart(int size)
          Constructs a new chart with the specified chart size.
protected Chart(int cellLimit, double pruneFact)
          Constructs a new chart with a default initial chart size, and with the specified cell limit and prune factor.
protected Chart(int size, int cellLimit, double pruneFact)
          Constructs a new chart with the specified initial chart size, cell limit and prune factor.
 
Method Summary
 boolean add(int start, int end, Item item)
          Adds the specified item covering the specified span to this chart.
protected abstract  boolean cellLimitShouldApplyTo(Item item)
          Returns true if cell limiting should apply to the specified item.
 void clear()
          Checks every map of the chart covering a span less than or equal to size and clears it; if a chart entry is null, then a new map is created.
 void dontDoPruning()
          Tells this chart not to do any pruning.
protected  void dontRelax()
          Tells this chart not to use a more relaxed form of pruning (the default behavior).
 void doPruning()
          Indicates that the chart should prune.
 Iterator get(int start, int end)
          Returns an iterator over all chart items (having all labels) covering the specified span.
 Item getTopItem(int start, int end)
          Returns the item with the highest log probability covering the specified span, or null if this span has no items.
 double getTopLogProb(int start, int end)
          Returns the highest log probability of an item covering the specified span.
 int numItems(int start, int end)
          Returns the number of chart items covering the specified span.
protected  boolean outsideBeam(Item item, double topProb)
          Returns whether the specified chart item is outside the beam given the specified top log prob.
 void postParseCleanup()
          Called by Decoder.parse after parsing has finished for a particular sentence.
 void prune(int start, int end)
          Prunes away items in the specified span that are either below the probability threshold of the top-ranked item for that span, or are outside the cell limit, if one has been specified.
protected  void reclaimItem(Item item)
          Reclaims this chart item.
protected abstract  void reclaimItemCollection(Collection c)
          A hook called by reclaimItemsInChart() to allow subclasses to reclaim each span's collection of chart items.
protected  void reclaimItemsInChart()
          Reclaims all items currently in the chart.
protected  void relax()
          Tells this chart to use a more relaxed form of pruning (for example, a wider beam in certain circumstances).
 void resetTopLogProb(int start, int end)
          Resets the highest log probability of the specified span to be Constants.logOfZero.
 void setPruneFactor(double pruneFact)
          Sets the prune factor.
 void setSize(int size)
          Ensures that the size of the chart is at least as large as the specified size.
 void setSizeAndClear(int size)
          Sets this chart to be the specified size and clears it for parsing sentences of length less than or equal to the specified size.
protected abstract  void setUpItemPool()
          Sets up the item object pools.
protected  boolean toPrune(int start, int end, Item item)
          Returns true if the specified chart item should not be added to the specified set of items because its probability is not within pruneFact of the highest-ranked item.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

debugNumPrunedItems

protected static final boolean debugNumPrunedItems
Indicats to keep track of the number of items pruned when parsing and print that information to System.err whenever reclaimItemsInChart() is invoked.

See Also:
Constant Field Values

debugNumItemsGenerated

protected static final boolean debugNumItemsGenerated
Indicats to keep track of the number of items generated by the decoder and print that information to System.err whenever reclaimItemsInChart() is invoked.

See Also:
Constant Field Values

chart

protected Chart.Entry[][] chart
A chart is a two-dimensional array of maps, each of which maps Item objects to their logProbs. More specifically, a chart is a two-dimensional array of Chart.Entry objects, each of which contains one such map, as well as a data member that stores the top log probability of all the items covering the span of the chart entry.

See Also:
Chart.Entry

size

protected int size
The current size of the chart. This value should always be equal to or greater than the length of the parsed sentence.

See Also:
setSizeAndClear(int)

cellLimit

protected int cellLimit
The maximum number of items allowed in a cell (span) of this chart.

See Also:
Settings.decoderUseCellLimit

pruneFact

protected double pruneFact
The natural log of the prune factor for this chart. If items have a log probability that is lower than the top-ranked item's log probability minus this prune factor, they are pruned away.


relax

protected boolean relax
Indicates whether to use a more relaxed form of pruning (wider beam in certain instances, to be determined by the concrete subclass).


totalItems

protected int totalItems
The total number of items added to this chart for a particular sentence (between calls to clear()).


itemPool

protected transient ObjectPool itemPool
The pool of chart items, to be used by the decoder instead of constructing new chart items while decoding.


totalItemsGenerated

protected int totalItemsGenerated
The total number of items generated, that is, the total number of items that a decoder attempts to add to this chart (used for debugging).


numPruned

protected int numPruned
The total number of items pruned during the parse of a particular sentence. Typically, after all items have been added for a particular span, a decoder will invoke the prune(int,int) method for that span. The value of this data member after parsing is complete will reflect the total number of items pruned via calls to this method.

See Also:
prune(int,int)

numPrePruned

protected int numPrePruned
The total number of items pre-pruned for a particular sentence, via calls to the toPrune(int,int,Item) method.

See Also:
toPrune(int,int,Item)

pruning

protected boolean pruning
Indicates whether the chart is currently doing any pruning.

See Also:
doPruning(), dontDoPruning()
Constructor Detail

Chart

protected Chart()
Constructs a new chart with the default chart size. This instructor will be called, often implicitly, by the constructor of a subclass.


Chart

protected Chart(int size)
Constructs a new chart with the specified chart size. This constructor must be called by the constructor of a subclass.

Parameters:
size - the initial size of this chart

Chart

protected Chart(int cellLimit,
                double pruneFact)
Constructs a new chart with a default initial chart size, and with the specified cell limit and prune factor.

Parameters:
cellLimit - the limit to the number of items per cell
pruneFact - that log of the prune factor
See Also:
cellLimit, pruneFact

Chart

protected Chart(int size,
                int cellLimit,
                double pruneFact)
Constructs a new chart with the specified initial chart size, cell limit and prune factor.

Parameters:
size - the initial size of this chart
cellLimit - the limit to the number of items per cell
pruneFact - that log of the prune factor
See Also:
cellLimit, pruneFact
Method Detail

setSizeAndClear

public void setSizeAndClear(int size)
Sets this chart to be the specified size and clears it for parsing sentences of length less than or equal to the specified size. This method should be called prior to parsing every sentence. The default implementation of this method simply calls setSize(size) and then clear().

Parameters:
size - the size to be set for this chart

setSize

public void setSize(int size)
Ensures that the size of the chart is at least as large as the specified size.

Parameters:
size - the size to be set for this chart

clear

public void clear()
Checks every map of the chart covering a span less than or equal to size and clears it; if a chart entry is null, then a new map is created.


relax

protected void relax()
Tells this chart to use a more relaxed form of pruning (for example, a wider beam in certain circumstances). The exact nature of the relaxed pruning is implementation-dependent.


dontRelax

protected void dontRelax()
Tells this chart not to use a more relaxed form of pruning (the default behavior).

See Also:
relax()

outsideBeam

protected boolean outsideBeam(Item item,
                              double topProb)
Returns whether the specified chart item is outside the beam given the specified top log prob.

Parameters:
item - the item to be tested for inclusion within the beam
topProb - the log prob of the top-ranked item for the specified item's span
Returns:
true if the specified item is outisde the beam given the specified top log prob, false otherwise

toPrune

protected boolean toPrune(int start,
                          int end,
                          Item item)
Returns true if the specified chart item should not be added to the specified set of items because its probability is not within pruneFact of the highest-ranked item. This method is guaranteed to be called by add(int,int,Item) when a sorted set of chart items already exists (and hence contains at least one chart item).

Parameters:
start - the start of the span for the specified item
end - the end of the span for the specified item
item - the item being added to items
Returns:
true if the specified chart item should not be added to the set of items covering the specified span

prune

public void prune(int start,
                  int end)
Prunes away items in the specified span that are either below the probability threshold of the top-ranked item for that span, or are outside the cell limit, if one has been specified.

Parameters:
start - the start of the span in which to prune chart items
end - the end of the span in which to prune chart items
See Also:
cellLimit, pruneFact

cellLimitShouldApplyTo

protected abstract boolean cellLimitShouldApplyTo(Item item)
Returns true if cell limiting should apply to the specified item.

Parameters:
item - the item to be tested
Returns:
whether cell limiting should apply to the specified item

add

public boolean add(int start,
                   int end,
                   Item item)
Adds the specified item covering the specified span to this chart.

Caution: By convention, the caller is responsible for reclaiming unadded chart items (via reclaimItem(Item)). However, in the case where an old item is removed from the chart because a new, equivalent item has a greater probability, this method does not immediately reclaim the old item, but rather marks it as "garbage", via its setGarbage method. This behvaior is to prevent the following case. Suppose that the behavior of this method were to reclaim such "booted" chart items immediately. Imagine that the caller is attempting to add a collection of chart items, and that an item in that collection was added, but then was removed due to a subsequent, equivalent item in the same collection having greater probability. In this case, the caller would have no way to know that the previous item in its collection had been silently reclaimed, so that if it operated over that collection again, there would be a garbage item that, as far as it knew, had been added and was still in the chart. By marking such "booted" items as garbage instead of immediately reclaiming them, callers can check for items' garbage status when they repeatedly operate over collections of "added" items.

Parameters:
start - the index of the first word in the span covered by item
end - the index of the last word in the span covered by item
item - the item to be added to this chart
Returns:
true if item was added to the chart, false otherwise

getTopLogProb

public double getTopLogProb(int start,
                            int end)
Returns the highest log probability of an item covering the specified span.

Parameters:
start - the start of the span for which to get the top log prob
end - the end of the span for which to get the top log prob
Returns:
the top log prob for the specified span

getTopItem

public Item getTopItem(int start,
                       int end)
Returns the item with the highest log probability covering the specified span, or null if this span has no items.

Parameters:
start - the start of the span for which to get the top-ranked item
end - the end of the span for which to get the top-ranked item
Returns:
the top-ranked item for the specified span

resetTopLogProb

public void resetTopLogProb(int start,
                            int end)
Resets the highest log probability of the specified span to be Constants.logOfZero.

Parameters:
start - the beginning of the span whose highest log prob is to be reset
end - the end of the span whose highest log prob is to be reset

numItems

public int numItems(int start,
                    int end)
Returns the number of chart items covering the specified span.

Parameters:
start - the start of the span for which to retrieve the number of items
end - the end of the span for which to retrieve the number of items
Returns:
the number of items in this chart covering the specified span

get

public Iterator get(int start,
                    int end)
Returns an iterator over all chart items (having all labels) covering the specified span. The iterator returned is for read-only access, and thus an UnsupportedOperationException will be thrown if its remove method is invoked.

Parameters:
start - the start index of the span for which to get all items
end - the end index of the span for which to get all items
Returns:
an iterator over all chart items covering the specified span

reclaimItem

protected void reclaimItem(Item item)
Reclaims this chart item. This method returns the specified item to the object pool of available items.

Parameters:
item - the item to be reclaimed

postParseCleanup

public void postParseCleanup()
Called by Decoder.parse after parsing has finished for a particular sentence. This default implementation simply calls reclaimItemsInChart().


reclaimItemsInChart

protected void reclaimItemsInChart()
Reclaims all items currently in the chart.


setPruneFactor

public void setPruneFactor(double pruneFact)
Sets the prune factor.


doPruning

public void doPruning()
Indicates that the chart should prune. THis method may be invoked during parsing.


dontDoPruning

public void dontDoPruning()
Tells this chart not to do any pruning. This method may be invoked during parsing.


reclaimItemCollection

protected abstract void reclaimItemCollection(Collection c)
A hook called by reclaimItemsInChart() to allow subclasses to reclaim each span's collection of chart items.

Parameters:
c - the collection of chart items to be reclaimed

setUpItemPool

protected abstract void setUpItemPool()
Sets up the item object pools. This allows subclasses to specify the type of item to be held in the object pool.


Parsing Engine

Author: Dan Bikel.