Parsing Engine

danbikel.parser
Class AnalyzeDisns

java.lang.Object
  extended by danbikel.parser.AnalyzeDisns

public class AnalyzeDisns
extends Object

An analysis and debugging class to analyze the probability distributions of all Models in a ModelCollection. It is important that

when the ModelCollection to be analyzed was created.

See Also:
Model.deleteCountsWhenPrecomputingProbs, Model.createHistBackOffMap

Field Summary
static int toPrevIdx
          The BiCountsTable index for retrieving the Jensen-Shannon divergence from a history context (distribution) at a particular back-off level to its corresponding previous back-off level (greater context) history context.
static int toZeroIdx
          The BiCountsTable index for retrieving the Jensen-Shannon divergence from a history context (distribution) at a particular back-off level to its corresponding zeroeth back-off level (maximal context) history context.
 
Method Summary
static void analyzeModWordDisn(ModelCollection mc, String eventStr)
          A debugging method for analyzing a particular event in the modifier word model.
static void computeEntropyAndJSStats(Model model, CountsTable[] entropy, BiCountsTable[] js)
          A method invoked by Model when Settings.modelDoPruning is true: entropy values and JS divergence values are used in the parameter-pruning method.
static CountsTable[] computeModelEntropies(Model model)
          A method to compute a model's entropy statistics for all estimated distributions.
static CountsTable[] computeModelEntropies(Model model, CountsTable[] entropy)
          A method to compute a model's entropy statistics for all estimated distributions.
static double entropy(double[] disn)
          Returns the entropy of the specified distribution.
static double entropy(double[] disn, int endIdx)
          Returns the entropy of the specified distribution.
static double entropyFromLogProbs(double[] disn)
          Returns the entropy of the specified distribution of log-probabilities.
static double entropyFromLogProbs(double[] disn, int endIdx)
          Returns the entropy of the specified distribution of log-probabilities.
static Set getFutures(Set futures, Model model, int level)
          Returns all possible futures for the specified model at the specified back-off level, using the specified set for storage (the specified set is first cleared before futures are stored).
static double[] getLogProbDisn(Model model, int level, Event hist, Set futures, double[] disn, Transition tmpTrans)
          Returns the smoothed log-probability distribution for the specified history at the specified back-off level in the specified model.
static double klDistFromLogProbs(double[] disnP, double[] disnQ)
          Returns D(disnP || disnQ), where D is the Kullback-Leibler divergence (relative entropy), and where each of the specified arguments is a distribution of log-probabilities.
static void main(String[] args)
          Analyzes and saves information about every distribution in every Model contained in a ModelCollection.
static CountsTable[] newEntropyCountsTables(Model model)
          Returns an array of CountsTable instances in which to store the entropy of every history at every back-off level.
static BiCountsTable[] newJSCountsTables(Model model)
          Returns an array of BiCountsTable instances in which to store the JS divergence of every history at every back-off level, both to the previous back-off level and to the zeroeth back-off level.
static void outputHistories(Model model)
          A debugging method that outputs all histories of the specified model to System.out.
static void writeKLDistStats(Model model)
          Creates two files named after the probability structure of the specified model, and writes Kullback-Leibler divergences (relative entropies) between the zeroeth-level back-off distributions and the other back-off distributions to one file and writes Jensen-Shannon divergences between zeroeth-level back-off distributions and the other back-off distributions to the other file.
static void writeModelStats(Model model)
          Creates a file named after the probability structure class of the specified model and writes information about every distribution contained in that model.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

toZeroIdx

public static final int toZeroIdx
The BiCountsTable index for retrieving the Jensen-Shannon divergence from a history context (distribution) at a particular back-off level to its corresponding zeroeth back-off level (maximal context) history context.

See Also:
Constant Field Values

toPrevIdx

public static final int toPrevIdx
The BiCountsTable index for retrieving the Jensen-Shannon divergence from a history context (distribution) at a particular back-off level to its corresponding previous back-off level (greater context) history context.

See Also:
Constant Field Values
Method Detail

entropy

public static double entropy(double[] disn)
Returns the entropy of the specified distribution.

Parameters:
disn - an array containing the probabilites of a distribution
Returns:
the entropy of the specified distribution.

entropy

public static double entropy(double[] disn,
                             int endIdx)
Returns the entropy of the specified distribution.

Parameters:
disn - an array containing the probabilities of a distribution
endIdx - the last index plus one in the specified array of probabilities
Returns:
the entropy of the specified distribution.

entropyFromLogProbs

public static double entropyFromLogProbs(double[] disn)
Returns the entropy of the specified distribution of log-probabilities.

Parameters:
disn - an array containing the log-probabilites of a distribution
Returns:
the entropy of the specified distribution of log-probabilities.

entropyFromLogProbs

public static double entropyFromLogProbs(double[] disn,
                                         int endIdx)
Returns the entropy of the specified distribution of log-probabilities.

Parameters:
disn - an array containing the log-probabilities of a distribution
endIdx - the last index plus one in the specified array of log-probabilities
Returns:
the entropy of the specified distribution of log-probabilities.

klDistFromLogProbs

public static double klDistFromLogProbs(double[] disnP,
                                        double[] disnQ)
Returns D(disnP || disnQ), where D is the Kullback-Leibler divergence (relative entropy), and where each of the specified arguments is a distribution of log-probabilities.

Parameters:
disnP - a distribution of log-probabilities
disnQ - a distribution of log-probabilities
Returns:
D(disnP || disnQ)

analyzeModWordDisn

public static void analyzeModWordDisn(ModelCollection mc,
                                      String eventStr)
                               throws IOException
A debugging method for analyzing a particular event in the modifier word model.

Parameters:
mc - the model collection from which to access the modifier word model
eventStr - the string to be converted to an S-expression that represents the TrainerEvent to be analyzed
Throws:
IOException

outputHistories

public static void outputHistories(Model model)
A debugging method that outputs all histories of the specified model to System.out.

Parameters:
model - the model whose histories are to be output

getFutures

public static Set getFutures(Set futures,
                             Model model,
                             int level)
Returns all possible futures for the specified model at the specified back-off level, using the specified set for storage (the specified set is first cleared before futures are stored).

Parameters:
futures - the set in which to store futures
model - the model from which to collect possible futures
level - the back-off level at which to collect possible futures (should normally be irrelevant)
Returns:
the specified Set having been destructively modified to contain possible futures for the specified model at the specified back-off level

getLogProbDisn

public static double[] getLogProbDisn(Model model,
                                      int level,
                                      Event hist,
                                      Set futures,
                                      double[] disn,
                                      Transition tmpTrans)
Returns the smoothed log-probability distribution for the specified history at the specified back-off level in the specified model.

Parameters:
model - the model from which to get a distribution of smoothed log-probability estimates
level - the back-off level of the specified history
hist - the history for which a distribution is to be gotten
futures - the set of possible futures for the specified history
disn - the array in which to store all smoothed log-probability estimates
tmpTrans - a temporary Transition object, to be used during the estimation of smoothed log-probabilities
Returns:
the specified array of double, having been modified to contain a distribution of log-probabilities at indices 0 through futures.size() - 1
Throws:
ArrayIndexOutOfBoundsException - if the specified array of double (the disn parameter) is of length less than futures.size()

computeModelEntropies

public static CountsTable[] computeModelEntropies(Model model)
A method to compute a model's entropy statistics for all estimated distributions.

Parameters:
model - the model whose entropy statistics are to be computed
Returns:
an array mapping every history of every back-off level to its entropy

computeModelEntropies

public static CountsTable[] computeModelEntropies(Model model,
                                                  CountsTable[] entropy)
A method to compute a model's entropy statistics for all estimated distributions.

Parameters:
model - the model whose entropy statistics are to be computed
entropy - an array of length model.getProbStructure().numLevels() in which to store entropy statistics for every history of every back-off level of the specified model
Returns:
an array mapping every history of every back-off level to its entropy

writeModelStats

public static void writeModelStats(Model model)
                            throws IOException
Creates a file named after the probability structure class of the specified model and writes information about every distribution contained in that model. Specifically, each line will contain the following six elements about a particular history context at a particular back-off level, where the elements are separated by tab characters:

Parameters:
model - the model whose distributions are to be analyzed
Throws:
IOException

newEntropyCountsTables

public static CountsTable[] newEntropyCountsTables(Model model)
Returns an array of CountsTable instances in which to store the entropy of every history at every back-off level. The array will necessarily be of length model.getProbStructure().numLevels().

Parameters:
model - the model for which entropies are to be computed
Returns:
an array of CountsTable instances in which to store the entropy of every history at every back-off level

newJSCountsTables

public static BiCountsTable[] newJSCountsTables(Model model)
Returns an array of BiCountsTable instances in which to store the JS divergence of every history at every back-off level, both to the previous back-off level and to the zeroeth back-off level. The array will necessarily be of length model.getProbStructure().numLevels().

Parameters:
model - the model for which entropies are to be computed
Returns:
an array of BiCountsTable instances in which to store the JS divergence of every history at every back-off level
See Also:
toPrevIdx, toZeroIdx

computeEntropyAndJSStats

public static void computeEntropyAndJSStats(Model model,
                                            CountsTable[] entropy,
                                            BiCountsTable[] js)
A method invoked by Model when Settings.modelDoPruning is true: entropy values and JS divergence values are used in the parameter-pruning method.

Parameters:
model - the model whose entropies and JS divergence statistics are to be computed
entropy - the array of counts tables in which to store the entropies of the specified model's distributions
js - the array of BiCountsTable objects in which to store the JS divergence statistics of the specified model's distributions

writeKLDistStats

public static void writeKLDistStats(Model model)
                             throws IOException
Creates two files named after the probability structure of the specified model, and writes Kullback-Leibler divergences (relative entropies) between the zeroeth-level back-off distributions and the other back-off distributions to one file and writes Jensen-Shannon divergences between zeroeth-level back-off distributions and the other back-off distributions to the other file. The KL divergence file will have the extension ".kl" and the JS divergence file will have the extension ".js".

Specifically, the KL divergence file will contain one line for each zeroeth-level (maximal-context) history with the following elements, separated by tab characters:

where hist_i is an S-expression of the history at back-off level i, D is the relative entropy function and c is the count of a history context int training, meaning that the quantity c(hist_i-1)/c(hist_i) is the probability of seeing the extra context in hist_i-1 compared to hist_i.

For example, if a model has three back-off levels (a zeroeth, maximal-context level and two more levels, each with less context), then each line will contain 11 elements separated by tab characters, where the first element is the S-expression of the zeroeth back-off level history and with five elements for each of the other two back-off levels.

The JS divergence file will contain one line for each non-zeroeth-level history with the following four elements, separated by tab characters:

where JS is the Jensen-Shannon divergence function.

Parameters:
model - the model whose distributions are to be analyzed
Throws:
IOException

main

public static void main(String[] args)
Analyzes and saves information about every distribution in every Model contained in a ModelCollection. It is important that when the ModelCollection to be analyzed was created.

usage: <derived data file>
where <derived data file> was produced by Trainer.

Parameters:
args - an array containing at least one element that is the name of a model collection (derived data file) as produced by a Trainer instance
See Also:
Trainer, Trainer.writeModelCollection(ObjectOutputStream,String,String), Trainer.loadModelCollection(String)

Parsing Engine

Author: Dan Bikel.