Plot peeling trajectory for current rule:
traj ()
Arguments: none.
Remarks:
This command plots the peeling trajectory for the current rule as
described in Friedman and Fisher (1997), Sections 9 and 13. Each point
represents the output mean and support for a potential box.
Individual boxes can be (repeatedly) selected by moving the cross-hair
with the mouse to that point and clicking the left mouse button. A number
then appears on the plot near the selected point, identifying it. These
numbers can then be entered (one at a time) at the prompt
Different box {>traj()}. Box number = (=no)
in the SuperGEM dialog window. This provides a summary of the indicated
box. The first (unsolicited) summary is for the (upper) leftmost point,
representing the (smallest) box with the highest output mean.
The format of the box summary in the SuperGEM dialog window is as follows.
The first line shows the rule number in the covering sequence, the selected
box number, and the number of deleted "redundant" variables (see below).
The next two lines provide a table of statistics based on the
training and test data respectively (rows). The first column is the output
mean over the entire data set used to construct the rule. This is either the
"global" mean (first rule) or that of the data "remaining", after data
covered by previous rules have been removed. The second column is the output
mean in the selected box, and the third column its support.
Following the statistical summary is the definition of the box. This is
presented in the form of a table. Each input variable defining the box is
represented by one or more rows. Real valued inputs are represented by at
least one and at most three rows, one row for each possible limit
(lower, upper, not equal missing) on that variable. The first column
identifies the variable. The second column is a relational operator
(>, <, !=) indicating the nature of the bound on that variable (lower,
upper, not equal missing). The third column is the boundary value, and the
fourth (in parentheses) is its quantile value (in percent) on the data
set used to construct the rule (global or remaining).
Each categorical variable defining the box is represented by at least two
rows. The first row identifies the variable (first column), indicates that
it is categorical ("cat - second column), and gives the fraction (percent)
of observations in the total sample (global or remaining) that assume the
categorical values that are in the box. Following this first row are one
or more rows listing the values of the categorical variable that are in
the box, four values per line.
The input variables are listed in this table in order of their estimated
importance to the box definition; more important variables appear higher in
the table [see Friedman and Fisher (1997), Section 10]. The last two
columns of the table have one entry for each variable. They show
respectively the box mean and support resulting from removing that
variable, and all those below it (less important), in the table.
One can go back and forth between the two windows, repeatedly clicking
additional points on the peeling trajectory and entering their numbers
in the SuperGEM window. When the most appropriate box has been determined,
its number is entered at the SuperGEM prompt, and then a null line ()
is entered at the next prompt. This then becomes the chosen box for further
analysis and interpretation. In order to reactivate the S command window,
click the middle (unix) or right (wintel) mouse button on the peeling
trajectory plot.
After the appropriate box has been chosen, "redundant" variables can be
removed from its definition as described in Friedman and Fisher (1997),
Section 10. A table listing each of the variables (rows), in order of
their estimated importance to the box definition, is displayed in the
SuperGEM dialog window. The second column gives the name/number of the
respective variables. The two right most columns show respectively the
box mean and support resulting from removing that variable and all those
below it in the list. The numbers in the first column indicate the
corresponding number of variables that will be deleted by selecting each
respective row. The bottom row (0) gives the mean and support of the
current box with no variables deleted.
Following the table is the prompt
Redundant variable elimination: delete how many? ( = 0)
Respond with the desired number (first column). The corresponding input
variable (row), and all those below it, are thereby deleted from the box
definition producing a new (larger/simpler) box for further analysis and
interpretation. If no variables are to be deleted, simply enter 0 or .
This approach allows input variables to be deleted only in nested subsets.
If desired, additional individual inputs can be deleted (in any order) at
a later stage in the analysis dialog (see "sens").