Specification of a Java Implemention of Modulated Sort Encoding

Hassan Aït-Kaci

HAK's Language Technology
All contents Copyright © by Hassan Aït-Kaci—All Rights Reserved
Last modified on Sat Apr 20 20:49:47 2013 by hak

Table of Contents


This is a Java specification of the modulated encoding method presented in the paper entitled "Efficient Implementation of Lattice Operations" published in the ACM's Transactions on Programming Languages and Systems in 1989. We start with describing the method as overviewed in that paper, then we elaborate on adapting it in specific details for a Java representation and implementation.

Implementation of modulation requires two things: a method to generate the codes, and an efficient implementation of the GLB, LUB, and BUTNOT operations on the resulting codes.


Generating Modules

In order to take advantage of the benefits of modulation, it is necessary to discover the group boundaries, or to draw circles around groups of elements. Below is a sketch of an algorithm to find such modules in an arbitrary poset. This algorithm may not find every possible module, but it does a pretty good job at finding the vast majority of them, and is able to modulate posets with thousands of elements in a few seconds. Also, if one wishes to create modules of modules, one simply need invoke the top level function on an already modulated poset.

The basic idea of this algorithm is to group elements together a few at a time until a group grows to some group size bound, in which case that group is not expanded any further, or the set of remaining objects (groups and ungrouped elements) is less than some threshold, in which case the entire process of modulation is complete.

For simplicity in the pidgin code below, a module is named after its first element. Also, it is assumed that the entire poset has been recorded as a set of related pairs, accessible through the previously seen functions Parents(x) and Children(x). It also assumes the operations Relate(x,y) which asserts that x≺y, and Unrelate(x,y) which asserts that x⊀y (recall that x≺y means "x immediately less than y"). A few of the procedures used here are not defined, but are obvious from the context.

The algorithm simply looks at each element, attempting to group it together with its parents. If that fails, it tries to group it with a sibling that has exactly the same set of parents as it does. In both types of grouping, it is possible that extra elements (children of the newly added elements) will need to be added to the group in order to satisfy the module requirements. Often, in trying to add the necessary extra children, the group will grow to encompass the entire poset. Thus checks for exceeding the group size bound are added to many of the procedures below. The bound used here is MAX_CODE_SIZE. This bound would depend on specific factors such as machine word length (e.g., 64 on a 64-bit machine).

The function Modulate is the main loop of the algorithm and can be described as follows:

function Modulate (s : set of elements) returns : set of set of elements
  {
    queue = s;
    modules = ∅
    while (queue != ∅ &&  |queue ∪ modules| > MAX_CODE_SIZE)
      {
        elt = queue.pop();
        newmodule = GrowUpward(elt);
        if (newmodule == ∅)
          newmodule = GrowSideways(elt);
        modules = (modules - newmodule) ∪ {newmodule};
        queue = (queue - newmodule) ∪ {newmodule};
        RecordNewModule(newmodule);
      }
    return modules;
  }
It begins by putting each element of the poset on a queue. It then examines the first thing in the queue. If it is able to grow that element into a larger module, then that new module is recorded, and is pushed on the queue (in order for it to grow further).

The procedure RecordNewModule simply updates the relatedness of elements and modules. Namely,

procedure RecordNewModule (s : set of elements)
  {
    base = ⋃ {Children(x) | x ∈ s};
    crown = ⋃ {Parents(x) | x ∈ s};

    for (x ∈ base)
      for (y ∈ s)
        {
          Unrelate(x,y);
          Relate(x,s);
        }

    for (x ∈ crown)
      for {y ∈ s}
        {
          Unrelate(y,x);
          Relate(s,x);
        }
  }
The function GrowUpward immediately adds all the parents of the element, since an element can never be grouped with only a subset of its parents. It then tries adding extra elements upward, until a homogeneous layer is found. Homogeneous(s) returns true if its argument s is a singleton set, false otherwise.
function GrowUpward (e : element) returns : set of elements
  {
    crown = Parents(e);

    while (!Homogeneous(crown) && |crown| < MAX_CODE_SIZE)
      TryAddingMoreAncestors(crown);

    if (|crown| ≥ MAX_CODE_SIZE)
      return ∅;

    return AddNecessaryChildren(crown, e);
  }
The function GrowSideways is similar to GrowUpward, except that it begins by finding its siblings (a cochain), and then finding those siblings which have exactly the same parents as the element. Those full siblings are then added, one at a time, until the bound is reached. As each new sibling is added, some children may have to be added in order to form a group.
function GrowSideways (e : element) returns : set of elements
  {
    family = {e};
    crown = Parents(e);
    siblings = ⋃ {Children(x) | x ∈ crown};
    fullsiblings = ⋃ {x ∈ siblings | Parents(x) == crown};
    oldfamily = ∅;

    while (fullsiblings != ∅ && |family| ≤ MAX_CODE_SIZE)
      {
        brother = fullsiblings.pop();
        family = AddNecessaryChildren({brother},family);
        if (|family| ≤ MAX_CODE_SIZE)
          oldfamily = family;
      }

    return oldfamily;
  }
The function AddNecessaryChildren begins adding children of the new elements until either a group is formed or the module-size limit is reached. In fact, it could be the case that by adding some new parents and some more new children, a group could be formed. However this is unlikely, and complicates the algorithm. Thus a simplification has been made to this function: Whenever one of the new children has any parents, there termed uncles, the function returns failure, preventing any such cancerous growth.
function AddNecessaryChildren (added, old : set of elements) returns : set of elements
  {
    new = added - old;

    if (new == ∅)
      return old;

    group = new ∪ old;

    if (|group| > MAX_CODE_SIZE)
      return ∅;

    newkids = ⋃ {Children(x) | x ∈ new};
    uncles = ⋃ {Parents(x) | x ∈ newkids};

    if (uncles ⊈ group)
      return ∅;

    return AddNecessaryChildren(newkids, group);
  }


It is possible to relax the requirements on modules to allow multiple topmost elements, and multiple bottommost elements in each module. If the upper surface (defined to be all the elements of a group that are immediate descendants of elements not in the group) is upward-homogeneous (defined here to mean that all elements have exactly the same set of parents), and the lower surface is downward-homogeneous, (defined similarly) then the group is a module. As described earlier, modules have singleton upper and lower surfaces, which are trivially homogeneous. In fact, the algorithm presented above is able to find modules that have large surfaces if the definition of Homogeneous is changed from a simple singleton test to one which tests whether all the elements presented have exactly the same set of parents. This better test can be used without changing the algorithm.


Operations on Modulated Codes

We will consider a sort poset encoded using only one level of modulation, so each code consists of only one group code, and a local code. The generalization of these operations for modules of modules is straightforward.

First we define >m to be a function on two modulated codes X and Y, where Xg is X's group code, Xl is X's local code, and Yg is Y's group code, Yl is Y's local code. We first compute the bitwise and (noted &) of Xg and Yg and call it Ag. Then we compute the bitwise and of Xl and Yl and call it Al. Within any group g, we denote by g the topmost sort's code in that group, and by g the bottommost sort's code in that group.

Then, we can define the modulated subsumption relation on two modulated X and Y codes as: X >m Y iff:

This definition says that X is greater than (or subsumes) Y if and only if either X and Y are in the same group and Xl subsumes Yl (i.e., Xl > Yl, where > is the local subsumption ordering in their common group), or if X and Y belong to different groups and Y's group is subsumed by X's group.

Then the GLB of two modulated codes X = (Xg,Xl) and Y = (Yg,Yl) is the modulated code X ∧ Y that can be defined using the GLB of group codes Ag = Xg & Yg and the GLB of local codes Al = Xl & Yl, as follows:

The first possibility is that X and Y are in the same group, and so the result is that group's code appended to the bitwise AND of their local codes. The second possibility is that X's group is subsumed by Y's, in which case the result is simply X's original modulated code. The third possibility is that Y's group is subsumed by X's, in which case the result is simply Y's original modulated code. The last possibility is that neither group subsumes the other, and thus the result is the topmost element in the group, which is the modulated code whose group code is the the bitwise AND of the group codes of X and Y, and whose local code is the greatest lower bound of the two groups—namely, the topmost element in that group.

In order to compute the LUB of modulated codes, we also compute the bitwise OR of the group and local codes, and call them Og = Xg | Yg and Ol = Xl | Yl respectively. Contrary to the modulated-code GLB computation, computing the LUB of two modulated codes from incomparable groups needs to resort to an explicit binary disjunctive constructor that we will denote as . Thus the LUB of X and Y is denoted as X ∨ Y and its value is given by:

The first possibility is that X and Y are in the same group, and so the result is that group's code appended to the bitwise OR of their local codes. The second possibility is that X's group subsumes Y's, in which case the result is simply X's original code. The third possibility is that Y's group subsumes X's, in which case the result is simply Y's original code. The last possibility is left as an explicit uninterpreted disjunction, which is necessary only when computing the LUB of modulated codes with two incomparable groups. The arguments of the modulated disjunction constructor are simply the original modulated codes. This constructor has an interpreted semantics upon being combined with the GLB and LUB of modulated coded. Namely: It is also interpreted upon checking modulated code subsumption as follows:

Last, we must explain how modulated code complementation is computed. As usual, relative complentation on modulated codes is a binary operation X\mY on a pair of modulated codes X and Y. The first argument subsumes all feasible codes while the second argument specifies all unfeasible codes that should not be subsumed. In other words, a modulated code complementation X is less than or equal to a complemented modulated code Y\mZ (denoted X ≤m Y\mZ) iff X ≤m Y and X ≰m Z.

As for modulated code disjunction, computing relative complementation of modulated codes cannot always be performed and, here too, must resort to be kept in constructor form. Namely, the expression X\mY where X and Y are modulated codes defined as X = (Xg,Xl) and Y = (Yg,Yl) is given by the following normalization rules:

Keeping some disjunctions and complementation in constructor forms is a slight price we have to pay. This has no incidence in eventually decoding modulated codes since subsumption is defined on modulated codes even in the presence of such constructors. Thus, as before, decoding a modulated code X is simply given by the formula:

In other words, γm-1(X) is the set of sorts whose codes are maximal modulated lower bounds of X.


Java Implementation

For a Java implementation of the above-described abstract modulation scheme, we must make concrete decisions concerning the various data structure we need.

The basic set-up will be similar to that described in the version without modulation. Namely, we first need to read in the "is-a" declarations and store each sort in an array called TAXONOMY of objects of class Sort at the index corresponding to their order of appearance. Since the purpose of the initial set-up is not to encode the sorts, but to modulate the set of sorts, the class Sort will need to maintain the set structures needed in the algorithms above. Namely, for each sort, the set of its immediate parents and the set of its immediate children. These sets will correspond to the value returned by the functions Parents and Children in the foregoing pseudo-code. From these, everything else we need can be calculated when needed (e.g., siblings, uncles, etc., ...). These will also be used and modified by the Relate and Unrelate procedures as the modulation proceeds.

Thus processing an "is-a" declaration such as s <| t means that s must be added to children(t), and that t must be added to parents(s).

As can be seen in the modulation pseudo-code, the sets of parents and children of a sort need to be represented by a data structure that supports set operations (such as intersection and union) efficiently. Since a sort is identified by its index in the TAXONOMY array, these sets are sets of elements of the Java primitive type int. We will use a hlt.language.util.IntSet that is a subclass of hlt.language.util.IntToIntMap. (These classes have the same relationship as the one existing between java.util.HashSet and java.util.HashMap.) The hlt.language.util.IntSet provides both static binary methods inter and union that do not modify their hlt.language.util.IntSet arguments, as well as two non-static unary methods with the same name that operate in-place on the hlt.language.util.IntSet object invoking them and modify it (but not the argument).

A module is also a set. It is a set of sorts for the first level of modulation, but is may also be a set of modules. Rather than making the design heavier by declaring them to be sets of integer objects as well as sets of non-integer objects, we shall follow the simplification used in the peudo-code by identifying a module to its first element. In this manner, we can represent a module as a subclass of hlt.language.util.IntSet with a distinct member called module of type int that is its first element. This element is the module's identifier.

In other words, a module is identified to a single sort element while denoting a sets of sorts (which could themselves be module identifiers). Therefore, a Sort object must have a means to identify which module it belongs to by pointing to the sort identifying this module. This is done by endowing each sort with a module attribute of type int which is the index of its module identifier. This attribute is initialized to the sort's own index since in the beginning all sorts may be considered as singleton modules. This attribute is updated to a module's identifier by the GrowUpward and GrowSideways procedures as they build a module by adding new sorts to it. An empty module will aways have -1 as the value of its module identifier and a null set of sorts.

Similarly, the Relate and Unrelate procedures, when seen as method of the sort class, update a sort's parents and/or children of a sort as follows. Given a sort s and a sort t:

[To be completed later...]