Picture of the Institute and Unilogo
home uni uni search search sitemap sitemap contact contact
unilogo University of Stuttgart

An Efficient Representation of Sets by Using Tree Structures and its Implementation in JavaTM

englishicon
 
Project Description:

This project offers a new possibility to represent and process sets of nonnegative integer numbers in an efficient way. The new structure is an ordered binary tree with intervals at each tree constituent. The leaves itself contain subsets realised by bit sets. This structure has been defined, described, analysed and finally implemented in JavaTM. The functionality and correctness of the implementation has been tested extensively. A comparison to other existing possibilities for sets of integer numbers demonstrates the eminent efficiency benefit of the new representation for many applications. The implementation of those trees and an improved implementation of pure bit sets has been integrated into an interface environment, which offers an expandability for further implementations.

Class Diagram Complexity Costs Normal
  Set Tests Inverted
  Set Tests Normal
  Set Size Inverted
  Set Size TreeIntegerSet
  Structure

The following files are available:

  • The JavaTM package intset (latest version from 11 August 2005) as a jar file (JavaTM-Version J2SETM 5.0 or later) and a Change log.
  • The documentation of the interfaces and classes created with javadoc, also available as a single jar file.
  • The backport to JavaTM-Version 1.4 of the package intset as a jar file.
  • A jar file (from 30 August 2005) with test classes for the correctness and performance of the other classes. It contains also an implementation of IntegerSet by using HashSet. A description is included as a readme file.

The new implementations BitIntegerSet and TreeIntegerSet can be used like the utility-class BitSet of JavaTM, except for the following restrictions:

  • The parameter of the methods equals(BitSet), intersects(BitSet), and(BitSet), andNot(BitSet), or(BitSet) and xor(BitSet) now has to be any set implementation that implements the interface IntegerSet, e.g. BitIntegerSet.
  • The method get(int, int) now returns a true subset, that is not shifted to the left interval border any more.
  • The method size() is deprecated and should not be used any more.

The new tree implementation is an enrichment for the group of integer set implementations. The linked list set implementation can't be seen as a rival, because its search operation is too slow. The bit set can't be efficiently applied to sets with a huge maximum element, because the memory needs and the performance are not acceptable in that case. So the only remaining candidate, the hash set implementation, is only slightly better than the tree sets for the search operation, when there are no large series of elements or empty intervals in the set. For the operations intersection and union the performance of the tree integer set implementation is always more efficient than that of the hash set by at least a factor of four, in many cases even by a factor of 20. The new implementation is also more space--saving by at least a factor of four, in many cases even by a factor larger than 100. Together with the possibility to insert or remove series of elements eminently efficient the tree structure of this project is a prospering new member inside the group of integer set implementations.

The parameterization of the leaf size shows that the standard leaf size of 1024 bit is a good compromise of the advantages of a tree structure and the advantages of a bit set, and has shown its supremacy against the 64 bit variant and the pure tree structure with the reduced one element leaf size.

The reduced option enhances the running time performance in most cases and reduces memory usage in all cases. Only for special set compositions and applications it makes sense to switch it off.

The result of this project offers now the possibility to use that new tree representation in combination with the powerful potential of the programming language JavaTM. It allows a performance gain for all kind of integer set applications. The open interface environment allows to add further implementations for a wide range of applications.

With this project a new structure to represent and process huge sets of integer values efficiently has become available. Especially for sets, in which it is expected, that there are large series of elements or clear intervals are a part of the set, this new implementation supplies a good grade of efficiency better than all the current available implementations for many applications.


Marco Buberl, Dietmar Lippold. For further information please send an e-mail to the project supervisor and now project admin Dietmar Lippold; Institute for Intelligent Systems