 |
 |
Project Description:
This project offers a new possibility to represent and process sets of
nonnegative integer numbers in an efficient way. The new structure is an
ordered binary tree with intervals at each tree constituent. The leaves itself
contain subsets realised by bit sets. This structure has been defined,
described, analysed and finally implemented in
JavaTM. The functionality and correctness of the
implementation has been tested extensively. A comparison to other existing
possibilities for sets of integer numbers demonstrates the eminent efficiency
benefit of the new representation for many applications. The implementation of those
trees
and an improved
implementation of pure bit sets
has been integrated into an
interface
environment, which offers an
expandability for further implementations.
The following files are available:
- The JavaTM package
intset (latest
version from 11 August 2005) as a jar file
(JavaTM-Version J2SETM
5.0 or later) and a Change log.
- The documentation of the interfaces and
classes created with javadoc, also available as a single
jar file.
- The backport to JavaTM-Version 1.4 of the
package
intset as a jar file.
- A jar file (from 30 August 2005) with test
classes for the correctness and performance of the other classes. It
contains also an implementation of
IntegerSet by using
HashSet. A description is included as a readme file.
The new implementations BitIntegerSet and
TreeIntegerSet can be used like the
utility-class BitSet of JavaTM, except
for the following restrictions:
-
The parameter of the methods
equals(BitSet),
intersects(BitSet), and(BitSet),
andNot(BitSet), or(BitSet) and
xor(BitSet) now has to be any set implementation that implements
the interface IntegerSet, e.g. BitIntegerSet.
-
The method
get(int, int) now returns a true subset, that is not
shifted to the left interval border any more.
-
The method
size() is deprecated and should not be used any more.
The new tree implementation is an enrichment for the group of integer set
implementations. The linked list set implementation can't be seen as a rival,
because its search operation is too slow. The bit set can't be efficiently
applied to sets with a huge maximum element, because the memory needs and the
performance are not acceptable in that case. So the only remaining candidate,
the hash set implementation, is only slightly better than the tree sets for the
search operation, when there are no large series of elements or empty intervals
in the set. For the operations intersection and union the performance of the
tree integer set implementation is always more efficient than that of the hash
set by at least a factor of four, in many cases even by a factor of 20. The new
implementation is also more space--saving by at least a factor of four, in many
cases even by a factor larger than 100. Together with the possibility to insert
or remove series of elements eminently efficient the tree structure of this
project is a prospering new member inside the group of integer set
implementations.
The parameterization of the leaf size shows that the standard leaf size of 1024
bit is a good compromise of the advantages of a tree structure and the advantages
of a bit set, and has shown its supremacy against the 64 bit variant and the
pure tree structure with the reduced one element leaf size.
The reduced option enhances the running time performance in
most cases and reduces memory usage in all cases. Only for special set
compositions and applications it makes sense to switch it off.
The result of this project offers now the possibility to use that new tree
representation in combination with the powerful potential of the programming
language JavaTM. It allows a performance gain for all
kind of integer set applications. The open interface environment allows to add
further implementations for a wide range of applications.
With this project a new structure to represent and process huge sets of integer
values efficiently has become available. Especially for sets, in which it is
expected, that there are large series of elements or clear intervals are a part
of the set, this new implementation supplies a good grade of efficiency better
than all the current available implementations for many applications.
Marco Buberl, Dietmar Lippold.
For further information please send an e-mail to the project supervisor
and now project admin
Dietmar Lippold;
Institute for Intelligent Systems
|
|