![]() |
  |
|
Union, Intersection and Complement: performance with a user-defined SameTest option As we mentioned in the text, the operations Union, Intersection and Complement with the SameTest option may perform slower or much slower than without it. We will only illustrate this for Union, but the same is true also for the other two. Consider some test list
The topics of rules, options and pure functions, which syntax we just used, may have not been covered yet, in which case ignore the syntax details for now. The problem is the following. The Union operation based on the default sorting function is very fast, but it may become a lot slower with a user-defined SameTest option. It shares this property with the Sort function (to be discussed next), essentially due to the way that Union operation is organized: it first sorts the list and then the same elements will always be adjacent to each other and thus easier to eliminate. To illustrate this point, consider a larger list:
We see that it is more then a 1000(!) times slower with this non-trivial "sameness" function, for this size of the list.
This illustrates several things. First of all, if we think of it, the specific problem and the notion of "sameness" as formulated above is ill-posed, because depending on the order in which the Union operation is performed, we will get different results. For example, consider a list {2,4,6}: if <4> is eliminated first, we get {2,6}, but if <6> is eliminated first, we get just {2}. Essentially the problem here is that our notion of sameness is not transitive. Perhaps a more meaningful formulation in this case would be to locate all chains of numbers with each one different by 2 from the next one in the chain. In any case, one has to make sure that the problem is well-formulated, and the sameness function better be transitive.
In fact, it is even better if we can reformulate a problem such that instead of the sorting function applied to elements of our list, we can use some key function which computes a key (say, integer number) for each element in the list, so that the majority of subsequent operations are performed with keys rather than the original elements. In Mathematica such approach often gives a large speed advantage since many operations are much faster with numbers than with arbitrary symbolic expressions. In cases when such key function is available, there are several techniques which can be efficiently used to replace Union. We will exploit this technique many times later, but for now let us just consider another problem as an illustration of these statements.
This is how long it takes to do so:
For versions of Mathematica earlier than 5.0, where Reap-Sow operations were not yet available, one can use the following code (slight extension of the technique used by Carl Woll), which takes about twice longer to execute than the one above (once again, we provide it here for illustration and timing comparison, so please ignore the code for now - we will revisit it later):
And this is the result of Union with the SameTest option and its timing:
We get about 25 times speed-up with Reap-Sow method and about 10 times with the Woll's technique, with respect to the one using Union, for this size of the list. If we make a list larger, the difference will be even more dramatic. To conclude this rather long discussion, there can be a huge difference in performance of Union depending on whether it is used in its "pure" form or some "sameness" function is provided. In the latter case, and if the list is any large, it is advisable to first analyze the problem, because there could be superior alternatives. Also, this behavior is not entirely the fault of Union, but partly reflects the fact that there is no general efficient solution for eliminating same elements from the list if all we have is just the sameness function, but not a comparison function.
|
|||||||||||||||||