"logo.gif"  
Main
Read Online
Download PDF
Additional resources
Revision history
Known typos/bugs
Report a bug
License terms
About the author
"node472_1.gif" "node472_2.gif" "node472_3.gif"

5.2.7.3.3   Example: words containing given letters - realizing alternative patterns programmatically

For this example we again will need some list of words, like this one (taken from  the  Mathematica  book)

"node472_4.gif"

Now, the problem: we want to pick all the words containing any of the symbols given by some symbol list, for instance {"a","n","k"}.

To solve it, let us start with a simple case when we have only one symbol, say "a". And, as a first step, let us work out the code that will check for some single word (string), whether it contains the symbol. So, let us pick some word:

"node472_5.gif"

"node472_6.gif"

Since I don't want to use the string-matching functions here, the first thing we need is to split the word into its letters. This is best done by using the built-in <Characters > command:

"node472_7.gif"

"node472_8.gif"

Now we need to test whether a  given symbol ("a") belongs to this list. This is done best by the built-in  <MemberQ> command:

"node472_9.gif"

"node472_10.gif"

Now we want to generalize to the case of several characters. One way would be to Map MemberQ on their list, and then use the built-in  <Or>. For instance, the second character is "k":

"node472_11.gif"

"node472_12.gif"

We need to Apply <Or> now (head List has to be "eaten up" by Or)

"node472_13.gif"

"node472_14.gif"

We are now ready to insert this condition, and use Cases command to find all the words containing either "a" or "k" (or both)

"node472_15.gif"

"node472_16.gif"

Finally, we package everything into a function, which will take two arguments: a list of words and a list of symbols to look for:

"node472_17.gif"

Note that we changed the specific symbol list by the function parameter  <symbols>. To check it, let us find all words containing say "e" or "r" letters :

"node472_18.gif"

"node472_19.gif"

We solved the problem, but rather inefficiently. Even within what we already know, there are ways to make it better. The main source of inefficiency which we can eliminate now is the Map-ping of < MemberQ[Characters[x],#]&> on a list of symbols. We can do better if we recall that the second argument of  MemberQ is a pattern, and as such, it may be more complicated than just one symbol. In particular, we can use alternative  patterns like "r"|"e", or, which is the same, Alternatives["r","e"].

"node472_20.gif"

"node472_21.gif"

Since again our letters are initially in a list, and Alternatives requires a sequence of elements, the head <List> has to be eaten up by the head "Alternatives", and therefore, we have to use Apply:

"node472_22.gif"

"node472_23.gif"

or, which is the same,

"node472_24.gif"

"node472_25.gif"

we can now rewrite our function:

"node472_26.gif"

Check:

"node472_27.gif"

"node472_28.gif"


The reason that the latter version is more efficient than the former one is the following: in the latter case, the "decision" about any given word is made already on the level of the MemberQ function, while in the former case, it is promoted to another function (Or). The rule of thumb is that one has to push as much computation as possible inside the built-in function. Basically, MemberQ does not care (almost), whether it checks a single character or an alternative pattern with many characters (for small numbers of characters such as considered here). On the other hand, by Mapping MemberQ on each character, we force it to check afresh for every character. Thus, roughly we expect that the difference in performance will be a factor of the order of the length of the character list.  We can check our expectations by measuring the timing for a list of symbols being the entire alphabet. We will use the <myTiming> function which measures small execution times (section 3.4.5.2). Now:

"node472_29.gif"

"node472_30.gif"

"node472_31.gif"

"node472_32.gif"

"node472_33.gif"

"node472_34.gif"

We see that we already gain more than an order of magnitude, for this example (the difference would be less for smaller list of symbols). As a byproduct, our new function is not only more efficient, but also more concise and transparent. In Mathematica programming this is very often the case. As one of our guiding principles has it, "Shorter programs usually run faster" (although, of course, there are exceptions).

Later we will see how to make this function yet more efficient, for example with the help of the Reap-Sow technique.

"node472_35.gif"

"node472_36.gif" "node472_37.gif" "node472_38.gif"

Created by Wolfram Mathematica 6.0  (05 February 2009) Valid XHTML 1.1!