From: David Latin on
Hello,
I am currently working on manipulating data in "vCard"-like format, and have
become confused by the actions of the Cases, StringCases and Select
functions.
Consider the small list:

In[1]:= list = {"DTEND:19260412T175900", "DTEND:20070207T050000",
"END:VCALENDAR", "MM"} ;

In[2]:= Cases[list, ___~~"END:"~~___]
Out[2]= {}
So pattern-matching obviously does not work with Cases for a list of
strings.

The documentation for Cases does not refer to patterns in strings, so I
tried

In[3]:= StringCases[list, ___~~"END:"~~___]
Out[3]=
{{"DTEND:19260412T175900"},{"DTEND:20070207T050000"},{"END:VCALENDAR"},{}}
The problem here is that empty elements can be returned.

So next I tried

In[4]:= Select[list, ___~~"END:"~~___]
Out[4]= {}
Obviously not working.

Next I tried

In[5]:= Select[ list, StringMatchQ[#, "*END:*"] & ]
Out[5]= {"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR"}

This is fine.
But what if I only want the "END:" lines and not the "DTEND:" lines ?

It may be appropriate to make use of

In[6]:= Select[ list, StringFreeQ[#, "*DTEND:*"] & ]
Out[6]= {"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR",
"MM"}
Not as expected!

But, in the end, what works is:

In[7]:= Select[ list, StringMatchQ[#, "*END:*"] && ! StringMatchQ[#,
"*DTEND:*"] & ]
Out[7]= {"END:VCALENDAR"}

I know I could have used "END*" instead of "*END*", but that's not the point
here.

My questions then are:
Why doesn't Cases work for a list of strings ?
Why doesn't Select work for patterns with the ~~ operator ?
Why doesn't StringFreeQ act in the same way as !StringMatchQ ?

Any help over this confusion would be very much appreciated!
Thank you,
David
From: Bill Rowe on
On 7/22/10 at 5:41 AM, d.latin(a)gmail.com (David Latin) wrote:

>Hello, I am currently working on manipulating data in "vCard"-like
>format, and have become confused by the actions of the Cases,
>StringCases and Select functions. Consider the small list:

>In[1]:= list = {"DTEND:19260412T175900", "DTEND:20070207T050000",
>"END:VCALENDAR", "MM"} ;

>In[2]:= Cases[list, ___~~"END:"~~___] Out[2]= {}

>So pattern-matching obviously does not work with Cases for a list of
>strings.

Patterns and string patterns simply aren't the same. So, do

In[12]:= Cases[list, _?(StringMatchQ[#, ___ ~~ "END:" ~~ ___] &)]

Out[12]= {DTEND:19260412T175900,DTEND:20070207T050000,END:VCALENDAR}

>The documentation for Cases does not refer to patterns in strings,
>so I tried

>In[3]:= StringCases[list, ___~~"END:"~~___] Out[3]=
>{{"DTEND:19260412T175900"},{"DTEND:20070207T050000"},{"END:VCALENDAR
>"},{}}

>The problem here is that empty elements can be returned.

That is easily fixed by doing either

In[13]:= DeleteCases[StringCases[list, ___ ~~ "END:" ~~ ___], {}]

Out[13]= {{"DTEND:19260412T175900"}, {"DTEND:20070207T050000"},
{"END:VCALENDAR"}}

or

In[14]:= StringCases[list, ___ ~~ "END:" ~~ ___] /. {} -> Sequence[]

Out[14]= {{"DTEND:19260412T175900"}, {"DTEND:20070207T050000"},
{"END:VCALENDAR"}}

>So next I tried

>In[4]:= Select[list, ___~~"END:"~~___] Out[4]= {}

>Obviously not working.

Here, like with Cases a pure function using StringMatchQ will do
what you need. That is,

In[15]:= Select[list, StringMatchQ[#, ___ ~~ "END:" ~~ ___] &]

Out[15]= {DTEND:19260412T175900,DTEND:20070207T050000,END:VCALENDAR}

>Next I tried

>In[5]:= Select[ list, StringMatchQ[#, "*END:*"] & ] Out[5]=
>{"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR"}

>This is fine. But what if I only want the "END:" lines and not the
>"DTEND:" lines ?

Change the pattern to be matched. For example,

In[16]:= Select[list, StringMatchQ[#, "END:" ~~ ___] &]

Out[16]= {END:VCALENDAR}

>It may be appropriate to make use of

>In[6]:= Select[ list, StringFreeQ[#, "*DTEND:*"] & ] Out[6]=
>{"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR",
>"MM"}

>Not as expected!

Since StringFreeQ[string, pattern] returns true when a substring
of string matches pattern, it isn't sensible to supply a pattern
like ___~~pattern~~___. This just causes Mathematica to do more
work than needed to achieve the desired result. So, do

In[17]:= Select[list, StringFreeQ[#, "DTEND:"] &]

Out[17]= {END:VCALENDAR,MM}

Also, note the documentation for StringMatchQ under more
information states "... ordinary StringExpression string
patterns, as well as abbreviated string patterns containing the
following metacharacters:" and specifically states a "*" is
interpreted as zero or more characters. The documentation for
StringFreeQ does not have any similar statement. So, I suspect
for StringFreeQ, an "*" is taken to be a literal asterisk. Since
none of strings in your list have a literal asterisk, all would
be selected if StringFreeQ is interpreting the "*" at the end of
you patterns as a literal asterisk.

>But, in the end, what works is:

>In[7]:= Select[ list, StringMatchQ[#, "*END:*"] && ! StringMatchQ[#,
>"*DTEND:*"] & ] Out[7]= {"END:VCALENDAR"}

>I know I could have used "END*" instead of "*END*", but that's not
>the point here.

>My questions then are: Why doesn't Cases work for a list of strings
>? Why doesn't Select work for patterns with the ~~ operator ?

Neither Cases nor Select is designed to use string patterns. You
can use string patterns with these by creating a pattern or
function that will evaluate to true or false using any of the
functions that do accept string patterns as arguments.

>Why doesn't StringFreeQ act in the same way as !StringMatchQ ?

Why are you expecting these to be the same? StringFreeQ[string,
pattern] returns true whenever no substring of string matches
pattern. !StringMatchQ[string, pattern] returns true whenever
the entire string fails to match pattern. There is a clear
difference between matching a substring of a given string and
the entire string.