From: Andre Engels on
On Wed, Mar 24, 2010 at 10:07 AM, John Smithury <joho.smithury(a)gmail.com> wrote:
> Dear pythoners,
>
> I'm a new member to studay the python, i wan't to studay the "regular
> expressions" handle like below:
>
> ==============source============
> <line>the</line>
> <line>is</line>
> <line>name</line>
> ==============source end=========
>
>
> after convert, the result like below:
>
> -------------------------result------------------------
> {'t','t','e'},
> {'i','s'},
> {'n','a','m','e'}

What did you think of yourself, and where did you get into the "I
don't know what to do now" place? Why do you think your problem would
have to do with regular expressions?



--
André Engels, andreengels(a)gmail.com
From: Andre Engels on
On Wed, Mar 24, 2010 at 10:34 AM, John Smithury <joho.smithury(a)gmail.com> wrote:
> ==============source============
> <line>the</line>
> <line>is</line>
> <line>name</line>
> ==============source end=========
>
> First, get the word only(discard the "<line>" and "</line>"), it can use
> regular expression, right?
>
> the
> is
> name
> Second, get a charactor in each word and compose like format {'t','h','e'}
>>>>for a in line
>
>
> Most import is learning the "regular expressions" var this example.

Okay, then I'll go into that part.

regex = re.compile("<line>([^<>]*)</line>")

[^<>] here means "any character but < or >"
* means that we have any number (zero or more) of such characters
The brackets mean that this is the part of the expression we are
interested in (the group)
The expression as a whole thus means:
First <line>, then the part we are interested in, which is a random
string of things that are not < or >, then </line>

To use this expression (assuming 'text' is the string you want to check:

result = regex.findall(text)

will find all occurences of the regular expression, and provide you
with the content of the group.


--
André Engels, andreengels(a)gmail.com