Multiline regex [Python]

Prev: An ODBC interface for Python 3?
Next: ANN: blist 1.2.0

From: Andreas Tawn on 21 Jul 2010 11:15

> I'm trying to read in and parse an ascii type file that contains
> information that can span several lines.
> Example:
>
> createNode animCurveTU -n "test:master_globalSmooth";
> setAttr ".tan" 9;
> setAttr -s 4 ".ktv[0:3]" 101 0 163 0 169 0 201 0;
> setAttr -s 4 ".kit[3]" 10;
> setAttr -s 4 ".kot[3]" 10;
> createNode animCurveTU -n "test:master_res";
> setAttr ".tan" 9;
> setAttr ".ktv[0]" 103 0;
> setAttr ".kot[0]" 5;
> createNode animCurveTU -n "test:master_faceRig";
> setAttr ".tan" 9;
> setAttr ".ktv[0]" 103 0;
> setAttr ".kot[0]" 5;
>
> I'm wanting to grab the information out in chunks, so
>
> createNode animCurveTU -n "test:master_faceRig";
> setAttr ".tan" 9;
> setAttr ".ktv[0]" 103 0;
> setAttr ".kot[0]" 5;
>
> would be what my regex would grab.
> I'm currently only able to grab out the first line and part of the
> second line, but no more.
> regex is as follows
>
> my_regexp = re.compile("createNode\ animCurve.*\n[\t*setAttr.*\n]*")
>
> I've run several variations of this, but none return me all of the
> expected information.
>
> Is there something special that needs to be done to have the regexp
> grab
> any number of the setAttr lines without specification?
>
> Brandon L. Harris

Aren't you making life too complicated for yourself?

blocks = []
for line in yourFile:
if line.startswith("createNode"):
if currentBlock:
blocks.append(currentBlock)
currentBlock = [line]
else:
currentBlock.append(line)
blocks.append(currentBlock)

Cheers,

Drea

From: Brandon Harris on 21 Jul 2010 11:42

I could make it that simple, but that is also incredibly slow and on a
file with several million lines, it takes somewhere in the league of
half an hour to grab all the data. I need this to grab data from many
many file and return the data quickly.

Brandon L. Harris

Andreas Tawn wrote:
>> I'm trying to read in and parse an ascii type file that contains
>> information that can span several lines.
>> Example:
>>
>> createNode animCurveTU -n "test:master_globalSmooth";
>> setAttr ".tan" 9;
>> setAttr -s 4 ".ktv[0:3]" 101 0 163 0 169 0 201 0;
>> setAttr -s 4 ".kit[3]" 10;
>> setAttr -s 4 ".kot[3]" 10;
>> createNode animCurveTU -n "test:master_res";
>> setAttr ".tan" 9;
>> setAttr ".ktv[0]" 103 0;
>> setAttr ".kot[0]" 5;
>> createNode animCurveTU -n "test:master_faceRig";
>> setAttr ".tan" 9;
>> setAttr ".ktv[0]" 103 0;
>> setAttr ".kot[0]" 5;
>>
>> I'm wanting to grab the information out in chunks, so
>>
>> createNode animCurveTU -n "test:master_faceRig";
>> setAttr ".tan" 9;
>> setAttr ".ktv[0]" 103 0;
>> setAttr ".kot[0]" 5;
>>
>> would be what my regex would grab.
>> I'm currently only able to grab out the first line and part of the
>> second line, but no more.
>> regex is as follows
>>
>> my_regexp =e.compile("createNode\ animCurve.*\n[\t*setAttr.*\n]*")
>>
>> I've run several variations of this, but none return me all of the
>> expected information.
>>
>> Is there something special that needs to be done to have the regexp
>> grab
>> any number of the setAttr lines without specification?
>>
>> Brandon L. Harris
>>
>
> Aren't you making life too complicated for yourself?
>
> blocks =]
> for line in yourFile:
> if line.startswith("createNode"):
> if currentBlock:
> blocks.append(currentBlock)
> currentBlock =line]
> else:
> currentBlock.append(line)
> blocks.append(currentBlock)
>
> Cheers,
>
> Drea
>
>

From: Andreas Tawn on 21 Jul 2010 11:55

> I could make it that simple, but that is also incredibly slow and on a
> file with several million lines, it takes somewhere in the league of
> half an hour to grab all the data. I need this to grab data from many
> many file and return the data quickly.
>
> Brandon L. Harris

That's surprising.

I just made a file with 13 million lines of your data (447Mb) and read it with my code. It took a little over 36 seconds. There must be something different in your set up or the real data you've got.

Cheers,

Drea

From: Brandon Harris on 21 Jul 2010 11:57

Could it be that there isn't just that type of data in the file? there
are many different types, that is just one that I'm trying to grab.

Brandon L. Harris

Andreas Tawn wrote:
>> I could make it that simple, but that is also incredibly slow and on a
>> file with several million lines, it takes somewhere in the league of
>> half an hour to grab all the data. I need this to grab data from many
>> many file and return the data quickly.
>>
>> Brandon L. Harris
>>
>
> That's surprising.
>
> I just made a file with 13 million lines of your data (447Mb) and read it with my code. It took a little over 36 seconds. There must be something different in your set up or the real data you've got.
>
> Cheers,
>
> Drea
>

From: Andreas Tawn on 21 Jul 2010 12:27

>>> I could make it that simple, but that is also incredibly slow and on
>>> a file with several million lines, it takes somewhere in the league of
>>> half an hour to grab all the data. I need this to grab data from
>>> many many file and return the data quickly.
>>>
>>> Brandon L. Harris
>>>
>> That's surprising.
>>
>> I just made a file with 13 million lines of your data (447Mb) and
>> read it with my code. It took a little over 36 seconds. There must be
>> something different in your set up or the real data you've got.
>>
>> Cheers,
>>
>> Drea
>>
> Could it be that there isn't just that type of data in the file? there
> are many different types, that is just one that I'm trying to grab.
>
> Brandon L. Harris

I don't see why it would make such a difference.

If your data looks like...

<block header>
\t<attribute>
\t<attribute>
\t<attribute>

Just change this line...

if line.startswith("createNode"):

to...

if not line.startswith("\t"):

and it won't care what sort of data the file contains.

Processing that data after you've collected it will still take a while, but that's the same whichever method you use to read it.

Cheers,

Drea

p.s. Just noticed I hadn't pre-declared the currentBlock list.

First | Prev | Next | Last
Pages: 1 2 3
Prev: An ODBC interface for Python 3?
Next: ANN: blist 1.2.0