From: MRAB on
vsoler wrote:
> On 5 ene, 19:35, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
>> vsoler wrote:
>>> Hello,
>>> I am acessing an Excel file by means of Win 32 COM technology.
>>> For a given cell, I am able to read its formula. I want to make a map
>>> of how cells reference one another, how different sheets reference one
>>> another, how workbooks reference one another, etc.
>>> Hence, I need to parse Excel formulas. Can I do it by means only of re
>>> (regular expressions)?
>>> I know that for simple formulas such as "=3*A7+5" it is indeed
>>> possible. What about complex for formulas that include functions,
>>> sheet names and possibly other *.xls files?
>>> For example "=Book1!A5+8" should be parsed into ["=","Book1", "!",
>>> "A5","+","8"]
>>> Can anybody help? Any suggestions?
>> Do you mean "how" or do you really mean "whether", ie, get a list of the
>> other cells that are referred to by a certain cell, for example,
>> "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5"]?
>
> I'd like to know how to do it, should it be possible.
>
Something like this should work:

references = re.findall(r"\b((?:\w+!)?[A-Za-z]+\d+)\b", formula)
From: vsoler on
On 5 ene, 20:05, Mensanator <mensana...(a)aol.com> wrote:
> On Jan 5, 12:35 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
>
>
>
> > vsoler wrote:
> > > Hello,
>
> > > I am acessing an Excel file by means of Win 32 COM technology.
> > > For a given cell, I am able to read its formula. I want to make a map
> > > of how cells reference one another, how different sheets reference one
> > > another, how workbooks reference one another, etc.
>
> > > Hence, I need to parse Excel formulas. Can I do it by means only of re
> > > (regular expressions)?
>
> > > I know that for simple formulas such as "=3*A7+5" it is indeed
> > > possible. What about complex for formulas that include functions,
> > > sheet names and possibly other *.xls files?
>
> > > For example    "=Book1!A5+8" should be parsed into ["=","Book1", "!",
> > > "A5","+","8"]
>
> > > Can anybody help? Any suggestions?
>
> > Do you mean "how" or do you really mean "whether", ie, get a list of the
> > other cells that are referred to by a certain cell, for example,
> > "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5]
>
> Ok, although "Book1" would be the default name of a workbook, with
> default
> worksheets labeled "Sheet1". "Sheet2", etc.
>
> If I had a worksheet named "Sheety" that wanted to reference a cell on
> "Sheetx"
> OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
> a completely
> different workbook (say Book1 with worksheets labeled "Sheet1",
> "Sheet2") then
> the cell might have =[Book1]Sheet1!A7.
>
> And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.

Yes, Mensanator, but... what re should I use? I'm looking for the re
statement. No doubt you can help!

Thank you.
From: Tim Chase on
vsoler wrote:
> Hence, I need to parse Excel formulas. Can I do it by means only of re
> (regular expressions)?
>
> I know that for simple formulas such as "=3*A7+5" it is indeed
> possible. What about complex for formulas that include functions,
> sheet names and possibly other *.xls files?

Where things start getting ugly is when you have nested function
calls, such as

=if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14,
(Min(C1:C25)+3)*18,Max(B1:B25)))

Regular expressions don't do well with nested parens (especially
arbitrarily-nesting-depth such as are possible), so I'd suggest
going for a full-blown parsing solution like pyparsing.

If you have fair control over what can be contained in the
formulas and you know they won't contain nested parens/functions,
you might be able to formulate some sort of "kinda, sorta, maybe
parses some forms of formulas" regexp.

-tkc


From: vsoler on
On 5 ene, 20:21, vsoler <vicente.so...(a)gmail.com> wrote:
> On 5 ene, 20:05, Mensanator <mensana...(a)aol.com> wrote:
>
>
>
> > On Jan 5, 12:35 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
>
> > > vsoler wrote:
> > > > Hello,
>
> > > > I am acessing an Excel file by means of Win 32 COM technology.
> > > > For a given cell, I am able to read its formula. I want to make a map
> > > > of how cells reference one another, how different sheets reference one
> > > > another, how workbooks reference one another, etc.
>
> > > > Hence, I need to parse Excel formulas. Can I do it by means only of re
> > > > (regular expressions)?
>
> > > > I know that for simple formulas such as "=3*A7+5" it is indeed
> > > > possible. What about complex for formulas that include functions,
> > > > sheet names and possibly other *.xls files?
>
> > > > For example    "=Book1!A5+8" should be parsed into ["=","Book1", "!",
> > > > "A5","+","8"]
>
> > > > Can anybody help? Any suggestions?
>
> > > Do you mean "how" or do you really mean "whether", ie, get a list of the
> > > other cells that are referred to by a certain cell, for example,
> > > "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5]
>
> > Ok, although "Book1" would be the default name of a workbook, with
> > default
> > worksheets labeled "Sheet1". "Sheet2", etc.
>
> > If I had a worksheet named "Sheety" that wanted to reference a cell on
> > "Sheetx"
> > OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
> > a completely
> > different workbook (say Book1 with worksheets labeled "Sheet1",
> > "Sheet2") then
> > the cell might have =[Book1]Sheet1!A7.
>
> > And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.
>
> Yes, Mensanator, but...  what re should I use? I'm looking for the re
> statement. No doubt you can help!
>
> Thank you.


Let me give you an example:

>>> import re
>>> re.split("([^0-9])", "123+456*/")
[’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]

I find it excellent that one single statement is able to do a lexical
analysis of an expression!

If the expression contains variables, such as A12 or B9, I can try
another re expression. Which one should I use?

And if my expression contains parenthesis? And the sin() function?

Vicente Soler


From: MRAB on
Mensanator wrote:
> On Jan 5, 12:35 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
>> vsoler wrote:
>>> Hello,
>>> I am acessing an Excel file by means of Win 32 COM technology.
>>> For a given cell, I am able to read its formula. I want to make a map
>>> of how cells reference one another, how different sheets reference one
>>> another, how workbooks reference one another, etc.
>>> Hence, I need to parse Excel formulas. Can I do it by means only of re
>>> (regular expressions)?
>>> I know that for simple formulas such as "=3*A7+5" it is indeed
>>> possible. What about complex for formulas that include functions,
>>> sheet names and possibly other *.xls files?
>>> For example "=Book1!A5+8" should be parsed into ["=","Book1", "!",
>>> "A5","+","8"]
>>> Can anybody help? Any suggestions?
>> Do you mean "how" or do you really mean "whether", ie, get a list of the
>> other cells that are referred to by a certain cell, for example,
>> "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5]
>
> Ok, although "Book1" would be the default name of a workbook, with
> default
> worksheets labeled "Sheet1". "Sheet2", etc.
>
> If I had a worksheet named "Sheety" that wanted to reference a cell on
> "Sheetx"
> OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
> a completely
> different workbook (say Book1 with worksheets labeled "Sheet1",
> "Sheet2") then
> the cell might have =[Book1]Sheet1!A7.
>
> And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.

I forgot about the dollars! In that case, the regex is:

references = re.findall(r"\b((?:\w+!)?\$?[A-Za-z]+\$?\d+)\b", formula)
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5
Prev: embedded python on mac - linking problem
Next: TypeError