From: John Nagle on
I'm working on street address parsing again, and I'm trying to deal
with some of the harder cases.

Here's a subparser, intended to take in things like "N MAIN" and
"SOUTH", and break out the "directional" from street name.

Directionals = ['southeast', 'northeast', 'north', 'northwest',
'west', 'east', 'south', 'southwest', 'SE', 'NE', 'N', 'NW',
'W', 'E', 'S', 'SW']

direction = Combine(MatchFirst(map(CaselessKeyword, directionals)) +
Optional(".").suppress())

streetNameParser = Optional(direction.setResultsName("predirectional"))
+ Combine(OneOrMore(Word(alphanums)),
adjacent=False, joinString=" ").setResultsName("streetname")



This parses something like "N WEBB" fine; "N" is the "predirectional",
and "WEBB" is the street name.

"SOUTH" (which, when not followed by another word, is a streetname,
not a predirectional), raises a parsing exception:

Street address line parse failed for SOUTH : Expected W:(abcd...)
(at char 5), (line:1, col:6)

The problem is that "direction" matched SOUTH, and even though
"direction" is within an "Optional" and followed by another word,
the parser didn't back up when it hit the end of the expression
without satisfying the OneOrMore clause.

Pyparsing does some backup, but I'm not clear on how much,
or how to force it to happen. There's some discussion at
"http://www.mail-archive.com/python-list(a)python.org/msg169559.html".
Apparently the "Or" operator will force some backup, but it's not
clear how much lookahead and backtracking is supported.

John Nagle
From: John Nagle on
On 7/5/2010 3:19 PM, John Nagle wrote:
> I'm working on street address parsing again, and I'm trying to deal
> with some of the harder cases.

The approach below works for the cases given. The "Or" operator ("^")
supports backtracking, but "Optional()" apparently does not.


direction = Combine(MatchFirst(map(CaselessKeyword, directionals)) +
Optional(".").suppress())

streetNameOnly = Combine(OneOrMore(Word(alphanums)), adjacent=False,
joinString=" ").setResultsName("streetname")

streetNameParser =
((direction.setResultsName("predirectional") + streetNameOnly)
^ streetNameOnly)



John Nagle
From: Thomas Jollans on
On 07/06/2010 04:21 AM, Dennis Lee Bieber wrote:
> On Mon, 05 Jul 2010 15:19:53 -0700, John Nagle <nagle(a)animats.com>
> declaimed the following in gmane.comp.python.general:
>
>> I'm working on street address parsing again, and I'm trying to deal
>> with some of the harder cases.
>>
>
> Hasn't it been suggested before, that the sanest method to parse
> addresses is from the end backwards...
>
> So that:
>
> 123 N South St.
>
> is parsed as
>
> St. South N 123

You will of course need some trickery for that to work with

Hauptstr. 12





From: Cousin Stanley on

> I'm working on street address parsing again,
> and I'm trying to deal with some of the harder cases.
> ....

For yet another test case
my actual address includes ....

... East South Mountain Avenue


Sometimes written as ....

... E. South Mtn Ave


--
Stanley C. Kitching
Human Being
Phoenix, Arizona