Extract a bordered, skewed rectangle from an image [Python]

Prev: Picking a license
Next: No module named server

From: Paul Hemans on 6 May 2010 21:03

We have a scanned document on which a label has been attached. The label has
been designed to have a border that makes it easy to determine the correct
orientation and area of the label. The label portion of the scanned image
needs to be extracted and deskewed as an image. The contents of the label
will change, but the border won't
I originally posted this onto RentAcoder as a project, but I am not getting
a lot of responses. It might be that I requested it be done in Python, its
too hard or I am too stingy. You can see the project here:
http://www.RentACoder.com/RentACoder/misc/BidRequests/ShowBidRequest.asp?lngBidRequestId=1402446

It may not be feasible to do this project without the use of an image
processing engine such as openCV. There is a routine in openCV called
cvMinAreaRect2() that may do the job of returning a matching rectangle that
is inclined. There is a Python to openCV interface available. So I think all
the pieces are there, but this is out of my league as I have had very little
experience with image processing.

I am wondering whether there are any people here that have experience with
openCV and Python. If so, could you either give me some pointers on how to
approach this, or if you feel so inclined, bid on the project. There are 2
problems:
How do I get openCV to ignore the contents of the label and just focus on
the border?
How to do this through Python into openCV? I am a newbie to Python, not
strong in Maths and ignorant of the usage of openCV.

Thanks.

From: David Bolen on 7 May 2010 17:41

"Paul Hemans" <darwin(a)nowhere.com> writes:

> I am wondering whether there are any people here that have experience with
> openCV and Python. If so, could you either give me some pointers on how to
> approach this, or if you feel so inclined, bid on the project. There are 2
> problems:

Can't offer actual services, but I've done image tracking and object
identification in Python with OpenCV so can suggest some approaches.

You might also try the OpenCV mailing list, though it's sometimes
varies wildly in terms of S/N ratio.

And for OpenCV specifically, I definitely recommend the book "Learning
OpenCV" by O'Reilly. It's really hard to grasp the concepts and
applications of the raw OpenCV calls from the API documentation, and I
found the book (albeit not cheap) helped me out tremendously and was
well worth it.

I'll flip the two questions since the second is quicker to answer.

> How to do this through Python into openCV? I am a newbie to Python, not
> strong in Maths and ignorant of the usage of openCV.

After trying a few wrappers, the bulk of my experience is with the
ctypes-opencv wrapper and OpenCV 1.x (either 1.0 or 1.1pre). Things
change a lot with the recent 2.x (which needs C++ wrappers), and I'm
not sure the various wrappers are as stable yet. So if you don't have
a hard requirement for 2.x, I might suggest at least starting with 1.x
and ctypes-opencv, which is very robust, though I'm a little biased as
I've contributed code to the wrapper.

> How do I get openCV to ignore the contents of the label and just focus on
> the border?

There's likely no single answer, since multiple mechanisms for
identifying features in an image exist, and you can also derive
additional heuristics based on your own knowledge of the domain space
(your own images). Without knowing exactly what the border design to
make it easy to detect is, it's hard to say anything definitive.

But in broad strokes, you'll often:

1. Normalize the image in some way. This can be to adjust for
brightness from various scans to make later processing more
consistent, or to switch spaces (to make color matching more
effective) or even to remove color altogether if it just
complicates matters. You may also mask of entire portions of the
image if you have information that says they can't possibly be
part of what you are looking for.
2. Attempt to remove noise. Even when portions of an image looks
like a solid color, at the pixel level there can be may different
variations in pixel values. Operations such as blurring or
smoothing help to average out those values and simplify matching
entire regions.
3. Attempt to identify the regions or features of interest. Here's
where a ton of algorithms may apply due to your needs, but the
simplest form to start with is basic color matching. For edge
detection (like of your label) convolutions (such as gradient
detection) might also ideal.
4. Process identified regions to attempt to clean them up, if
possible weakening regions likely to be extraneous, and
strengthening those more likely to be correct. Morphology
operations are one class of processing likely to help here.
5. Select among features (if more than one) to identify the best
match, using any knowledge you may have that can be used to
rank them (e.g., size, position in image, etc...)

My own processing is ball tracking in motion video, so I have some
additional data in terms of adjacent frames that helps me remove
static background information and minimize the regions under
consideration for step 3, but a single image probably won't have
that. But given that you have scanned documents, there may be other
simplifying rules you can use, like eliminating anything too white or
too black (depending on label color).

My own flow works like:

1. Normalize each frame

1. Blur the frame (cvSmooth with CV_BLUR, 5x5 matrix). This
smooths out the pixel values, improving the color conversion.
2. Balance brightess (in RGB space). I ended up just offsetting
the image a fixed (x,x,x) value to maximize the RGB values.
Found it worked better doing it in RGB before Lab conversion.
3. Convert the image to the "Lab" color space. I used Lab because
the conversion process was fastest, but when frame rate isn't
critical, HLS is likely better since hue/saturation are
completely separate from lightness which makes for easier color
matching.

2. Identify uninteresting regions in the current frame

This may not apply to you, but here is where I mask out static
information from prior background frames, based on difference
calculations with the current frame, or very dark areas that I
knew couldn't include what I was interested in.

In your case, for example, if you know the label is going to show
up fairly saturated (say it's a solid red or something), you could
probably eliminate everything that is below a certain saturation
level. Or if they are black and white documents, but the label has
a color, it might be very easy to filter out everything but the
label.

If you're lucky, some simple heuristics applied here might have the
net effect of masking the majority of your document image away,
leaving primarily the label.

3. Color matching

1. Mask off regions of the image not falling within a specific Lab
pixel range, sufficient to encompass my object under a variety of
lighting/camera conditions. I typically use cvInRangeS to set
the mask bits for pixels within the range.
2. Perform an erosion/dilation process - cvMorphologyEx against the
mask as CV_MOP_CLOSE. What this does is apply an erosion
followed by a dilation. The erosion removes very small features
(likely unnecessary matches) while the dilation combines nearby
features with each other. The net effect is to strengthen
larger matched areas (and help them become contiguous) while
removing tiny features.

Note in my case I was looking for a relatively solid color ball (it
had gaps since it was a whiffle ball), so if, for example, your
label is alternating colors, or dashed lines or something like that
it might not work as well. There are more complicated algorithms
that can match more elaborate patterns, sometimes with initial
training on target images.

4. Object selection

1. Locate all top level contours of any remaining solid areas
in the mask (cvFindContours). This will identify connected
areas in the mask, so in your case, ideally one of the located
contours would be the label edge. This does assume that your
feature identification in the prior step is likely to create
contiguous areas. Even just a few pixels of gaps will net a
non-closed contour which is harder to work with, though the
morphology operation will sometimes close those gaps.
2. Evaluate "best" contour when multiple choices exist. Very small
areas are eliminated, and remaining areas are evaluated for
average Lab value distance from a target point (somewhat
arbitrarily chosen at this point to represent the "ideal" ball).
The nearest (in color distance) contour is picked, except in the
case of two "close" contours where the further contour can win
if it is at least 4x (arbitrarily chosen) as large. In your
case, for example, any contours located within the label itself
would necessarily be smaller than the label, so you could
probably just pick largest. Also, when calling cvFindContours
you can prevent it from finding "interior" contours.
3. Compute and return a minimum bounding circle (center, radius)
for the selected contour. In your case, you'd likely just use
the contour itself - you can use the contour (with 'n' line
segments) as is, or convert into an approximate polygon.

The nice thing about Python with OpenCV is the interactive
experimentation you can do right in the interpreter. Open a highgui
window, load in your image and then experiment. After performing
various processes, just quickly show the new image in the existing or
a new window. You can keep several windows up to date when you test
process an image through several transforms to see the results.

Hope this at least gives you some thoughts as to how to proceed.

-- David

From: Paul Hemans on 9 May 2010 08:03

Thanks David, that is a 'tonne' of information. I am going to have a play
with it, probably looking at masking out the contents of the label and
finding the label border within the scanned document is the place to start.
Looks like there is going to be a learning curve here.

Thanks again for your help you really put a lot of effort into this.

|
Pages: 1
Prev: Picking a license
Next: No module named server