New utf8string design may make UTF-8 the superior encoding [MFC]

Prev: UTF-8 string in MBCS project
Next: Love Potion for Miss Blandish

From: Peter Olcott on 20 May 2010 00:36

On 5/19/2010 11:13 PM, Pete Delgado wrote:
> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote in message
> news:5O2dnS2UptANt2nWnZ2dnUVZ_rqdnZ2d(a)giganews.com...
>> Here are the actual results from the working prototype of my original DFA
>> based glyph recognition engine.
>> http://www.ocr4screen.com/Unique.html
>> The new algorithm is much better than this.
>
> The salient points that you fail to mention is that the alternative
> solutions can perform OCR on *any* font while your implementation requires

Yes and quite often with zero percent accuracy at screen resolutions.
The most accurate alternative system scored about 25% accuracy on the
sample image and was 872-fold slower. The market leader consistently
scored zero percent accuracy on any image of text at display screen
resolutions.

> the customer to tell the OCR system which font (including all specifics such
> as point size) is being used.

This is not true.

> In addition, the other systems can perform
> when the font is not consistent in the document or if different font weights
> are used, your implementation cannot and will fail miserably.

No 100% accuracy there too.

>
> All in all, very misleading.
>
> PS: The information used in my critique of your OCR system was obtained by
> looking at your prior posts as well as your patent and are not merely
> conjecture.
>
> -Pete
>
>

From: Liviu on 20 May 2010 01:40

"Pete Delgado" <Peter.Delgado(a)NoSpam.com> wrote...
> "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote...
>> Here are the actual results from the working prototype of my
>> original DFA based glyph recognition engine.
>> http://www.ocr4screen.com/Unique.html
>> The new algorithm is much better than this.
>
> The salient points that you fail to mention is that the alternative
> solutions can perform OCR on *any* font while your implementation
> requires the customer to tell the OCR system which font (including
> all specifics such as point size) is being used.

Right, and also what liberties the OS might take with the rendering
of the font on a given device context. I believe this has been
established before in previous (and equally "entertaining") threads.

It's not so much "recognition" as "locating" some a priori known
pattern in an array of pixels. Come to think of it, this could be easily
done with some variation of regex. With the new development put
forward now, this has been postulated to be optimally implemented
with a DFA, which in turn resolves back to a regex. Iterating the
logic a few times, it could yield a near-0 cost for the whole process.
Too bad that (last I checked, at least) there was one step where
transactional integrity was proposed to be backed by email ;-)

Liviu

From: Mihai N. on 20 May 2010 02:46

> wchar_t is 16 bits on windows, and
> 32 bits on most Unix-like systems IIRC.

To make things worse, wchar_t can (in theory) be 8 bits
(the C standard allows it) and there is in no way guaranteed
to be some form of Unicode (in fact there is one system the
I know of that uses wchar_t for non-Unicode strings).

> Then, locales in my experience have not been terribly portable,

Agree. And I think that's again the fault of the C standard.
Which is these areas feels more like a set of guidelines than
a standard :-)

--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: Mihai N. on 20 May 2010 02:54

> If your compiler defines wchar_t as 16 bits, then
> it implies UTF-16 encoding

Nope. wchar_t does not imply Unicode.
I think this is caused by the great reluctance of the C/C++ standards
to refer to other standards. They try to be self-sufficient.

Happily enough, this seems to be changing lately (still too slow).

> Well, the locale names are supposed to be the ISO standard
> string designators

From what I know, that is not specified anywhere in the C/C++ standard.
A locale can be anything you want it to be.
POSIX added something, but it is quite outdated.

UTS-35 (Unicode Technical Standard #35, http://unicode.org/reports/tr35/)
is the best thing right now. And you can use it with ICU (again the best
platform-independent solution for locale aware support (ICU has it's own
problems though))

--
Mihai Nita [Microsoft MVP, Visual C++]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

From: Hector Santos on 20 May 2010 02:54

Good points P. Delgado. Lets also make a few major notes:

- He has no products,
- He has no customers,
- He has no competition!
- He has been at this for past 9-10 years.
- He has the IQ of a Pre-Med student, hence he is smarter than us,
- His process once required 5 gb of resident PURE memory, then 3gb
then 1.5gb.
- He wasn't familiarr memory virtualization, fragmentation

Remember that classic thread? when he finally proven wrong and he
admitted all his knowledge about an OS comes from a 25 year old OS
class book and he forgot the read the 2nd half because the final exam
was cancelled, hence he didn't know about memory virtualization ideas.
But he was going to catch up now. :)

- He has no concept of threads even when provided thread based code,
- He believes Multiple Queue/Multiple Servant FIFO is superior
- He invented the fastest string class in the world
- He wanted to make SQLITE3 behave like MYSQL, SEQUEL
- He wanted to use ISAM offset ideas for SQL records
- He wants to do all this in a single cpu computer.
- He wants fault tolerants without DISK I/O.

Did I miss anything? I'm pretty sure I did. Oh yeah..

- He wants a secured computer at customer sites that no one can touch
because they might still his software.

Did I mention he has no products? no customers? and no competitor? <g>

Since 2006, his products would be available in the FALL and will be
available are ActiveX, but oh yeah

- He wants to use Linux with no GUI and in REAL TIME.

But the Linux people don't seem to be too helpful and needs to come to
the MFC forum because "this is where people answer his patent claim
questions."

Go Figure.

--
HLS

On May 20, 12:13 am, "Pete Delgado" <Peter.Delg...(a)NoSpam.com> wrote:
> "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote in message
>
> news:5O2dnS2UptANt2nWnZ2dnUVZ_rqdnZ2d(a)giganews.com...
>
> > Here are the actual results from the working prototype of my original DFA
> > based glyph recognition engine.
> > http://www.ocr4screen.com/Unique.html
> > The new algorithm is much better than this.
>
> The salient points that you fail to mention is that the alternative
> solutions can perform OCR on *any* font while your implementation requires
> the customer to tell the OCR system which font (including all specifics such
> as point size) is being used. In addition, the other systems can perform
> when the font is not consistent in the document or if different font weights
> are used, your implementation cannot and will fail miserably.
>
> All in all, very misleading.
>
> PS: The information used in my critique of your OCR system was obtained by
> looking at your prior posts as well as your patent and are not merely
> conjecture.
>
> -Pete

First | Prev | Next | Last
Pages: 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Prev: UTF-8 string in MBCS project
Next: Love Potion for Miss Blandish