From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:OshXHLFyKHA.5576(a)TK2MSFTNGP05.phx.gbl...
> Hector Santos wrote:
>
>>> <not yelling, emphasizing>
>>> MEMORY BANDWIDTH SPEED IS THE BOTTLENECK
>>> </not yelling, emphasizing>
>
> >
>
>> <BIG>
>> YOU DON'T KNOW WHAT YOU ARE DOING!
>> </BIG>
>>
>> You don't have a freaking CRAY application need! Plus if
>> you said your process time is 100ms is less, then YOU
>> DON'T KNOW what you are talking about if you say you
>> can't handle more than one thread.
>>
>> It means YOU PROGRAMMED YOUR SOFTWARE WRONG!
>
> Look, you can't take a single thread process that demands
> 4GB of meta processing and believe that this is optimized
> for a WINTEL QUAD machine to run as single thread process
> instances, and then use as a BASELINE for any other
> WEB-SERVICE design. Its foolish.

Do you want me to paypal you fifty dollars? All that I need
is some way to get your paypal email address. You can email
me at PeteOlcott(a)gmail.com Only send me your paypal address
because I never check this mailbox. If you do send me your
paypal address, please tell me so I can check this email box
that I never otherwise check.

>
> You have to redesign your OCR software to make it
> thread-ready and use sharable data so that its only LOADED
> once and USED many times.
>
> If you have thousands of font glyph files, then you can
> use a memory mapped class array shared data. I will
> guarantee you that will allow you to run multiple threads.

I am still convinced that multiple threads for my OCR
process is a bad idea. I think that the only reason that you
are not seeing this is that you don't understand my
technology well enough. I also don't think that there exists
any possible redesign that would not reduce performance. The
design is fundamentally based on leveraging large amounts of
RAM to increase speed. Because I am specifically leveraging
RAM to increase speed, the architecture is necessarily
memory bandwidth intensive.

>
> But if you insist it can only be a FIFO single thread
> processor, well, you are really wasting people time here
> because everything else you want to do contradicts your
> limitations. You want to put a web server INTO your OCR,
> when in reality, you need to put your OCR into your WEB
> SERVER.
>
> --
> HLS


From: Hector Santos on
Peter Olcott wrote:

>> Look, you can't take a single thread process that demands
>> 4GB of meta processing and believe that this is optimized
>> for a WINTEL QUAD machine to run as single thread process
>> instances, and then use as a BASELINE for any other
>> WEB-SERVICE design. Its foolish.
>
> Do you want me to paypal you fifty dollars?


No, please stop. You need it more than I do.

I am here on my own wishes and have the time currently to provide
input, a process I enjoy doing for FREE. If i didn't, I would not.


>> If you have thousands of font glyph files, then you can
>> use a memory mapped class array shared data. I will
>> guarantee you that will allow you to run multiple threads.
>
> I am still convinced that multiple threads for my OCR
> process is a bad idea. I think that the only reason that you
> are not seeing this is that you don't understand my
> technology well enough.


I do know more *A LOT* OCR processing than you might believe. My first
computer company in the 80s was OptiSoft, Inc. which produced
electronic file cabinets with OCR requirements. So I know quite about
about the issues.

That said, a computer is a computer. When you become an expert at
something, specifically software engineering, with hundreds of
products and projects under your belt, you don't need to know the
specifics to know what it takes to design and optimize ANY processing.

> I also don't think that there exists
> any possible redesign that would not reduce performance.


Thats because you lack of design experience for high end performance
programming.

> The
> design is fundamentally based on leveraging large amounts of
> RAM to increase speed. Because I am specifically leveraging RAM

> to increase speed, the architecture is necessarily memory
> bandwidth intensive.

That is whats called an optimal boundary condition. How it deviates
from the optimal is a natural part of designing for efficiency.

Your 4GB font library is READ ONLY - NOT WRITE, that right there tells
you should not have any multi-access contention bottlenecks.

Your problem is that for EACH process you want each PROCESS to have
4GB of READ ONLY memory. Your design is flawed right there.

--
HLS
From: Joseph M. Newcomer on
See below...
On Fri, 19 Mar 2010 23:11:24 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:2bg8q5p3k3pf1fdgnsvm5fv566jins4kuf(a)4ax.com...
>> See below...
>> On Fri, 19 Mar 2010 19:21:17 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>
>>>
>>>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>>message
>>>news:OdDHnT7xKHA.5480(a)TK2MSFTNGP06.phx.gbl...
>>>> Peter Olcott Asked:
>>>>
>>>>> If I can not reject a file at the HTTP level, then I
>>>>> have
>>>>> to work at a lower level. In the ideal case I can
>>>>> receive
>>>>> the file size before any of the rest of the file is
>>>>> sent,
>>>>> and reject is with only a few bytes of wasted
>>>>> bandwidth.
>>>>
>>>>
>>>> You will get the size (Content-Length:) of the HTTP
>>>> request body from the HTTP request header block.
>>>>
>>>> You can issue an error at that point after receiving the
>>>> header and before receiving the body. This is
>>>> described
>>>> in HTTP 1.1 standard
>>>
>>>This also includes all lower levels (TCP/IP, et cetera) of
>>>any data transmitted over HTTP?
>>>
>>>I imagine that there could be an underlying buffer that
>>>another lower level protocol uses, such that when HTTP
>>>sees
>>>the first 20 bytes, this lower level protocol has already
>>>eaten up 100K of my bandwidth quota.
>> ****
>> You are completley missing the point here, once again
>> getting hung up on irrelevant
>> concepts such as "buffers" and "packets". I repeat, you
>> have NO CONTROL over any of this,
>> which is purely magic. Your app gets a stream. YOu have
>> NO IDEA how much of that stream
>
>I have to take control of this to prevent denial of service
>attacks, even if I have to go to a much lower level than
>HTTP. From speaking with Hector, it looks like the lowest
>reasonable level is sockets. The next higher level that
>seems to make the most sense is HTTP.
D-S-S attacks can be prevented ONLY at the server, there is NOTHING you can do at the
client to prevent them. This involves having smart firewalls, stream validation built
into your app, etc. Nothing else you can do will stop me if I want to launch a D-O-S
attack on your site! A friend of mine is one of the world's top experts on network
security, and I learned a lot listening to him tell "war stories".
joe
****
>
>> is sitting in buffers in your server already; it might be
>> 276 bytes, and it might be 100K
>> bytes, and it is nothing you need to worry about. That's
>> what shutdown() is for: it says
>
>I want to prevent malicious users from eating up my service
>providers bandwidth allocation. I would like to do this in
>an optimal way. I do not yet understand the technology quite
>well enough to determine what this way would be.
****
You can't., That's your ISP's responsbilitity. They may provide you an API by which you
can inform them about malicious attacks, but you can't stop someone from doing this. It
isn't even worth trying.
joe
****
>
>> "kill off any incoming buffered data I haven't received
>> yet and don't bother receiving any
>> more" and "make sure any outgoing buffers are sent NOW!"
>> (each of these is indicated by a
>> 1-bit flag that forms the parameter value of shutdown().
>> You are obsessing over
>> irrelevancies here. And go read about TCP/IP buffer
>> management and packet flow, and then
>> realize that NONE of these aspects of TCP/IP are remotely
>> visible to you as a TCP/IP
>> programmer, even if you are ignoring HTTP level and
>> wrirting raw socket code! For HTTP,
>> you are typically going to get the whole thing that was
>> sent, before the protocol bothers
>> to inform you that something has arrived; the "something"
>> is an atomic entity. It is
>> going to feed it as a stream to stdin, and you will have
>> NO IDEA what packet magic, MTU
>> boundaries, etc. are involved in getting that stream to
>> your app. So just go do it, and
>> stop worrying about cases that are not going to be any
>> problem in practice. It is simply
>> MANDATORY that your server will validate every detail of
>> the file format, including
>> illegal PNG encodings; anything else is a minor
>> implementation detail, and should be below
>> your radar.
>
>I was thinking that someone might be sending me multiple GB
>files in a tight loop to eat up my bandwidth budget. It just
>occurred to me that this would also cost them bandwidth too.
****
Sure, but they could be launching this via zombie attacks, so it isn't their bandwidth. If
I were evil and wanted to take you out, I know half a dozen ways I could do so, and not
one of your ideas would stop me. So you are focusing on the wrong problem. Get your
service up and working, and if you log a lot of bogus attempts, talk to your ISP about
what do do about it. They had better know. It's one reason I outsource my Web hosting; I
don't have to become a network security expert!
joe
****
>
>>
>> Yes, bad transmissions use up bandwidth. Technically, we
>> refer to this using the phrase
>> "life is hard". Meaning, tough, suck it up and live with
>> the fact.Until there is
>> convincing proof that this is a serious issue, it is not
>> worth worrying about. Note that
>> you can create a "blacklist" of IP addresses that you
>> refuse to accept connections from,
>> and if you do this, realize that these addresses (used by
>> crackers who are trying to break
>> your program) are probably spoofed addresses of
>> potentially legitimate users. Life is
>> hard.. You can "age" the blacklist so repeated trials from
>> the same script kiddie will
>> probably be rejected right away, but in case they spoofed
>> a legitimate address, it will
>> become legal after N hours, for some N of your choice
>> (N=24, N=24*k for k a number of
>> days, are good initial ideas for aging parameters).
>
>Sounds like a good idea.
>
>> joe
>> ****
>> joe
>> ****
>>>
>>>> RFC 2068, section 8.2 Message Transmission Requirements:
>>>>
>>>> http://www.ietf.org/rfc/rfc2068.txt
>>>>
>>>> This is a feature of a HTTP 1.1 server, not HTTP 1.0
>>>> server which generally requires the entire payload to be
>>>> received first.
>>>>
>>>> You HTTP 1.1 web server needs to do this very carefully,
>>>> otherwise it can cause resends by the clients.
>>>>
>>>>> After I determine that the file is not too large I then
>>>>> get however many minimal bytes are required to
>>>>> determine
>>>>> the file type, and then reject the rest of the file if
>>>>> it
>>>>> is not 24-bit PNG.
>>>>
>>>>
>>>> Only HTTP 1.1 clients will gracefully support a
>>>> mid-stream
>>>> reject by the web server. Otherwise, resends can occur.
>>>>
>>>> In other words, if the client is using HTTP 1.0, you
>>>> will
>>>> see that in the first line of the HTTP request header
>>>> block, you could either reject the usage of this client
>>>> or
>>>> ignore the fact the user will see irregular
>>>> "disconnected"
>>>> error pages.
>>>>
>>>> --
>>>> HLS
>>>
>>>Since HTTP 1.1 has been around for 14 years I may simply
>>>reject all HTTP 1.0 calls and request the user update to a
>>>newer browser.
>>>
>>>Or I could force the user to use some sort of java applet
>>>that sends the data using the HTTP 1.1 format. This client
>>>side code could also verify the size and type of the file
>>>(specifically 24-bit PNG) before anything is sent. It
>>>could
>>>also provide a file search dialogbox that only looks for
>>>PNG
>>>files. With this scenario I would no longer be limited to
>>>HTTP, I could devise my own protocol that could strip off
>>>some of the extra HTTP baggage.
>>>
>>>Would I be back to sockets again if I did this? Would
>>>this
>>>be programming at the TCP/IP level or some other level?
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on
For authentication, there's a lot of technology out there already. You should be holding
these conversations with your ISP to see what services they provide,and how much they
cost.
joe


On Fri, 19 Mar 2010 23:42:36 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:eUo4lS%23xKHA.2012(a)TK2MSFTNGP04.phx.gbl...
>> Peter Olcott wrote:
>>
>>> I want to prevent malicious users from eating up my
>>> service providers bandwidth allocation. I would like to
>>> do this in an optimal way.
>>
>>
>> Are you kidding me?
>>
>> Don't you have in your business plan to get your own WEB
>> SERVER and not use GoDaddy?
>>
>>
>> --
>> HLS
>
>I think I have a way to minimize abusive users. It does make
>it a little more difficult for my legitimate users, but, it
>is worth the cost. Everyone must be an authenticated user,
>even the free trial users. Every user must have a valid
>email address. Every user must prove that they are human
>(and not a bot) when setting up their user account. Non
>authenticated users have no access. Users that are abusive
>simply get their account cancelled. An email is sent to
>their valid email address explaining why the account was
>cancelled.
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:eCVZmuFyKHA.2644(a)TK2MSFTNGP04.phx.gbl...
> Peter Olcott wrote:
>
>>> Look, you can't take a single thread process that
>>> demands 4GB of meta processing and believe that this is
>>> optimized for a WINTEL QUAD machine to run as single
>>> thread process instances, and then use as a BASELINE for
>>> any other WEB-SERVICE design. Its foolish.
>>
>> Do you want me to paypal you fifty dollars?
>
>
> No, please stop. You need it more than I do.
>
> I am here on my own wishes and have the time currently to
> provide input, a process I enjoy doing for FREE. If i
> didn't, I would not.
>
>
>>> If you have thousands of font glyph files, then you can
>>> use a memory mapped class array shared data. I will
>>> guarantee you that will allow you to run multiple
>>> threads.
>>
>> I am still convinced that multiple threads for my OCR
>> process is a bad idea. I think that the only reason that
>> you are not seeing this is that you don't understand my
>> technology well enough.
>
>
> I do know more *A LOT* OCR processing than you might
> believe. My first computer company in the 80s was
> OptiSoft, Inc. which produced electronic file cabinets
> with OCR requirements. So I know quite about about the
> issues.

I am really only using the term OCR in a figurative sense.
My technology is entirely different than OCR in that it is
completely deterministic rather than stochastic. It is based
on a deterministic finite automaton. The DFA is the
recognizer, and comprises most of my memory requirements.
necessarily this architecture would be memory bandwidth
intensive. Transitioning from one DFA state to another
mostly involves reading memory, this is inherent in the way
that DFAs work. All of the intelligence of a DFA is encoded
in a huge lookup table, thus looking things up in the table
is most of what a DFA does.

>
> That said, a computer is a computer. When you become an
> expert at something, specifically software engineering,
> with hundreds of products and projects under your belt,
> you don't need to know the specifics to know what it takes
> to design and optimize ANY processing.
>
>> I also don't think that there exists any possible
>> redesign that would not reduce performance.
>
>
> Thats because you lack of design experience for high end
> performance programming.
>
>> The design is fundamentally based on leveraging large
>> amounts of RAM to increase speed. Because I am
>> specifically leveraging RAM
>
> > to increase speed, the architecture is necessarily
> > memory
> > bandwidth intensive.
>
> That is whats called an optimal boundary condition. How
> it deviates from the optimal is a natural part of
> designing for efficiency.
>
> Your 4GB font library is READ ONLY - NOT WRITE, that right
> there tells you should not have any multi-access
> contention bottlenecks.
>
> Your problem is that for EACH process you want each
> PROCESS to have 4GB of READ ONLY memory. Your design is
> flawed right there.
>
> --
> HLS