From: Peter Olcott on

"Geoff" <geoff(a)invalid.invalid> wrote in message
news:hqaop5tffmmlfla5mjbbi6i7d3e7583sgb(a)4ax.com...
> On Sat, 13 Mar 2010 17:58:02 -0600, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>I want to make a web service, and I don't need any complex
>>protocol such as SOAP. I only need to take a stream of
>>bytes
>>representing a PNG image file as input, and provide UTF-8
>>text as output.
>>
>>I might have a bunch of small nearly simultaneous
>>requests.
>>These can be processed in their order of arrival. A few of
>>these requests may be multi-megabytes. Is socket
>>programming
>>a good way to go on this?
>>
>
> This sounds like more questions about your OCR appliance.
>
> If the clients accessing your service are using the
> Internet protocol
> to access your server you do not need to ask this
> question. You have
> no choice.

The other choices that I was envisioning were much higher
levels of abstraction that hide all of the details of
sockets.

>
> If you were coding on the Linux platform you would not
> need to ask
> this question, it would be automatically assumed that you
> are going to
> use a sockets interface for your service. It would also
> take about 30
> minutes to code and debug a sockets interface and fork
> logic to pass
> this kind of data in and out of your OCR process.
>
> Outside of your program, one could program a Perl cgi
> script behind an
> Apache based web server to copy the received data streams
> to files
> with associated cookies or "handles" to the socket
> sessions hosted by
> the web server but you would have to process the files and
> respond to
> them in such a manner that you could guarantee a reply
> before the web
> session expired. The script would handle the socket
> sessions while
> your process merely dealt with file I/O (pipes?) to the
> scripts.

What about if my system is really busy and take ten minutes
to begin processing, can I prevent the session from
expiring?

>
> Even doing this in Windows with or without MFC, just
> writing a simple
> server, one could write a socket server that passed files
> to/from your
> OCR program in a few hours.
>
> I hope you also realize that your OCR appliance could
> probably very
> easily be perverted to defeat web based captcha codes on a
> vast scale.

There is already a good replacement that I just encountered.
Instead of asking me to type the word that I see, it asked
me a question requiring a little intelligent human
judgment.--->What color is an orange?


From: Peter Olcott on

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:qkmop5ddia2p14ci5jlv35vfu780nd2mag(a)4ax.com...
> Typically, you would want to look at the HTTP protocol,
> specifically, SEND and POST
> requirements, as well as using an encoding such as
> uuencode to encode your binary data.
>
> The typical question is how you want to integrate this
> with a Web server such as IIS or
> Aparche. THis would be port 80 (or 8080) protocolos.
> Otherwise, you are free to obtain a
> registered port number from IANA (www.iana.org, takes
> about six weeks, no charge, fill in
> a simple form), and you are free to send any message in
> any format to that port #. It
> does require that you have a static IP address so your
> program can open a connection to
> that IP address (obtain via the DNS server mechanisms,
> e.g., if I had a static IP for
> flounder.com you could request a DNS resolution of
> flounder.com and find the one-and-only
> IP address that represents it; note also that a static IP
> address usually costs extra, and
> I chose no to pay the price, so the IP address of
> flounder.com changes every 24 hours).
> joe

I want to have two types of interfaces to my web service,
one a simply web based GUI, and the other would be a machine
to machine interface. I am not sure of the tradeoffs between
all of the options. For example which would work better
Sockets, SOAP or REST?
I want to minimize overhead, server development costs and
maximize ease of use. What approach would be easiest for
client side programmers on possibly diverse platforms and
languages?

>
> On Sat, 13 Mar 2010 17:58:02 -0600, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>I want to make a web service, and I don't need any complex
>>protocol such as SOAP. I only need to take a stream of
>>bytes
>>representing a PNG image file as input, and provide UTF-8
>>text as output.
>>
>>I might have a bunch of small nearly simultaneous
>>requests.
>>These can be processed in their order of arrival. A few of
>>these requests may be multi-megabytes. Is socket
>>programming
>>a good way to go on this?
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm


From: Hector Santos on
Peter Olcott wrote:

> The other choices that I was envisioning were much higher
> levels of abstraction that hide all of the details of
> sockets.


Its called HTTP programming (if you are speaking web service)

> What about if my system is really busy and take ten minutes
> to begin processing, can I prevent the session from
> expiring?


Generally, no, as the browser can have its own timeouts.

It can depend on the level of HTTP protocol,

1.0 is not persistent, a new socket connection for
each request.

1.0 is persistent, same socket connections can be
used.

In the quest to move towards "interactive clients", some browsers have
special protocols to basically keep the line active.

But you can also do this with XHR (AJAX).

Session Expiring can also depend on the web server you are using.

In general, the WEB is NOT an interactive process with the backend.
Its client/server.

Web 1.0 - client/server, traditional, no javascript

Web 2.0 - a little more of interactive operations using
javascript which allows for XHR

Web 3.0 - Web 2.0 with off-loaded clients, Flex, SilverLight
Flash, etc.

>> I hope you also realize that your OCR appliance could
>> probably very easily be perverted to defeat web based

>> captcha codes on a vast scale.

>
> There is already a good replacement that I just encountered.
> Instead of asking me to type the word that I see, it asked
> me a question requiring a little intelligent human
> judgment.--->What color is an orange?


And why can't that be learned as well?

The reasons for CAPTCHA are slowly disappearing with more WEB 2.0+
directions. Its more of a fad and more and more systems don't use it
anymore. We have it only because our silly customer THINK they ought
to have it in their SIGNUP forms. Don't need it for message posting
because users need to be logged in anyway.

Only anonymous systems have a reason for it, and anonymous systems
have been rapidly disappearing for all sorts of reasons. Anonymous
Web is how the industry got its jump start, it hurt systems like our
own and others that required logins - private by design where you
don't need a CAPTCHA. But today, nearly all systems require logins
and this has helped out business as I knew it would once the early
excitement was over.

So personally, if this is for CAPTCHA, I don't see it as worthy endeavor.

--
HLS
From: Peter Olcott on

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:vbnop5dhutg9604v8dgsdpqge5ija6d20f(a)4ax.com...
> See below...
> On Sat, 13 Mar 2010 16:35:46 -0800, Geoff
> <geoff(a)invalid.invalid> wrote:
>
>>On Sat, 13 Mar 2010 17:58:02 -0600, "Peter Olcott"
>><NoSpam(a)OCR4Screen.com> wrote:
>>
>>>I want to make a web service, and I don't need any
>>>complex
>>>protocol such as SOAP. I only need to take a stream of
>>>bytes
>>>representing a PNG image file as input, and provide UTF-8
>>>text as output.
>>>
>>>I might have a bunch of small nearly simultaneous
>>>requests.
>>>These can be processed in their order of arrival. A few
>>>of
>>>these requests may be multi-megabytes. Is socket
>>>programming
>>>a good way to go on this?
>>>
>>
>>This sounds like more questions about your OCR appliance.
>>
>>If the clients accessing your service are using the
>>Internet protocol
>>to access your server you do not need to ask this
>>question. You have
>>no choice.
>>
>>If you were coding on the Linux platform you would not
>>need to ask
>>this question, it would be automatically assumed that you
>>are going to
>>use a sockets interface for your service. It would also
>>take about 30
>>minutes to code and debug a sockets interface and fork
>>logic to pass
>>this kind of data in and out of your OCR process.
> ****
> Actually, on the server side it is faster than this,
> because it is just a cgi gateway
> script; the data is sent up uuencoded or some other
> suitable encoding, and passed as data
> to the CGI-invoked program. Of course, you have to code
> up the CGI code, and that can
> take a while, but that's going to be fixed overhead no
> matter what means is used to encode
> the client data. But an HTTP POST with suitable encoding
> on the client side is all that
> is required, plus waiting for the HTTP response, both of
> which are (a) easy and (b) the
> same on Windows and linux.
> ****

I want whatever code that handles servicing the clients to
be always memory resident. I didn't think that CGI worked
this way. I also thought that you need a separate CGI
instance for each client, I can't have that.

>>
>>Outside of your program, one could program a Perl cgi
>>script behind an
>>Apache based web server to copy the received data streams
>>to files
>>with associated cookies or "handles" to the socket
>>sessions hosted by
>>the web server but you would have to process the files and
>>respond to
>>them in such a manner that you could guarantee a reply
>>before the web
>>session expired. The script would handle the socket
>>sessions while
>>your process merely dealt with file I/O (pipes?) to the
>>scripts.
> ****
> Remember his original question about 500ms? This is part
> of the overhead that is "beyond
> the control of the program" that I kept referring to. In
> fact, the cgi invocation
> overhead is one of the performance bottlenecks of most Web
> servers,

So I will have to use something else.

> and Apache is probably
> the most efficient platform around for handling this (I
> haven't followed the latest round
> of tricks, only to note that it now has improved this time
> considerably)
> ****
>>
>>Even doing this in Windows with or without MFC, just
>>writing a simple
>>server, one could write a socket server that passed files
>>to/from your
>>OCR program in a few hours.
>>
>>I hope you also realize that your OCR appliance could
>>probably very
>>easily be perverted to defeat web based captcha codes on a
>>vast scale.
> ****
> I didn't want to mention this...but it seems pretty
> obvious. But we've always known that
> captcha codes were going to be short-lived...
>
> joe
> ****
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm


From: Hector Santos on
Peter Olcott wrote:

>>


> I want to have two types of interfaces to my web service,
> one a simply web based GUI, and the other would be a machine
> to machine interface. I am not sure of the tradeoffs between
> all of the options. For example which would work better
> Sockets, SOAP or REST?
> I want to minimize overhead, server development costs and
> maximize ease of use. What approach would be easiest for
> client side programmers on possibly diverse platforms and
> languages?


What you need to work on is your PROTOCOL - your state machine, then
that will help define all the above.

In principle, you have:

1) you have an UPLOAD/DOWNLOAD protocol to pick:

- HTTP
- Proprietary required a special client.

2) Decide on the PAYLOAD format

- XML
- SOAP
- MIME
- Proprietary required a special client.

But as a WEB SERVICE?

You need to work out your protocol, the state machine, the
client/server response framework. At that point, it really doesn't
matter what vendor or tools are used.

If you are talking about an extremely simple concept where you want a
"General service" to:

1) Transmit an image
2) Send text response

Then you need to first work out the INPUT protocol, (1) the method of
sending and the (2) payload format.

What you decide here will guide you with the development tools.

For example:

1) For the WEB FORM SERVICE, you only have a HTTP protocol to use
using a FORM submit concept. No choice here unless you want to force
users to use FireFox and others that support new "SEND DATA" protocols.

2) For the machine to machine, you can also still use the HTTP
protocol, but the client now has to do the same job the browser does,
that means learning the HTTP protocol. If you move away from this,
then you are designing a proprietary protocol. If that is what you
want, then the concepts are properly a little too deep for your here,
but here is the basic idea if you want to get stated on a machine to
machine protocol:

http://www.codeproject.com/KB/IP/cspserver.aspx


--
HLS