From: Raul Parolari on
I have some final results on the problem described.

First, I must correct something on my initial post; I had stated that
there was an occasional high delay between the moment of the 'select'
call and the 'select' return (ie, although the timeout set in the select
was 50msec, the delay could be of 5 seconds).

Actually, tracing across all calls in that section of code, the delay
occurs between the select return and the 'recvfrom'. The details of what
follows may be of interest to anyone using Ruby for fast communication.
Test environment:
- pure Ruby 1.9.2 (no gems, just the 'socket' library) on an ubuntu
machine (lots of memory)
- Ruby sends 4 Udp msgs per second to a micro-controller
- The micro (C/assembler) responds (Udp) within a 10-30 milliseconds
range
- So it is 4 msgs sent and 4 responses rcvd every second

This is what I saw since midnight in one of the systems (the symbol
'<->' means 1 msg sent and response; the symbol '!!' was inserted to
grep all abnormal results):

# Time as Hour:Min:Sec:Msec; the 'delay_sel_rcv' (the time between
return of 'select' and 'recvfrom') value is in Seconds

# log from midnight; all perfect until 1:21 am

01:21:19:914: <->: !! delay_sel_rcv=10.006525661
01:21:29:928: <->: !! delay_sel_rcv=10.010217133
01:21:39:937: <->: !! delay_sel_rcv=10.004327574
01:21:49:954: <->: !! delay_sel_rcv=10.011541082
01:21:59:972: <->: !! delay_sel_rcv=10.005877574
01:22:05:973: <->: !! delay_sel_rcv= 5.998151639

# then all ok unti:

02:22:27:374: <->: !! delay_sel_rcv=10.008022384
02:22:37:394: <->: !! delay_sel_rcv=10.008430684
02:22:47:401: <->: !! delay_sel_rcv=10.004019076
02:22:57:409: <->: !! delay_sel_rcv=10.005836859
02:23:07:580: <->: !! delay_sel_rcv=10.008476556
02:23:17:610: <->: !! delay_sel_rcv=10.007506338
02:23:27:642: <->: !! delay_sel_rcv=10.007311141
02:23:37:655: <->: !! delay_sel_rcv=10.008225368
02:23:47:685: <->: !! delay_sel_rcv=10.018187389

# then all ok until
04:24:08:873: <->: !! delay_sel_rcv=10.006355125
...

We can see from the above:

- the first 80 minutes (from midnight to 01:21) went fine
- then we see several delays of 10 seconds, in the same minute (each 10
seconds apart from the other)
- for 1 hour all was pefect again, exchanging some 12,000 messages with
perfect timing
- then we have 9 delays of 10 seconds (again separated by 10 seconds)
- for 1 hour all went fine again; then the cycle repeats

This pattern can only indicate (in my view) the garbage collector, which
Ruby seems to run for 10 seconds several times in the same minute or so.
I could not put the calls to GC.disable/enable (to have the final
proof), around the select/recvfrom (not to interfere with a real
experiment that was moving heavy equipment). Notice that, if it is the
GC, disabling/enabling it will only shift the problem from one area of
the communication handler to another (and thus having a similar impact
on the applications using the comm handler).

Interestingly, this problem does not happen within 1 computer; I used
the identical Ruby program but replacing the Firmware with a Ruby
simulator (with same machines, same Udp and the same binary strings
exchanged); in a test of 10 hours, I only saw occasional "delays" betwen
select and recvfrom, but in the order of 100 milliseconds, and never of
5/10 seconds.
This would seem to indicate an inefficiency in the Udp stack (when used
across computers).

My conclusion is that if you want a predictable delay (with values
spread across a 'tight bell' curve, not just increasing the timeout to
cope with 'everything'), you must use (for that section of the software)
a compiled language; at least until the technology of garbage collector
changes.

I hope that this is useful to others who use Ruby for high speed
communication (and the ones working on garbage collectors).
--

Last note: one year ago I met in a party a JPL engineer working on the
Mars exploration program; he was admirative of Ruby, but after some
jokes on the expressivity and beauty in old and new languages, he added
that they would never use scripting languages because "we don't want the
garbage collector to enter in action just when we should to begin to
slow down the spacecraft near Mars and miss the landing! in fact, we
don't even use C++, as we did not find it totally predictable; so we
will still use C for years to come".

I never knew how well I would learn to appreciate his point

Raul Parolari

raulparolari(a)gmail.com

--
Posted via http://www.ruby-forum.com/.

From: Roger Pack on

> - for 1 hour all went fine again; then the cycle repeats
>
> This pattern can only indicate (in my view) the garbage collector, which
> Ruby seems to run for 10 seconds several times in the same minute or so.

You can test your hypothesis by running it against jruby with a
"constant time" GC.
If you're using 1.9 you could also test your hypothesis by using the
GC::Profiler
http://en.wikibooks.org/wiki/Ruby_Programming/Reference/Objects/IO/GC/Profiler

Also the "exact" 5 second delays makes me wonder if it isn't a DNS
lookup.

Try your code with
BasicSocket.do_not_reverse_lookup=true

-r
--
Posted via http://www.ruby-forum.com/.

From: Raul Parolari on
Roger Pack wrote:

> You can test your hypothesis by running it against jruby with a
> "constant time" GC.
> If you're using 1.9 you could also test your hypothesis by using the
> GC::Profiler
> http://en.wikibooks.org/wiki/Ruby_Programming/Reference/Objects/IO/GC/Profiler
>
> Also the "exact" 5 second delays makes me wonder if it isn't a DNS
> lookup.
>
> Try your code with
> BasicSocket.do_not_reverse_lookup=true

Three observations:

a) I do not think that there can be a DNS lookup between a 'select(x, y,
z)' and a 'socket.recvfrom'.

I also clarify that the 2 statements are almost sequential; after the
select, you verify that the there is a 'read condition' and then you
issue the 'recvfrom'. The whole section takes less than 10 lines of code
(with printouts, etc), and it should not generate any external query.

b) for the same reason I did not need a profiler (in a real system
moving machinery) to evaluate the time spent between those 2 statements.
Just computing the difference between the 2 times ('Time.now' when
select and recvfrom return), and then printing when the difference was >
a threshold was enough.

As I said, this happened regularly (after many thousands of messages
were exchanged with perfect timing), producing several 'processing gaps'
of either 5 or 10 seconds every hour (approx). This occurred only when
UDP was used over Ethernet between 2 machines (with the apps inside the
same computer I never saw the problem).

c) I could not try jruby. Although Java may not be immune to perhaps
similar problems (see problems found in the Android phone).

My conclusion is that dynamic allocation and deallocation of memory for
tasks that occur every 100 msec or so is to be minimized (if possible;
else, move that task to a C program).

Still, more fundamentally, I think that we need a different garbage
collector architecture for Ruby. All the games disabling/enabling Gc,
tinkering, restructuring code, etc, are not the solution (for me).

--
Posted via http://www.ruby-forum.com/.

From: Roger Pack on

>> Try your code with
>> BasicSocket.do_not_reverse_lookup=true
>
> Three observations:
>
> a) I do not think that there can be a DNS lookup between a 'select(x, y,
> z)' and a 'socket.recvfrom'.

It might, see
http://www.ensta.fr/~diam/ruby/online/ruby-doc-stdlib/libdoc/socket/rdoc/classes/Socket.html#M001258

Why don't you try it and find out?

> I could not try jruby.

Did it not install?

Also did you try 1.9.1 with it's GC::Profiler?

-r
--
Posted via http://www.ruby-forum.com/.

From: Joel VanderWerf on
Roger Pack wrote:
>>> Try your code with
>>> BasicSocket.do_not_reverse_lookup=true
>> Three observations:
>>
>> a) I do not think that there can be a DNS lookup between a 'select(x, y,
>> z)' and a 'socket.recvfrom'.
>
> It might, see
> http://www.ensta.fr/~diam/ruby/online/ruby-doc-stdlib/libdoc/socket/rdoc/classes/Socket.html#M001258
>
> Why don't you try it and find out?

Seconded: it's worth a try.

Also, if you really believe this is GC-related, you can prove it by
GC.disable before the select, GC.enable after the recvfrom.

It seems highly unlikely that GC would be eating up 10 seconds.

 |  Next  |  Last
Pages: 1 2 3 4
Prev: Threading with Activerecord
Next: Aliasing classes