From: hagman on
On 4 Okt., 10:15, federico <smokin...(a)gmail.com> wrote:
> Hello everybody!
>
> I write in the hope somebody will be so kind to show me the right way to solve my problem.. It's a mixture of simulation theory and statistics, so I hope this is the right place for asking.. :)
>
> In brief, I have a simulator (it simulates an IT system) from which I observe 2 correlated random variables, called "Events" and "Failures" (the meaning of "Events" is "total requests submitted to the system" while "Failures" counts the number of requests which experienced a failure (i.e. requests which have not been successfully served by the system)). During the simulation, every time a request is succesfully completed, "Events" is updated, while every time a failure occours both the measures are updated. My simulation, like every simulation, consists of more than one run: at the end of each run I print the value of both my variables to a trace file.
>
> My problem is that I'm not directly interested to the 2 measures, but rather to their ratio, which I use to calculate the reliability of the system:
>
> Rel=1-(Failures/Events)
>
> Since I have to compare such value of Reliability with the value calculated by another tool (to whom of course I submit the same system), I'd like to say, with statistical evidence, whether the results provided by the 2 tools are similar or not.
>
> This means that I should build a confidence interval for Rel: unfortunately, my sample of (Failures, Events) is very small, since a simulation consists usually of not more than 10-15 runs (which means having 10-15 samples).
>
> So, my question is: is there an approach for building a confidence interval based on a sample consisting of just 10-15 observations? Otherwise, which approach could represent a good compromise?
>
> Thank you so much for your attention and help!
>
> Bye,
> Federico

Well, if you have only 15 events, there can only be 0 or 1 or ... or
15 failures.
In fact I assume you have only, say, 0 or 1 or 2 or 3 or 4 failures (a
system having more failures with significant probability might not be
worth investigating anyway)
There is not much possible to end up with a really high confidence
from this.

If we assume that the events are independant and have a failure
probability of p,
you can calculate the probability of observing k or more failures in a
sample of n
and may want to reject your hypothesis about p if the result is below
your
confidence threshold.
From: federico on
Hi hagman, and thank you so much for your reply..
Maybe I didn't explain well the problem: each run is an execution of the system, independent from the other ones.. So, for example, with 10 runs I got the following sample:

Fail. Events Fail/Events Reliability
224 1956 0,114519427 0,885480573
217 1950 0,111282051 0,888717949
192 1976 0,097165992 0,902834008
196 1966 0,099694812 0,900305188
190 1935 0,098191214 0,901808786
196 1937 0,101187403 0,898812597
202 1988 0,101609658 0,898390342
192 1908 0,100628931 0,899371069
192 1836 0,104575163 0,895424837
206 1927 0,10690192 0,89309808

The mean for reliability, over the 10 samples, is 0,896434285.
The other tool I mentioned provided me with a value of 0.902 for reliability: I'd like to say, in a non-empiric way, that the two tools provided a statistically similar result..

Any idea?

Thanks again!
From: James Waldby on
On Sun, 04 Oct 2009 08:59:32 -0400, federico wrote:

[Re his earlier post of 04 Oct 2009 04:15:19 -0400 in which he wrote,
> In brief, I have a simulator ... from which I
> observe 2 correlated random variables, called "Events" and "Failures"
> (the meaning of "Events" is "total requests submitted to the system"
> while "Failures" counts the number of requests which experienced a
> failure (i.e. requests which have not been successfully served by the
> system)...
> Since I have to compare such [data] with the value
> calculated by another tool (to whom of course I submit the same system),
> I'd like to say, with statistical evidence, whether the results provided
> by the 2 tools are similar or not.
]
> Hi hagman, and thank you so much for your reply.. Maybe I didn't explain
> well the problem: each run is an execution of the system, independent
> from the other ones.. So, for example, with 10 runs I got the following
> sample:
>
> Fail. Events Fail/Events Reliability 224 1956
> 0,114519427 0,885480573
> 217 1950 0,111282051 0,888717949
> 192 1976 0,097165992 0,902834008
> 196 1966 0,099694812 0,900305188
> 190 1935 0,098191214 0,901808786
> 196 1937 0,101187403 0,898812597
> 202 1988 0,101609658 0,898390342
> 192 1908 0,100628931 0,899371069
> 192 1836 0,104575163 0,895424837
> 206 1927 0,10690192 0,89309808
>
> The mean for reliability, over the 10 samples, is 0,896434285. The other
> tool I mentioned provided me with a value of 0.902 for reliability: I'd
> like to say, in a non-empiric way, that the two tools provided a
> statistically similar result..

It probably would be better to randomly create n different system
settings, S1 ... Sn, and for each Si, run the system once in each of
the two simulators, and then use a paired t-test with n-1 degrees of
freedom (see <http://en.wikipedia.org/wiki/Paired_difference_test>
and <http://en.wikipedia.org/wiki/Statistical_hypothesis_testing>).

Usual t-test assumptions are that the random variables are
normally distributed, but as n grows larger (eg n > 30) that
provision becomes less important.

--
jiw
From: Ray Koopman on
[reply cross-posted to sci.stat.consult]

On Oct 4, 9:35 am, James Waldby <n...(a)no.no> wrote:
> On Sun, 04 Oct 2009 08:59:32 -0400, federico wrote:
>
> [Re his earlier post of 04 Oct 2009 04:15:19 -0400 in which he wrote,
>
>> In brief, I have a simulator ... from which I
>> observe 2 correlated random variables, called "Events" and "Failures"
>> (the meaning of "Events" is "total requests submitted to the system"
>> while "Failures" counts the number of requests which experienced a
>> failure (i.e. requests which have not been successfully served by the
>> system)...
>> Since I have to compare such [data] with the value
>> calculated by another tool (to whom of course I submit the same system),
>> I'd like to say, with statistical evidence, whether the results provided
>> by the 2 tools are similar or not.
> ]
>> Hi hagman, and thank you so much for your reply.. Maybe I didn't explain
>> well the problem: each run is an execution of the system, independent
>> from the other ones.. So, for example, with 10 runs I got the following
>> sample:
>
>> Fail. Events Fail/Events Reliability
>> 224 1956 0,114519427 0,885480573
>> 217 1950 0,111282051 0,888717949
>> 192 1976 0,097165992 0,902834008
>> 196 1966 0,099694812 0,900305188
>> 190 1935 0,098191214 0,901808786
>> 196 1937 0,101187403 0,898812597
>> 202 1988 0,101609658 0,898390342
>> 192 1908 0,100628931 0,899371069
>> 192 1836 0,104575163 0,895424837
>> 206 1927 0,10690192 0,89309808
>
>> The mean for reliability, over the 10 samples, is 0,896434285. The other
>> tool I mentioned provided me with a value of 0.902 for reliability: I'd
>> like to say, in a non-empiric way, that the two tools provided a
>> statistically similar result..
>
> It probably would be better to randomly create n different system
> settings, S1 ... Sn, and for each Si, run the system once in each of
> the two simulators, and then use a paired t-test with n-1 degrees of
> freedom (see <http://en.wikipedia.org/wiki/Paired_difference_test>
> and <http://en.wikipedia.org/wiki/Statistical_hypothesis_testing>).
>
> Usual t-test assumptions are that the random variables are
> normally distributed, but as n grows larger (eg n > 30) that
> provision becomes less important.

Yes, the same simulator settings should be used for both tools, so
that a paired-data analysis can be done. However, confidence intervals
for the reliabilities and for the differences in reliabilities would
be more to the point than would a t-test on the mean difference. And a
simple scatterplot of the reliability pairs also would be informative.
(Most statisticians would probably do the scatterplot first, as a
diagnostic.)
From: James Waldby on
On Sun, 04 Oct 2009 11:18:53 -0700, Ray Koopman wrote:

> [reply cross-posted to sci.stat.consult]
>
> On Oct 4, 9:35 am, James Waldby <n...(a)no.no> wrote:
>> On Sun, 04 Oct 2009 08:59:32 -0400, federico wrote:
>>
>> [Re his earlier post of 04 Oct 2009 04:15:19 -0400 in which he wrote,
>>> In brief, I have a simulator ... from which I observe 2 correlated
>>> random variables, called "Events" and "Failures" (the meaning of
>>> "Events" is "total requests submitted to the system" while "Failures"
>>> counts the number of requests which experienced a failure (i.e.
>>> requests which have not been successfully served by the system)...
>>> Since I have to compare such [data] with the value calculated by
>>> another tool (to whom of course I submit the same system), I'd like to
>>> say, with statistical evidence, whether the results provided by the 2
>>> tools are similar or not.
>> ]
>>> Hi hagman, and thank you so much for your reply.. Maybe I didn't
>>> explain well the problem: each run is an execution of the system,
>>> independent from the other ones.. So, for example, with 10 runs I got
>>> the following sample:
>>
>>> Fail. Events Fail/Events Reliability
>>> 224 1956 0,114519427 0,885480573
>>> 217 1950 0,111282051 0,888717949
>>> 192 1976 0,097165992 0,902834008
>>> 196 1966 0,099694812 0,900305188
>>> 190 1935 0,098191214 0,901808786
>>> 196 1937 0,101187403 0,898812597 [...]
>>
>>> The mean for reliability, over the 10 samples, is 0,896434285. The
>>> other tool I mentioned provided me with a value of 0.902 for
>>> reliability: I'd like to say, in a non-empiric way, that the two tools
>>> provided a statistically similar result..
>>
>> It probably would be better to randomly create n different system
>> settings, S1 ... Sn, and for each Si, run the system once in each of
>> the two simulators, and then use a paired t-test with n-1 degrees of
>> freedom (see <http://en.wikipedia.org/wiki/Paired_difference_test> and
>> <http://en.wikipedia.org/wiki/Statistical_hypothesis_testing>).
>>
>> Usual t-test assumptions are that the random variables are normally
>> distributed, but as n grows larger (eg n > 30) that provision becomes
>> less important.
>
> Yes, the same simulator settings should be used for both tools, so that
> a paired-data analysis can be done. However, confidence intervals for
> the reliabilities and for the differences in reliabilities would be more
> to the point than would a t-test on the mean difference. And a simple
> scatterplot of the reliability pairs also would be informative. (Most
> statisticians would probably do the scatterplot first, as a diagnostic.)

Yes, a plot should be made. However, the OP wrote, "I'd like to say, in
a non-empiric way, that the two tools provided a statistically similar
result". You apparently see this as a question like "Are the confidence
intervals different?", while I think the more basic question, "Are the
means different?" should be answered first. If there is a statistically
significant difference in means, further statistics might be useless
except as a guide to correcting simulation-model errors.

--
jiw