From: Wolfgang Kern on

Hallo Guga,

> Tks.. robert.. i think i got it..

> Assuming i�m using them continuosly, i made a simple formula that
> shows the amount of clock cycles of those instructions used on such a
> way (continuosly)

> Clocks = Latency+(Throughput*N-1)

> N = Amount of instructions used (all of the same type), like the 1000
> example you gave.
> Latency of the mnemonic
> Throughput of the mnemonic.
> Clocks = total amount of clocks of the sequence of the mnemonics used
> continuosly

> Is that it ?

Would work also if throughput is <1
which means several instructions may perform in parallel.

There is a timing calculation example in my AMD-docs ...
This formula spans half a page and is impractical for daily usage,
so I just use the lists as prepared by AMD:

|Instruction-group |Latency |Throughput |affected PIPES|

Intel got similar lists, but I also missed SSE-timing there.

I once had timing information in my x86-disassembler,
but it used latency values only.
As this wasn't exact nor near raw, I removed it for x86 at all.
But other CPUs (good olde Z80 and followers) can work as RTCL :)

__
wolfgang



From: Guga on
On Mar 8, 8:46 pm, "Wolfgang Kern" <nowh...(a)never.at> wrote:
> Hallo Guga,
>
> > Tks.. robert.. i think i got it..
> > Assuming i´m using them continuosly, i made a simple formula that
> > shows the amount of clock cycles of those instructions used on such a
> > way (continuosly)
> > Clocks = Latency+(Throughput*N-1)
> > N = Amount of instructions used (all of the same type), like the 1000
> > example you gave.
> > Latency of the mnemonic
> > Throughput of the mnemonic.
> > Clocks = total amount of clocks of the sequence of the mnemonics used
> > continuosly
> > Is that it ?
>
> Would work also if throughput is <1
> which means several instructions may perform in parallel.
>
> There is a timing calculation example in my AMD-docs ...
> This formula spans half a page and is impractical for daily usage,
> so I just use the lists as prepared by AMD:
>
> |Instruction-group |Latency |Throughput |affected PIPES|
>
> Intel got similar lists, but I also missed SSE-timing there.
>
> I once had timing information in my x86-disassembler,
> but it used latency values only.
> As this wasn't exact nor near raw, I removed it for x86 at all.
> But other CPUs (good olde Z80 and followers) can work as RTCL :)
>
> __
> wolfgang


hi Wolfgang,

nice to see you again :)

Those lists are a bit confusing.. I think i´ll do as you did. Just
using the AMD list with latencies. I´m trying to make a list
containing the clock cycles of each mnemonic, but there are so many
different processors, ust helps to increase the confusion.

The best list i found so far was here
http://www.logix.cz/michal/doc/i386/chp17-00.htm

Sure.. it is old.. it is for 386, but it displays the clock cycles on
a easy to read way.


The list robert provided, also refers to the general purpose
mnemonics.. like:

CMP/TEST latency: 1, Throughput = 0.5

So, i presume that the way they behave is the same as for SSE
instructions right ?

I mean, they works more or less like the formula i posted before,
right ?

But.. if that is true...then why on this documents says that JCC don´t
have latency ?

It is said on Table C10 that for a processor 0F2, the Jcc is not
applicable, but it have a Throughput of 0.5...But.. how is that
possible ?

if a instructino don´t use have the latency to compute the clock
cycles used to it be issued.. how it works ? I mean, it _could_ works
only from the Throughput, but.. if the latency is 0, shouldn´t the
Throughput be also 0 ? I thought the Throughput and latency were
related to each other.

Best Regards,

Guga


From: Guga on
On Mar 9, 7:53 am, "Guga" <Guga...(a)gmail.com> wrote:
> On Mar 8, 8:46 pm, "Wolfgang Kern" <nowh...(a)never.at> wrote:
>
>
>
> > Hallo Guga,
>
> > > Tks.. robert.. i think i got it..
> > > Assuming i´m using them continuosly, i made a simple formula that
> > > shows the amount of clock cycles of those instructions used on such a
> > > way (continuosly)
> > > Clocks = Latency+(Throughput*N-1)
> > > N = Amount of instructions used (all of the same type), like the 1000
> > > example you gave.
> > > Latency of the mnemonic
> > > Throughput of the mnemonic.
> > > Clocks = total amount of clocks of the sequence of the mnemonics used
> > > continuosly
> > > Is that it ?
>
> > Would work also if throughput is <1
> > which means several instructions may perform in parallel.
>
> > There is a timing calculation example in my AMD-docs ...
> > This formula spans half a page and is impractical for daily usage,
> > so I just use the lists as prepared by AMD:
>
> > |Instruction-group |Latency |Throughput |affected PIPES|
>
> > Intel got similar lists, but I also missed SSE-timing there.
>
> > I once had timing information in my x86-disassembler,
> > but it used latency values only.
> > As this wasn't exact nor near raw, I removed it for x86 at all.
> > But other CPUs (good olde Z80 and followers) can work as RTCL :)
>
> > __
> > wolfgang
>
> hi Wolfgang,
>
> nice to see you again :)
>
> Those lists are a bit confusing.. I think i´ll do as you did. Just
> using the AMD list with latencies. I´m trying to make a list
> containing the clock cycles of each mnemonic, but there are so many
> different processors, ust helps to increase the confusion.
>
> The best list i found so far was herehttp://www.logix.cz/michal/doc/i386/chp17-00.htm
>
> Sure.. it is old.. it is for 386, but it displays the clock cycles on
> a easy to read way.
>
> The list robert provided, also refers to the general purpose
> mnemonics.. like:
>
> CMP/TEST latency: 1, Throughput = 0.5
>
> So, i presume that the way they behave is the same as for SSE
> instructions right ?
>
> I mean, they works more or less like the formula i posted before,
> right ?
>
> But.. if that is true...then why on this documents says that JCC don´t
> have latency ?
>
> It is said on Table C10 that for a processor 0F2, the Jcc is not
> applicable, but it have a Throughput of 0.5...But.. how is that
> possible ?
>
> if a instructino don´t use have the latency to compute the clock
> cycles used to it be issued.. how it works ? I mean, it _could_ works
> only from the Throughput, but.. if the latency is 0, shouldn´t the
> Throughput be also 0 ? I thought the Throughput and latency were
> related to each other.
>
> Best Regards,
>
> Guga


Someone knows where to get a list of CPUIDs signatures of all
processors ?

For example:
Pentium M - Banias is 0x69X
Pentium M - Dothan is 0x6DX



From: Guga on
On Mar 9, 10:20 am, "Guga" <Guga...(a)gmail.com> wrote:
> On Mar 9, 7:53 am, "Guga" <Guga...(a)gmail.com> wrote:
>
>
>
> > On Mar 8, 8:46 pm, "Wolfgang Kern" <nowh...(a)never.at> wrote:
>
> > > Hallo Guga,
>
> > > > Tks.. robert.. i think i got it..
> > > > Assuming i´m using them continuosly, i made a simple formula that
> > > > shows the amount of clock cycles of those instructions used on such a
> > > > way (continuosly)
> > > > Clocks = Latency+(Throughput*N-1)
> > > > N = Amount of instructions used (all of the same type), like the 1000
> > > > example you gave.
> > > > Latency of the mnemonic
> > > > Throughput of the mnemonic.
> > > > Clocks = total amount of clocks of the sequence of the mnemonics used
> > > > continuosly
> > > > Is that it ?
>
> > > Would work also if throughput is <1
> > > which means several instructions may perform in parallel.
>
> > > There is a timing calculation example in my AMD-docs ...
> > > This formula spans half a page and is impractical for daily usage,
> > > so I just use the lists as prepared by AMD:
>
> > > |Instruction-group |Latency |Throughput |affected PIPES|
>
> > > Intel got similar lists, but I also missed SSE-timing there.
>
> > > I once had timing information in my x86-disassembler,
> > > but it used latency values only.
> > > As this wasn't exact nor near raw, I removed it for x86 at all.
> > > But other CPUs (good olde Z80 and followers) can work as RTCL :)
>
> > > __
> > > wolfgang
>
> > hi Wolfgang,
>
> > nice to see you again :)
>
> > Those lists are a bit confusing.. I think i´ll do as you did. Just
> > using the AMD list with latencies. I´m trying to make a list
> > containing the clock cycles of each mnemonic, but there are so many
> > different processors, ust helps to increase the confusion.
>
> > The best list i found so far was herehttp://www.logix.cz/michal/doc/i386/chp17-00.htm
>
> > Sure.. it is old.. it is for 386, but it displays the clock cycles on
> > a easy to read way.
>
> > The list robert provided, also refers to the general purpose
> > mnemonics.. like:
>
> > CMP/TEST latency: 1, Throughput = 0.5
>
> > So, i presume that the way they behave is the same as for SSE
> > instructions right ?
>
> > I mean, they works more or less like the formula i posted before,
> > right ?
>
> > But.. if that is true...then why on this documents says that JCC don´t
> > have latency ?
>
> > It is said on Table C10 that for a processor 0F2, the Jcc is not
> > applicable, but it have a Throughput of 0.5...But.. how is that
> > possible ?
>
> > if a instructino don´t use have the latency to compute the clock
> > cycles used to it be issued.. how it works ? I mean, it _could_ works
> > only from the Throughput, but.. if the latency is 0, shouldn´t the
> > Throughput be also 0 ? I thought the Throughput and latency were
> > related to each other.
>
> > Best Regards,
>
> > Guga
>
> Someone knows where to get a list of CPUIDs signatures of all
> processors ?
>
> For example:
> Pentium M - Banias is 0x69X
> Pentium M - Dothan is 0x6DX

Damn.. this is a hell of a work.. but i´m building the list. :):) So
far i ´ve got:

CPUID Name
04F4 AMD 5x86-133 P75 (X5) in 4x clock mode
0600 Cyrix/IBM 6x86MX PR166-266 or Cyrix MII PR300-433
0650 Pentium II / Celeron Processor Deschutes / Covington dA0 SECC /
SEPP
0651 Pentium II / Celeron Processor Deschutes / Covington dA0 SECC/
SECC2 / SEPP
0652 Pentium II Processor Deschutes dB0 SECC/SECC2
0653 Pentium II Processor Deschutes dB1 SECC/SECC2
0660 Intel Celeron-A 300/333/366/400 A0-step with 128 KB integrated L2
cache
0660 Intel Celeron Processor Mendocino mA0 SEPP
0665 Intel Celeron Processor Mendocino mB0 PPGA
0672 Pentium III Processor Katmai kB0 SECC2
0673 Pentium III Processor Katmai kC0 SECC2
0681 Pentium III Processor Coppermine cA2 SECC/SECC2
0681 Pentium III Processor Coppermine cA2 FC-PGA
0683 Pentium III Processor Coppermine cB0 SECC2
0683 Pentium III / Celeron Processor Coppermine cB0 FC-PGA / PPGA
0686 Pentium III Processor Coppermine cC0 SECC2
0686 Pentium III / Celeron Processor Coppermine cC0 FC-PGA / PPGA
068A Pentium III / Celeron Processor Coppermine cD0 FC-PGA / PPGA
069X 80686 - Pentium M - Banias
06B1 Intel® Celeron® processor
06B1 Pentium III / Celeron Processor Tualatin tA1 PPGA-370
06B4 Pentium III / Celeron Processor Tualatin tB1 PPGA-370
06DX 80686 - Pentium M - Dothan
06D8 Pentium M 740 Processor 1.73GHz Processor
06E8 Core Solo T1300 1.66GHz Processor - 32-bit Dynamic Execution
Microarchitecture
06F5 Xeon Dual-Core 3040 1.86GHz Processor. 64-bit Core
Microarchitecture
06F5 Xeon Dual-Core 3060 2.4GHz Processor. 64-bit Core
Microarchitecture
06F5 Xeon Dual-Core 3050 2.13GHz Processor. 64-bit Core
Microarchitecture
06F5 Xeon Dual-Core 3050 2.13GHz Processor - 64-bit Core
Microarchitecture
06F6 Core2 Duo T5500 1.66GHz Mobile Processor - 64-bit Core
Microarchitecture
06F6 Intel Core2 Duo T7400 Mobile Processor
06F6 Core2 Duo T5600 1.83GHz Mobile Processor. 64 Bit Core
Microarchitecture
06F6 Core2 Duo T5200 2.0GHz Mobile Processor. 64 Bit Core
Microarchitecture
06F6 Core2 Duo T7400 2.16GHz Mobile Processor. 64 Bit Core
Microarchitecture
06F7 Intel Core 2 Extreme QX6700 Processor
06F7 Intel Core 2 Quad Q6600 Kentsfield
07A0 AMD Athlon XP 2600
0F07 Pentium 4 Processor Willamette B2 PPGA-423 INT2
0F0A Pentium 4 Processor Willamette C1 PPGA-423 INT2
0F0A Pentium 4 Processor Willamette C1 PPGA-478 FC-PGA2
0F12 Intel Pentium 4 P68, Willamette, A80528
0F12 Pentium 4 Processor Willamette D0 PPGA-423 INT2
0F12 Pentium 4 Processor Willamette D0 PPGA-478 FC-PGA2
0F13 Pentium 4 / Celeron Processor Willamette E0 PPGA-478 FC-PGA2
0F24 Pentium 4 Processor Northwood B0 PPGA-478
0F27 Pentium 4 / Celeron Processor Northwood C1 PPGA-478
0F29 Pentium 4 / Celeron Processor Northwood D1 PPGA-478
0F33 Pentium 4 / Celeron Processor Prescott C0 All
0F34 Xeon (Nocona)
0F34 Pentium 4 (Prescott)
0F41 Intel® Celeron® D 336 64 Bit NetBurst Microarchitecture
0F41 Intel® Celeron® D 346 64 Bit NetBurst Microarchitecture
0F41 Celeron D 331 2.66GHz Processor 64-bit
0F41 Celeron D 336 2.80GHz Processor 64-bit
0F41 Celeron D 351 3.20GHz Processor - 64 Bit NetBurst
Microarchitecture
0F41 Celeron D 346 3.06GHz Processor - 64 Bit NetBurst
Microarchitecture
0F41 Pentium 4 541 - 3.20GHz Processor 64-bit NetBurst
Microarchitecture
0F48 Xeon 2.80GHz Dual-Core Processor. 64-bit NetBurst
Microarchitecture
0F64 Intel Celeron D 347 - 64 Bit NetBurst Microarchitecture
0F64 Celeron D 347 3.06GHz Processor - 64 Bit NetBurst
Microarchitecture


From: Guga on
I´m still completting it.. 120 more CPUids to go: )

CPUID Name
04F4 AMD 5x86-133 P75 (X5) in 4x clock mode
0600 Cyrix/IBM 6x86MX PR166-266 or Cyrix MII PR300-433
0650 Pentium II / Celeron Processor Deschutes / Covington dA0 SECC /
SEPP
0651 Pentium II / Celeron Processor Deschutes / Covington dA0 SECC/
SECC2 / SEPP
0652 Pentium II Processor Deschutes dB0 SECC/SECC2
0653 Pentium II Processor Deschutes dB1 SECC/SECC2
0660 Intel Celeron-A 300/333/366/400 A0-step with 128 KB integrated L2
cache
0660 Intel Celeron Processor Mendocino mA0 SEPP
0665 Intel Celeron Processor Mendocino mB0 PPGA
0672 Pentium III Processor Katmai kB0 SECC2
0673 Pentium III Processor Katmai kC0 SECC2
0681 Pentium III Processor Coppermine cA2 SECC/SECC2
0681 Pentium III Processor Coppermine cA2 FC-PGA
0683 Pentium III Processor Coppermine cB0 SECC2
0683 Pentium III / Celeron Processor Coppermine cB0 FC-PGA / PPGA
0686 Pentium III Processor Coppermine cC0 SECC2
0686 Pentium III / Celeron Processor Coppermine cC0 FC-PGA / PPGA
068A Pentium III / Celeron Processor Coppermine cD0 FC-PGA / PPGA
069x 80686 - Pentium M - Banias
06B1 Intel® Celeron® processor
06B1 Pentium III / Celeron Processor Tualatin tA1 PPGA-370
06B4 Pentium III / Celeron Processor Tualatin tB1 PPGA-370
06Dx 80686 - Pentium M - Dothan
06D8 Pentium M 740 Processor 1.73GHz Processor
06D8 Pentium M 780 2.26GHz Processor
06D8 Processor ( mobile ) - 1 x Intel Pentium M 760 2 GHz 32-bit
Dynamic Execution Microarchitecture
06D8 Intel Celeron M - 1.5GHz Processor - 1.5GHz 32-bit Dynamic
Execution Microarchitecture
06E8 Core Solo T1300 1.66GHz Processor - 32-bit Dynamic Execution
Microarchitecture
06F5 Xeon Dual-Core 3040 1.86GHz Processor. 64-bit Core
Microarchitecture
06F5 Xeon Dual-Core 3060 2.4GHz Processor. 64-bit Core
Microarchitecture
06F5 Xeon Dual-Core 3050 2.13GHz Processor. 64-bit Core
Microarchitecture
06F5 Xeon Dual-Core 3050 2.13GHz Processor - 64-bit Core
Microarchitecture
06F5 Xeon Dual-Core 3050 2.13GHz Processor - 64-bit Core
Microarchitecture
06F5 Xeon Dual-Core 3060 2.4GHz Processor - 64-bit Core
Microarchitecture
06F5 Xeon Dual-Core 3070 2.66GHz Processor - 64-bit Core
Microarchitecture
06F5 Intel Core 2 Extreme X6800 (Conroe rev. B1)
06F5 Processor - 1 x Intel Dual-Core Xeon 3060 / 2.4 GHz 64-bit Core
Microarchitecture
06F5 Processor - 1 x Intel Dual-Core Xeon 3050 / 2.13 GHz 64-bit Core
Microarchitecture
06F5 Processor - 1 x Intel Dual-Core Xeon 3040 / 1.86 GHz - 64-bit
Core Microarchitecture
06F6 Core2 Duo T5500 1.66GHz Mobile Processor - 64-bit Core
Microarchitecture
06F6 Intel Core2 Duo T7400 Mobile Processor
06F6 Core2 Duo T5600 1.83GHz Mobile Processor. 64 Bit Core
Microarchitecture
06F6 Core2 Duo T5200 2.0GHz Mobile Processor. 64 Bit Core
Microarchitecture
06F6 Core2 Duo T7400 2.16GHz Mobile Processor. 64 Bit Core
Microarchitecture
06F6 Core2 Duo T7600 2.33GHz Mobile Processor. 64 Bit Core
Microarchitecture
06F6 Core2 Duo T5600 1.83GHz Mobile Processor - 64-bit Core
Microarchitecture
06F6 Core2 Duo T7200 2.0GHz Mobile Processor - 64-bit Core
Microarchitecture
06F6 Core2 Duo T7400 2.16GHz Mobile Processor - 64-bit Core
Microarchitecture
06F6 Core2 Duo T7600 2.33GHz Mobile Processor - 64-bit Core
Microarchitecture
06F7 Intel Core 2 Extreme QX6700 Processor
06F7 Intel Core 2 Quad Q6600 Kentsfield
06F7 Core 2 Duo QX6700 Extreme Processor. 2.66 ghz.
07A0 AMD Athlon XP 2600
0F07 Pentium 4 Processor Willamette B2 PPGA-423 INT2
0F0A Pentium 4 Processor Willamette C1 PPGA-423 INT2
0F0A Pentium 4 Processor Willamette C1 PPGA-478 FC-PGA2
0F12 Intel Pentium 4 P68, Willamette, A80528
0F12 Pentium 4 Processor Willamette D0 PPGA-423 INT2
0F12 Pentium 4 Processor Willamette D0 PPGA-478 FC-PGA2
0F13 Pentium 4 / Celeron Processor Willamette E0 PPGA-478 FC-PGA2
0F24 Pentium 4 Processor Northwood B0 PPGA-478
0F25 Intel Pentium 4 Extreme Edition 3.46GHz Processor - 3.46GHz 32-
bit NetBurst Microarchitecture
0F27 Pentium 4 / Celeron Processor Northwood C1 PPGA-478
0F29 Pentium 4 / Celeron Processor Northwood D1 PPGA-478
0F33 Pentium 4 / Celeron Processor Prescott C0 All
0F34 Xeon (Nocona)
0F34 Pentium 4 (Prescott)
0F41 Intel® Celeron® D 336 64 Bit NetBurst Microarchitecture
0F41 Intel® Celeron® D 346 64 Bit NetBurst Microarchitecture
0F41 Celeron D 331 2.66GHz Processor 64-bit
0F41 Celeron D 336 2.80GHz Processor 64-bit
0F41 Celeron D 351 3.20GHz Processor - 64 Bit NetBurst
Microarchitecture
0F41 Celeron D 346 3.06GHz Processor - 64 Bit NetBurst
Microarchitecture
0F41 Pentium 4 541 - 3.20GHz Processor 64-bit NetBurst
Microarchitecture
0F41 Intel Pentium 4 541 - 3.20GHz Processor - 3.20GHz - 64-bit
NetBurst Microarchitecture
0F41 Intel Celeron D 326 2.53GHz Processor - 2.53GHz 64-bit
0F43 Xeon 3.20GHz Processor
0F43 Xeon 3.4GHz Processor
0F43 Xeon 3.60GHz Processor - Extended Memory 64 Technology Hyper-
Threading Technology
0F43 Intel Xeon 3.0GHz Processor - 3.0GHz
0F43 Intel Xeon 3.20GHz Processor - 3.20GHz
0F44 Processor - 1 x Intel Pentium D 830 3 GHz ( 800 MHz ) Dual-Core
NetBurst Microarchitecture
0F48 Xeon 2.80GHz Dual-Core Processor. 64-bit NetBurst
Microarchitecture
0F4A Xeon 2.8GHz Processor- 64-bit NetBurst Microarchitecture
0F4A Xeon 3.60GHz Processor - 64-bit NetBurst Microarchitecture
0F4A Xeon 3.80GHz Processor - Extended Memory 64 Technology Enhanced
SpeedStep Technology Hyper-Threading Technology
0F4A Intel Xeon 2.80GHz Processor - 2.8GHz 64-bit NetBurst
Microarchitecture
0F64 Intel Celeron D 347 - 64 Bit NetBurst Microarchitecture
0F64 Celeron D 347 3.06GHz Processor - 64 Bit NetBurst
Microarchitecture
0F64 Processor - 1 x Intel Pentium D 945 / 3.4 GHz 64 Bit
0F64 Intel Celeron D 347 3.06GHz Processor - 3.06GHz 64-bit NetBurst
Microarchitecture
020F32 AMD Athlon 64 X2 3800+, 2.0 GHz (Manchester rev. E6)