From: rajesh on
Hi,

I was working on implementation of h.264 algorithm on Blackfin a
couple of years back. I had used the elegantly made DMA of the
processor to move data in and out of the internal memory (specially
during de-blocking) and i was competing with the cache in terms of
cycles.So I had a chance to experiment with the cache.

I had observed very strange phenomenon occuring with the cache. I
had written two different versions of codes for the same deblocking
algorithm.(de-blocking is a part of h.264 algorithm). One is
supposedly optimized but actually wasnt..

The order in which i was accessing the pixel and other data is same
in both the cases. This point is very important.
I havent changed the order in which data is been accessed.

Now I disable the cache, both consume the same number of cycles. Now
Only if i enable the cache there is a huge difference (almost 40%, i
dont remember exactly but it was considerably huge).

Now tell me how does the cache bifferently with the two different
versions of code for the same algorithm.
Remember i havent changed the order in which i was accessing the data.

the code, on both oaccasions was residing in the internal memory.

there wanst much difference between the code, there was an 'if'
statement which was moved to outside a 'for' loop.

How can one explain the difference in cycles which occurs only when i
enable the cache,when there is no change in the order in which the
data being accessed.

No I havent changed the cache mapping option, it was kept constant.

I had obeserved the same phenomenon at another situation (in the same
H.264) on blackfin (BF533).














From: Vladimir Vassilevsky on


rajesh wrote:
> Hi,
>
> I was working on implementation of h.264 algorithm on Blackfin a
> couple of years back. I had used the elegantly made DMA of the
> processor to move data in and out of the internal memory (specially
> during de-blocking) and i was competing with the cache in terms of
> cycles.So I had a chance to experiment with the cache.


BlackFin doesn't have any means for providing cache and DMA coherency.
Hence you generally can't DMA to the memory areas which are covered by
cache.


> I had observed very strange phenomenon occuring with the cache. I
> had written two different versions of codes for the same deblocking
> algorithm.(de-blocking is a part of h.264 algorithm). One is
> supposedly optimized but actually wasnt..
>
> The order in which i was accessing the pixel and other data is same
> in both the cases. This point is very important.
> I havent changed the order in which data is been accessed.
>
> Now I disable the cache, both consume the same number of cycles. Now
> Only if i enable the cache there is a huge difference (almost 40%, i
> dont remember exactly but it was considerably huge).

I can't understand what you did. BTW I compared the efficiency of the
data cache vs L1 data memory on my tasks. Cache appears to be somewhat
10% slower, and this is what expected.


> Now tell me how does the cache bifferently with the two different
> versions of code for the same algorithm.
> Remember i havent changed the order in which i was accessing the data.
> the code, on both oaccasions was residing in the internal memory.
> there wanst much difference between the code, there was an 'if'
> statement which was moved to outside a 'for' loop.
> How can one explain the difference in cycles which occurs only when i
> enable the cache,when there is no change in the order in which the
> data being accessed.
> No I havent changed the cache mapping option, it was kept constant.
> I had obeserved the same phenomenon at another situation (in the same
> H.264) on blackfin (BF533).

You are a muddle headed. Learn hardware.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
From: rajesh on
On May 6, 2:13 am, Vladimir Vassilevsky <antispam_bo...(a)hotmail.com>
wrote:
> rajesh wrote:
> > Hi,
>
> > I was working on implementation of h.264 algorithm on Blackfin a
> > couple of years back. I had used the elegantly made DMA of the
> > processor to move data in and out of the internal memory (specially
> > during de-blocking) and i was competing with the cache in terms of
> > cycles.So I had a chance to experiment with the cache.
>
> BlackFin doesn't have any means for providing cache and DMA coherency.
> Hence you generally can't DMA to the memory areas which are covered by
> cache.
>
> > I had observed very strange phenomenon occuring with the cache. I
> > had written two different versions of codes for the same deblocking
> > algorithm.(de-blocking is a part of h.264 algorithm). One is
> > supposedly optimized but actually wasnt..
>
> > The order in which i was accessing the pixel and other data is same
> > in both the cases. This point is very important.
> > I havent changed the order in which data is been accessed.
>
> > Now I disable the cache, both consume the same number of cycles. Now
> > Only if i enable the cache there is a huge difference (almost 40%, i
> > dont remember exactly but it was considerably huge).
>
> I can't understand what you did. BTW I compared the efficiency of the
> data cache vs L1 data memory on my tasks. Cache appears to be somewhat
> 10% slower, and this is what expected.
>
> > Now tell me how does the cache bifferently with the two different
> > versions of code for the same algorithm.
> > Remember i havent changed the order in which i was accessing the data.
> > the code, on both oaccasions was residing in the internal memory.
> > there wanst much difference between the code, there was an 'if'
> > statement which was moved to outside a 'for' loop.
> > How can one explain the difference in cycles which occurs only when i
> > enable the cache,when there is no change in the order in which the
> > data being accessed.
> > No I havent changed the cache mapping option, it was kept constant.
> > I had obeserved the same phenomenon at another situation (in the same
> > H.264) on blackfin (BF533).
>
> You are a muddle headed. Learn hardware.
>
> Vladimir Vassilevsky
> DSP and Mixed Signal Design Consultanthttp://www.abvolt.com

i didnt get it..should i learn hardware coz am muddle... or should i
learn hardware to unmuddle myself...

In any case..strange phenomenon like the above can make coz anyone to
be muddle..


"None of us knew much about staging a variety show, so we just had to
muddle through."
From: rajesh on
On May 6, 2:13 am, Vladimir Vassilevsky <antispam_bo...(a)hotmail.com>
wrote:
> rajesh wrote:
> > Hi,
>
> > I was working on implementation of h.264 algorithm on Blackfin a
> > couple of years back. I had used the elegantly made DMA of the
> > processor to move data in and out of the internal memory (specially
> > during de-blocking) and i was competing with the cache in terms of
> > cycles.So I had a chance to experiment with the cache.
>
> BlackFin doesn't have any means for providing cache and DMA coherency.
> Hence you generally can't DMA to the memory areas which are covered by
> cache.
>
> > I had observed very strange phenomenon occuring with the cache. I
> > had written two different versions of codes for the same deblocking
> > algorithm.(de-blocking is a part of h.264 algorithm). One is
> > supposedly optimized but actually wasnt..
>
> > The order in which i was accessing the pixel and other data is same
> > in both the cases. This point is very important.
> > I havent changed the order in which data is been accessed.
>
> > Now I disable the cache, both consume the same number of cycles. Now
> > Only if i enable the cache there is a huge difference (almost 40%, i
> > dont remember exactly but it was considerably huge).
>
> I can't understand what you did. BTW I compared the efficiency of the
> data cache vs L1 data memory on my tasks. Cache appears to be somewhat
> 10% slower, and this is what expected.
>
> > Now tell me how does the cache bifferently with the two different
> > versions of code for the same algorithm.
> > Remember i havent changed the order in which i was accessing the data.
> > the code, on both oaccasions was residing in the internal memory.
> > there wanst much difference between the code, there was an 'if'
> > statement which was moved to outside a 'for' loop.
> > How can one explain the difference in cycles which occurs only when i
> > enable the cache,when there is no change in the order in which the
> > data being accessed.
> > No I havent changed the cache mapping option, it was kept constant.
> > I had obeserved the same phenomenon at another situation (in the same
> > H.264) on blackfin (BF533).
>
> You are a muddle headed. Learn hardware.
>
> Vladimir Vassilevsky
> DSP and Mixed Signal Design Consultanthttp://www.abvolt.com


> BlackFin doesn't have any means for providing cache and DMA coherency.
> Hence you generally can't DMA to the memory areas which are covered by
> cache.

there are

I have used an instruction to invalidate cache after dma
transfer.









From: rajesh on
On May 6, 9:21 am, rajesh <getrajes...(a)gmail.com> wrote:
> On May 6, 2:13 am, Vladimir Vassilevsky <antispam_bo...(a)hotmail.com>
> wrote:
>
>
>
> > rajesh wrote:
> > > Hi,
>
> > > I was working on implementation of h.264 algorithm on Blackfin a
> > > couple of years back. I had used the elegantly made DMA of the
> > > processor to move data in and out of the internal memory (specially
> > > during de-blocking) and i was competing with the cache in terms of
> > > cycles.So I had a chance to experiment with the cache.
>
> > BlackFin doesn't have any means for providing cache and DMA coherency.
> > Hence you generally can't DMA to the memory areas which are covered by
> > cache.
>
> > > I had observed very strange phenomenon occuring with the cache. I
> > > had written two different versions of codes for the same deblocking
> > > algorithm.(de-blocking is a part of h.264 algorithm). One is
> > > supposedly optimized but actually wasnt..
>
> > > The order in which i was accessing the pixel and other data is same
> > > in both the cases. This point is very important.
> > > I havent changed the order in which data is been accessed.
>
> > > Now I disable the cache, both consume the same number of cycles. Now
> > > Only if i enable the cache there is a huge difference (almost 40%, i
> > > dont remember exactly but it was considerably huge).
>
> > I can't understand what you did. BTW I compared the efficiency of the
> > data cache vs L1 data memory on my tasks. Cache appears to be somewhat
> > 10% slower, and this is what expected.
>
> > > Now tell me how does the cache bifferently with the two different
> > > versions of code for the same algorithm.
> > > Remember i havent changed the order in which i was accessing the data.
> > > the code, on both oaccasions was residing in the internal memory.
> > > there wanst much difference between the code, there was an 'if'
> > > statement which was moved to outside a 'for' loop.
> > > How can one explain the difference in cycles which occurs only when i
> > > enable the cache,when there is no change in the order in which the
> > > data being accessed.
> > > No I havent changed the cache mapping option, it was kept constant.
> > > I had obeserved the same phenomenon at another situation (in the same
> > > H.264) on blackfin (BF533).
>
> > You are a muddle headed. Learn hardware.
>
> > Vladimir Vassilevsky
> > DSP and Mixed Signal Design Consultanthttp://www.abvolt.com
> > BlackFin doesn't have any means for providing cache and DMA coherency.
> > Hence you generally can't DMA to the memory areas which are covered by
> > cache.
>
> there are
>
> I have used an instruction to invalidate cache after dma
> transfer.

FYI

iflush [ p2 ] ; /* Invalidate cache line containing address that
P2 points to */