From: Luna Moon on
Post now also to comp.dsp to see if experts can help us.

The bottleneck is the "backfill" part.

There must be a "filter" way of doing "backfill" fast?

Thanks a lot!

On Jul 21, 3:53 pm, Luna Moon <lunamoonm...(a)gmail.com> wrote:
> How to align two time series fast?
>
> Hi all,
>
> I have two time series, both are in the following format:
>
> Date       Data
> 1/1/2010    5.3
> 1/2/2010    4.4
> ...
>
> Lets label the first time series: MyDates1, MyData1 and the second
> time series: MyDates2, MyData2,
>
> where MyDates1 and MyData1 have the same number of rows and MyDates2
> and MyData2 have the same number of rows,
>
> and where MyDates1 and MyDates2 are in fact in datenum format.
>
> The sets MyDates1 and MyDates2 are very different.
>
> How can I align the time series two to be in line with the time series
> one?
>
> That's to say, we want to modify MyDates2 and MyData2 to make them in
> line with MyDates1 and MyData1.
>
> Actions:
>
> (1) If a date is in MyDates1 but not in MyDates2, then insert that
> date into MyDates2 and put an "NaN" into corresponding location in
> MyData2.
>
> (2) If a date is in MyDates2 but not in MyDates1, then delete that
> date from MyDates2 and delete the data in the corresponding location
> in MyData2.
>
> (3) The 2nd time series now may look like the following:
>
> Date         Data
> 1/1/2010     NaN
> 1/2/2010     NaN
> 1/5/2010     2.3
> 1/6/2010     NaN
> 1/7/2010     NaN
> 1/8/2010     3.1
> ...
>
> Then we need to backfill the holes ("NaN"s) in this 2nd time series.
>
> For example, the above data, after backfill, become:
>
> Date         Data
> 1/1/2010     NaN
> 1/2/2010     NaN
> 1/5/2010     2.3
> 1/6/2010     2.3
> 1/7/2010     2.3
> 1/8/2010     3.1
> ...
>
> Note that the first a few missing values("NaN"s) cannot be
> backfilled...
>
> The output is the modified MyData2, because the modified MyDate2
> should be exactly as the MyDate1 which is used as reference.
>
> MyData2 should now have the same number of rows as MyDate1, MyData1,
> and MyDate2(modified).
>
> I currently do this using Matlab Financial toolbox,
>
> but it's very slow,
>
> Any thought how I can do it fast?
>
> Thanks a lot!

From: Fred Marshall on
YoLuna Moon wrote:
> Post now also to comp.dsp to see if experts can help us.
>
> The bottleneck is the "backfill" part.
>
> There must be a "filter" way of doing "backfill" fast?
>
> Thanks a lot!
>
> On Jul 21, 3:53 pm, Luna Moon <lunamoonm...(a)gmail.com> wrote:
>> How to align two time series fast?
>>
>> Hi all,
>>
>> I have two time series, both are in the following format:
>>
>> Date Data
>> 1/1/2010 5.3
>> 1/2/2010 4.4
>> ...
>>
>> Lets label the first time series: MyDates1, MyData1 and the second
>> time series: MyDates2, MyData2,
>>
>> where MyDates1 and MyData1 have the same number of rows and MyDates2
>> and MyData2 have the same number of rows,
>>
>> and where MyDates1 and MyDates2 are in fact in datenum format.
>>
>> The sets MyDates1 and MyDates2 are very different.
>>
>> How can I align the time series two to be in line with the time series
>> one?
>>
>> That's to say, we want to modify MyDates2 and MyData2 to make them in
>> line with MyDates1 and MyData1.
>>
>> Actions:
>>
>> (1) If a date is in MyDates1 but not in MyDates2, then insert that
>> date into MyDates2 and put an "NaN" into corresponding location in
>> MyData2.
>>
>> (2) If a date is in MyDates2 but not in MyDates1, then delete that
>> date from MyDates2 and delete the data in the corresponding location
>> in MyData2.
>>
>> (3) The 2nd time series now may look like the following:
>>
>> Date Data
>> 1/1/2010 NaN
>> 1/2/2010 NaN
>> 1/5/2010 2.3
>> 1/6/2010 NaN
>> 1/7/2010 NaN
>> 1/8/2010 3.1
>> ...
>>
>> Then we need to backfill the holes ("NaN"s) in this 2nd time series.
>>
>> For example, the above data, after backfill, become:
>>
>> Date Data
>> 1/1/2010 NaN
>> 1/2/2010 NaN
>> 1/5/2010 2.3
>> 1/6/2010 2.3
>> 1/7/2010 2.3
>> 1/8/2010 3.1
>> ...
>>
>> Note that the first a few missing values("NaN"s) cannot be
>> backfilled...
>>
>> The output is the modified MyData2, because the modified MyDate2
>> should be exactly as the MyDate1 which is used as reference.
>>
>> MyData2 should now have the same number of rows as MyDate1, MyData1,
>> and MyDate2(modified).
>>
>> I currently do this using Matlab Financial toolbox,
>>
>> but it's very slow,
>>
>> Any thought how I can do it fast?
>>
>> Thanks a lot!
>

Your rules are obscured by the fact that you don't provide 3 or 4 time
series.
- Input 1
- Input 2
- Output 3
- Output 4

I have no idea what you are trying to do with Action 2...
I'd suggest a better naming convention similar to the set above:

Series 1
Series 2
Series 3 from Action 1 on Series 1 and Series 2
Series 4 from Action 2 on [what?] Series 2 or Series 3?

I don't see the data that ends up at 1/6 and 1/7 so the thing is just a
bit fuzzy yet.

>> Date Data
>> 1/1/2010 NaN
>> 1/2/2010 NaN
>> 1/5/2010 2.3
>> 1/6/2010 2.3
>> 1/7/2010 2.3
>> 1/8/2010 3.1

Fred
From: Luna Moon on
On Jul 22, 9:22 am, dpb <n...(a)non.net> wrote:
> dpb wrote:
> > Luna Moon wrote:
> >> On Jul 21, 6:36 pm, dpb <n...(a)non.net> wrote:
> >>> Luna Moon wrote:
>
> >>> ...
>
> >>>> The key part is how to do back-fill fast!
> >>> How big of a series is this and what's the typical sparseness?
>
> >>> Wouldn't seem it should be particularly time consuming but an example
> >>> might help visualize.
>
> >>> --
>
> >> Very large, millions of rows... and I have lots of such time series...
>
> >> So let's just focus on how to write such backfill function better...
>
> > How about revising the algorithm?
>
> > Or, perhaps mex the function you have.
>
> > I'll consider it overnight; nothing pops to mind automagic...
>
> OK, what about
>
> idx = ~isnan(d) & isnan([d(2:end) -1])
>
> is logical array of those locations w/ a value followed by Nan
>
> The next location in the array is to be replaced with the value at this
> location.
>
> Iterate this until idx==0
>
> Still iterative but perhaps different than you're currently doing...
>
> --- Hide quoted text -
>
> - Show quoted text -

no i am doing this exactly the same way...

but need to get rid of for loop