From: Junhui Liao on
Dear all,

My script tried to read from one original tsv file and distribute into
new multiple tsv files.

Each line of the original file is like this: time_1, signal_1, time_2,
signal_2... time_4096, signal_4096.
I would like to write them into file_1, file_2, ... file_4096
accordingly, and these files contain time_1, signal_1; time_2, signal_2;
... time_4096, signal_4096 separately.

My script did well only if the original file contains ONE line.
If the original file has two or more lines, the error message like
following,

new_split.rb:17: undefined method `+' for nil:NilClass (NoMethodError)

By tracing the output, I found it seemed the script just read ONE line,
since the put results like this :

........(omitted a lots of lines here)
8182
"4.08963252844486E+00"
"-2.3E-03"
8184
"4.09063219413236E+00"
"-3.1E-03"
8186
"4.09163185987611E+00"
"-7E-04"
8188
"4.09263152560423E+00"
"-3.7E-03"
8190
"4.09363119136048E+00"
"3.6E-03"
8192
nil
nil


And my script is like this:

@a = []
@itemnum = 4096
@counter = 0
@linenum = 10
File.open("../original_data/test_2lines.tsv").each_line
do |record| # "^M"
#File.open("../original_data/one_line.tsv").each_line do
|record|
@a = record.chomp.split("\t")

@itemnum.times do |n|

File.open("#{n}_debug_split"+".tsv" , "w") do |f|
puts @counter
puts @a[@counter].inspect + "\n"
puts @a[@counter+1].inspect + "\n"
f << @a[@counter] + "\t" + @a[@counter+1] + "\n"
@counter += 2
end


end

end

Thanks a lot for your comments in advance !
Junhui

BTW, at the end of each line in original tsv file, this is a "^M"
appended.
I don't know how it comes and results something or not.
--
Posted via http://www.ruby-forum.com/.

From: Jesús Gabriel y Galán on
On Thu, Jul 29, 2010 at 1:43 PM, Junhui Liao <junhui.liao(a)uclouvain.be> wrote:
> Dear all,
>
> My script tried to read from one original tsv file and distribute into
> new multiple tsv files.
>
> Each line of the original file is like this: time_1, signal_1, time_2,
> signal_2... time_4096, signal_4096.
> I would like to write them into file_1, file_2, ... file_4096
> accordingly, and these files contain time_1, signal_1; time_2, signal_2;
> ... time_4096, signal_4096 separately.
>
> My script did well only if the original file contains ONE line.
> If the original file has two or more lines, the error message like
> following,
>
> new_split.rb:17: undefined method `+' for nil:NilClass (NoMethodError)

> And my script is like this:
>
>                @a = []

you don't need to declare this, because you later are assigning
directly to @a again

>                @itemnum = 4096
>                @counter = 0
>                @linenum = 10

and, by the way, you probably don't need instance variables, probably
local variables could suffice, itemnum looks like a constant and
linenum is not used, so:

ITEM_NUM = 4096
counter = 0

>                File.open("../original_data/test_2lines.tsv").each_line
> do |record|  # "^M"
>                #File.open("../original_data/one_line.tsv").each_line do
> |record|
>                @a = record.chomp.split("\t")

a = record.chomp.split("\t") # although maybe fields or line_fields
are better names than a

>             @itemnum.times do |n|
>
>                File.open("#{n}_debug_split"+".tsv" , "w") do |f|
>                 puts @counter
>                 puts @a[@counter].inspect + "\n"
>                 puts @a[@counter+1].inspect + "\n"
>                 f << @a[@counter] + "\t" + @a[@counter+1] + "\n"
>                 @counter += 2
>                end
>           end
> end

You are adding 2 to the counter every iteration, but not clearing it
after every line. So, on the second line, counter will still be 4096,
and so you will try to get an element from the array that is out of
bounds, returning nil and raising the NoMethodError, because you are
calling the + method on nil. I think you are complicated the issue
with the counting and so on, usually the Ruby iterators are a cleaner
way to traverse lists of things. You can remove the use of
itemnum,counter and so on like this (untested):

File.open("../original_data/test_2lines.tsv").each_line do |record|
a = record.chomp.split("\t")
a.each_slice(2).with_index do |(time,signal), index|
File.open("#{index}_debug_split"+".tsv" , "w") do |f|
f << "#{time}\t#{signal}\n"
end
end
end

Although this will open and close the 4096 files for every line. Are
there many lines? If not, you can read the whole file and build a
structure in memory (a hash of arrays) to store the lines that belong
to every file, and then write them at once to each file.

Jesus.

From: Junhui Liao on

> You are adding 2 to the counter every iteration, but not clearing it
> after every line. So, on the second line, counter will still be 4096,
> and so you will try to get an element from the array that is out of
> bounds, returning nil and raising the NoMethodError, because you are
> calling the + method on nil. I think you are complicated the issue
> with the counting and so on, usually the Ruby iterators are a cleaner
> way to traverse lists of things. You can remove the use of
> itemnum,counter and so on like this (untested):


Many thanks for your comment !


> File.open("../original_data/test_2lines.tsv").each_line do |record|
> a = record.chomp.split("\t")
> a.each_slice(2).with_index do |(time,signal), index|
> File.open("#{index}_debug_split"+".tsv" , "w") do |f|
> f << "#{time}\t#{signal}\n"
> end
> end
> end


I tried the script, but added "require 'enumerator' ".
Still, there is a problem like this :
new_split_Jesus.rb:1:in `each_slice': no block given (LocalJumpError)

After looking for this forum, I got that this results from my mac based
ruby is 1.8.6, and your code should worked under 1.9 + .
Even though I don't know how to do "requires a block to be passed to
it"
Refer to this link please: http://www.ruby-forum.com/topic/201095#new



> Although this will open and close the 4096 files for every line. Are
> there many lines? If not, you can read the whole file and build a
> structure in memory (a hash of arrays) to store the lines that belong
> to every file, and then write them at once to each file.


Yes, my file is totally 2048 lines, ~260M.
So, if read the whole file into memory,
the efficiency maybe not so nice.


Thanks again for your help !

Best,
Junhui
--
Posted via http://www.ruby-forum.com/.

From: Junhui Liao on
Dear Jesús Gabriel y Galán and all,

> File.open("../original_data/test_2lines.tsv").each_line do |record|
> a = record.chomp.split("\t")
> a.each_slice(2).with_index do |(time,signal), index|
> File.open("#{index}_debug_split"+".tsv" , "w") do |f|
> f << "#{time}\t#{signal}\n"
> end
> end
> end

This code ran well at 1.9.1 version of ruby. Since I tried at our
server where ruby is this version.

Actually, I need to do this also: make the first line's time value
subtracted by other lines' corresponding time ones.

First line: time_1.1, signal_1.1, time_1.2, signal_1.2... time_1.4096,
signal_1.4096.
Second line: time_2.1, signal_2.1, time_2.2, signal_2.2... time_2.4096,
signal_2.4096.
.......

I would like to do, time_2.1 = time_2.1 - time_1.1 , time_2.2 =
time_2.2 - time_1.2 ,
...... time_2.4096 = time_2.4096 - time_1.4096.
......
Similar to other lines' time value.

I tried to use a counter to pick up the first line (stupid way, I know)
than save in an array, and
take other lines time values to subtract this array, but failed. Since
it seemed
to the enumerator I could not access individual ? But "puts a[index] "
printed two items
(time and signal) well. However, i could not print just time or signal
value.

Thanks a lot for in advance!
Best,
Junhui
--
Posted via http://www.ruby-forum.com/.

From: Jesús Gabriel y Galán on
On Fri, Jul 30, 2010 at 1:58 AM, Junhui Liao <junhui.liao(a)uclouvain.be> wrote:
> Dear Jesús Gabriel y Galán and all,
>
>> File.open("../original_data/test_2lines.tsv").each_line do |record|
>>   a = record.chomp.split("\t")
>>   a.each_slice(2).with_index do |(time,signal), index|
>>     File.open("#{index}_debug_split"+".tsv" , "w") do |f|
>>       f << "#{time}\t#{signal}\n"
>>     end
>>   end
>> end
>
> This code ran well at 1.9.1 version of ruby. Since I tried at our
> server where ruby is this version.

BTW, I'm using 1.8.7. And also, File.open().each_line doesn't properly
close the file,
so we should be using File.foreach()

>
> Actually, I need to do this also: make the first line's time value
> subtracted by other lines' corresponding time ones.
>
> First line: time_1.1, signal_1.1, time_1.2, signal_1.2... time_1.4096,
> signal_1.4096.
> Second line: time_2.1, signal_2.1, time_2.2, signal_2.2... time_2.4096,
> signal_2.4096.
> .......
>
> I would like to do,  time_2.1 = time_2.1 - time_1.1 , time_2.2 =
> time_2.2 - time_1.2 ,
> ...... time_2.4096 = time_2.4096 - time_1.4096.
>                             ......
> Similar to other lines' time value.
>
> I tried to use a counter to pick up the first line (stupid way, I know)
> than save in an array, and
> take other lines time values to subtract this array, but failed. Since
> it seemed
> to the enumerator I could not access individual ? But "puts a[index] "
> printed two items
> (time and signal) well.  However, i could not print just time or signal
> value.

What I'd do is create an array for the first line with the times, and
use that after on to substract.
I've refactored a little bit to simplify (this is completely untested):

File.open("../original_data/test_2lines.tsv") do |file|
first_line = file.readline
first_line_times = first_line.chomp.split("\t").each_slice(2).map
{|time,signal| time}
write_line_to_file first_line
file.each_line do |record|
line_data = record.chomp.split("\t")
write_line_to_file line_data, first_line_times
end
end

def write_line_to_file line, base_time = Hash.new(0)
line_data.each_slice(2).with_index do |(time,signal), index|
File.open("#{index}_debug_split"+".tsv" , "w") do |f|
f << "#{time.to_i - base_time[index]}\t#{signal}\n"
end
end
end

Hope this gives you an idea to explore,

Jesus.