Replace in large text file ? [Python]

Prev: Plotting in batch with no display
Next: Python treats non-breaking space wrong?

From: Nobody on 6 Jun 2010 15:55

On Sat, 05 Jun 2010 16:35:42 +0100, MRAB wrote:

>>> In plain language what I wish to do is:
>>>
>>> Remove all comma's
>>> Replace all @ with comma's

>> input_file = open("some_huge_file.txt", "r")
>> output_file = open("newfilename.txt", "w")
>> for line in input_file:

> I'd probably process it in larger chunks:
>
> CHUNK_SIZE = 1024 ** 2 # 1MB at a time
> input_file = open("some_huge_file.txt", "r")
> output_file = open("newfilename.txt", "w")
> while True:
> chunk = input_file.read(CHUNK_SIZE)

This is fine for the exact problem at hand. The moment the problem evolves
into replacing a sequence of two or more characters, processing
line-by-line eliminates the problem where the chunk boundary occurs in the
middle of the sequence.

From: hiral on 9 Jun 2010 06:27

On Jun 6, 7:27 am, Steve <vvw...(a)googlemail.com> wrote:
> On 5 June, 08:53, Steve <vvw...(a)googlemail.com> wrote:
>
> > I am new to Python and am wanting to replace characters in a very
> > large text file.....6 GB
> > In plain language what I wish to do is:
>
> > Remove all comma's
> > Replace all @ with comma's
> > Save as a new file.
>
> > Any of you clever people know the best way to do this......idiot guide
> > please.
>
> > Thanks
>
> > Steve
>
> Many thanks for your suggestions.
>
> sed -i 's/Hello/hello/g' file
>
> Run twice on the CL..with the hello's changed for my needs did it in a
> few minutes ,
>
> Again thanks
>
> Steve

Hi Steve,

You can do...

sed "s/,//g" <your_file> | sed "s/@/,/g" > <new_file>

Thank you.

From: Tim Chase on 9 Jun 2010 07:00

On 06/09/2010 05:27 AM, hiral wrote:
> On Jun 6, 7:27 am, Steve<vvw...(a)googlemail.com> wrote:
>> On 5 June, 08:53, Steve<vvw...(a)googlemail.com> wrote:
>>> Remove all comma's
>>> Replace all @ with comma's
>>> Save as a new file.
>>
>> Many thanks for your suggestions.
>>
>> sed -i 's/Hello/hello/g' file
>>
>> Run twice on the CL..with the hello's changed for my needs did it in a
>> few minutes ,
>
> You can do...
>
> sed "s/,//g"<your_file> | sed "s/@/,/g"> <new_file>

No need to use 2 sed processes:

sed 's/,//g;y/@/,/' your_file > new_file

(you could use "s/@/,/g" as well, but the internal implementation
of the transliterate "y" should be a lot faster)

-tkc

First | Prev |
Pages: 1 2
Prev: Plotting in batch with no display
Next: Python treats non-breaking space wrong?