From: Mike Williams on
"Bee" <Bee(a)discussions.microsoft.com> wrote in message
news:5F3E878D-D34E-456B-9CB1-18AAA879136C(a)microsoft.com...

> Converting to and from a byte array is very fast.
> I think this is legal.
> Dim aByte() as byte
> aByte=sString ' to byte array
> work on the byte array
> sString = StrConv(aByte, vbUnicode) ' back to string

Well it's legal, but it doesn't do what you appear to think it does. Rather
than explain what is going on I think it might be more instructive to just
show you the result and then allow you to work out for yourself what is
going on (remembering that a VB String has two bytes per character), so that
you can post again if you can't work it out. Try this, which is your above
code with an actual test string included:

Dim sString As String
sString = "abcd"
Dim aByte() As Byte
Print sString, Len(sString)
aByte = sString ' to byte array
'work on the byte array
sString = StrConv(aByte, vbUnicode) ' back to string
Print sString, Len(sString)

Mike



From: Schmidt on

"David Kaye" <sfdavidkaye2(a)yahoo.com> schrieb im Newsbeitrag
news:hh8ipi$42m$1(a)news.eternal-september.org...

> >Repeated calls to Replace (scanning the string
> >over and over again, to replace different single-chars)
> >is "horribly inefficient" ... ;-)
>
> My experience wasn't like that at all, which was my point.
> An 80k file with about 5k of replacements or deletions took
> only 0.009 or 9/100ths of a second.

You probably mean "0.090 or 9/100ths of a second".

And yes, that matches with the first results in the following
test-routines (on a modern CPU) - nonetheless the
"repeated replaces approach" is already about 50 times
slower than Larrys routine on these smaller Input-Lenghts
(as your ~80kByte). But 90msec is already near the
human "i can feel it"-barrier (of 1/10ths of a second)...

And on larger Input in the MegaByte-range, the influence
of the OLE-BSTR-caching has a much lesser effect ...
on a ~2.5MB input-string your routine is then already
about 160 times slower, needing about 13 seconds...
so be a bit patient, until the Demo below has finished.

Just test this yourself with the below copy and paste-code.
(compile to native code with all advanced options please,
to see the real difference).

'***Into a Form
Option Explicit

Private Type SafeArray1D
cDims As Integer
fFeatures As Integer
cbElements As Long
cLocks As Long
pvData As Long
cElements1d As Long
lLBound1d As Long
End Type

Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _
(PArr() As Any, PSrc&, Optional ByVal cb& = 4)
Private Declare Sub ReleaseArray Lib "kernel32" Alias "RtlMoveMemory" _
(PArr() As Any, Optional PSrc& = 0, Optional ByVal cb& = 4)
Private Declare Sub RtlMoveMemory Lib "kernel32" _
(dst As Any, src As Any, ByVal nBytes&)


Private Declare Function QueryPerformanceFrequency& Lib "kernel32" (x@)
Private Declare Function QueryPerformanceCounter& Lib "kernel32" (x@)


Private Sub Form_Load()
AutoRedraw = True
Caption = "Click the Form"
End Sub

Private Sub Form_Click()
Dim i&, T#, S$, S1$, S2$, S3$

S = " abc" & vbTab & "123" & vbCrLf & "ABC" & Chr(0) & "123 "
For i = 1 To 12
S = S & S 'results in an about 80kByte test-string
Next i

S1 = S
S2 = S
S3 = S

Print "InputLen:", Len(S), vbCrLf

T = HPTimer
ScrubUsingReplace S1
Print "ScrubUsingReplace", Round((HPTimer - T) * 1000, 2), Len(S1)

T = HPTimer
ScrubUsingLookupTable S2
Print "ScrubUsingLookupTable", Round((HPTimer - T) * 1000, 2), Len(S2)

T = HPTimer
ScrubUsingSafeArray S3
Print "ScrubUsingSafeArray", Round((HPTimer - T) * 1000, 2), Len(S3)


'******* and the same thing again with larger Input ********
For i = 13 To 17
S = S & S 'results in an about 2.5MByte test-string
Next i

S1 = S
S2 = S
S3 = S

Print
Print "Now we leave the efficiency-range of the OLE-BSTR-cache..."
Print "InputLen:", Len(S), vbCrLf

T = HPTimer
ScrubUsingReplace S1
Print "ScrubUsingReplace", Round((HPTimer - T) * 1000, 2), Len(S1)

T = HPTimer
ScrubUsingLookupTable S2
Print "ScrubUsingLookupTable", Round((HPTimer - T) * 1000, 2), Len(S2)

T = HPTimer
ScrubUsingSafeArray S3
Print "ScrubUsingSafeArray", Round((HPTimer - T) * 1000, 2), Len(S3)

Print
End Sub

Private Sub ScrubUsingReplace(Text As String)
Dim j%
For j% = 0 To 64
Text$ = Replace(Text$, Chr$(j%), "")
Next
For j% = 128 To 255
Text$ = Replace(Text$, Chr$(j%), "")
Next
End Sub

Private Sub ScrubUsingLookupTable(Text As String)
Dim inc() As Byte, txt() As Byte
Dim i As Long, src As Long, dst As Long

txt = Text

ReDim inc(255)
For i = 0 To UBound(inc)
If i > 64 And i < 128 Then inc(i) = 2
Next i

Do While src < UBound(txt)
txt(dst) = txt(src)
src = src + 2
dst = dst + inc(txt(dst))
Loop
Do While dst < UBound(txt)
txt(dst) = 32
dst = dst + 2
Loop
Text = Trim$(txt)
End Sub

Private Sub ScrubUsingSafeArray(Text As String)
Dim i&, j&, aSrc%(), saSrc As SafeArray1D
saSrc.cDims = 1
saSrc.cbElements = 2 'the width of an 16Bit-Integer
saSrc.cElements1d = Len(Text) + 2 'two more, to reflect the LBound
saSrc.lLBound1d = -2 'include the 4 Len-Info-Bytes of the BSTR
saSrc.pvData = StrPtr(Text) - 4 'adapt to the real start of the BSTR

If saSrc.cElements1d = 2 Then Exit Sub 'nothing to replace

BindArray aSrc, VarPtr(saSrc)

For i = 0 To UBound(aSrc)
Select Case aSrc(i)
Case Is < 65, Is > 127 '<-- define the scrubbed Char-ranges here...
Case Else: aSrc(j) = aSrc(i): j = j + 1
End Select
Next i
RtlMoveMemory aSrc(-2), CLng(j + j), 4 'adjust to the new Len-Info

ReleaseArray aSrc
End Sub

Private Function HPTimer#()
Dim x@: Static Frq@
If Frq = 0 Then QueryPerformanceFrequency Frq
If QueryPerformanceCounter(x) Then HPTimer = x / Frq
End Function

Olaf


From: Schmidt on

"Bee" <Bee(a)discussions.microsoft.com> schrieb im Newsbeitrag
news:5F3E878D-D34E-456B-9CB1-18AAA879136C(a)microsoft.com...

> I am loading a notepad "compatible" file from disk (shows extra
> control characters as boxes, etc).
> It may or may not be totally pure printable text.
> I need to clean out all non-printing characters other than
> the Tab, CR and LF.
> I need to make proper paragraphs.
> So I look for non-end-of-sentence with a CRLF near after
> and remove the CRLF and other whitespace and replace
> with a space if necessary.
> So I scan forward then back up through the text and do
> a replace as necessary.
> I also look for other characters that I need to change or delete.
For such "pretty formatting" tasks it is difficult to give
concrete advise - cant you just post your current code?

> I think this is legal.
> Dim aByte() as byte
> aByte=sString ' to byte array
> work on the byte array
> sString = StrConv(aByte, vbUnicode) ' back to string

Nope, as Mike already pointed out, the correct pairing
would either be:
aByte = sString 'two Bytes per char (no ANSI-conversion)
....
sString = aByte 'two Bytes per Char back-conversion

or ANSI-based (ByteArray-StepWidth = 1)
aByte = StrConv(sString, vbFromUnicode)
....
sString = StrConv(aByte, vbUnicode)

> Currently, with a very fast InString and Replace string
> routine the 1M text file takes over a minute to process.
That's pretty much for an 1MB-input, yes.
As said, please post some code, regarding what you
currently do - would be easier than "guessing".
Aside from that, I'd probably split that up into two
scans - the first one doing all single-char-cleanups
(replacements with "nothing", using Larrys Lookup-
approach).
And in the second run over the already roughly cleaned
up String, I'd try to ensure your "pretty formatting-stuff".

Olaf


From: Bee on
I have something working now using strings.
It is now taking about 40 secs for the 1M file.
It used to be many many minutes.

I am working on the ReplaceInByteArray() routine.
I will post that tomorrow as a new start post.
I have everything except for this ReplaceInByteArray().
So I will let you tear it apart. But be gentle.
And thanks to both of you for sticking with me on this.

I plan on this:

(1)Use very fast string replace to do the easy stuff.
I have a very fast Like search and replace now.
(2)Convert to a Byte Array and do the hard looping stuff.
(3)Then convert back.

the code is too large and convoluted for easy study and I think I have it
down to just this one ReplaceInByteArray routine 'cause all else works.

and as I said, if all else fails, the sub is all working correctly with
strings.

First  |  Prev  | 
Pages: 1 2 3 4 5 6
Prev: How To Know
Next: Array Problem