Can this be "refactored"? A simple 'wrapper function' to display MySQL data sets in tabular form [Lisp]

Prev: interrupt sbcl x86-64
Next: porting a Forth program

From: Thomas A. Russ on 8 Mar 2010 13:59

Ariel Badichi <vermilionrush(a)gmail.com> writes:

> floaiza <floaiza2(a)gmail.com> writes:
> >
> > First a simple function to repeat a short string n-times:
> >
> > CL-USER>(defun repeat-string (n str)
> > (setq retstr "")
> > (dotimes (var1 n retstr)
> > (setq retstr (concatenate 'string str retstr))))
> >
>
> Since you use SETQ to modify RETSTR (apparently a variable you have
> not defined), the behavior of this function is undefined by the
> Standard; use LET to introduce such a variable. To repeat an
> operation a number of times, it's not necessary to explicitly
> introduce a variable to keep count; use LOOP REPEAT n DO instead of
> DOTIMES. Whenever CONCATENATE is called, a new string is created, so
> if you repeat a string 3 times, 3 new strings will be created, two of
> which are simply garbage when the function returns. Here's one
> approach to writing this function clearly and efficiently:
>
> (defun repeat-string (n string)
> (with-output-to-string (out)
> (loop repeat n do (write-string string out))))
>
> We might also choose to make use of the fact that we know the length
> of the resulting string before actually constructing it:
>
> (defun repeat-string (n string)
> (let* ((length (length string))
> (result (make-string (* n length))))
> (dotimes (i n)
> (setf (subseq result (* i length)) string))
> result))

This is a nice tutorial on how to fix up the original function and is a
nice general set of techniques to use.

This would be necessary for the case of an actual short string.

For a more specialized case, where there is only a single repeated
character, one can use a very simple idioms using the MAKE-STRING
function:

(make-string n :initial-element character)
(make-string 10 :initial-element #\- ) => "----------"

For the string case, one can also make use of some fancy FORMAT options.
For the case of a fixed string and count, you can get the repeat by
(mis)using the ~{...~} iteration construct:

(format nil "~10{abc~}" '(t))

for a variable number but fixed string, it is also easy.

(format nil "~v{abc~}" 10 '(t))

For both a variable string and count, you would have to construct the
format string to use, but it wouldn't be that hard:

(defun repeat-string (n string)
(format nil (format nil "~~~D{~A~~}" n string) '(t)))

Note that using FORMAT, although rather compact, will not be nearly as
fast as the second solution Ariel Badichi gave.

--
Thomas A. Russ, USC/Information Sciences Institute

From: Zach Beane on 10 Mar 2010 10:55

floaiza <floaiza2(a)gmail.com> writes:

> ;;;
> ;;; Thomas A. Russ
> ;;;
> (format nil "~10{abc~}" '(t))
> ;;;
> ;;; Thomas A. Russ
> ;;;
> (format nil "~v{abc~}" 10 '(t))
> ;;;
> ;;; Thomas A. Russ
> ;;;
> (defun repeat-string (n string)
> (format nil (format nil "~~~D{~A~~}" n string) '(t)))

Here's a variation:

(format nil "~v{~A~:*~}" n (list string))

Zach

From: Thomas A. Russ on 10 Mar 2010 13:41

Zach Beane <xach(a)xach.com> writes:

> floaiza <floaiza2(a)gmail.com> writes:
>
> > ;;;
> > ;;; Thomas A. Russ
> > ;;;
> > (format nil "~10{abc~}" '(t))
> > ;;;
> > ;;; Thomas A. Russ
> > ;;;
> > (format nil "~v{abc~}" 10 '(t))
>
> Here's a variation:
>
> (format nil "~v{~A~:*~}" n (list string))

Ah. I was wondering what an elegant way of getting the variable string
into the format directive would be. I didn't think about backing up in
order to stop from exhausting the the list argument.

I did consider putting in a circular list, but decided that was a bit
too cumbersome. I suppose with a helper function it wouldn't have been
too bad:

(defun circular-list (arg &rest other-args)
(let ((l (apply #'list arg other-args)))
(setf (cdr (last l)) l)
l))

(format nil "~v{~A~}" n (circular-list string))

Newby Note: If you want to experiment with CIRCULAR-LIST, make sure you
first do

(setf *print-circle* t)

or bad things will happen.

From: Ariel Badichi on 10 Mar 2010 14:04

floaiza <floaiza2(a)gmail.com> writes:

> Five possible ways have been suggested to address the question of how
> to generate a string that is N times some component string, e.g., "-".
>
> ;;;
> ;;; Ariel Badichi
> ;;;
> (defun repeat-string (n string)
> (with-output-to-string (out)
> (loop repeat n do (write-string string out))))
> ;;;
> ;;; Ariel Badichi
> ;;;
> (defun repeat-string (n string)
> (let* ((length (length string))
> (result (make-string (* n length))))
> (dotimes (i n)
> (setf (subseq result (* i length)) string))
> result))
> ;;;
> ;;; Thomas A. Russ
> ;;;
> (make-string n :initial-element character)
> ;;;
> ;;; Thomas A. Russ
> ;;;
> (format nil "~10{abc~}" '(t))
> ;;;
> ;;; Thomas A. Russ
> ;;;
> (format nil "~v{abc~}" 10 '(t))
> ;;;
> ;;; Thomas A. Russ
> ;;;
> (defun repeat-string (n string)
> (format nil (format nil "~~~D{~A~~}" n string) '(t)))
> ;;;
> ;;;
> ;;;
>
> QUESTION: Is there one or more criteria to decide which one to use?
>

Certainly, there are many criteria useful in judging code, and their
order of significance varies depending on the case. E.g.:

1. Correctness - does the code satisfy the functional requirements?

The third solution is not correct if you are talking about
replicating _strings_.

2. Clarity - is the code easy to follow and understand?

I find FORMAT distasteful. It has a cryptic language for
formatting descriptions that end up being stuffed into an opaque
control string, thereby making it hard to work with their
structure. The semantics are sometimes plain weird, e.g., the
nesting behavior of ~(. With some effort, a Lisp programmer can
come up with a nice extensible macro for formatting and never
look back. [0] [1] Of course, that is no excuse for not being
familiar with FORMAT, since it is part of the language, and as
long as people don't go overboard with it (in real code - not
"show-off" posts :) it's a minor annoyance. This is certainly
not to slight Thomas A. Russ's contribution: it's interesting to
see different approaches to solving a problem.

3. Efficiency - does the code satisfy requirements of performance?

You did not specify any such requirements, but it's a good idea
to avoid wanton waste of resources. Considerations about
efficiency may include the time it takes to run, the time it
takes to develop, the storage required by the process, the amount
of power consumed by the machine, etc.

The first solution I provided was correct, easy to understand, and
moderately efficient. It is the one that I would actually use by
default.

> Speed of execution is mentioned as something to keep in mind; is this
> the only criterion?
>
> Should one time each solution and then go with the fastest one?
>

No, that is the wrong way going about such decisions. You should have
a set of desiderata, and an idea about their relative order of
significance. In the Lisp world, correctness usually is an absolute
requirement and comes first. Clarity is also regarded highly. To get
an idea of desiderata commonly valued in the Lisp world, see such
papers as "Lisp: Good News, Bad News, How to Win Big" [2] (especially
section 2.1, "The Rise of /Worse is Better/") and the "Tutorial on
Good Lisp Programming Style" [3].

In cases where efficiency matters, you should have a concrete
performance requirement for a chunk of computation. It is then
possible to judge in a particular case (code running in a certain
environment) whether the actual requirement is satisfied or not. If
the requirement is not satisfied, the code, environment, the
requirement itself, or a combination thereof need to be adjusted.

You don't just try a few pieces of code and choose the fastest one.

> QUESTION: How does an expert CL programmer go about deciding how to
> put together the code for an application?
>
> As suggested in some of the responses one should avoid writing long
> functions, and choose instead a style where each 'result' is
> controlled by a small function with little or no 'side effects' (the
> functional programming paradigm).
>

Functional programming is only one of various styles Lisp programmers
may choose to think in. Unfortunately, there isn't an easy answer to
your question. Hopefully, a programmer uses his common sense,
intuition, ingenuity, and background knowledge when designing a
solution for a problem.

> However, many string manipulations can be accomplished by using the
> format macro. I have read that purist Lispers don't like to use that
> macro (ditto for the Loop one).
>
> In the end, does it matter?
>

FORMAT is not a macro; it is a function. (A FORMAT compiler macro may
also be present, of course.) I already wrote about FORMAT. Some
people do like it. I know how to read and write its language. I use
it when I don't want to bring in a dependency on some other formatting
facility.

I think LOOP, despite its flaws, is a good and usable operator. I
have no qualms about using it.

> =========
>
> Ariel Badichi recommends not to couple the application that does the
> display of the data returned by an SQL SELECT query with the database
> access proper.
>
> Is security the main reason for such a recommendation? Or is there
> something else that supports it?
>

No, the main reason is separation of concerns. What does table
formatting have to do with database access?

> =========
>
> I recognize that some of questions above are "open ended", but given
> the plethora of choices, I am trying to figure out what I should look
> for generally when making a decision regarding which solution to use.
>
> Thanks,
>
> Francisco
>

Ariel

[0] "Eliminating FORMAT from Lisp"
http://cs-www.cs.yale.edu/homes/dvm/format-stinks.html

[1] CONSTANTIA:OUT, a convenient way to print stuff (on top of FORMAT)
http://github.com/death/constantia/blob/master/out.lisp

[2] "Lisp: Good News, Bad News, How to Win Big"
http://www.dreamsongs.com/WIB.html

[3] "Tutorial on Good Lisp Programming Style"
http://norvig.com/luv-slides.ps

From: Thomas A. Russ on 10 Mar 2010 14:02

floaiza <floaiza2(a)gmail.com> writes:

> Five possible ways have been suggested to address the question of how
> to generate a string that is N times some component string, e.g., "-".
[snip]

> QUESTION: Is there one or more criteria to decide which one to use?

Well, programming is almost always an engineering enterprise, so that
usually means that there are competing criteria and one has to weigh the
relative importance of different factors in choosing how to proceed for
a particular application.

> Speed of execution is mentioned as something to keep in mind; is this
> the only criterion?
>
> Should one time each solution and then go with the fastest one?
>
> QUESTION: How does an expert CL programmer go about deciding how to
> put together the code for an application?

There are several criteria:

* Execution Speed -- How quick is it?
* Clarity of intention -- How easy is it to see what is being done?
* Elegance -- Is the result aesthetically pleasing?
* Flexibility -- How many related tasks can be performed?
* Extensibility -- How easy would it be to add new, related tasks?
* Ease of maintenance -- Can the code evolve easily?
* Cleverness -- Does the code use a particularly clever algorithm?

I'm probably leaving some other factors out as well, and you will see
that some of these work together (clarity or intention, ease of
maintenance) and others may be at cross purposes (flexibility, clarity
of intention, execution speed).

So you need to figure out which factors are the most important for what
you want to do. A function that you call once at startup does not need
to have quite the level of optimization applied as one that is called
millions of times during the run of a program. So for one-off
functions, clarity is (for me) much more important than speed. For
critical functions that are called often, speed is more important than
clarity, so more clever or complicated algorithms become attractive.
There is a greater documentation need for clever algorithms so that you
can still maintain and reuse the code.

> As suggested in some of the responses one should avoid writing long
> functions, and choose instead a style where each 'result' is
> controlled by a small function with little or no 'side effects' (the
> functional programming paradigm).

Yes. This speaks to both resusability and clarity. It also simplifies
testing since you can work on small parts of the problem at once. One
rule of thumb is that no function should take up more than one screenful
of space in your editor. Often it should be smaller than that.

Avoiding side effects makes understanding the program's operation a lot
easier, since you don't end up having "hidden" things happen when you
call functions. There are obviously times when you need this, but it is
a lot clearer when you try to pass parameters and use return values.
Lisp's support of multiple-value returns is great for that. I wonder
when that feature will make it into Java?

> However, many string manipulations can be accomplished by using the
> format macro. I have read that purist Lispers don't like to use that
> macro (ditto for the Loop one).

Well, I'm a pragmatist. I like elegance, partly for aesthetics and
partly for clarity. I also find that the specialized sub-languages of
FORMAT and LOOP have their places. LOOP can make it relatively easy to
create complicated iterative structures in a compact way. Format
strings, on the other hand, can be highly incomprehensible, especially
when they start getting complicated. But for its specialized task,
FORMAT does the job in a reasonable way using what is in effect a
domain-specific language.

The main objection to using format for string manipulations often has to
do with the overhead of using it. Often there are other, simpler,
clearer and faster (more efficient) ways of accomplishing the same
goal.

So, although I am a fan of format and clever uses of it, I would
certainly prefer STRING-UPCASE to format's "~:@(~A~)" if upcasing a
string were all that I was doing. On the other hand, I do often prefer
using format to join strings rather than (CONCATENATE 'STRING ...)

So what it boils down to is that you need to develop a style that you
are comfortable with and which doesn't conflict with the general
community. The reason for community conformance is because programs are
often used to communicate with other programmers, and conventions
regarding style, indentation, etc. make that easier.

--
Thomas A. Russ, USC/Information Sciences Institute

| Next | Last
Pages: 1 2
Prev: interrupt sbcl x86-64
Next: porting a Forth program