From: Ashley Sheridan on
On Sun, 2010-03-14 at 12:25 +0100, Rene Veerman wrote:

> On Sun, Mar 14, 2010 at 12:24 PM, Rene Veerman <rene7705(a)gmail.com> wrote:
> >
> > I'd love to have a copy of whatever function you use to filter out bad
> > HTML/js/flash for use cases where users are allowed to enter html.
> > I'm aware of strip_tags() "allowed tags" param, but haven't got a good list
> > for it.
> >
>
> oh, and even <img> tags can be used for cookie-stuffing on many browsers..
>


Yes, and you call strip_tags() before the data goes to the browser for
display, not before it gets inserted into the database. Essentially, you
need to keep as much original information as possible.

Thanks,
Ash
http://www.ashleysheridan.co.uk


From: Jochem Maas on
Op 3/14/10 11:45 AM, Ashley Sheridan schreef:
> On Sun, 2010-03-14 at 12:25 +0100, Rene Veerman wrote:
>
>> On Sun, Mar 14, 2010 at 12:24 PM, Rene Veerman <rene7705(a)gmail.com> wrote:
>>>
>>> I'd love to have a copy of whatever function you use to filter out bad
>>> HTML/js/flash for use cases where users are allowed to enter html.
>>> I'm aware of strip_tags() "allowed tags" param, but haven't got a good list
>>> for it.
>>>
>>
>> oh, and even <img> tags can be used for cookie-stuffing on many browsers..
>>
>
>
> Yes, and you call strip_tags() before the data goes to the browser for
> display, not before it gets inserted into the database. Essentially, you
> need to keep as much original information as possible.

I disagree with both you. I'm like that :)

let's assume we're not talking about data that is allowed to contain HTML,
in such cases I would do a strip_tags() on the incoming data then compare
the output ofstrip_tags() to the original input ... if they don't match then
I would log the problem and refuse to input the data at all.

using strip_tags() on a piece of data everytime you output it if you know
that it shouldn't contain any in the first is a waste of resources ... this
does assume that you can trust the data source ... which in the case of a database
that you control should be the case.

at any rate, strip_tags() doesn't belong in an 'anti-sql-injection' routine as
it has nothing to do with sql injection at all.

>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>

From: Colin Guthrie on
'Twas brillig, and Jochem Maas at 14/03/10 23:56 did gyre and gimble:
> Op 3/14/10 11:45 AM, Ashley Sheridan schreef:
>> On Sun, 2010-03-14 at 12:25 +0100, Rene Veerman wrote:
>>
>>> On Sun, Mar 14, 2010 at 12:24 PM, Rene Veerman <rene7705(a)gmail.com> wrote:
>>>>
>>>> I'd love to have a copy of whatever function you use to filter out bad
>>>> HTML/js/flash for use cases where users are allowed to enter html.
>>>> I'm aware of strip_tags() "allowed tags" param, but haven't got a good list
>>>> for it.
>>>>
>>>
>>> oh, and even <img> tags can be used for cookie-stuffing on many browsers..
>>>
>>
>>
>> Yes, and you call strip_tags() before the data goes to the browser for
>> display, not before it gets inserted into the database. Essentially, you
>> need to keep as much original information as possible.
>
> I disagree with both you. I'm like that :)
>
> let's assume we're not talking about data that is allowed to contain HTML,
> in such cases I would do a strip_tags() on the incoming data then compare
> the output ofstrip_tags() to the original input ... if they don't match then
> I would log the problem and refuse to input the data at all.
>
> using strip_tags() on a piece of data everytime you output it if you know
> that it shouldn't contain any in the first is a waste of resources ... this
> does assume that you can trust the data source ... which in the case of a database
> that you control should be the case.

I used to think like that too, but I've relatively recently changed my
position.

While it's not as extreme an example, I used to keep data in the
database *after* I had processed it with htmlspecialchars() (not quite
the same as strip_tags, but the principle is the same).

The issue I had was that over time, I've found the need to output to
other formats - e.g. spread sheets, plain text emails, PDFs etc. in
which case this pre-encoded format is a pain and I have to call
html_entity_decode() to reverse the htmlspecialchars() I did in the
first place. This is a royal pain in the bum and it's really ugly in the
code, remembering what format the data is in in order to process it
appropriately at the right points.

Nowadays I work rather differently and always escape at the point of
output (this does not exclude filtering at the point of input too, but I
do not keep things encoded any longer - I keep it raw).

Any half way decently designed caching layer will prevent any major
impact from escaping at the point of output anyway.

Now you could argue that encoding at the save point and reversing the
encoding when needed is still a better approach and I wont argue too
heavily, but for the sake of my sanity I'm much happier working the way
I do now. The view layers are very clearly escaping everything that
needs escaping and no logic for the "is it or is it not already escaped"
leaks into this layer.

(I appreciate strip tags and htmlspecialchars are not the same and my
general usage may not apply to a pure striptags usage).

> at any rate, strip_tags() doesn't belong in an 'anti-sql-injection' routine as
> it has nothing to do with sql injection at all.

Indeed, it's more about XSS and CSRF rather than SQL injection.

Col

--

Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
Tribalogic Limited [http://www.tribalogic.net/]
Open Source:
Mandriva Linux Contributor [http://www.mandriva.com/]
PulseAudio Hacker [http://www.pulseaudio.org/]
Trac Hacker [http://trac.edgewall.org/]

From: Ashley Sheridan on
On Mon, 2010-03-15 at 12:48 +0000, Colin Guthrie wrote:

> 'Twas brillig, and Jochem Maas at 14/03/10 23:56 did gyre and gimble:
> > Op 3/14/10 11:45 AM, Ashley Sheridan schreef:
> >> On Sun, 2010-03-14 at 12:25 +0100, Rene Veerman wrote:
> >>
> >>> On Sun, Mar 14, 2010 at 12:24 PM, Rene Veerman <rene7705(a)gmail.com> wrote:
> >>>>
> >>>> I'd love to have a copy of whatever function you use to filter out bad
> >>>> HTML/js/flash for use cases where users are allowed to enter html.
> >>>> I'm aware of strip_tags() "allowed tags" param, but haven't got a good list
> >>>> for it.
> >>>>
> >>>
> >>> oh, and even <img> tags can be used for cookie-stuffing on many browsers..
> >>>
> >>
> >>
> >> Yes, and you call strip_tags() before the data goes to the browser for
> >> display, not before it gets inserted into the database. Essentially, you
> >> need to keep as much original information as possible.
> >
> > I disagree with both you. I'm like that :)
> >
> > let's assume we're not talking about data that is allowed to contain HTML,
> > in such cases I would do a strip_tags() on the incoming data then compare
> > the output ofstrip_tags() to the original input ... if they don't match then
> > I would log the problem and refuse to input the data at all.
> >
> > using strip_tags() on a piece of data everytime you output it if you know
> > that it shouldn't contain any in the first is a waste of resources ... this
> > does assume that you can trust the data source ... which in the case of a database
> > that you control should be the case.
>
> I used to think like that too, but I've relatively recently changed my
> position.
>
> While it's not as extreme an example, I used to keep data in the
> database *after* I had processed it with htmlspecialchars() (not quite
> the same as strip_tags, but the principle is the same).
>
> The issue I had was that over time, I've found the need to output to
> other formats - e.g. spread sheets, plain text emails, PDFs etc. in
> which case this pre-encoded format is a pain and I have to call
> html_entity_decode() to reverse the htmlspecialchars() I did in the
> first place. This is a royal pain in the bum and it's really ugly in the
> code, remembering what format the data is in in order to process it
> appropriately at the right points.
>
> Nowadays I work rather differently and always escape at the point of
> output (this does not exclude filtering at the point of input too, but I
> do not keep things encoded any longer - I keep it raw).
>
> Any half way decently designed caching layer will prevent any major
> impact from escaping at the point of output anyway.
>
> Now you could argue that encoding at the save point and reversing the
> encoding when needed is still a better approach and I wont argue too
> heavily, but for the sake of my sanity I'm much happier working the way
> I do now. The view layers are very clearly escaping everything that
> needs escaping and no logic for the "is it or is it not already escaped"
> leaks into this layer.
>
> (I appreciate strip tags and htmlspecialchars are not the same and my
> general usage may not apply to a pure striptags usage).
>
> > at any rate, strip_tags() doesn't belong in an 'anti-sql-injection' routine as
> > it has nothing to do with sql injection at all.
>
> Indeed, it's more about XSS and CSRF rather than SQL injection.
>
> Col
>
> --
>
> Colin Guthrie
> gmane(at)colin.guthr.ie
> http://colin.guthr.ie/
>
> Day Job:
> Tribalogic Limited [http://www.tribalogic.net/]
> Open Source:
> Mandriva Linux Contributor [http://www.mandriva.com/]
> PulseAudio Hacker [http://www.pulseaudio.org/]
> Trac Hacker [http://trac.edgewall.org/]
>
>


You could escape the content with strip_tags() and insert both copies
into the database if you're really worried about wasted resources. That
way, you keep a copy of the original data, and the one you're most
likely going to display in a web page.

It's like the whole argument about modifying textarea content to replace
newlines with <br/> tags. At some point, you might need that content for
another use, and when you do, you'll wish you had the original. Just
because you don't see that use in your immediate future, it doesn't mean
it won't occur.

Thanks,
Ash
http://www.ashleysheridan.co.uk


From: Tommy Pham on
On Sat, Mar 13, 2010 at 11:10 AM, tedd <tedd.sperling(a)gmail.com> wrote:
> Hi gang:
>
> I just completed writing a survey that has approximately 180 questions in it
> and I need a fresh look at how to store the results so I can use them later.
>
> The survey requires the responder to identify themselves via an
> authorization script. After which, the responder is permitted to take the
> survey. Everything works as the client wants so there are no problems there.
>
> My question is how to store the results?
>
> I have the answers stored in a session variable, like:
>
> $_SESSION['answer']['e1']
> $_SESSION['answer']['e2']
> $_SESSION['answer']['e2a']
> $_SESSION['answer']['e2ai']
> $_SESSION['answer']['p1']
> $_SESSION['answer']['p1a']
> $_SESSION['answer']['p1ai']
>
> and so on. As I said, there are around 180 questions/answers.
>
> Most of the answers are integers (less than 100), some are text, and some
> will be null.
>
> Each "vote" will have a unique number (i.e., time) assigned to it as well as
> a common survey id.
>
> My first thought was to simply record the "vote" as a single record with the
> answers as a long string (maybe MEDIUMTEXT), such as:
>
> 1, 1268501271, e1, 1, e2, 16, e2a, Four score and ..., e2a1, ,
>
> Then I thought I might make the data xml, such as:
>
> <survey_id>1</survey_id><vote_id>1268501271</vote_id><e1>1</e1><e2>16</e2><e2a>Four
> score and ...</e2a><e2ai></e2ai>
>
> That way I can strip text entries for <> and have absolute control over
> question separation.
>
> Then I thought I could make each question/answer combination have it's own
> record while using the vote_id to tie the "vote" together. That way I can
> use MySQL to do the heavy lifting during the analysis. While each "vote"
> creates 180 records, I like this way best.
>
> Then I thought, what would you guys do? So, what would you guys do?
>
> Keep in mind that this survey must evaluated in terms of answers, such as
> "Of the ones who answered e1 as 1 how did they answer e2?"
>
> If there is something wrong with my preference, please let me know.
>
> Thanks,
>
> tedd
>
> --
> -------

Tedd,

Sorry to be jumping in late, trying to migrate from yahoo mail to
gmail since I'm experiencing more problems with yahoo mail then I'd
like. Any way, I'd go with db storage for storing of the results
since it will give better and more flexible analysis and reporting
later. Below is how I'd do the db structure:

tbl_survey_questions:
questionId = int / uid << your call
languageId = int / uid / char << your call if you intend to I18n it ;)
question = varchar << length is your requirement
PK > questionId + languageId

tbl_participants:
userId = int / uid
userName = varchar
PK > userId

tbl_answers:
userId = int / uid
questionId = int / uid
languageId = int / uid
answer = varchar / mediumtext / or another type of text field
PK > userId + questionId + languageId

The reason why I'd structure it like this is:

Let's say you have question 1 with 5 (a-e) multiple choices, you
aggregrate your query (GROUP BY) to db for question 1 and see how many
responses are for a to e (each). If your survey is I18n and your DB
reflects it, you can even analyze how/why certain cultural background
would choose each of those answer. (don't flame me... I know the
environment comes in to growing up too :p and that's way beyond the
scope of this list )

For question 2 with could be user entry (non multiple choice
selection), again, you see what their opinions are for that question.
You get the idea as how the rest may go.

I used to do lots of reporting with the real tool, Crystal Report ;)

Regards,
Tommy