From: Mike Rhoads on
Venky,

Interesting point!

I don't mind the behavior you are describing (ignoring leading zeroes in
otherwise-valid date values) at all. However, my issue is different.
In your examples a and c below, the entire contents of the specified
9-column (for a) and 11-column (for c) widths are valid, once the
leading zeroes are ignored (and the system assumes a 2-digit year if
necessary). The ABCDs are irrelevant, because they are past the end of
the number of columns SAS was told to process.

In b, on the other hand, the "invalid" characters are within the
specified field width.

The behavior still seems undesirable to me, as well as inconsistent with
the way other informats seem to work. Off the top of my head I can't
think of a situation where the current behavior of ignoring invalid
characters is useful, but maybe someone else can come up with an
example.

Mike Rhoads
Westat
RhoadsM1(a)Westat.com

-----Original Message-----
From: Venky Chakravarthy [mailto:swovcc(a)HOTMAIL.COM]
Sent: Wednesday, January 17, 2007 10:09 AM
To: SAS-L(a)LISTSERV.UGA.EDU; Mike Rhoads
Subject: Re: Safe way to test if a date is valid ?


Hi Mike,

That was my feeling too. However, going back a few years on this list, I
seem to recall another thread that was similar and I think I can apply
that
explanation here to make some sense of it.

Note that the documentation is clear on stating that leading zeroes do
not
affect numeric values. So when you write

input('1JAN5ABCD',DATE9.)

it is similar to a and c below when viewed in the context of the
YEARCUTOFF
option:

data _null_ ;
a = input ('01JAN0005ABCD',DATE9.) ;
b = input ('1JAN5ABCD',DATE9.) ;
c = input ('0001JAN0005ABCD',DATE11.) ;
put a=date9. b=date9. c=date9. ;
run ;

Which yields the following in the log:

a=01JAN2005 b=01JAN2005 c=01JAN2005

I agree that this behavior has a strange feel to it but I think it is
also
instrumental in rendering some of the flexibility to SAS in reading in
date
values. I guess there is more good than bad coming out of this behavior
so
it may be more of a feature than a bug.

Here is my response to a similar question from a previous thread in 2001
and the link to a SAS Note that details this behavior.

http://listserv.uga.edu/cgi-bin/wa?A2=ind0112B&L=sas-l&P=R6447

http://support.sas.com/techsup/unotes/V6/0/0573.html

Regards,

Venky


On Wed, 17 Jan 2007 08:33:10 -0500, Mike Rhoads <RHOADSM1(a)WESTAT.COM>
wrote:

>Scott,
>
>This seems odd to me as well.
>
>INPUT ('123XYZ',6.) returns an error, even though within the specified
>width there is a valid numeric value prior to the invalid characters.
>Off the top of my head, I can't think of any good reason why INPUT
>('1JAN5ABCD',DATE9.) should not return an error. The underlying
>situation appears to be the same. (Trailing blanks should be OK, of
>course.)
>
>Mike Rhoads
>Westat
>RhoadsM1(a)Westat.com
>
>-----Original Message-----
>From: owner-sas-l(a)listserv.uga.edu
[mailto:owner-sas-l(a)listserv.uga.edu]
>On Behalf Of Scott Barry
>Sent: Tuesday, January 16, 2007 5:20 PM
>To: SAS-L ListServ Group
>Subject: Re: Safe way to test if a date is valid ?
>
>
>...
>
>Also, I would consider the SAS DATE INFORMAT processing of
>alpha-characters in the year to be an
>serious defect without question. Hopefully SAS Institute can
understand
>why, presuming someone
>reported the behavior?
>
>For many years I've used the DATA step technique with INPUT function
and
>checking _ERROR_ to
>validate a user-specified date string as being correct. To see Dan's
>post is disquieting.
>
>Sincerely,
>
>Scott Barry
>SBBWorks, Inc.
From: Ian Whitlock on
Summary: I see it as a philosophical language problem.
#iw-value=1

Mike,

I seem to remember you sitting beside me in a SUGI BOF some years ago when
Rick Langston explained that the various date informats were, by design,
triggered to end when a non-date symbol occurred.

I agree that

INPUT ('1JAN5ABCD',DATE9.)

is a little hard to swallow, but informats are for reading data where it is
somewhat easier to understand. Since you find a blank acceptable, what do
you say to

INPUT ('1JAN5 BCD',DATE9.)

I find it hard to argue with

1 data _null_ ;
2 input chk $15. @1 dt datetime. ;
3 put chk= dt= datetime20. ;
4 cards ;

chk=1JAN5,15:35:40 dt=01JAN2005:15:35:40

Perhaps one should have an agreed small set of allowable symbols that can
trigger the end of a date, but it seems somewhat like a "mother may I"
approach to the language which SAS has not traditionally adopted.

I think there are arguments for a permissive language and for a very strict
"you conform or else" language, but SAS has already made that decision.
The handling of date, time and date/time informats is consistent with that
decision, albeit not so well documented.

Ian Whitlock
=================

Date: Wed, 17 Jan 2007 08:33:10 -0500
Reply-To: Mike Rhoads <RHOADSM1(a)WESTAT.COM>
Sender: "SAS(r) Discussion"
From: Mike Rhoads <RHOADSM1(a)WESTAT.COM>
Subject: Re: Safe way to test if a date is valid ?
In-Reply-To: <011c01c739bc$69539a30$6600a8c0(a)IBMA9A4F058C42>
Content-Type: text/plain; charset="us-ascii"

Scott,

This seems odd to me as well.

INPUT ('123XYZ',6.) returns an error, even though within the specified
width there is a valid numeric value prior to the invalid characters. Off
the top of my head, I can't think of any good reason why INPUT
('1JAN5ABCD',DATE9.) should not return an error. The underlying situation
appears to be the same. (Trailing blanks should be OK, of course.)

Mike Rhoads Westat RhoadsM1(a)Westat.com

-----Original Message----- From: owner-sas-l [mailto:owner-sas-l] On Behalf
Of Scott Barry Sent: Tuesday, January 16, 2007 5:20 PM To: SAS-L ListServ
Group Subject: Re: Safe way to test if a date is valid ?

....

Also, I would consider the SAS DATE INFORMAT processing of alpha-characters
in the year to be an serious defect without question. Hopefully SAS
Institute can understand why, presuming someone reported the behavior?

For many years I've used the DATA step technique with INPUT function and
checking _ERROR_ to validate a user-specified date string as being correct.
To see Dan's post is disquieting.

Sincerely,

Scott Barry SBBWorks, Inc.
From: Mike Rhoads on
Ian,

What a memory!!! ;-)

I certainly have no reason to doubt your recollection. The "end at a
non-date symbol" paradigm certainly explains this behavior, and Rick
would be the person to know. And, given how long this has been the
case, it is probably unlikely to change now.

Being the stubborn fellow that I am, however, I can still see no good
reason why trailing alpha characters should produce "valid" results with
date informats, when they don't with simple numeric formats. For
instance:

DATA _NULL_;
INPUT @1 D DATE11. @15 N 6. @22 X $CHAR1.;
PUT N=6. D=YYMMDD10.;
CARDS;
15APR2007 123 X
15APR2007XYZ 123XYZ X
RUN;

In other words, SAS could have a "permissive" philosophy where the idea
is to come up with acceptable input whenever there is some remotely
plausible way of doing so. In some respects it does, such as allowing
leading and trailing blanks. But here, with trailing "invalid"
characters other than blanks, the behavior with dates does not seem to
be consistent with that used with non-date values.

I would also reject your '1JAN5 BCD', if SAS is instructed to read 9
characters. There is, of course, a difference between "formatted" and
"list" input -- with the latter, it's certainly acceptable to read fewer
characters than specified if a field delimiter (say a blank or comma,
depending on context) is encountered. I'd argue, however, that the
INPUT function implies "formatted" input, where SAS should read and act
upon the full number of characters specified (or implied) with the
format, unless the string is shorter than that.

BTW, to make things even a little stranger, Venky found the following in
the documentation:

"SAS can read date and time values that are delimited by the following
characters:

! # $ % & ( ) * + - . / : ; < = > ? [ \ ] ^ _ { | } ~

The blank character can also be used."
(http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a002200738.htm
)

So, INPUT('15(APR(2007',DATE11.) returns a valid date!

Mike Rhoads
Westat
RhoadsM1(a)Westat.com

-----Original Message-----
From: owner-sas-l(a)listserv.uga.edu [mailto:owner-sas-l(a)listserv.uga.edu]
On Behalf Of iw1junk(a)comcast.net
Sent: Wednesday, January 17, 2007 11:42 AM
To: SAS(r) Discussion
Cc: Mike Rhoads; Scott Barry
Subject: Re: Safe way to test if a date is valid ?


Summary: I see it as a philosophical language problem.
#iw-value=1

Mike,

I seem to remember you sitting beside me in a SUGI BOF some years ago
when
Rick Langston explained that the various date informats were, by design,
triggered to end when a non-date symbol occurred.

I agree that

INPUT ('1JAN5ABCD',DATE9.)

is a little hard to swallow, but informats are for reading data where it
is
somewhat easier to understand. Since you find a blank acceptable, what
do
you say to

INPUT ('1JAN5 BCD',DATE9.)

I find it hard to argue with

1 data _null_ ;
2 input chk $15. @1 dt datetime. ;
3 put chk= dt= datetime20. ;
4 cards ;

chk=1JAN5,15:35:40 dt=01JAN2005:15:35:40

Perhaps one should have an agreed small set of allowable symbols that
can
trigger the end of a date, but it seems somewhat like a "mother may I"
approach to the language which SAS has not traditionally adopted.

I think there are arguments for a permissive language and for a very
strict
"you conform or else" language, but SAS has already made that decision.
The handling of date, time and date/time informats is consistent with
that
decision, albeit not so well documented.

Ian Whitlock
=================

Date: Wed, 17 Jan 2007 08:33:10 -0500
Reply-To: Mike Rhoads <RHOADSM1(a)WESTAT.COM>
Sender: "SAS(r) Discussion"
From: Mike Rhoads <RHOADSM1(a)WESTAT.COM>
Subject: Re: Safe way to test if a date is valid ?
In-Reply-To: <011c01c739bc$69539a30$6600a8c0(a)IBMA9A4F058C42>
Content-Type: text/plain; charset="us-ascii"

Scott,

This seems odd to me as well.

INPUT ('123XYZ',6.) returns an error, even though within the specified
width there is a valid numeric value prior to the invalid characters.
Off
the top of my head, I can't think of any good reason why INPUT
('1JAN5ABCD',DATE9.) should not return an error. The underlying
situation
appears to be the same. (Trailing blanks should be OK, of course.)

Mike Rhoads Westat RhoadsM1(a)Westat.com

-----Original Message----- From: owner-sas-l [mailto:owner-sas-l] On
Behalf
Of Scott Barry Sent: Tuesday, January 16, 2007 5:20 PM To: SAS-L
ListServ
Group Subject: Re: Safe way to test if a date is valid ?

....

Also, I would consider the SAS DATE INFORMAT processing of
alpha-characters
in the year to be an serious defect without question. Hopefully SAS
Institute can understand why, presuming someone reported the behavior?

For many years I've used the DATA step technique with INPUT function and
checking _ERROR_ to validate a user-specified date string as being
correct.
To see Dan's post is disquieting.

Sincerely,

Scott Barry SBBWorks, Inc.
From: Scott Barry on
A follow-up on my SAS support issue/track regarding handling of datew. and mmddyyw. (input as
character string) informat behavior with "invalid characters" in the year-portion, SAS Support
pointed me to a few tech notes that appear to substantiate the "working as design" behavior, though
the SAS 9 Language Dictionary is in conflict (without any warning otherwise) stating "must be in the
form...two-digit or four-digit...year."

When I asked about an alternate INFORMAT I can trust to validate an incoming date string, none was
offered and I should consider using a character function (e.g., VERIFY) to ensure the year-portion
is intact and as expected. How unfortunate.

Also, I was given no justification as to why the DATEw. and MMDDYYw. INFORMATs behave differently
when an invalid year-portion ends in a character (a possible condition with a masked date string?),
as illustrated below where mmddyy10. considered the year-string "2xxx" to invalidate the date
translation):

1 data _null_;
2 format dtvalue date9.;
3 dtvalue = input('01jan2xxx',date9.);
4 put _all_;
5 dtvalue = input('01/01/2xxx',mmddyy10.);
6 put _all_;
7 run;

dtvalue=01JAN2002 _ERROR_=0 _N_=1
NOTE: Invalid argument to function INPUT at line 5 column 12.
dtvalue=. _ERROR_=1 _N_=1
dtvalue=. _ERROR_=1 _N_=1
NOTE: Mathematical operations could not be performed at the following places. The results of
the operations have been set to missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 5:12
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


Oh, well....looks like "user beware".

Sincerely,

Scott Barry
SBBWorks, Inc.
___________________________

SN-V8+-017774
DATEw. informat does not produce INVALID DATA message when alpha character follows year value


--------------------------------------------------------------------------------


The DATEw. informat reads date values structured like "02jan06", which
represents January 2, 2006. If there is an alpha character after the
year value, the informat reads the value without producing an "invalid
data" note in the SAS log. For example:

data _null_;
date=input('16jan5q',date7.);
put date= mmddyy10.;
run;

produces:

date=01/16/2005

The 'q' did not cause a problem when reading the value because the day
value had been satisfied.

The informat has behaved this way since SAS 82.4. The behavior is
"by design", and not considered a bug.

*****************************************************************
MMDDYY INFORMAT discussion - SAS 9 Language Reference Dictionary:


Details

The date values must be in the form mmddyy or mmddyyyy, where

mm
is an integer from 01 through 12 that represents the month.

dd
is an integer from 01 through 31 that represents the day of the month.

yy or yyyy
is a two-digit or four-digit integer that represents the year.

You can separate the month, day, and year fields by blanks or by special characters. However, if you
use delimiters, place them between all fields in the value. Blanks can also be placed before and
after the date.
Note: SAS interprets a two-digit year as belonging to the 100-year span that is defined by the
YEARCUTOFF= system option.
*****************************************************************
From: Ya Huang on
>When I asked about an alternate INFORMAT I can trust to validate an
incoming date string, none was
>offered and I should consider using a character function (e.g., VERIFY) to
ensure the year-portion
>is intact and as expected. How unfortunate.

Not even for v9?