Tuesday, November 24, 2009

Working with dates in sql server 2005


"Time is that great gift of nature which keeps everything from happening at once."--C.J. Overbeck(?)
All but the most rudimentary database applications store one or more expressions of time. They may record when an event occurred, its duration, the expected occurrence of a future event, or a combination of these. The values themselves are frequently less important than the calculations that depend on them: How much did we sell this month? How many hits did we get today? Is that hotel room available two weeks from now? How long before this asset is fully depreciated?
SQL Server has respectable facilities for storing, calculating, and reporting time and date information. If they do not meet your needs, the server is flexible enough to handle nearly any system you improvise.
This article won't discuss SQL Server's online analytical processing (OLAP) services. Regardless, these services almost always require time values obtained from an online transaction processing (OLTP) database. It will also omit the extensive application of time in administering SQL Server.
Out of the Box
SQL Server has four basic capabilities with regard to time values: Storage and retrieval; calculations and comparisons; formatted display; and conversion to and from other data types.
System data types
SQL Server provides two time data types: datetime and smalldatetime. They are distinguished by their range and precision, and corresponding storage requirements. Both reckon dates as the number of days either from or to January 1, 1900, the base date. Both reckon time as the period elapsed since midnight. Datetime and smalldatetime combine date and time into a single value--there is no distinct time data type in SQL Server. You can store only time values by omitting dates, or vice versa.
Datetime stores the range January 1, 1753, to December 31, 9999 with one three-hundredth of a second accuracy--it rounds to the nearest .000, .003, or .007 seconds. It requires eight bytes, four for the date, and four for the time
Smalldatetime covers the range January 1, 1900, through June 6, 2079 with one-minute resolution. It requires four bytes, two for the date, and two for the time.
CREATE TABLE #T(
id INT IDENTITY,
dt DATETIME NOT NULL DEFAULT GETDATE(),
sdt SMALLDATETIME NOT NULL DEFAULT GETDATE())
GO
 
INSERT #T VALUES(DEFAULT,DEFAULT)
INSERT #T VALUES('6/24/2001 7:05:01.544', '6/24/2001 7:05:01.544')
INSERT #T VALUES('6/24/2001', '6/24/2001')
INSERT #T VALUES('6/24/01', '6/24/01')
INSERT #T VALUES('6-24-2001','6-24-2001')
INSERT #T VALUES('20010624', '20010624') -- this is ISO/international format
INSERT #T VALUES('June 24, 2001', 'June 24, 2001')
INSERT #T VALUES('7:05:01.544', '7:05:01.544')
INSERT #T VALUES('7:05:01.544 PM', '7:05:01.544 PM')
GO
 
SELECT * FROM #T
 
On my 7.0 system, I got the following results:

Example 2:

id  dt                          sdt
--- --------------------------- ---------------------------
1   2001-06-24 07:36:56.910     2001-06-24 07:37:00
2   2001-06-24 07:05:01.543     2001-06-24 07:05:00
3   2001-06-24 00:00:00.000     2001-06-24 00:00:00
4   2001-06-24 00:00:00.000     2001-06-24 00:00:00
5   2001-06-24 00:00:00.000     2001-06-24 00:00:00
6   2001-06-24 00:00:00.000     2001-06-24 00:00:00
7   2001-06-24 00:00:00.000     2001-06-24 00:00:00
8   1900-01-01 07:05:01.543     1900-01-01 07:05:00
9   1900-01-01 19:05:01.543     1900-01-01 19:05:00
Note the difference in precision, and the effect of rounding. Also note the various means of populating the columns. The default value uses the GETDATE() function; the remaining rows use a variety of character strings enclosed in single quotes.
The server will transparently convert properly formatted strings into their correct datetime values. These examples do not cover the full scope of its capabilities. Be warned that formatting of string constants is very particular if the database must support multiple languages. See BOL for internationalization information. Also pay attention to the settings for DATEFORMAT and DATEFIRST.
With regard to two-digit years, the server uses the two digit year cutoff setting specified by sp_configure or the Enterprise Manager. The default is 2049--'6/24/49' is equivalent to '6/24/2049,' and '6/24/50' is equivalent to '6/24/1950.' There is little rationale for using two-digit years, and they should be avoided in the interest of clarity and consistency.
Conversion and Formatting
Implicit Conversion
SQL Server implicitly converts datetime and smalldatetime both among themselves and among the various char data types. I've already shown conversions from char. Things become interesting when mathematical operators are involved.
--This works fine
DECLARE @c CHAR(20)
SET @c = GETDATE()
SELECT 'The time is now ' + @c

-- This doesn't
SELECT 'The time is now ' + GETDATE()
The first example converts a datetime (smalldatetime , actually) to a char local variable. The server can then concatenate the variable with a string constant and announce the time.
The second example fails because the server considers the overloaded plus sign an addition operator, not a concatenation operator, and is trying to add two incompatible types. The way around this is to use the CAST function to convert the datetime to a string.
SELECT 'The time is now ' + CAST(GETDATE() AS CHAR)
CAST
CAST has two uses as regards dates: It explicitly converts converts the date data types to and from most of the other system data types, and it changes the output's length, usually to truncate it.
SELECT 'Today is ' + CAST(GETDATE() AS CHAR(11))
You can also use CAST to parse a text string into a time datatype.
DECLARE @c CHAR(20), @d DATETIME
SET @c = '06242001073656910'
SET @d = CAST(SUBSTRING(@c,5,4) +
         + SUBSTRING(@c,1,4) + ' '
         + SUBSTRING(@c,9,2) + ":"
         + SUBSTRING(@c,11,2) + ":"
         + SUBSTRING(@c,13,2) + "."
         + SUBSTRING(@c,15,3)
         AS DATETIME)
SELECT @d
CAST has minimal formatting capability and isn't supported prior to 7.0. Unless you need ANSI compliance, consider using the CONVERT function instead.
CONVERT
CONVERT provides a number of styles corresponding to prevailing date format of each of a number of countries. Most styles allow either two or four digit years.
--CONVERT (data_type[(length)],expression [, style])
--See BOL for list of styles
DECLARE @d DATETIME
SET @d = '20010624 21:48'

SELECT NULL, CONVERT(VARCHAR,@d)
SELECT 1, CONVERT(VARCHAR,@d,1)
SELECT 101, CONVERT(VARCHAR,@d,101)
SELECT 2, CONVERT(VARCHAR,@d,2)
SELECT 104, CONVERT(VARCHAR,@d,104)
SELECT 108, CONVERT(VARCHAR,@d,108) -- time only
SELECT 112, CONVERT(VARCHAR,@d,112)
SELECT 121, CONVERT(VARCHAR,@d,121)
This gives the following output on my system:
NULL       Jun 24 2001 9:48PM
1          06/24/01
101        06/24/2001
2          01.06.24
104        24.06.2001
108        21:48:00
112        20010624
121        2001-06-24 21:48:00.000
Like CAST, CONVERT can truncate values if desired. Just use the appropriate data type; i.e., CHAR(4), and style. You can also nest CONVERT. The following strips the time off a datetime and converts it back to datetime.
SELECT CONVERT(DATETIME,CONVERT(CHAR(8),GETDATE(),112))
To put it mildly, there are large number of ways to handle conversion and formatting. You do an implicit or explicit conversion, and then use one or more functions that handle that data type. You'll find some combinations run much faster than others. For instance, the above example could be rewritten as follows. The rewrite runs more than twice as quickly on my single user system.
SELECT CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE())))
Date Functions
SQL Server has a good collection of specialized functions for returning parts of dates. Most are also useful in calculations. The functions are DATENAME, DATEPART, DAY, MONTH, and YEAR. DATENAME returns a character string; the rest return integers. See the BOL for a full list of supported date parts and their abbreviations.
DECLARE @c DATETIME
SET @c = '20010624 07:36:56.910'
SELECT 'This is Week ' + DATENAME(WEEK, @c) + ' of FY' + DATENAME(YEAR, @c)
SELECT DATEPART(QUARTER, @c) AS QUARTER
SELECT DATEPART(YEAR, @c) AS YEAR,
       DATEPART(MONTH, @c) AS MONTH,
       DATEPART(DAY, @c) AS DAY
SELECT YEAR(@c) AS YEAR, MONTH(@c) AS MONTH, DAY(@c) AS DAY
This is Week 26 of FY2001

QUARTER
-----------
2

YEAR        MONTH       DAY
----------- ----------- -----------
2001        6           24

YEAR        MONTH       DAY
----------- ----------- -----------
2001        6           24
A few additional notes about conversion and formatting. Watch for unintentional loss of precision converting among data types. The effect will probably be subtle.
Datetime and smalldatetime have the highest precedence of all the data types. If you mix them with other data types using an operator, the server will attempt implicit conversion of the lower precedence data types to datetime or smalldatetime as required. If it can't do the conversion, it returns an error message. See the initial discussion of CAST.
With regard to formatting, if you intend to import date values into another program, keep that program's requirements in mind. In many cases, that means no formatting at all--just export the data as datetime or smalldatetime values, by casting if necessary. Otherwise, the program may treat the "dates" as nothing more than text and fail to sort or group them properly. You'll save yourself much grief with Excel pivot tables, among other things.
Calculations and Comparisons
Basic Arithmetic
SQL Server can handle addition and subtraction of time without resorting to functions, if you want. Use whichever is clearer.
-- Add 1.5 days to current time
SELECT GETDATE() AS Start, GETDATE()+1.5 AS 'Estimated Finish'
-- Done all your shopping yet?
SELECT CEILING(CAST('20011225'-GETDATE() AS FLOAT)) AS 'Shopping days left'
Functions
There are two workhorse time functions, DATEADD and DATEDIFF. As their names suggest, they add (or subtract) periods to a date, or return the difference between two dates, respectively.
DATEADD
DATEADD adds the specified interval to a date and returns a datetime or smalldatetime. You can use any of the usual collection of date parts.
DECLARE @C SMALLDATETIME

SET @c = '20010624'

SELECT @C AS 'Start'

SELECT DATEADD(S,172800,@c) AS '+172800 seconds',

       DATEADD(HH,48,@c) AS '+48 hours',

       DATEADD(D,2,@c) AS '+2 days'

SELECT DATEADD(WK,-3,@c)as '-3 weeks'

SELECT DATEADD(M,1,CONVERT(CHAR(8),@c,112))-DAY(@c) as 'End of Month'

SELECT DATEADD(YY,1,'20000229') AS '1 yr. after Feb. 29, 2000'

-- Constants are evaluated at runtime. The following won't work.

SET ARITHABORT OFF -- Set this to ON and run again

SELECT DATEADD(D,1,'20010229') AS '1 day after Feb. 29, 2001'

Start
---------------------------
2001-06-24 00:00:00

+172800 seconds             +48 hours                   +2 days
--------------------------- --------------------------- ---------------------------
2001-06-26 00:00:00         2001-06-26 00:00:00         2001-06-26 00:00:00

-3 weeks
---------------------------
2001-06-03 00:00:00

End of Month
---------------------------
2001-06-30 00:00:00.000

1 yr. after Feb. 29, 2000
---------------------------
2001-02-28 00:00:00.000

1 day. after Feb. 29, 2001
---------------------------
NULL

Arithmetic overflow occurred.
An interesting use for DATEADD is converting Julian dates to Gregorian ones. Strictly speaking, a Julian date is the number of days since noon, Universal Time on January 1, 4713 BCE. Some applications use Julian dates, but with a different base date. My employer's accounting system uses 1 A.D. as the base. December 31, 2000 was 730485. Since the dates we work with are within datetime's range, DATEADD can do the conversion.
DECLARE @j INT
SET @j = 730636
SELECT @j AS 'Julian', DATEADD(D,@j-730485,'12/31/2000') AS 'Gregorian'
Julian      Gregorian
----------- ---------------------------
730636      2001-05-31 00:00:00.000
DATEADD isn't useful for business days. It doesn't know about weekends or holidays. Rather than going through gyrations to avoid Sunday and New Years, many applications build lookup tables in advance and reference them instead.
DATEDIFF
DATEDIFF returns an integer representing the number of date parts between two dates. If the start date falls after the end date, the result is negative. Note that it only counts boundaries. If you ask it how many years elapsed between December 31, 1999, and January 1, 2001, it will reply, "2." To rehash some prior examples:
SELECT DATEDIFF(D,GETDATE(),'20011225') AS 'Shopping days left'

-- Convert from Gregorian to Julian. Must know base date and be in range.

SELECT DATEDIFF(D,'12/31/2000',GETDATE())+730485 as 'Julian'
The big gotcha with DATEDIFF is date boundaries. Even calculating age isn't straightforward.
DECLARE @BIRTH SMALLDATETIME, @AGEAT SMALLDATETIME
SELECT @BIRTH = '12/10/1990', @AGEAT = '6/24/2001'
-- This is the wrong way
SELECT DATEDIFF(YY,@BIRTH,@AGEAT) AS 'Age using straight DATEDIFF'

-- This is one correct way
SELECT DATEDIFF(YY,@BIRTH,@AGEAT) -
    CASE WHEN (MONTH(@BIRTH) = MONTH(@AGEAT) AND DAY(@BIRTH) > DAY(@AGEAT)
    OR MONTH(@BIRTH) > MONTH(@AGEAT))
    THEN 1 ELSE 0 END AS 'Age correcting DATEDIFF'

Age using straight DATEDIFF
---------------------------
11

Age correcting DATEDIFF
-----------------------
10
DATEDIFF increments whenever the year, or whatever date part you specified, changes. This could be useful for calculating how many model years old a vehicle is, but not helpful when you're trying to calculate anniversaries. It can trip up experts, as this discussion shows. (If the link breaks, look for a thread called "Finding anniversary dates - Ken Henderson Book" on Devdex, Google, etc.) Interestingly, not only is Henderson's query wrong, so is the "corrected" one propounded. See if you can understand how the second query works, and why it erroneously omits January rows once the system date reaches December 2.
Comparisons
SELECT statements to retrieve rows based on dates are straightforward. The main concerns are string constants, which were previously discussed, handling combined date and time values, and query optimization. Following is a simple table and some queries against it.
CREATE TABLE work_study(
id INT IDENTITY,
r_type CHAR(1) NOT NULL,
r_time SMALLDATETIME NOT NULL)
GO
INSERT work_study VALUES('A','1/5/2000')
INSERT work_study VALUES('B','1/5/2000 14:10')
INSERT work_study VALUES('B','1/5/2000 14:20')
INSERT work_study VALUES('A','2/8/2000')
INSERT work_study VALUES('B','2/8/2000 6:05')
INSERT work_study VALUES('B','2/8/2000 6:07')
INSERT work_study VALUES('A','4/10/2001')
INSERT work_study VALUES('B','4/10/2001 15:21')
INSERT work_study VALUES('B','4/10/2001 15:27')

SELECT COUNT(*) as '= April 10, 2001' FROM work_study WHERE r_time = '4/10/2001'

SELECT * FROM work_study WHERE r_time = '4/10/2001'

SELECT COUNT(*) as 'All April 10, 2001'
FROM work_study
WHERE r_time BETWEEN '4/10/2001' AND '4/10/2001 23:59:59'

SELECT COUNT(*) AS 'FY2000 Rows'
FROM work_study
WHERE YEAR(r_time) = 2000 and r_type = 'B'

--Use like to extract times
SELECT r_time as 'Like 6:05' FROM work_study WHERE r_time LIKE '%6:05%'

SELECT DATENAME(DW,r_time) as 'Weekday', COUNT(*) as 'Count'
FROM work_study
WHERE r_type = 'B'
GROUP BY DATENAME(DW,r_time)

--Crosstabs: See http://support.microsoft.com/support/kb/articles/Q175/5/74.ASP
SELECT SUM(CASE WHEN YEAR(r_time) = 2000 THEN 1 ELSE 0 END) AS '2000 Rows',
SUM(CASE WHEN YEAR(r_time) = 2001 THEN 1 ELSE 0 END) AS '2001 Rows'
FROM work_study
WHERE r_type = 'B'

= April 10, 2001
----------------
1

id          r_type r_time
----------- ------ ---------------------------
7           A      2001-04-10 00:00:00

All April 10, 2001
------------------
3

FY2000 Rows
-----------
4

Like 6:05
---------------------------
2000-02-08 06:05:00

Weekday                        Count
------------------------------ -----------
Tuesday                        4
Wednesday                      2

2000 Rows   2001 Rows
----------- -----------
4           2
Some things to note:
  1. Unless you never store time values with your dates, you have to make provisions for extracting all rows for the specified date.
  2. String constants must be correctly formatted for your locale or you will either get errors or strange results. Using YYYYMMDD should be foolproof.
  3. You can use LIKE to extract times. See BOL.
  4. You can use all of the usual date functions to do selection, sorting, and grouping.
One potential big issue with date functions is performance. Date columns should be indexed if date will used as a search argument. Unfortunately, the server won't use the index, except perhaps for index scans, once you use a function. Rewriting queries to avoid functions can lead to cumbersome code, but it's worth it if the query runs often. In extreme cases, splitting the date parts into separate, indexed columns, is beneficial.
-- Don't do this often.

SELECT COUNT(*)

FROM employees

WHERE YEAR(hire_date) = 1998




-- Do this instead.

SELECT COUNT(*)

FROM employees

WHERE hire_date BETWEEN '19980101' AND '19981231'




-- Same thing here. Try to avoid this.

SELECT COUNT(*)

FROM employees

WHERE MONTH(hire_date) = 11




-- If it's not too long, this is very effective.

SELECT COUNT(*)

FROM employees

WHERE hire_date BETWEEN '19901101' AND '19901130'

OR hire_date BETWEEN '19911101' AND '19911130'

OR hire_date BETWEEN '19921101' AND '19921130'

OR hire_date BETWEEN '19931101' AND '19931130'

OR hire_date BETWEEN '19941101' AND '19941130'

OR hire_date BETWEEN '19951101' AND '19951130'

OR hire_date BETWEEN '19961101' AND '19961130'

OR hire_date BETWEEN '19971101' AND '19971130'

OR hire_date BETWEEN '19981101' AND '19981130'

OR hire_date BETWEEN '19991101' AND '19991130'

OR hire_date BETWEEN '20001101' AND '20001130'

OR hire_date BETWEEN '20011101' AND '20011130'
I experimented while writing this article and got a consistent six-fold improvement using constants instead of functions. Your mileage may vary. The difference widened a little as the number of rows increased. Don't give up on functions, though. Queries on indexed columns should run quickly, no matter what approach you take. It's when you run the query 1,000 times per day that the milliseconds add up.
Miscellany
Keys
Dates are probably a bad idea for keys. You're betting heavily that you'll never need to insert two or more identical values into a column. Granted, datetime's resolution is 1/300th of a second, but why take the bet?
Constraints
Other than keys, constraints are particularly useful with date columns. Default constraints can record the time when each row is inserted. Check constraints can meet business needs, such as prohibiting time values in a column. That way, a hire date can never be '7/2/2001 10:00.' They can meet legal needs, such as prohibiting transactions on Sunday.
Summary
Almost any non-trivial database requires working with date or time values. SQL Server has a good collection of tools for manipulating them; however, with this power, comes complexity. The developer should strive to understand the server's approach to dates so they may get the results and performance they need.


No comments:

Post a Comment