Tech Tip from JD Paquet, Principal Support Engineer, for Wednesday, May 5, 2016.
Special Thanks to Richard Williamsom for sharing the following tip:
A customer showed me a slow-running query that is doing a join of a table to itself using
WHERE T1.1ST_COLUMN_OF_KEY = T2.1ST_COLUMN_OF_KEY
AND DATE(T1.TIMESTAMP) = DATE(T2.TIMESTAMP)
AND TIME(T1.TIMESTAMP) = TIME(T2.TIMESTAMP)
The table has 28 MILLION rows, and the cardinality of the 1st column is about 1,000. With the DATE and TIME scalar expressions preventing those predicates to be passed to CBS in the RQA, and also preventing a merge join, each T1 row was scanning about 28,000 T2 rows. That’s 28,000,000 x 28,000 T2 rows read! No wonder it ran 111 minutes.
They cannot join on the whole Timestamp column because they are matching a successful security event with an unsuccessful event within the same second. Timestamp requires an exact match down to 1 microsecond.
Other common use cases might be to match on the same day, just using the DATE function.
Well, fortunately, the following syntax does matching much more efficiently:
WHERE SUBSTR(CAST(T1.TSCOL AS CHAR(10) WITHOUT CONVERSION),1,8) =
AND SUBSTR(CAST(T2.TSCOL AS CHAR(10) WITHOUT CONVERSION),1,8)
The CAST allows the SUBSTR to work. The DATE and TIME portion of a TIMESTAMP are in the first 8 bytes. (Use 4 for just the date.)
CAST does not prevent the predicate from being sent to CBS. SUBSTR, when starting at column 1, also does not prevent the predicate from being passed to CBS. A shorter compare is done in CBS.