There is a new item (from the subscriber) in my collection of non-DB SQL tools at github:

https://explainextended.com/2023/12/31/happy-new-year-15/
GPT in 500 lines of SQL
SNOWFLAKE SPECIAL BEHAVIOR: REUSING COLUMN ALIASES

In any database, you can use aliases for both columns and expressions within queries:





SELECT
id AS user_id,
name AS user_name,
age AS user_age,
age * 2 AS user_age_doubled
FROM users;


But what can we do with these aliases?
Database vendors allow different things. For example, in MySQL and PostgreSQL, aliases can only be used in GROUP BY and ORDER BY.
In ClickHouse and Snowflake, however, aliases can be used everywhere. But, as usual, there are nuances 🙂

Let's create syntactically identical VIEWs:





CREATE TABLE abc AS
SELECT
1 AS a,
100 AS b,
1000 AS c
;

CREATE VIEW v13 AS
SELECT
a+1 AS b,
b+1 AS c
FROM abc;

CREATE VIEW v14 AS
SELECT
a+1 AS d,
d+1 AS e
FROM abc;


At first glance, the lineage for these views should be the same
But in v14 both columns will depend on ABC.A, and in v13 only column B.

Why is this so?
Because in Snowflake, when using an alias equivalent to the original column name from the source, the original column takes precedence!

By the way, in ClickHouse, it works differently…

What was meant by saying that aliases work everywhere?
Aliases can be reused in JOIN and WHERE clauses:





CREATE TABLE t12 AS SELECT 1 AS a, 100 AS b;
CREATE TABLE t13 AS SELECT 1 AS c, 100 AS d;

CREATE VIEW v15 AS
SELECT
a + b AS e,
c + d AS f
FROM
t12
JOIN t13 ON e = f;


or even like this:





CREATE VIEW v16 AS
SELECT
a1 + b1 AS e,
c1 + d1 AS f,
ROUND(e, f) AS g,
ROUND(f, e) AS h
FROM
t12 AS t121(a1, b1)
JOIN t13 AS t131(c1, d1) ON e = f
WHERE
g = h;

Since in dwh.dev we display not only the data flows but also the columns used in JOIN and WHERE, you will also see the original column sources in the corresponding data lineage section.

PS: Thumbs Up Here
PPS: https://github.com/dwh-dev/data-lineage-challenge
SNOWFLAKE: SELECT * ILIKE EXCLUDE REPLACE RENAME

Database vendors are competing to see who can come up with the most features for SELECT *.
Snowflake is not lagging behind and supports as many as 4 modifiers for SELECT *:
ILIKE EXCLUDE REPLACE RENAME

We all know that SELECT * is bad. Now it's 4 times worse :)
Ok, 3 times. You can't use them all together (either ILIKE or EXCLUDE).

But the most disgusting modifier is REPLACE. It allows you to replace one column with any expression. Good luck debugging, dudes :)

What happens to lineage when using these modifiers?
A few examples are collected here

Take a look at one of them:


CREATE TABLE t19(
id INT,
c1 BOOLEAN,
c2 BOOLEAN,
c3 BOOLEAN,
c4 BOOLEAN,
c12c BOOLEAN
);
CREATE TABLE t20(
c5 BOOLEAN,
c6 BOOLEAN
);

CREATE VIEW v20 AS
SELECT
*
ILIKE 'c%'
REPLACE (a.c1 OR b.$2 AS c1)
RENAME c1 AS c0
FROM t19 a, t20 b
;


The result of dwh.dev is on the screenshot.

Thumbs up here!
PS: Just a reminder about our data lineage challenge
SQL-WTF TIL (Snowflake edition)

WHAT FIELDS WILL I SEE AS A RESULT OF A SELECT QUERY THAT RETRIEVES MULTIPLE FIELDS WITH IDENTICAL NAMES?

Hold on before answering; it's not all that obvious 🙂

Let me explain what the answer depends on:

The type of client from which the query is made.
- Whether we are using USING or not.
- Whether there is data in the sources or not.
- Whether we are looking directly at the result of the executed query or at the result from the history using RESULT_SCAN.

Cool, right? Now, I'll briefly explain each point 🙂

Client Type
Database client creators love adding spices to the work process of data engineers 🙂 All examples discussed were executed in Snowsight (in SnowSQL, the result will be different). To be fair, it's not always clear which part of the described chaos is attributed to the client and which to the database itself. But we won't delve into that now…

The creators decided that when more than one column with the same name is encountered in the result, it would be a good idea to rename them! A, A1, A2… The idea is great, but how do I now understand if these are actually existing columns or not?

In reality, such columns do not exist. You cannot refer to them in SQL.

USING
USING is syntactic sugar to shorten the ON clause. But it affects not only ON, which might seem logical, but also the list of columns.

CREATE TABLE T1 (A INT, B INT);
SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A);

It is implied that the result should be identical to:

SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 ON T1.A = T1_1.A

and we should get A, B, A, B (or the renamed version described above).

But no! USING removes one column A. Really, why do you need 2 identical columns that will break everything for you later? So, it should be A, B, B.

Everything is fine here. But what if we continue to use JOIN with USING?

SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A)
LEFT JOIN T1 as T1_2 USING(A)
...

Do you think that each time another column B will be added? Nope! 🙂 All subsequent JOIN USING with identically named columns will simply be ignored! Moreover, this will happen until a JOIN with ON is encountered!


SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A)
LEFT JOIN T1 as T1_2 USING(A)
LEFT JOIN T1 as T1_3 ON T1.A = T1_3.A

will return A B B2 A2 B_3

Empty and Non-Empty Tables
Yes, Snowflake (Snowsight) renames columns differently depending on whether there is data in the tables or not!


-- EMPTY TABLES
SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A_2 B_2

-- TABLE T1 IS NOT EMPTY
SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A B


RESULT_SCAN
This function returns the result of a query by its UUID in QUERY_HISTORY:


SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A_2 B_2

SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));
-- A B A_1 B_1


But now, A1 and B1 are actual columns!


SELECT A_1 FROM
(
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
);
-- OK


More examples are here: Github
Thumbs up here: Linkedin
Media is too big
VIEW IN TELEGRAM
We at dwh.dev decided to make some vertical videos (how do you do, fellow kids?).
What do you think?
Like it on YouTube, please.
2024/05/03 08:20:31
Back to Top
HTML Embed Code: