There is a new item (from the subscriber) in my collection of non-DB SQL tools at github:
https://explainextended.com/2023/12/31/happy-new-year-15/
GPT in 500 lines of SQL
https://explainextended.com/2023/12/31/happy-new-year-15/
GPT in 500 lines of SQL
SNOWFLAKE SPECIAL BEHAVIOR: REUSING COLUMN ALIASES
In any database, you can use aliases for both columns and expressions within queries:
But what can we do with these aliases?
Database vendors allow different things. For example, in MySQL and PostgreSQL, aliases can only be used in GROUP BY and ORDER BY.
In ClickHouse and Snowflake, however, aliases can be used everywhere. But, as usual, there are nuances 🙂
Let's create syntactically identical VIEWs:
At first glance, the lineage for these views should be the same
But in v14 both columns will depend on ABC.A, and in v13 only column B.
Why is this so?
Because in Snowflake, when using an alias equivalent to the original column name from the source, the original column takes precedence!
By the way, in ClickHouse, it works differently…
What was meant by saying that aliases work everywhere?
Aliases can be reused in JOIN and WHERE clauses:
or even like this:
Since in dwh.dev we display not only the data flows but also the columns used in JOIN and WHERE, you will also see the original column sources in the corresponding data lineage section.
PS: Thumbs Up Here
PPS: https://github.com/dwh-dev/data-lineage-challenge
In any database, you can use aliases for both columns and expressions within queries:
SELECT
id AS user_id,
name AS user_name,
age AS user_age,
age * 2 AS user_age_doubled
FROM users;
But what can we do with these aliases?
Database vendors allow different things. For example, in MySQL and PostgreSQL, aliases can only be used in GROUP BY and ORDER BY.
In ClickHouse and Snowflake, however, aliases can be used everywhere. But, as usual, there are nuances 🙂
Let's create syntactically identical VIEWs:
CREATE TABLE abc AS
SELECT
1 AS a,
100 AS b,
1000 AS c
;
CREATE VIEW v13 AS
SELECT
a+1 AS b,
b+1 AS c
FROM abc;
CREATE VIEW v14 AS
SELECT
a+1 AS d,
d+1 AS e
FROM abc;
At first glance, the lineage for these views should be the same
But in v14 both columns will depend on ABC.A, and in v13 only column B.
Why is this so?
Because in Snowflake, when using an alias equivalent to the original column name from the source, the original column takes precedence!
By the way, in ClickHouse, it works differently…
What was meant by saying that aliases work everywhere?
Aliases can be reused in JOIN and WHERE clauses:
CREATE TABLE t12 AS SELECT 1 AS a, 100 AS b;
CREATE TABLE t13 AS SELECT 1 AS c, 100 AS d;
CREATE VIEW v15 AS
SELECT
a + b AS e,
c + d AS f
FROM
t12
JOIN t13 ON e = f;
or even like this:
CREATE VIEW v16 AS
SELECT
a1 + b1 AS e,
c1 + d1 AS f,
ROUND(e, f) AS g,
ROUND(f, e) AS h
FROM
t12 AS t121(a1, b1)
JOIN t13 AS t131(c1, d1) ON e = f
WHERE
g = h;
Since in dwh.dev we display not only the data flows but also the columns used in JOIN and WHERE, you will also see the original column sources in the corresponding data lineage section.
PS: Thumbs Up Here
PPS: https://github.com/dwh-dev/data-lineage-challenge
SNOWFLAKE: SELECT * ILIKE EXCLUDE REPLACE RENAME
Database vendors are competing to see who can come up with the most features for SELECT *.
Snowflake is not lagging behind and supports as many as 4 modifiers for SELECT *:
ILIKE EXCLUDE REPLACE RENAME
We all know that SELECT * is bad. Now it's 4 times worse :)
Ok, 3 times. You can't use them all together (either ILIKE or EXCLUDE).
But the most disgusting modifier is REPLACE. It allows you to replace one column with any expression. Good luck debugging, dudes :)
What happens to lineage when using these modifiers?
A few examples are collected here
Take a look at one of them:
The result of dwh.dev is on the screenshot.
Thumbs up here!
PS: Just a reminder about our data lineage challenge
Database vendors are competing to see who can come up with the most features for SELECT *.
Snowflake is not lagging behind and supports as many as 4 modifiers for SELECT *:
ILIKE EXCLUDE REPLACE RENAME
We all know that SELECT * is bad. Now it's 4 times worse :)
Ok, 3 times. You can't use them all together (either ILIKE or EXCLUDE).
But the most disgusting modifier is REPLACE. It allows you to replace one column with any expression. Good luck debugging, dudes :)
What happens to lineage when using these modifiers?
A few examples are collected here
Take a look at one of them:
CREATE TABLE t19(
id INT,
c1 BOOLEAN,
c2 BOOLEAN,
c3 BOOLEAN,
c4 BOOLEAN,
c12c BOOLEAN
);
CREATE TABLE t20(
c5 BOOLEAN,
c6 BOOLEAN
);
CREATE VIEW v20 AS
SELECT
*
ILIKE 'c%'
REPLACE (a.c1 OR b.$2 AS c1)
RENAME c1 AS c0
FROM t19 a, t20 b
;
The result of dwh.dev is on the screenshot.
Thumbs up here!
PS: Just a reminder about our data lineage challenge
There will be no singularity
https://github-contributions.vercel.app
Forgot to post the year in review :)
SQL-WTF TIL (Snowflake edition)
WHAT FIELDS WILL I SEE AS A RESULT OF A SELECT QUERY THAT RETRIEVES MULTIPLE FIELDS WITH IDENTICAL NAMES?
Hold on before answering; it's not all that obvious 🙂
Let me explain what the answer depends on:
The type of client from which the query is made.
- Whether we are using USING or not.
- Whether there is data in the sources or not.
- Whether we are looking directly at the result of the executed query or at the result from the history using RESULT_SCAN.
Cool, right? Now, I'll briefly explain each point 🙂
Client Type
Database client creators love adding spices to the work process of data engineers 🙂 All examples discussed were executed in Snowsight (in SnowSQL, the result will be different). To be fair, it's not always clear which part of the described chaos is attributed to the client and which to the database itself. But we won't delve into that now…
The creators decided that when more than one column with the same name is encountered in the result, it would be a good idea to rename them! A, A1, A2… The idea is great, but how do I now understand if these are actually existing columns or not?
In reality, such columns do not exist. You cannot refer to them in SQL.
USING
USING is syntactic sugar to shorten the ON clause. But it affects not only ON, which might seem logical, but also the list of columns.
It is implied that the result should be identical to:
and we should get A, B, A, B (or the renamed version described above).
But no! USING removes one column A. Really, why do you need 2 identical columns that will break everything for you later? So, it should be A, B, B.
Everything is fine here. But what if we continue to use JOIN with USING?
Do you think that each time another column B will be added? Nope! 🙂 All subsequent JOIN USING with identically named columns will simply be ignored! Moreover, this will happen until a JOIN with ON is encountered!
will return A B B2 A2 B_3
Empty and Non-Empty Tables
Yes, Snowflake (Snowsight) renames columns differently depending on whether there is data in the tables or not!
RESULT_SCAN
This function returns the result of a query by its UUID in QUERY_HISTORY:
But now, A1 and B1 are actual columns!
More examples are here: Github
Thumbs up here: Linkedin
WHAT FIELDS WILL I SEE AS A RESULT OF A SELECT QUERY THAT RETRIEVES MULTIPLE FIELDS WITH IDENTICAL NAMES?
Hold on before answering; it's not all that obvious 🙂
Let me explain what the answer depends on:
The type of client from which the query is made.
- Whether we are using USING or not.
- Whether there is data in the sources or not.
- Whether we are looking directly at the result of the executed query or at the result from the history using RESULT_SCAN.
Cool, right? Now, I'll briefly explain each point 🙂
Client Type
Database client creators love adding spices to the work process of data engineers 🙂 All examples discussed were executed in Snowsight (in SnowSQL, the result will be different). To be fair, it's not always clear which part of the described chaos is attributed to the client and which to the database itself. But we won't delve into that now…
The creators decided that when more than one column with the same name is encountered in the result, it would be a good idea to rename them! A, A1, A2… The idea is great, but how do I now understand if these are actually existing columns or not?
In reality, such columns do not exist. You cannot refer to them in SQL.
USING
USING is syntactic sugar to shorten the ON clause. But it affects not only ON, which might seem logical, but also the list of columns.
CREATE TABLE T1 (A INT, B INT);
SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A);
It is implied that the result should be identical to:
SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 ON T1.A = T1_1.A
and we should get A, B, A, B (or the renamed version described above).
But no! USING removes one column A. Really, why do you need 2 identical columns that will break everything for you later? So, it should be A, B, B.
Everything is fine here. But what if we continue to use JOIN with USING?
SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A)
LEFT JOIN T1 as T1_2 USING(A)
...
Do you think that each time another column B will be added? Nope! 🙂 All subsequent JOIN USING with identically named columns will simply be ignored! Moreover, this will happen until a JOIN with ON is encountered!
SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A)
LEFT JOIN T1 as T1_2 USING(A)
LEFT JOIN T1 as T1_3 ON T1.A = T1_3.A
will return A B B2 A2 B_3
Empty and Non-Empty Tables
Yes, Snowflake (Snowsight) renames columns differently depending on whether there is data in the tables or not!
-- EMPTY TABLES
SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A_2 B_2
-- TABLE T1 IS NOT EMPTY
SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A B
RESULT_SCAN
This function returns the result of a query by its UUID in QUERY_HISTORY:
SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A_2 B_2
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));
-- A B A_1 B_1
But now, A1 and B1 are actual columns!
SELECT A_1 FROM
(
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
);
-- OK
More examples are here: Github
Thumbs up here: Linkedin
Media is too big
VIEW IN TELEGRAM
We at dwh.dev decided to make some vertical videos (how do you do, fellow kids?).
What do you think?
Like it on YouTube, please.
What do you think?
Like it on YouTube, please.
There will be no singularity
Final stop! Posgresql. WASM. Browser. https://www.crunchydata.com/blog/learn-postgres-at-the-playground
GitHub
GitHub - electric-sql/pglite: Lightweight Postgres packaged as WASM into a TypeScript library for the browser, Node.js, Bun and…
Lightweight Postgres packaged as WASM into a TypeScript library for the browser, Node.js, Bun and Deno - electric-sql/pglite
You are here:
TypeScript + SQL-based OPERATING SYSTEM by Mike Stonebraker
https://www.dbos.dev/blog/announcing-dbos
TypeScript + SQL-based OPERATING SYSTEM by Mike Stonebraker
https://www.dbos.dev/blog/announcing-dbos
www.dbos.dev
Hello DBOS - Announcing DBOS Cloud
Introducing transactional serverless computing for TypeScript, enabled by a revolutionary new operating system that puts a distributed DBMS in the kernel of the OS.
2024 MAD (Machine Learning, AI & Data) Landscape
https://mattturck.com/mad2024/
PDF: https://mattturck.com/landscape/mad2024.pdf
https://mattturck.com/mad2024/
PDF: https://mattturck.com/landscape/mad2024.pdf
Matt Turck
Full Steam Ahead: The 2024 MAD (Machine Learning, AI & Data) Landscape
This is our tenth annual landscape and “state of the union” of the data, analytics, machine learning and AI ecosystem.
In 10+ years covering the space, things have never been as exciting and promising as they are today. All trends and subtrends we described…
In 10+ years covering the space, things have never been as exciting and promising as they are today. All trends and subtrends we described…