Telegram Web Link
Lads, if you use Snowflake or other analytical databases, please like this post!

Maybe you remember, I used to make a service for static analysis of PostgreSQL (holistic.dev) and posted stuff about PG.

For quite some time now, I've been doing an observability platform for Snowflake (dwh.dev) with my team, and I'm fully immersed in this area.

And would love to understand how interested you guys are in this at all.
Is it worth sharing details about creating such software? Or is it necessary to bring back SQL-WTF about PostgreSQL?
In short, tell us what you would like to see.

PS: Yes, by the way, we have Snowflake chat rooms (RU/EN) and a channel with Snowflake news.
There will be no singularity
Days without explanation JOINs via Venn diagrams by data influencers: 0 https://blog.jooq.org/say-no-to-venn-diagrams-when-explaining-joins/
A bit of explanation in my own words:

1) Venn diagrams describe interactions between sets. However, when we perform a JOIN, we don't get a single set but a set of sets (a table). Therefore, Venn diagrams aren't suitable for describing what actually happens after a JOIN.

2) In such illustrations (even if supplemented with tables), usually only one, almost the rarest case is depicted: unique values in both columns and an equality condition for the join. There are no repetitions, no NULLs, and no complex join conditions.

Previously, I created a static analyzer for PostgreSQL - holistic.dev. Besides type inference, I wanted to infer two more important parameters: constraints and the row count class. If we have information about unique indexes and foreign keys, we can infer the uniqueness of values, NULLability, and the row count class.

What are these classes? NONE, ONE, ONE_OR_NONE, MANY, MANY_OR_NONE.
Why are they needed? For instance, automatic asserts in code can be created. These exist in the Node.js driver. Or if you store your SQL in separate files, automatic checks can be made to ensure nothing broke during query or schema updates and to detect Row Explosion.

So, outputting these classes and constraints were the most complex parts of the entire compiler :)
All because of JOIN. Maybe that's why my eye still twitches when I see such explainers...

People might look at this and then receive bills for 50 million...

Conclusion: Venn diagrams are suitable for explaining UNION/INTERSECT/EXCEPT. For JOIN, this picture by Andreas Martinson is more suitable.
Channel name was changed to «There will be no singularity»
Don't be alarmed. It's time to rewrite the name of the channel in English 🙂
New week, new "special behavior" in Snowflake 🙂
In Snowflake, there is a special function IDENTIFIER() that allows referencing objects using strings and session variables:


SELECT
*
FROM
IDENTIFIER($my_table)


Documentation can be found here: https://docs.snowflake.com/en/sql-reference/identifier-literal

Additionally, you can reference an object in the column list:


SELECT
IDENTIFIER($my_table).*
FROM
IDENTIFIER($my_table)


Wait, that's not the "special behavior" yet...

1)

IDENTIFIER($my_table).*

works, but

IDENTIFIER($my_table).column_name

does not

This looks very strange... But until you try this:

2)

SELECT
IDENTIFIER($my_table1).*,
IDENTIFIER($my_table2).*
FROM
IDENTIFIER($my_table1),
IDENTIFIER($my_table2)

... and you'll get twice of all the columns 🙂

Because IDENTIFIER(whats_ever).* gets replaced by just *!

As a result, the above query is converted to


SELECT
*,
*
FROM
IDENTIFIER($my_table1),
IDENTIFIER($my_table2);


PS: send your reactions here 🙂
https://jepsen.io/analyses/mysql-8.0.34

Reason for not using MySQL #1234:
MySQL’s “Repeatable Read” does not satisfy PL-2.99 Repeatable Read
In the first days of the new year, the dwh.dev team has prepared a bit more fun for you :)

Do you use data lineage tools in your work?
How about checking their quality?
Finding out how comprehensive and accurate your data observability is in reality, upon which you rely?

We're launching a new initiative - the Data Lineage Challenge.
This is a set of examples specific to various databases that will help you assess the quality of your data lineage tool:
https://github.com/dwh-dev/data-lineage-challenge

As dwh.dev specializes in Snowflake, the first set of examples will be dedicated to it.

We'll be updating this repository with examples for Snowflake as well as other databases. Subscribe not to miss out!

Feel free to send examples of the "special behavior" of the databases you work with, and we'll add them to this repository.

Additionally, we'd love to see the results of the data lineage tools you're currently using. Hopefully, this will assist their developers in making these tools even better 🙂

PS: Also, dwh.dev provides an API for SQL parsing and compilation. Contact us to get demo access. We collaborate with different data companies (even our competitors).

PPS: support us on linkedin
SQL-WTF SQL-TIL

It would seem that what can happen with escaping characters in strings?
Everything has been known for a long time and works the same everywhere...

But what happens if you escape characters that don't need to be escaped?








CREATE TABLE vals(s VARCHAR(5));
INSERT INTO vals(s) VALUES('%');
INSERT INTO vals(s) VALUES('\%');
INSERT INTO vals(s) VALUES('\\%');

SELECT
s,
s LIKE '%' as "%",
s LIKE '\%' as "\%",
s LIKE '\\%' as "\\%"
FROM vals
ORDER BY s;


Now let's run in different databases. Note the values of S.

Snowflake:

| S | % | \% | \\% |
|----|------|------|-------|
| % | TRUE | TRUE | FALSE |
| % | TRUE | TRUE | FALSE |
| \% | TRUE | TRUE | TRUE |


Sqlite:

| s | % | \% | \\% |
|-----|---|----|-----|
| % | 1 | 0 | 0 |
| \% | 1 | 1 | 0 |
| \\% | 1 | 1 | 1 |


Mysql/MariaDB/ClickHouse:

| s | % | \% | \% |
|----|---|----|----|
| % | 1 | 1 | 1 |
| \% | 1 | 0 | 0 |
| \% | 1 | 0 | 0 |


PostgreSQL:

| s | % | \% | \\% |
|-----|------|-------|-------|
| % | true | true | false |
| \% | true | false | true |
| \\% | true | false | true |


Oracle, BigQuery, Databricks and SQL Server don't support this syntax.

So, well... Good thing at least '%' works the same everywhere.....
There is a new item (from the subscriber) in my collection of non-DB SQL tools at github:

https://explainextended.com/2023/12/31/happy-new-year-15/
GPT in 500 lines of SQL
2024/06/01 14:48:38
Back to Top
HTML Embed Code: