Happy Friday!

Today’s Topic: Azure Synapse – Part 2

Talent: Edgar Cuellar, Lead Consultant, featuring Seth Werner, Sr. Lead Consultant.

S: “Hi everyone! So last week we talked about Azure Synapse Workspaces and some of the new features and now we’re going to be talking about how that may apply specifically to data warehouse. So, one question I have for you Edgar is: I have a client that has a lot of data storage account that I need to move pretty quickly into a data warehouse. How can I do that?”

E: “I’m glad you brought that up, although the Synapse Workspace now is introducing various capabilities, the actual SQL data warehouse itself and Synapse have also been updated. That particular use case does come up often and so traditionally methods such as Polybase are available for that, but now what is being introduced is a concept called COPY INTO that allows you to grab large amounts of data from a storage account and very quickly inserted into your SQL data warehouse so it becomes available to you in persistent tables.”

S: “Ok, now would that apply also with JSON Data because a lot of the data that I have in my storage account is in that particular format. What functions do I have available?”

E: “Right so traditionally, JSON is pretty pretty challenging from a data warehouse environment because getting that data and then being able to run queries on it, and their tensor one updated that can get very complex. Fortunately, they’ve introduced out of the box and T SQL JSON capability and SQL data warehouse synapse. So an example of that would be: you get JSON your storage account, you moved it over and copy into and you want to store it in the same way it was given to you, and then once you have it in those tables, you want to be able to run queries against certain elements in your JSON. You want to be able to update certain values, etc. The good news is you no longer need to worry about that tooling for that. That’s all available for you out of the box.”

S: “OK now in this scenario: A lot of that data that we’ve moved with COPY INTO in JSON format. I have a lot of end users that need to be able to query it and rollups, summaries, other aggregates, but I also need to maintain high performance. How can I do that?”

E: “A lot of the data warehouse use cases do involve aggregates, and rollups, and things like that, and traditional was done is standard views to simplify those query patterns. Unfortunately, with standard views what usually ends up happening is as soon as you query  that view it has to recompute, the joins, the rollups, the summaries, etc.  And for big data. Warehouses, lots of data, lots of joints, that can take a long time. Fortunately, they introduced the concept of a materialized view. What a materialized view allows you to do is essentially store that view as if it was a table, and under the covers, behind the scenes, it updates as a source data changes. For instance, you have a new record that was just inserted into one of the tables of the many joins; the materialized view is smart enough to handle that for you. You don’t have to worry about your materialized view being updated. You don’t have to worry about missing data. In addition, you don’t have to worry about each time that view is access, that the query pattern uses all the computer to run all those joints for you again. It’s all persisted for you out of the box through the materialize view.”

S: “ OK, so between COPY INTO, the JSON functionality and then the options for materialized views for aggregate data, it sounds like there’s a lot of really great functions and I assume there’s even more functions that we can talk about in future sessions.”

E: “Absolutely.”

S: “OK, looking forward to it.”

Next week: Azure Government

Want to check out more Catapult’s videos? Visit our You Tube Channel.