nexusstc/Snowflake: The Definitive Guide: Architecting, Designing, and Deploying on the Snowflake Data Cloud/471e6bb78a5c990adbdf5e5f18a22c86.pdf
SNOWFLAKE - THE DEFINITIVE GUIDE : architecting, designing, and deploying on the snowflake data... cloud 🔍
Joyce Kay Avila
O'Reilly Media, Incorporated, 1, 2023
英语 [en] · PDF · 27.1MB · 2023 · 📘 非小说类图书 · 🚀/lgli/lgrs/nexusstc/zlib · Save
描述
Snowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users within an organization to make data-driven decisions. This clear, comprehensive guide will show you how to build integrated data applications and develop new revenue streams based on data. Snowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users at all levels within an organization to make data-driven decisions. Whether you're an IT professional working in data warehousing or data science, a business analyst or technical manager, or an aspiring data professional wanting to get more hands-on experience with the Snowflake platform, this book is for you. You'll learn how Snowflake users can build modern integrated data applications and develop new revenue streams based on data. Using hands-on SQL examples, you'll also discover how the Snowflake Data Cloud helps you accelerate data science by avoiding replatforming or migrating data unnecessarily. You'll be able to: Efficiently capture, store, and process large amounts of data at an amazing speed Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace
备用文件名
lgli/OReilly.Snowflake.The.Definitive.Guide.1098103823.pdf
备用文件名
lgrsnf/OReilly.Snowflake.The.Definitive.Guide.1098103823.pdf
备用文件名
zlib/Computers/Databases/Joyce Kay Avila/Snowflake: The Definitive Guide: Architecting, Designing, and Deploying on the Snowflake Data Cloud_26428850.pdf
备选作者
Avila, Joyce
备用版本
United States, United States of America
备用版本
O'Reilly Media, Sebastopol, CA, 2022
备用版本
First edition, Sebastopol, CA, 2022
备用版本
Cambridge, 2022
备用版本
1, US, 2022
元数据中的注释
Publisher's PDF
元数据中的注释
{"edition":"1","isbns":["1098103823","9781098103828"],"last_page":465,"publisher":"O'Reilly Media"}
备用描述
Cover
Copyright
Table of Contents
Preface
Origin of the Book
Who Is This Book For?
Goals of the Book
Navigating this Book
Using Code Examples
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Getting Started
Snowflake Web User Interfaces
Prep Work
Snowsight Orientation
Snowsight Preferences
Navigating Snowsight Worksheets
Context Setting
Improved Productivity
Snowflake Community
Snowflake Certifications
Snowday and Snowflake Summit Events
Important Caveats About Code Examples in the Book
Code Cleanup
Summary
Knowledge Check
Chapter 2. Creating and Managing the Snowflake Architecture
Prep Work
Traditional Data Platform Architectures
Shared-Disk (Scalable) Architecture
Shared-Nothing (Scalable) Architecture
NoSQL Alternatives
The Snowflake Architecture
The Cloud Services Layer
Managing the Cloud Services Layer
Billing for the Cloud Services Layer
The Query Processing (Virtual Warehouse) Compute Layer
Virtual Warehouse Size
Scaling Up a Virtual Warehouse to Process Large Data Volumes and Complex Queries
Scaling Out with Multicluster Virtual Warehouses to Maximize Concurrency
Creating and Using Virtual Warehouses
Separation of Workloads and Workload Management
Billing for the Virtual Warehouse Layer
Centralized (Hybrid Columnar) Database Storage Layer
Introduction to Zero-Copy Cloning
Introduction to Time Travel
Billing for the Storage Layer
Snowflake Caching
Query Result Cache
Metadata Cache
Virtual Warehouse Local Disk Cache
Code Cleanup
Summary
Knowledge Check
Chapter 3. Creating and Managing Snowflake Securable Database Objects
Prep Work
Creating and Managing Snowflake Databases
Creating and Managing Snowflake Schemas
INFORMATION_SCHEMA
ACCOUNT_USAGE Schema
Schema Object Hierarchy
Introduction to Snowflake Tables
Creating and Managing Views
Introduction to Snowflake Stages: File Format Included
Extending SQL with Stored Procedures and UDFs
User-Defined Function (UDF): Task Included
Secure SQL UDTF That Returns Tabular Value (Market Basket Analysis Example)
Stored Procedures
Introduction to Pipes, Streams, and Sequences
Snowflake Streams (Deep Dive)
Snowflake Tasks (Deep Dive)
Code Cleanup
Summary
Knowledge Check
Chapter 4. Exploring Snowflake SQL Commands, Data Types, and Functions
Prep Work
Working with SQL Commands in Snowflake
DDL Commands
DCL Commands
DML Commands
TCL Commands
DQL Command
SQL Query Development, Syntax, and Operators in Snowflake
SQL Development and Management
Query Syntax
Query Operators
Long-Running Queries, and Query Performance and Optimization
Snowflake Query Limits
Introduction to Data Types Supported by Snowflake
Numeric Data Types
String and Binary Data Types
Date and Time Input/Output Data Types
Semi-Structured Data Types
Unstructured Data Types
How Snowflake Supports Unstructured Data Use
Snowflake SQL Functions and Session Variables
Using System-Defined (Built-In) Functions
Creating SQL and JavaScript UDFs and Using Session Variables
External Functions
Code Cleanup
Summary
Knowledge Check
Chapter 5. Leveraging Snowflake Access Controls
Prep Work
Creating Snowflake Objects
Snowflake System-Defined Roles
Creating Custom Roles
Functional-Level Business and IT Roles
System-Level Service Account and Object Access Roles
Role Hierarchy Assignments: Assigning Roles to Other Roles
Granting Privileges to Roles
Assigning Roles to Users
Testing and Validating Our Work
User Management
Role Management
Snowflake Multi-Account Strategy
Managing Users and Groups with SCIM
Code Cleanup
Summary
Knowledge Check
Chapter 6. Data Loading and Unloading
Prep Work
Basics of Data Loading and Unloading
Data Types
File Formats
Data File Compression
Frequency of Data Processing
Snowflake Stage References
Data Sources
Data Loading Tools
Snowflake Worksheet SQL Using INSERT INTO and INSERT ALL Commands
Web UI Load Data Wizard
SnowSQL CLI SQL PUT and COPY INTO Commands
Data Pipelines
Third-Party ETL and ELT Tools
Alternatives to Loading Data
Tools to Unload Data
Data Loading Best Practices for Snowflake Data Engineers
Select the Right Data Loading Tool and Consider the Appropriate Data Type Options
Avoid Row-by-Row Data Processing
Choose the Right Snowflake Virtual Warehouse Size and Split Files as Needed
Transform Data in Steps and Use Transient Tables for Intermediate Results
Code Cleanup
Summary
Knowledge Check
Chapter 7. Implementing Data Governance, Account Security, and Data Protection and Recovery
Prep Work
Snowflake Security
Controlling Account Access
Monitoring Activity with the Snowflake ACCESS_HISTORY Account Usage View
Data Protection and Recovery
Replication and Failover
Democratizing Data with Data Governance Controls
INFORMATION_SCHEMA Data Dictionary
Object Tagging
Classification
Data Masking
Row Access Policies and Row-Level Security
External Tokenization
Secure Views and UDFs
Object Dependencies
Code Cleanup
Summary
Knowledge Check
Chapter 8. Managing Snowflake Account Costs
Prep Work
Snowflake Monthly Bill
Storage Fees
Data Transfer Costs
Compute Credits Consumed
Creating Resource Monitors to Manage Virtual Warehouse Usage and Reduce Costs
Resource Monitor Credit Quota
Resource Monitor Credit Usage
Resource Monitor Notifications and Other Actions
Resource Monitor Rules for Assignments
DDL Commands for Creating and Managing Resource Monitors
Using Object Tagging for Cost Centers
Querying the ACCOUNT_USAGE View
Using BI Partner Dashboards to Monitor Snowflake Usage and Costs
Snowflake Agile Software Delivery
Why Do We Need DevOps?
Continuous Data Integration, Continuous Delivery, and Continuous Deployment
What Is Database Change Management?
How Zero-Copy Cloning Can Be Used to Support Dev/Test Environments
Code Cleanup
Summary
Knowledge Check
Chapter 9. Analyzing and Improving Snowflake Query Performance
Prep Work
Analyzing Query Performance
QUERY_HISTORY Profiling
HASH() Function
Web UI History
Understanding Snowflake Micro-Partitions and Data Clustering
Partitions Explained
Snowflake Micro-Partitions Explained
Snowflake Data Clustering Explained
Clustering Width and Depth
Choosing a Clustering Key
Creating a Clustering Key
Reclustering
Performance Benefits of Materialized Views
Exploring Other Query Optimization Techniques
Search Optimization Service
Query Optimization Techniques Compared
Summary
Code Cleanup
Knowledge Check
Chapter 10. Configuring and Managing Secure Data Sharing
Snowflake Architecture Data Sharing Support
The Power of Snowgrid
Data Sharing Use Cases
Snowflake Support for Unified ID 2.0
Snowflake Secure Data Sharing Approaches
Prep Work
Snowflake’s Direct Secure Data Sharing Approach
Creating Outbound Shares
How Inbound Shares Are Used by Snowflake Data Consumers
How to List and Shop on the Public Snowflake Marketplace
Snowflake Marketplace for Providers
Standard Versus Personalized Data Listings
Harnessing the Power of a Snowflake Private Data Exchange
Snowflake Data Clean Rooms
Important Design, Security, and Performance Considerations
Share Design Considerations
Share Security Considerations
Share Performance Considerations
Difference Between Database Sharing and Database Cloning
Data Shares and Time Travel Considerations
Sharing of Data Shares
Summary
Code Cleanup
Knowledge Check
Chapter 11. Visualizing Data in Snowsight
Prep Work
Data Sampling in Snowsight
Fixed-Size Sampling Based on a Specific Number of Rows
Fraction-Based Sampling Based on Probability
Previewing Fields and Data
Sampling Examples
Using Automatic Statistics and Interactive Results
Snowsight Dashboard Visualization
Creating a Dashboard and Tiles
Working with Chart Visualizations
Aggregating and Bucketing Data
Editing and Deleting Tiles
Collaboration
Sharing Your Query Results
Using a Private Link to Collaborate on Dashboards
Summary
Code Cleanup
Knowledge Check
Chapter 12. Workloads for the Snowflake Data Cloud
Prep Work
Data Engineering
Data Warehousing
Data Vault 2.0 Modeling
Transforming Data within Snowflake
Data Lake
Data Collaboration
Data Monetization
Regulatory and Compliance Requirements for Data Sharing
Data Analytics
Advanced Analytics for the Finance Industry
Advanced Analytics for the Healthcare Industry
Advanced Analytics for the Manufacturing Industry and Logistics Services
Marketing Analytics for Retail Verticals and the Communications and Media Industry
Data Applications
Data Science
Snowpark
Streamlit
Cybersecurity Using Snowflake as a Security Data Lake
Overcoming the Challenges of a SIEM-Only Architecture
Search Optimization Service Versus Clustering
Unistore
Transactional Workload Versus Analytical Workload
Hybrid Tables
Summary
Code Cleanup
Knowledge Check
Appendix A. Answers to the Knowledge Check Questions
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Appendix B. Snowflake Object Naming Best Practices
General (Character Related)
General (Not Character Related)
Standard Label Abbreviations
Appendix C. Setting Up a Snowflake Trial Account
Index
About the Author
Colophon
Copyright
Table of Contents
Preface
Origin of the Book
Who Is This Book For?
Goals of the Book
Navigating this Book
Using Code Examples
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Getting Started
Snowflake Web User Interfaces
Prep Work
Snowsight Orientation
Snowsight Preferences
Navigating Snowsight Worksheets
Context Setting
Improved Productivity
Snowflake Community
Snowflake Certifications
Snowday and Snowflake Summit Events
Important Caveats About Code Examples in the Book
Code Cleanup
Summary
Knowledge Check
Chapter 2. Creating and Managing the Snowflake Architecture
Prep Work
Traditional Data Platform Architectures
Shared-Disk (Scalable) Architecture
Shared-Nothing (Scalable) Architecture
NoSQL Alternatives
The Snowflake Architecture
The Cloud Services Layer
Managing the Cloud Services Layer
Billing for the Cloud Services Layer
The Query Processing (Virtual Warehouse) Compute Layer
Virtual Warehouse Size
Scaling Up a Virtual Warehouse to Process Large Data Volumes and Complex Queries
Scaling Out with Multicluster Virtual Warehouses to Maximize Concurrency
Creating and Using Virtual Warehouses
Separation of Workloads and Workload Management
Billing for the Virtual Warehouse Layer
Centralized (Hybrid Columnar) Database Storage Layer
Introduction to Zero-Copy Cloning
Introduction to Time Travel
Billing for the Storage Layer
Snowflake Caching
Query Result Cache
Metadata Cache
Virtual Warehouse Local Disk Cache
Code Cleanup
Summary
Knowledge Check
Chapter 3. Creating and Managing Snowflake Securable Database Objects
Prep Work
Creating and Managing Snowflake Databases
Creating and Managing Snowflake Schemas
INFORMATION_SCHEMA
ACCOUNT_USAGE Schema
Schema Object Hierarchy
Introduction to Snowflake Tables
Creating and Managing Views
Introduction to Snowflake Stages: File Format Included
Extending SQL with Stored Procedures and UDFs
User-Defined Function (UDF): Task Included
Secure SQL UDTF That Returns Tabular Value (Market Basket Analysis Example)
Stored Procedures
Introduction to Pipes, Streams, and Sequences
Snowflake Streams (Deep Dive)
Snowflake Tasks (Deep Dive)
Code Cleanup
Summary
Knowledge Check
Chapter 4. Exploring Snowflake SQL Commands, Data Types, and Functions
Prep Work
Working with SQL Commands in Snowflake
DDL Commands
DCL Commands
DML Commands
TCL Commands
DQL Command
SQL Query Development, Syntax, and Operators in Snowflake
SQL Development and Management
Query Syntax
Query Operators
Long-Running Queries, and Query Performance and Optimization
Snowflake Query Limits
Introduction to Data Types Supported by Snowflake
Numeric Data Types
String and Binary Data Types
Date and Time Input/Output Data Types
Semi-Structured Data Types
Unstructured Data Types
How Snowflake Supports Unstructured Data Use
Snowflake SQL Functions and Session Variables
Using System-Defined (Built-In) Functions
Creating SQL and JavaScript UDFs and Using Session Variables
External Functions
Code Cleanup
Summary
Knowledge Check
Chapter 5. Leveraging Snowflake Access Controls
Prep Work
Creating Snowflake Objects
Snowflake System-Defined Roles
Creating Custom Roles
Functional-Level Business and IT Roles
System-Level Service Account and Object Access Roles
Role Hierarchy Assignments: Assigning Roles to Other Roles
Granting Privileges to Roles
Assigning Roles to Users
Testing and Validating Our Work
User Management
Role Management
Snowflake Multi-Account Strategy
Managing Users and Groups with SCIM
Code Cleanup
Summary
Knowledge Check
Chapter 6. Data Loading and Unloading
Prep Work
Basics of Data Loading and Unloading
Data Types
File Formats
Data File Compression
Frequency of Data Processing
Snowflake Stage References
Data Sources
Data Loading Tools
Snowflake Worksheet SQL Using INSERT INTO and INSERT ALL Commands
Web UI Load Data Wizard
SnowSQL CLI SQL PUT and COPY INTO Commands
Data Pipelines
Third-Party ETL and ELT Tools
Alternatives to Loading Data
Tools to Unload Data
Data Loading Best Practices for Snowflake Data Engineers
Select the Right Data Loading Tool and Consider the Appropriate Data Type Options
Avoid Row-by-Row Data Processing
Choose the Right Snowflake Virtual Warehouse Size and Split Files as Needed
Transform Data in Steps and Use Transient Tables for Intermediate Results
Code Cleanup
Summary
Knowledge Check
Chapter 7. Implementing Data Governance, Account Security, and Data Protection and Recovery
Prep Work
Snowflake Security
Controlling Account Access
Monitoring Activity with the Snowflake ACCESS_HISTORY Account Usage View
Data Protection and Recovery
Replication and Failover
Democratizing Data with Data Governance Controls
INFORMATION_SCHEMA Data Dictionary
Object Tagging
Classification
Data Masking
Row Access Policies and Row-Level Security
External Tokenization
Secure Views and UDFs
Object Dependencies
Code Cleanup
Summary
Knowledge Check
Chapter 8. Managing Snowflake Account Costs
Prep Work
Snowflake Monthly Bill
Storage Fees
Data Transfer Costs
Compute Credits Consumed
Creating Resource Monitors to Manage Virtual Warehouse Usage and Reduce Costs
Resource Monitor Credit Quota
Resource Monitor Credit Usage
Resource Monitor Notifications and Other Actions
Resource Monitor Rules for Assignments
DDL Commands for Creating and Managing Resource Monitors
Using Object Tagging for Cost Centers
Querying the ACCOUNT_USAGE View
Using BI Partner Dashboards to Monitor Snowflake Usage and Costs
Snowflake Agile Software Delivery
Why Do We Need DevOps?
Continuous Data Integration, Continuous Delivery, and Continuous Deployment
What Is Database Change Management?
How Zero-Copy Cloning Can Be Used to Support Dev/Test Environments
Code Cleanup
Summary
Knowledge Check
Chapter 9. Analyzing and Improving Snowflake Query Performance
Prep Work
Analyzing Query Performance
QUERY_HISTORY Profiling
HASH() Function
Web UI History
Understanding Snowflake Micro-Partitions and Data Clustering
Partitions Explained
Snowflake Micro-Partitions Explained
Snowflake Data Clustering Explained
Clustering Width and Depth
Choosing a Clustering Key
Creating a Clustering Key
Reclustering
Performance Benefits of Materialized Views
Exploring Other Query Optimization Techniques
Search Optimization Service
Query Optimization Techniques Compared
Summary
Code Cleanup
Knowledge Check
Chapter 10. Configuring and Managing Secure Data Sharing
Snowflake Architecture Data Sharing Support
The Power of Snowgrid
Data Sharing Use Cases
Snowflake Support for Unified ID 2.0
Snowflake Secure Data Sharing Approaches
Prep Work
Snowflake’s Direct Secure Data Sharing Approach
Creating Outbound Shares
How Inbound Shares Are Used by Snowflake Data Consumers
How to List and Shop on the Public Snowflake Marketplace
Snowflake Marketplace for Providers
Standard Versus Personalized Data Listings
Harnessing the Power of a Snowflake Private Data Exchange
Snowflake Data Clean Rooms
Important Design, Security, and Performance Considerations
Share Design Considerations
Share Security Considerations
Share Performance Considerations
Difference Between Database Sharing and Database Cloning
Data Shares and Time Travel Considerations
Sharing of Data Shares
Summary
Code Cleanup
Knowledge Check
Chapter 11. Visualizing Data in Snowsight
Prep Work
Data Sampling in Snowsight
Fixed-Size Sampling Based on a Specific Number of Rows
Fraction-Based Sampling Based on Probability
Previewing Fields and Data
Sampling Examples
Using Automatic Statistics and Interactive Results
Snowsight Dashboard Visualization
Creating a Dashboard and Tiles
Working with Chart Visualizations
Aggregating and Bucketing Data
Editing and Deleting Tiles
Collaboration
Sharing Your Query Results
Using a Private Link to Collaborate on Dashboards
Summary
Code Cleanup
Knowledge Check
Chapter 12. Workloads for the Snowflake Data Cloud
Prep Work
Data Engineering
Data Warehousing
Data Vault 2.0 Modeling
Transforming Data within Snowflake
Data Lake
Data Collaboration
Data Monetization
Regulatory and Compliance Requirements for Data Sharing
Data Analytics
Advanced Analytics for the Finance Industry
Advanced Analytics for the Healthcare Industry
Advanced Analytics for the Manufacturing Industry and Logistics Services
Marketing Analytics for Retail Verticals and the Communications and Media Industry
Data Applications
Data Science
Snowpark
Streamlit
Cybersecurity Using Snowflake as a Security Data Lake
Overcoming the Challenges of a SIEM-Only Architecture
Search Optimization Service Versus Clustering
Unistore
Transactional Workload Versus Analytical Workload
Hybrid Tables
Summary
Code Cleanup
Knowledge Check
Appendix A. Answers to the Knowledge Check Questions
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Appendix B. Snowflake Object Naming Best Practices
General (Character Related)
General (Not Character Related)
Standard Label Abbreviations
Appendix C. Setting Up a Snowflake Trial Account
Index
About the Author
Colophon
开源日期
2023-10-07
We strongly recommend that you support the author by buying or donating on their personal website, or borrowing in your local library.
🚀 快速下载
成为会员以支持书籍、论文等的长期保存。为了感谢您对我们的支持,您将获得高速下载权益。❤️
如果您在本月捐款,您将获得双倍的快速下载次数。
🐢 低速下载
由可信的合作方提供。 更多信息请参见常见问题解答。 (可能需要验证浏览器——无限次下载!)
- 低速服务器(合作方提供) #1 (稍快但需要排队)
- 低速服务器(合作方提供) #2 (稍快但需要排队)
- 低速服务器(合作方提供) #3 (稍快但需要排队)
- 低速服务器(合作方提供) #4 (稍快但需要排队)
- 低速服务器(合作方提供) #5 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #6 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #7 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #8 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #9 (无需排队,但可能非常慢)
- 下载后: 在我们的查看器中打开
所有选项下载的文件都相同,应该可以安全使用。即使这样,从互联网下载文件时始终要小心。例如,确保您的设备更新及时。
外部下载
-
对于大文件,我们建议使用下载管理器以防止中断。
推荐的下载管理器:JDownloader -
您将需要一个电子书或 PDF 阅读器来打开文件,具体取决于文件格式。
推荐的电子书阅读器:Anna的档案在线查看器、ReadEra和Calibre -
使用在线工具进行格式转换。
推荐的转换工具:CloudConvert和PrintFriendly -
您可以将 PDF 和 EPUB 文件发送到您的 Kindle 或 Kobo 电子阅读器。
推荐的工具:亚马逊的“发送到 Kindle”和djazz 的“发送到 Kobo/Kindle” -
支持作者和图书馆
✍️ 如果您喜欢这个并且能够负担得起,请考虑购买原版,或直接支持作者。
📚 如果您当地的图书馆有这本书,请考虑在那里免费借阅。
下面的文字仅以英文继续。
总下载量:
“文件的MD5”是根据文件内容计算出的哈希值,并且基于该内容具有相当的唯一性。我们这里索引的所有影子图书馆都主要使用MD5来标识文件。
一个文件可能会出现在多个影子图书馆中。有关我们编译的各种数据集的信息,请参见数据集页面。
有关此文件的详细信息,请查看其JSON 文件。 Live/debug JSON version. Live/debug page.