High Availability and Disaster Recovery in Snowflake
20 March 2023Understanding Snowflake’s Costing and Pricing Model
16 April 2023Tags
Published by
BluePi
Data-Driven Business Transformation
Security in Snowflake
Introduction
Before we plunge headfirst into the awesomeness of Snowpro, let’s ensure we’re on the same page. Snowpro is not your average software; it’s a game-changer in the field of data warehousing and analytics. Developed by Snowflake, a leading cloud-based data platform, Snowpro is here to revolutionize how we handle data.
In this blog, we will learn about all the intriguing security features that Snowflake has to offer us.
Security Consideration
Before diving into the security considerations, We need to discuss the guiding principles that serve as the foundation of every organization’s security architecture. They serve as goals and objectives for any security program, in reality.
CIA stands for :
- Confidentiality: Snowpro is designed to supercharge your analytics game. It offers a range of tools and features that allow you to extract meaningful insights from your data effortlessly. Whether you’re a data scientist or a business analyst, Snowpro has something to offer.
- Integrity: The next thing to talk about is integrity. Well, the idea here is to make sure that data is trustworthy and free from tampering at any stage of the Data Lifecycle.
- Availability: This means that the network should be readily available to its users. This applies to both systems and data. To ensure availability, the network administrator should maintain hardware, make regular upgrades, have a plan for failover, and prevent bottlenecks in a network.
Keeping this GOLD Standard in mind we will now discuss Security Considerations at every level of the data lifecycle
Let’s go by an example,
There is a Company named “Alfa”. It produces large amounts of data which is consumed on regular basis by its employee to generate reports, do some reconciliation, handle analytical workload & some auditing purposes.
Once the data has been generated it immediately gets stored in a storage system or a network drive.
On stored data or while storing data, there are generally two types of actions made by a user/program.
- To do something like transformation, querying.
- To Load/Unload some data.
DATA USAGE
Since our Company is big, we have different departments like HR that are eligible to view Employee’s sensitive information. Finance who are eligible to view the Balance sheet. But at the same time, an employee is not eligible to do so.
Therefore, now is the right time to setup Access control frameworks that helps us in defining data usages policies & it can be achieved by different authorization techniques like Role-based access control, Rule-based access control, Risk-based access control
In this way, we can avoid Data Leakage & Attacks from Malicious Insiders.
DATA USAGE
The data we have generated will be shared & consumed via different mediums like APIs, dashboards, drivers, or another group of users via Client Screens. Therefore we need to have a secure way of gaining access to our systems which can be achieved through Authentication Methods like MFA, OAuth, etc
In this way, we can avoid hijacking or misuse of our accounts
Security Architecture
By looking into all the considerations we have a well-designed security architecture that will cover all security features provided by Snowflake.
Snowflake secures customer data using defence depth with three security layers.
- Network Security
- IAM
- Data Encryption
Network Security
The first line of defence against malicious individuals trying to access Snowflake customer accounts is network security.
Snowflake provides two different forms of network security safeguards to protect against malicious users: employing network policies and private connectivity.
- Network Policies
Network access to the Snowflake data warehouse is managed and limited using Snowflake Network Policies. These policies can be applied to restrict access to particular protocols or ports, as well as to designate allowed IP addresses or CIDR ranges. Only users with the SECURITYADMIN position or higher, as well as roles with the global CREATE NETWORK POLICY access, are able to create network policies. A network policy’s ownership can be transferred to another role. Network Policies can be managed via the Snowflake web interface or the Snowflake API and applied at the account, user, and warehouse levels. To determine whether a network policy is set on your account or for a specific user, execute the SHOW PARAMETERS command.
- Private Connectivity
Business Critical Feature**
By connecting to Snowflake with a private IP address and using the private connectivity provided by cloud service providers like AWS PrivateLink or Azure Private Link, you can make use of the private connectivity available
Snowflake’s account will show up in our network as a resource thanks to this feature. Here are a few guidelines for using this function effectively.
- Setting up DNS to resolve Snowflake’s secret URL is our responsibility. The optimal strategy is to use private DNS in our cloud provider network since it enables clients running both on-premises and in the cloud provider network to resolve Snowflake accounts. The Snowflake account can then have a DNS forwarding rule created for it in our on-premise DNS.
- If we want to restrict access to the public endpoint after configuring private connectivity, we can make an account-level network policy that only allows connections from the private IP range on our network.
- Client apps running outside of our network would connect to our account via a public endpoint if we wanted to let this happen. We can add the client application’s IP range to the approved list of account level, user level, or OAuth integration network policy in order to grant access, depending on the use case.
Identity and Access Management
The next step in gaining access to Snowflake is to authenticate the user after our Snowflake account has been made accessible. Before gaining access, users must be created in Snowflake.
Once the user has been verified, a session with roles is formed and utilised to grant access to Snowflake.
This section covers best practices for:
1. Managing users and roles
2. Authentication and single sign-on
3. Sessions
4. Object-level access control (authorization)
5. Column-level access control
6. Row-level access control
Managing users and roles
Snowflake recommends using federated single sign-on (SSO) while using passwords for only certain use cases such as for service accounts and users with the Snowflake ACCOUNTADMIN system role. For such cases, the password management best practices are as follows:
- Enable built-in Duo multi-factor authentication for additional security.
- Use lengthy, complex passwords that are preferably monitored by platforms for privileged access management (PAM).To utilise Hashicorp Vault with Snowflake, see the sample.
- Passwords should be changed on a frequent basis. Snowflake does not presently enable password expiry. however, we can force password changes by using platforms for secrets management or privileged access management (PAM).
Authentication and single sign-on
Depending on the interface being used, Snowflake offers a variety of authentication techniques, including client applications using drivers, UI, or Snowpipe.
Snowflake advises compiling a spreadsheet that details each client application that connects to it as well as its authentication capabilities. Use the authentication method in the priority order listed below if the app supports multiple authentication methods.
- OAuth (either Snowflake OAuth or External OAuth)
- If the application is a desktop programme and OAuth is not supported, use an external browser.
- Key Pair Authentication, which is mostly utilised by service account users. Add our internal key management software as a complement because this necessitates the client application managing private keys.
- If none of the aforementioned alternatives is supported by the application, the last resort should be a password. Users connecting via third-party ETL apps typically use this option when using service account login credentials.
*Additionally, Snowflake advises always employing MFA because it adds an extra layer of security for user access.
Object-level Access Control
In Snowflake, roles are used to control access to objects like tables, views, and functions. Roles have hierarchies and can contain other roles. The primary role is linked to the database session when it is created for a user. To carry out the authorisation, all roles in the principal role’s hierarchy are activated throughout the session. We should spend some time upfront creating a proper role hierarchy model.
Snowflake recommends the following best practices for access control:
- Define functional roles and access roles
- Avoid granting access roles to other access roles
- Use future grants
- Set default_role property for the user
- Create a role per user for cross-database join use cases
- Use managed access schema to centralize grant management
Column-level Access Control
Snowflake advises using the following data governance capabilities to limit column access for unauthorised users if we want to restrict access to sensitive information that is present in particular columns, such as PII, PHI, or financial data.
- Dynamic Data Masking: this is a built-in feature that can dynamically obfuscate column data based on who’s querying it.
- External Tokenization: It integrates with partner solutions to detokenize data at query time for authorized users.
- Secure Views: We can hide the columns entirely from unauthorized users using them.
Masking policies are used by Dynamic Data Masking and External Tokenization to limit authorised users’ access to sensitive data. Additionally, Snowflake suggests the following guidelines for disguising policies:
- Determine up-front if we want to take a centralized vs. decentralized approach for policy management.
- Use invoker_role() in policy condition for unauthorized users to view aggregate data while unable to view individual data.
- Avoid using the SHA2 function in the policy to allow joins on protected columns for unauthorized users since it can lead to unintended query results.
Row-level Access Control
Snowflake offers row-level security by using row access restrictions to choose which rows to return in the query result. The row access policy can be as basic as allowing one role to view rows or as sophisticated as including a mapping table in the policy description to decide access to rows in the query result.
It is a schema-level object that controls whether a certain row in a table or view may be accessed using the following statements:
- SELECT clauses
- UPDATE, DELETE, and MERGE commands.
When requirements are fulfilled, row access policies can incorporate conditions and functions in the policy expression to alter the data at query runtime. The policy-driven model encourages the separation of roles, allowing governance teams to develop regulations that restrict the exposure of sensitive data.
The object owner (i.e. the role with the OWNERSHIP privilege on the object, such as a table or view) is also included in this method, as they generally have complete access to the underlying data. Note: A single policy can be applied to several tables and views at the same time.
The main advantage of a row access policy is that it provides an organisation with an extendable policy that allows it to correctly balance data security, governance, and analytics. The row access policy’s extensible design enables one or more conditions to be added or withdrawn at any moment in order to keep the policy up to date with changes to the data, the mapping tables, and the RBAC hierarchy.
Data Encryption
Snowflake provides us with End-to-end encryption, which is a method that prevents third parties from reading data while at rest or in transit to and from Snowflake.
Aside from E2EE, SF provides us with two features that serve as the icing on the cake.
- Periodic Rekeying
- Tri-secret secure
Snowflake controls data encryption keys to safeguard consumer information. There is no requirement for client involvement in this management; it happens automatically. Customers can manage their own extra encryption key using the key management feature of the cloud platform that houses their Snowflake account.
When enabled, a composite master key is produced by combining a customer-managed key with a Snowflake-maintained key to secure Snowflake data. It’s known as Tri-Secret Secure.
Periodic Rekeying
All Snowflake-managed keys are automatically rotated by Snowflake when they are more than 30 days old. Active keys are retired, and new keys are created.
The following image illustrates key rotation for one table master key (TMK) over a period of three months:
The TMK rotation works as follows:
- Version 1 of the TMK is active in April. Data inserted into this table in April is protected with TMK v1.
- In May, this TMK is rotated: TMK v1 is retired and a new, completely random key, TMK v2, is created. TMK v1 is now used only to decrypt data from April. New data inserted into the table is encrypted using TMK v2.
- In June, the TMK has rotated again: TMK v2 is retired and a new TMK, v3, is created. TMK v1 is used to decrypt data from April, TMK v2 is used to decrypt data from May, and TMK v3 is used to encrypt and decrypt new data inserted into the table in June.
If periodic rekeying is enabled, then when the retired encryption key for a table is older than one year, Snowflake automatically creates a new encryption key and re-encrypts all data previously protected by the retired key using the new key. The new key is used to decrypt the table data going forward.
Periodic rekeying works as follows:
- In April of the following year, after TMK v1 has been retired for an entire year, it is rekeyed (generation 2) using a fully new random key.
- The data files protected by TMK v1 generation 1 are decrypted and re-encrypted using TMK v1 generation 2. Having no further purpose, TMK v1 generation 1 is destroyed.
- In May, Snowflake performs the same rekeying process on the table data protected by TMK v2.
- And so on.