Deep Dive into Snowflake Security: Protecting Your Data

Introduction

Before we plunge headfirst into the awesomeness of Snowpro, let’s ensure we’re on the same page. Snowpro is not your average software; it’s a game-changer in the field of data warehousing and analytics. Developed by Snowflake, a leading cloud-based data platform, Snowpro is here to revolutionize how we handle data.

Snowflake also provides robust security features to secure data and ensure compliance with industry regulations.

In this blog, we will learn about all the intriguing security features that Snowflake has to offer us.

Security Consideration

Before diving into the security considerations, We need to discuss the guiding principles that serve as the foundation of every organization’s security architecture. They serve as goals and objectives for any security program, in reality.

The CIA triangle, which is intended to direct policies for data security, is one of the most significant models when discussing security at the organizational level.

Confidentiality: Snowpro is designed to supercharge your analytics game. It offers a range of tools and features that allow you to extract meaningful insights from your data effortlessly. Whether you’re a data scientist or a business analyst, Snowpro has something to offer.
Integrity: The next thing to talk about is integrity. Well, the idea here is to make sure that data is trustworthy and free from tampering at any stage of the Data Lifecycle.
Availability: This means that the network should be readily available to its users. This applies to both systems and data. To ensure availability, the network administrator should maintain hardware, make regular upgrades, have a plan for failover, and prevent bottlenecks in a network.

Keeping this GOLD Standard in mind we will now discuss Security Considerations at every level of the data lifecycle

There is a Company named “Alfa”. It produces large amounts of data which is consumed on regular basis by its employee to generate reports, do some reconciliation, handle analytical workload & some auditing purposes.

Data Creation is the first step.

Once the data has been generated it immediately gets stored in a storage system or a network drive.

On stored data or while storing data, there are generally two types of actions made by a user/program.

To do something like transformation, querying.
To Load/Unload some data.

Since our company is producing very sensitive data, we need to make sure that our storage system is only accessible from a few authentic IP addresses while querying & … Load/Unloading of data is done via private channels only. Also, data stored should be encrypted at rest & also in motion.

In this way, we can avoid PHISHING, DOS Attacks& Man In Middle Attacks, which are major cyber threats to any organization.

Since our Company is big, we have different departments like HR that are eligible to view Employee’s sensitive information. Finance who are eligible to view the Balance sheet. But at the same time, an employee is not eligible to do so.

Therefore, now is the right time to setup Access control frameworks that helps us in defining data usages policies & it can be achieved by different authorization techniques like Role-based access control, Rule-based access control, Risk-based access control.

In this way, we can avoid Data Leakage & Attacks from Malicious Insiders.

The data we have generated will be shared & consumed via different mediums like APIs, dashboards, drivers, or another group of users via Client Screens. Therefore we need to have a secure way of gaining access to our systems which can be achieved through Authentication Methods like MFA, OAuth, etc.

In this way, we can avoid hijacking or misuse of our accounts.

Security Architecture

By looking into all the considerations we have a well-designed security architecture that will cover all security features provided by Snowflake.

It is crucial to ensure the security of data stored in Snowflake, as it allows organizations to benefit from scalable storage and analytics while protecting sensitive information.

Snowflake secures customer data using defence depth with three security layers.

Network Security
IAM
Data Encryption

Data Security Fundamentals

Data security is a critical aspect of protecting sensitive data from unauthorized access, theft, or corruption. It involves a set of practices and procedures that ensure data is available to authorized individuals who need it, when they need it. Key components of data security include data encryption, access controls, data governance policies, and data loss prevention measures.

Data encryption converts text characters into an unreadable format, making it accessible only to those with the decryption key. Access controls, such as role-based access control (RBAC), ensure that only authorized users can access specific data. Data governance policies establish guidelines for data management and protection, while data loss prevention measures help safeguard data from various security risks, including insider threats, social engineering attacks, and ransomware. Together, these practices form a robust framework for protecting sensitive data and ensuring its integrity and availability.

Network Security

The first line of defence against malicious individuals trying to access Snowflake customer accounts is network security.

Snowflake provides two different forms of network security safeguards to protect against malicious users: employing network policies and private connectivity.

Network Policies

Network access to the Snowflake data warehouse is managed and limited using Snowflake Network Policies. These policies can be applied to restrict access to particular protocols or ports, as well as to designate allowed IP addresses or CIDR ranges. Only users with the SECURITYADMIN position or higher, as well as roles with the global CREATE NETWORK POLICY access, are able to create network policies. A network policy's ownership can be transferred to another role. Network Policies can be managed via the Snowflake web interface or the Snowflake API and applied at the account, user, and warehouse levels. To determine whether a network policy is set on your account or for a specific user, execute the SHOW PARAMETERS command.

Private Connectivity

Business Critical Feature**

By connecting to Snowflake with a private IP address and using the private connectivity provided by cloud service providers like AWS PrivateLink or Azure Private Link, you can make use of the private connectivity available

Snowflake's account will show up in our network as a resource thanks to this feature. Here are a few guidelines for using this function effectively.

Setting up DNS to resolve Snowflake's secret URL is our responsibility. The optimal strategy is to use private DNS in our cloud provider network since it enables clients running both on-premises and in the cloud provider network to resolve Snowflake accounts. The Snowflake account can then have a DNS forwarding rule created for it in our on-premise DNS.
If we want to restrict access to the public endpoint after configuring private connectivity, we can make an account-level network policy that only allows connections from the private IP range on our network.
Client apps running outside of our network would connect to our account via a public endpoint if we wanted to let this happen. We can add the client application's IP range to the approved list of account level, user level, or OAuth integration network policy in order to grant access, depending on the use case.

Identity and Access Management with Multi Factor Authentication

The next step in gaining access to Snowflake is to authenticate the user after our Snowflake account has been made accessible. Before gaining access, users must be created in Snowflake.

Once the user has been verified, a session with roles is formed and utilised to grant access to Snowflake.

This section covers best practices for:

1. Managing users and roles 2. Authentication and single sign-on 3. Sessions 4. Object-level access control (authorization) 5. Column-level access control 6. Row-level access control

Snowflake recommends using federated single sign-on (SSO) while using passwords for only certain use cases such as for service accounts and users with the Snowflake ACCOUNTADMIN system role. For such cases, the password management best practices are as follows:

Enable built-in Duo multi-factor authentication for additional security.
Use lengthy, complex passwords that are preferably monitored by platforms for privileged access management (PAM).To utilise Hashicorp Vault with Snowflake, see the sample.
Passwords should be changed on a frequent basis. Snowflake does not presently enable password expiry. however, we can force password changes by using platforms for secrets management or privileged access management (PAM).

Authentication and single sign-on

Depending on the interface being used, Snowflake offers a variety of authentication techniques, including client applications using drivers, UI, or Snowpipe.

Snowflake advises compiling a spreadsheet that details each client application that connects to it as well as its authentication capabilities. Use the authentication method in the priority order listed below if the app supports multiple authentication methods.

OAuth (either Snowflake OAuth or External OAuth)
If the application is a desktop programme and OAuth is not supported, use an external browser.
Key Pair Authentication, which is mostly utilised by service account users. Add our internal key management software as a complement because this necessitates the client application managing private keys.
If none of the aforementioned alternatives is supported by the application, the last resort should be a password. Users connecting via third-party ETL apps typically use this option when using service account login credentials.

*Additionally, Snowflake advises always employing MFA because it adds an extra layer of security for user access.

Object-level Access Control

In Snowflake, roles are used to control access to objects like tables, views, and functions. Roles have hierarchies and can contain other roles. The primary role is linked to the database session when it is created for a user. To carry out the authorisation, all roles in the principal role's hierarchy are activated throughout the session. We should spend some time upfront creating a proper role hierarchy model.

Snowflake recommends the following best practices for access control:

Define functional roles and access roles
Avoid granting access roles to other access roles
Use future grants
Set default_role property for the user
Create a role per user for cross-database join use cases
Use managed access schema to centralize grant management

Column-level Access Control

Snowflake advises using the following data governance capabilities to limit column access for unauthorised users if we want to restrict access to sensitive information that is present in particular columns, such as PII, PHI, or financial data.

Dynamic Data Masking: this is a built-in feature that can dynamically obfuscate column data based on who's querying it.
External Tokenization: It integrates with partner solutions to detokenize data at query time for authorized users.
Secure Views: We can hide the columns entirely from unauthorized users using them.

Masking policies are used by Dynamic Data Masking and External Tokenization to limit authorised users' access to sensitive data. Additionally, Snowflake suggests the following guidelines for disguising policies:

Determine up-front if we want to take a centralized vs. decentralized approach for policy management.
Use invoker_role() in policy condition for unauthorized users to view aggregate data while unable to view individual data.
Avoid using the SHA2 function in the policy to allow joins on protected columns for unauthorized users since it can lead to unintended query results.

Snowflake offers row-level security by using row access restrictions to choose which rows to return in the query result. The row access policy can be as basic as allowing one role to view rows or as sophisticated as including a mapping table in the policy description to decide access to rows in the query result.

It is a schema-level object that controls whether a certain row in a table or view may be accessed using the following statements:

SELECT clauses
UPDATE, DELETE, and MERGE commands.

When requirements are fulfilled, row access policies can incorporate conditions and functions in the policy expression to alter the data at query runtime. The policy-driven model encourages the separation of roles, allowing governance teams to develop regulations that restrict the exposure of sensitive data.

The object owner (i.e. the role with the OWNERSHIP privilege on the object, such as a table or view) is also included in this method, as they generally have complete access to the underlying data. Note: A single policy can be applied to several tables and views at the same time.

The main advantage of a row access policy is that it provides an organisation with an extendable policy that allows it to correctly balance data security, governance, and analytics. The row access policy's extensible design enables one or more conditions to be added or withdrawn at any moment in order to keep the policy up to date with changes to the data, the mapping tables, and the RBAC hierarchy.

Data Protection

Data protection is a crucial aspect of data security that involves safeguarding sensitive data from unauthorized access, theft, or corruption. This includes implementing data encryption, data masking, and data redaction techniques. Data encryption uses algorithms to convert text characters into an unreadable format, ensuring that only authorized individuals with the decryption key can access the data.

Data masking and redaction are techniques used to hide or remove sensitive information from data. Data masking, such as dynamic data masking, obfuscates data in real-time, making it unusable to unauthorized individuals. Data redaction involves permanently removing sensitive information, ensuring that it cannot be accessed by anyone. These data protection measures are essential for maintaining the confidentiality and integrity of sensitive information, and for complying with regulatory requirements.

Access Control and Authorization

Access control and authorization are critical components of data security that ensure only authorized individuals have access to sensitive data. Role-based access control (RBAC) is a common approach that assigns access privileges to roles, which are then assigned to users. This method ensures that users only have access to the data necessary for their role, minimizing the risk of unauthorized access.

Multi-factor authentication (MFA) adds an additional layer of security by requiring users to provide multiple forms of verification before accessing sensitive data. This can include something the user knows (like a password), something the user has (like a security token), and something the user is (like a fingerprint). Access controls can be implemented at various levels, including network policies, database objects, and column-level access control, ensuring comprehensive protection of sensitive data.

Data Masking and Redaction

Data masking and redaction are techniques used to protect sensitive data by hiding or removing sensitive information. Dynamic data masking is a technique that masks sensitive data in real-time, making it unusable to unauthorized individuals. This method ensures that sensitive information is protected while still allowing authorized users to access the data they need.

Data redaction involves permanently removing sensitive information from data, making it unusable to anyone. This technique is often used to comply with regulatory requirements and to protect sensitive information from unauthorized access. By implementing data masking and redaction, organizations can ensure that sensitive data is protected and that they are in compliance with data protection regulations.

Monitoring and Logging

Monitoring and logging are critical components of data security that help detect and respond to security risks in real-time. Monitoring involves tracking user activity, access attempts, and access violations, while logging involves recording and storing security-related events. These measures help identify potential security risks and enable swift action to prevent data breaches and unauthorized access.

Regular monitoring and logging also help ensure compliance with regulatory requirements and security standards. By keeping a detailed record of security-related events, organizations can quickly identify and respond to potential threats, ensuring the ongoing protection of sensitive data.

Data Encryption

Snowflake provides us with End-to-end encryption, which is a method that prevents third parties from reading data while at rest or in transit to and from Snowflake.

Aside from E2EE, SF provides us with two features that serve as the icing on the cake.

Periodic Rekeying
Tri-secret secure

Snowflake controls data encryption keys to safeguard consumer information. There is no requirement for client involvement in this management; it happens automatically. Customers can manage their own extra encryption key using the key management feature of the cloud platform that houses their Snowflake account.

When enabled, a composite master key is produced by combining a customer-managed key with a Snowflake-maintained key to secure Snowflake data. It's known as Tri-Secret Secure.

All Snowflake-managed keys are automatically rotated by Snowflake when they are more than 30 days old. Active keys are retired, and new keys are created.

The following image illustrates key rotation for one table master key (TMK) over a period of three months:

The TMK rotation works as follows:

Version 1 of the TMK is active in April. Data inserted into this table in April is protected with TMK v1.
In May, this TMK is rotated: TMK v1 is retired and a new, completely random key, TMK v2, is created. TMK v1 is now used only to decrypt data from April. New data inserted into the table is encrypted using TMK v2.
In June, the TMK has rotated again: TMK v2 is retired and a new TMK, v3, is created. TMK v1 is used to decrypt data from April, TMK v2 is used to decrypt data from May, and TMK v3 is used to encrypt and decrypt new data inserted into the table in June.

Encryption Key Rotation is described as key rotation, which replaces active keys with new keys on a periodic basis and retires the old keys. Periodic data rekeying completes the life cycle.

If periodic rekeying is enabled, then when the retired encryption key for a table is older than one year, Snowflake automatically creates a new encryption key and re-encrypts all data previously protected by the retired key using the new key. The new key is used to decrypt the table data going forward.

Periodic rekeying works as follows:

In April of the following year, after TMK v1 has been retired for an entire year, it is rekeyed (generation 2) using a fully new random key.
The data files protected by TMK v1 generation 1 are decrypted and re-encrypted using TMK v1 generation 2. Having no further purpose, TMK v1 generation 1 is destroyed.
In May, Snowflake performs the same rekeying process on the table data protected by TMK v2.
And so on.

Summary

In the age of data-driven digital transformation, ensuring the security of your business data has never been more important. As a leading digital transformation service provider, Bluepi understands the importance of robust security measures for businesses undergoing digital transformation. Our team of experts has extensive experience in providing the best digital transformation services and solutions, making us one of the top consulting firms for digital transformation.

Data Security in Snowflake