Snowflake SnowPro Advanced Architect Test Practice Test Questions, Exam Dumps

Practice Exams:

Snowflake SnowPro Advanced Architect Exam Dumps & Practice Test Questions

Question 1:

Which Snowflake features utilize the change tracking metadata associated with a table? (Select two.)

A. The MERGE command
B. The UPSERT command
C. The CHANGES clause
D. A STREAM object
E. The CHANGE_DATA_CAPTURE command

Correct Answer: A, D

Explanation:

Snowflake provides built-in mechanisms that leverage change tracking metadata to monitor and manage changes made to tables. These features are designed to track inserts, updates, and deletes efficiently, enabling data engineers and developers to work with change data capture patterns without manually scanning entire tables for modifications.

Looking at each option:

MERGE command (A): The MERGE command is a versatile SQL operation in Snowflake that combines INSERT, UPDATE, and DELETE actions based on matched conditions between source and target datasets. It uses the change tracking metadata to optimize the comparison process by identifying changed rows. This reduces unnecessary writes and improves performance, making MERGE a key feature that benefits from change tracking.

UPSERT command (B): Although "UPSERT" is a common term that generally refers to an operation that updates existing records or inserts new ones, Snowflake does not provide a dedicated UPSERT command distinct from MERGE. UPSERT is usually achieved via the MERGE command itself, so this option is not separately valid.

CHANGES clause (C): Snowflake does not have a specific "CHANGES" clause in its SQL syntax for change tracking. Change tracking is handled through other features like Streams and MERGE, so this option is incorrect.

STREAM object (D): A STREAM in Snowflake is a special object that records changes to a table (insertions, updates, and deletions) starting from when the stream is created. It tracks change metadata internally, allowing users to query the stream for incremental changes rather than scanning the entire table. This is a core feature that directly depends on change tracking metadata.

CHANGE_DATA_CAPTURE command (E): Snowflake does not offer a command called CHANGE_DATA_CAPTURE. Instead, CDC capabilities are implemented through Streams and Tasks. Therefore, this is not a valid choice.

In conclusion, the MERGE command and STREAM objects are the two Snowflake features that make direct use of change tracking metadata to efficiently handle table changes.

Question 2:

When integrating Kafka with Snowflake using the Snowflake Connector for Kafka, which data formats are supported for Kafka message ingestion? (Select two.)

A. CSV
B. XML
C. Avro
D. JSON
E. Parquet

Correct Answer: C, D

Explanation:

The Snowflake Connector for Kafka enables seamless streaming of data from Kafka topics into Snowflake tables. To accommodate diverse data pipelines, the connector supports several data formats for messages, but not all common formats are supported natively.

Evaluating each format:

CSV (A): While CSV is a popular format for batch data processing and manual data loads in Snowflake, it is not natively supported by the Kafka connector. Streaming CSV messages from Kafka is not typical because CSV lacks schema enforcement and efficient serialization for real-time streaming.

XML (B): XML is widely used in some systems but is not supported by the Snowflake Kafka connector. Processing XML data in Snowflake typically requires batch ingestion or special parsing after loading, not direct Kafka streaming.

Avro (C): Avro is a binary, compact, and schema-based format widely adopted in Kafka environments. It supports schema evolution, making it ideal for streaming. The Snowflake Connector for Kafka fully supports Avro, allowing efficient, schema-aware ingestion of data streams.

JSON (D): JSON is another highly flexible and widely used format in streaming applications. It supports semi-structured data, which Snowflake handles efficiently. JSON is natively supported by the connector, making it a top choice for Kafka message streaming.

Parquet (E): Parquet is a columnar storage format optimized for batch processing and analytics. While Snowflake supports Parquet for file ingestion, it is not a supported message format for Kafka streaming through the connector.

In summary, Avro and JSON are the two data formats that the Snowflake Connector for Kafka supports for message ingestion, offering the flexibility and efficiency needed for streaming data pipelines.

Question 3:

At which level of database object can privileges like APPLY MASKING POLICY, APPLY ROW ACCESS POLICY, and APPLY SESSION POLICY be assigned?

A. Global
B. Database
C. Schema
D. Table

Correct answer: D

Explanation:

The privileges APPLY MASKING POLICY, APPLY ROW ACCESS POLICY, and APPLY SESSION POLICY are specialized permissions used in database systems to enforce fine-grained data security and access controls. These privileges are typically related to how sensitive data is protected and accessed on a granular level.

Masking Policies are designed to hide or obfuscate sensitive information, ensuring that users only see data they are authorized to view. Row Access Policies control which rows of data are accessible to users, enabling security measures like multi-tenant data separation or restricted views. Session Policies govern behaviors at the session level, such as enforcing specific security rules or parameter settings during a user’s database session.

These types of policies inherently operate at the level of individual data objects because the controls must be precise and targeted. The only practical and effective place to assign these privileges is at the table level, where the actual data resides and can be controlled. This allows organizations to apply these policies exactly where they are needed without overly broad permissions that might compromise security.

The other options are less appropriate:

Global level permissions are too broad and would affect the entire database environment, which is neither practical nor secure for such detailed data controls.
Database level permissions apply to the whole database, which is also too coarse for controlling access and masking on a per-object basis.
Schema level permissions apply to collections of objects but do not provide the granularity required to enforce row-level or column-level data policies.

Therefore, these critical data security privileges are granted at the table level to ensure precise, controlled, and effective application of masking, row-level access, and session policies. Hence, option D is the correct answer.

Question 4:

An Architect uses the COPY INTO command with ON_ERROR=SKIP_FILE to load CSV files into TABLEA using its table stage.

After fixing an error in file5.csv, which two commands should the Architect use to reload only this file from the stage?

A. COPY INTO tablea FROM @%tablea RETURN_FAILED_ONLY = TRUE;
B. COPY INTO tablea FROM @%tablea;
C. COPY INTO tablea FROM @%tablea FILES = ('file5.csv');
D. COPY INTO tablea FROM @%tablea FORCE = TRUE;
E. COPY INTO tablea FROM @%tablea NEW_FILES_ONLY = TRUE;
F. COPY INTO tablea FROM @%tablea MERGE = TRUE;

Correct answers: C and A

Explanation:

In this scenario, the Architect previously used the COPY INTO command with the ON_ERROR=SKIP_FILE option to bulk load CSV files. This caused file5.csv to be skipped due to errors. After correcting the errors, the Architect wants to reload only file5.csv from the stage without reloading all other files.

Option C, using FILES = ('file5.csv'), explicitly instructs the COPY INTO command to load only the specified file. This is a precise way to target the fixed file without touching the rest of the files in the stage, which is efficient and avoids duplicating already loaded data.

Option A, which uses RETURN_FAILED_ONLY = TRUE, reloads only those files that failed during previous load attempts. Since file5.csv was the only failed file, this option automatically selects and reloads it. This command is very useful when multiple files fail, but you want to retry only the ones that were problematic.

Now, the other options do not fit this use case as well:

B reloads all files in the stage indiscriminately, which is inefficient and may cause duplicates.
D forces all files to be reloaded regardless of previous success, which is unnecessary when only one file needs reloading.
E loads only new files added since the last load, but file5.csv is not new—it’s a replaced file with the same name—so this may not behave as expected.
F is used for merge operations during loading, which is unrelated to selectively reloading specific files.

Therefore, the best choices are C to directly specify the file and A to automatically reload any previously failed files. Using either or both ensures that only file5.csv will be reloaded efficiently and correctly after fixing its issues.

Question 5:

A large manufacturing company operates multiple Snowflake accounts across different divisions, with most accounts in the same cloud region except for some European divisions. The company aims to improve data sharing to optimize supply chains and boost purchasing power with vendors. The Snowflake Architects need a design that lets each division control what data to share while keeping configuration and management efforts low.

According to Snowflake best practices, which solution should they implement?

A. Move the European accounts to the global region, manage shares using a connected graph, and deploy a Data Exchange.
B. Use a Private Data Exchange alongside data shares specifically for European accounts.
C. Publish data to the Snowflake Marketplace using invoker_share() in all secure views.
D. Implement a Private Data Exchange and use data replication to enable European accounts to share data within the Exchange.

Correct answer: D

Explanation:

The company’s goal is to enable efficient, manageable data sharing across multiple Snowflake accounts, including geographically distributed ones, particularly European divisions subject to data residency and compliance constraints. The recommended approach is to leverage Snowflake’s Private Data Exchange combined with replication to meet these needs.

A Private Data Exchange creates a centralized, secure platform for sharing data across accounts with minimal management overhead. It allows different business divisions to govern what they share while benefiting from simplified access control and security. However, since European accounts may reside in different cloud regions or data centers due to compliance (e.g., GDPR), direct sharing across regions can be complicated.

Replication addresses this challenge by copying data across regions, ensuring that European divisions can access data locally while still participating in the broader Data Exchange ecosystem. This maintains compliance and performance by keeping data geographically appropriate while providing seamless sharing capabilities. This setup reduces configuration complexity and operational overhead.

Option A suggests migrating European accounts to a global region, which risks violating compliance rules and adds operational challenges. The connected graph architecture is less aligned with Snowflake’s best practices for this scenario. Option B omits replication, meaning European accounts would face management complexity and potential latency issues. Option C, using the Snowflake Marketplace, is intended for external sharing, not for internal multi-account data collaboration.

In conclusion, option D best balances centralized control, compliance, performance, and ease of management by combining Private Data Exchange with replication, enabling efficient and secure data sharing across all divisions, including Europe.

Question 6:

If a user with permission to view unmasked data copies data from a column with a masking policy to another column that does not have any masking policy, what will be the outcome?

A. The unmasked data will be copied into the new column.
B. Masked data will be copied into the new column.
C. The unmasked data will be copied but only users with permissions can see the unmasked data.
D. The unmasked data will be copied but no users can see it.

Correct answer: A

Explanation:

Data masking policies in databases protect sensitive information by showing masked (obscured) values to unauthorized users while allowing privileged users to see the original unmasked data. The key to understanding this question lies in how masking policies apply during data copy operations and what happens to data visibility after copying.

If a user has the proper privilege to view the unmasked data in the original column, they are effectively bypassing the masking policy. When they copy this data to a new column that does not have a masking policy applied, the database does not reapply masking rules on the target column because none exist there. Therefore, the unmasked data is copied as-is into the new column, which means that the full, unmasked values become stored and visible in this new location.

Option A correctly states this: the unmasked data will be copied into the new column. Since the new column lacks any masking policy, all users who can query it will see the unmasked data, regardless of their privileges on the original column.

Option B is incorrect because masked data would only be copied if the user lacked privileges to see unmasked data in the first place. Since the user has the necessary privileges, masking is not applied during the copy.

Option C is wrong because after the data is copied to a column without masking, it becomes fully visible to all users. There is no additional privilege check on data visibility if no masking policy exists on that column.

Option D is false because no masking policy means no data is hidden; thus, users can see the data.

In summary, when a privileged user copies data from a masked column to an unmasked one, the data is copied in its unmasked form, and becomes visible to all users who have access to the new column. This emphasizes the need for applying masking policies on columns that store sensitive data to ensure data privacy is maintained across the system.

Question 7:

How can a database architect configure clustering to maximize performance for various access paths on a single table?

A. Define multiple clustering keys for the table
B. Create several materialized views with different clustering keys
C. Use super projections to automatically handle clustering
D. Design a clustering key that includes all columns involved in the access paths

Correct Answer: D

Explanation:

Clustering in a database context refers to how data is physically stored and organized on disk or within the storage layer to improve query efficiency. Effective clustering can significantly reduce input/output operations by aligning data layout with typical query access patterns. This is especially important in columnar or distributed database systems that leverage clustering to optimize data retrieval.

When working with a given table, the primary goal is to enable fast query execution by matching the clustering strategy to the columns most commonly used in query predicates, joins, or sorting. This improves the “access paths” — the routes through which queries reach the data.

Option A suggests creating multiple clustering keys for the same table. However, most database systems permit only one clustering key per table, since multiple clustering keys would cause conflicting data organization and complicate query optimization.

Option B proposes using materialized views with different clustering keys. Materialized views store precomputed query results and can enhance performance for specific queries but do not directly influence the physical clustering of the underlying base table itself. Therefore, they do not solve the problem of clustering optimization for the table’s primary access paths.

Option C involves using super projections, a feature in some specialized database systems. While super projections can help optimize performance by creating specialized storage layouts, they do not replace the fundamental role of a well-designed clustering key that aligns with all major access paths.

Option D is the most appropriate choice because a clustering key composed of all columns frequently used in access paths ensures data is physically stored in an order that accelerates those queries. This organization minimizes disk I/O and speeds up query processing by reducing data scans and enabling efficient filtering or joins.

In conclusion, defining a comprehensive clustering key that covers all key access columns is the best way to optimize table performance through clustering. Hence, D is the correct answer.

Question 8:

What must be done to enable data sharing between Company A and Company B if they operate on different cloud platforms using Snowflake?

A. Build a data pipeline to move shared data to a storage location in the target cloud provider
B. Persist all views, since views cannot be shared across clouds otherwise
C. Establish data replication across the regions and cloud platforms of both companies
D. Both companies must use the same cloud platform, as data sharing requires a single cloud provider

Correct Answer: C

Explanation:

Snowflake provides a powerful data sharing feature that allows one account to securely share data with another account, even if they reside on different cloud platforms such as AWS, Azure, or Google Cloud. This cross-cloud data sharing capability enables organizations to collaborate seamlessly without moving data manually or using external storage intermediaries.

To share data between Company A and Company B on different cloud platforms, the key technical requirement is to set up data replication so that the shared data exists in the region and cloud platform where the consuming company operates. Snowflake manages this replication transparently, enabling access without compromising data security or freshness.

Option A—creating a data pipeline to write data to cloud storage—is unnecessary because Snowflake’s native sharing does not require copying or exporting data to external locations. It uses secure metadata pointers and replication behind the scenes, eliminating extra data movement.

Option B—persisting all views—is incorrect. Snowflake allows sharing of both tables and views seamlessly across clouds. Views do not need to be materialized or persisted differently to be shared, as Snowflake manages access control and query resolution dynamically.

Option D—requiring both companies to use the same cloud provider—is false. Snowflake’s cross-cloud architecture explicitly supports data sharing across AWS, Azure, and Google Cloud, making it possible for companies on different platforms to collaborate without migrating to a single provider.

Therefore, the correct answer is C, as setting up replication between the relevant cloud platforms and regions is essential to enable smooth and secure cross-cloud data sharing in Snowflake.

Question 9:

Which three statements correctly describe the properties of Snowflake’s result set cache?

A. Time Travel queries can be executed using the result set cache.
B. Snowflake retains cached query results for 24 hours.
C. The 24-hour retention timer resets each time cached results are accessed.
D. Data in the result set cache adds to storage costs.
E. The retention period can be extended up to 31 days.
F. Result set caches are unique to each warehouse and not shared.

Answer: B, C, F

Explanation:

Snowflake’s result set cache is designed to boost query performance by temporarily storing the results of executed queries. This mechanism allows identical subsequent queries to retrieve results rapidly without re-executing the query logic, thereby saving compute resources.

First, regarding option B, Snowflake automatically caches query results for 24 hours. If the exact same query is executed within that 24-hour window and the underlying data hasn’t changed, Snowflake serves the cached result, which greatly improves response time.

For option C, the retention period of 24 hours isn’t static; it resets each time the cached result is accessed. This means every time a query retrieves results from the cache, the timer restarts, ensuring that frequently accessed queries remain cached and available for another 24 hours from the latest use.

Option F is also correct because result set caches are warehouse-specific. Each virtual warehouse maintains its own cache. Therefore, if multiple warehouses execute the same query, they will each have separate caches, and one warehouse’s cache won’t be available to another. This separation ensures cache integrity but also means that cache benefits do not extend across warehouses.

On the other hand, option A is false since Time Travel allows querying historical versions of data, but result set cache stores only the actual executed query’s latest results, not historical snapshots. Thus, Time Travel queries do not use the result set cache.

Regarding option D, cached query results do not count toward your storage costs because Snowflake manages the cache in temporary storage, often in memory or ephemeral storage, rather than permanent storage.

Finally, option E is incorrect because the result set cache retention period cannot be extended beyond 24 hours; the system does not allow configurable cache expiry beyond this fixed timeframe.

In summary, Snowflake’s result set cache improves performance by storing query results for 24 hours (B), resetting the retention timer upon each access (C), and isolating caches per warehouse (F).

Question 10:

Which three organization-level actions can be performed by a user with the ORGADMIN role?

A. Renaming the organization
B. Creating new accounts
C. Viewing all accounts within the organization
D. Changing an individual account’s name
E. Deleting an account
F. Enabling database replication

Answer: A, B, C

Explanation:

The ORGADMIN role is a high-level administrative role within an organization’s cloud or service environment, responsible for overseeing organizational structure and key configurations. This role grants extensive permissions to manage organization-wide settings and accounts.

Regarding option A, the ORGADMIN has the authority to change the organization’s name. This task affects the organization's identity within the system, so it requires administrative privileges that the ORGADMIN role holds.

For option B, creating new accounts is a core responsibility of the ORGADMIN. This function allows the administrator to onboard new users, teams, or service accounts under the organization’s umbrella, enabling resource allocation and access control.

Option C is also within the ORGADMIN’s scope, as viewing a comprehensive list of all accounts associated with the organization is essential for effective governance and oversight. Access to this information helps maintain visibility of organizational structure and account activity.

However, option D is generally not handled by the ORGADMIN. Changing the name of a specific account is usually restricted to account-level administrators or the account owner to ensure security and minimize administrative risk.

Option E (deleting accounts) is often a sensitive action that might require special authorization beyond ORGADMIN privileges or handled by roles focused on account lifecycle management, so it is not commonly assigned to ORGADMIN.

Finally, option F involves technical database operations such as replication, typically performed by database administrators (DBAs) or specialized roles, rather than organizational administrators.

In summary, the ORGADMIN role focuses on broad organizational management tasks such as renaming the organization (A), creating new accounts (B), and viewing existing accounts (C), but does not generally extend to granular account management or technical database functions.

How to open VCE Files

Use VCE Exam Simulator to open VCE files

Learn More Full Version

Top Snowflake Certifications

Top Snowflake Certification Exams

Site Search:

VISA, MasterCard, AmericanExpress, UnionPay