
Streamlining Large Dataset Processing in Tableau Prep
Understanding the Role of Data Sampling in Tableau Prep
Optimizing Tableau Prep Workflows with Smart Data Sampling
Introduction to Data Sampling in Tableau Prep
Benefits of Data Sampling for Large Datasets
Best Practices for Implementing Data Sampling in Tableau Prep
Automating Data Preparation Using Sampling Techniques
Step-by-Step Guide to Sampling in Tableau Prep
Configuring Sample Sizes for Optimal Performance
Avoiding Common Pitfalls in Data Sampling
Comparing Different Sampling Methods in Tableau Prep
Today, making informed decisions relies heavily on one’s ability to analyze and interpret large datasets efficiently. However, working with extensive data can pose challenges, including performance issues and overwhelming complexity. This is where data sampling comes into play.
Sampling enables you to work with a smaller, manageable subset of your data, which can significantly streamline the testing and validation process at each step of your analysis.
In this blog, we will examine the concept of data sampling for testing purposes in Tableau Prep. When dealing with large datasets, utilizing a sample for testing is often the most effective approach.
Where do you find this component?
As soon as data is loaded into Tableau Prep, you will see below options in the Input Step to select Data sample.

Sampling lets you work with a subset of data for faster testing and validation at each step. Your choice of sampling method can significantly affect performance. When you run the flow and generate an output, Tableau Prep will process all the records in your data set.
In most cases, when working with datasets containing a huge number of rows (let’s say 1 million), Tableau Prep typically employs sampling to enhance performance. The sample size is determined by the number of fields in the dataset and the data types of those fields.
- Automatic (recommended): Loads data quickly and automatically determines the number of rows needed for a sample, with a maximum of 393,216 rows.
- Specify (Fixed number of rows): Loads a small number of rows for quick data structure analysis and fast load times. Set the number of rows to less than 1 million.
- Maximum: Loads up to 1,048,576 rows, or as many as possible within this limit. Ensure you meet High-Performance Requirements for large datasets, as this setting can impact performance due to the size of the data used for sampling.
The records are determined by the “Row Selection” method:
- Quick Select: This method selects the first rows based on the user selection or the rows that were cached in memory from a previous query.
- Random: It returns a random selection of rows based on the number of records requested.
- Stratified: From Tableau Prep 2023.3 version onwards, you can use the new Stratified Row Selection to control how Prep samples data. By selecting a column of interest, such as Product Category, Prep will automatically sample an equal number of rows for each value in that column. The option takes longer to load data than Quick Select or Random row but gives you the most representative sample.
Let’s take the example of Superstore Dataset with 5,50,476 rows.


Stratified Rows
To ensure an equal distribution of rows across the specified field for improved data validation, you must specify a number of rows that doesn’t exceed the count of the field value with the fewest rows.
By using the same example of Sample Superstore:
Technology has the lowest number of rows (100,710). So, ensure that the numbers of rows don’t exceed 100,710 for equal distribution across unique values present in “Category” field.
Want to learn more about Tableau Prep?
Tableau Prep is a data preparation tool designed to help users clean, shape, and organize their data before analysis in Tableau Desktop. It simplifies the data preparation process, making it more accessible for users who may not have extensive technical expertise.
Refer to the following Tableau Prep blog posts for additional information: