Softstuff Consulting
My occasional musings of a technical nature
  • Send us a message
  • About Softstuff Consulting
  • Privacy Policy
  • Main Website
T-SQL

Getting random data out of SQL Server

by Ian Blair February 23, 2019 No Comments

SQL Server is excellent for storing data and passing it back in nice ordered blocks, but what happens if you just want to take a random selection of data for example a random list of customers to test a new marketing campaign on.

Since SQL2005 there has been a clause called TABLESAMPLE that will allow you to get some random data from a table either as a percentage of the table or a set number of rows.

Select * from mytableTABLESAMPLE(10 PERCENT)

Or

Select * from mytableTABLESAMPLE(100 ROWS)

You can also specify a REPEATABLE option so that if no updates, inserts or deletes have been made to the database inbetween queries the same results will be returned.

Select * from mytableTABLESAMPLE(10 PERCENT) REPEATABLE(1)

The (1) is an arbitrary value, as long as this stays the same between queries and no other changes have been made you will see the same results.

Unfortunately due to the way this clause works under the hood the results may not be truly random as it uses the way the data is stored internally in pages to sample information rather than taking the database rows as a whole and returning the sample from there.

This also has some limitations in that it cannot be used in Views, or Rowsets.

 

Another more flexible method to get random data out from a sql table is to simply sort by NEWID(), this works by temporarily creating a Guid for each row and sorting by it, and is a very effective way especially when combined with a TOP command to extract a random set of rows. A Guid or Globally Unique Identifier is a computer generated unique reference number that is 128bits long and should be unique as it can be generated with 2^122 unique values. A Guid usually takes the form {21EC2020-3AEA-4069-A2DD-08002B30309D}.

Select TOP 50 * from mytable order by NEWID()

This will return a random 50 rows from the table, but if the query is run a second time it will return a different 50 rows, although it is possible that some of the rows from the first query may appear in the second.

This method has the advantage that the order by clause can be included in almost existing query or view but care has to be taken about the order it appears in if there is already an existing ORDER BY clause.

Select top 50 * from mytable order by name,NEWID()

This will probably not return the results you are looking for as the query will perform a fixed sort before the random part you are looking for.

The query should be rewritten as

Select top 50 * from mytable order by NEWID(),name

This will return a random sample sorted in the order defined after the NEWID().

datat-sql

  • Previous Using the CLR to create a Random Number generator4 years ago
  • Next A SQL Server random number generator4 years ago

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • How to make your Powershell scripts more annoying
  • What motherboard is in my PC
  • A Dynamics365 plugin thought experiment
  • Registering a Dynamics365 plugin and I get an error
  • Going back in time with Dynamics365

Categories

  • Bootstrap
  • C#
  • CSS
  • Dot Net Core
  • Dynamics365
  • JavaScript
  • Powershell
  • T-SQL
  • Thoughts
  • VBScript
  • Visual Studio
  • Windows
  • Xamarin

Recent Comments

  • S0ftStuffB055 on Call a Dynamics365 workflow from JavaScript
  • Siva on Call a Dynamics365 workflow from JavaScript
  • TC Sharpe on Throw exceptions in Dynamics365 workflow plugins, or don’t
  • BigOwl on Throw exceptions in Dynamics365 workflow plugins, or don’t
  • CRMGod on Access a Dynamics365 embedded webresource

Archives

  • January 2021
  • May 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • October 2019
  • June 2019
  • May 2019
  • February 2019
2023 Softstuff Consulting. Donna Theme powered by WordPress
  • Twitter