Azure Tables are used to store simple tabular data at petabyte scale on Microsoft Azure. Azure Queue storage is used to provide messaging between application components so they can be de-coupled and scale under heavy load.
This skill covers how to:
Implement CRUD with and without transactions;
Design and manage partitions;
Query using OData;
Designing, managing, and scaling tablepartitions;
Add and process queue messages;
Retrieve a batch of messages;
Scale queues
Choose between Azure Storage Tables and Azure Cosmos DB Table API
Azure Tables are simple tables filled with rows and columns. They are a key-value database solution, which references how the data is stored and retrieved, not how complex the table can be. Tables store data as a collection of entities. Each entity has a property. Azure Tables can have 255 properties (or columns to hijack the relational vocabulary). The total entity size (or row size) cannot exceed 1MB. That might seem small initially, but 1MB can store a lot of tabular data per entity. Azure Tables are similar to Azure Storage blobs, in that you are not charged for compute time for inserting, updating, or retrieving your data. You are only charged for the total storage of your data.
Azure Tables are stored in the same storage account as Azure Storage blobs discussed earlier. Where blobs organize data based on container, Azure Tables organize data based on table name. Entities that are functionally the same should be stored in the same table. For example, all customers should be stored in the Customers table, while their orders should be stored in the Orders table.
Azure Tables store entities based on a partition key and a row key. Partition keys are the partition boundary. All entities stored with the same PartitionKey property are grouped into the same partition and are served by the same partition server. Choosing the correct partition key is a key responsibility of the Azure developer. Having a few partitions will improve scalability, as it will increase the number of partition servers handling your requests. Having too many partitions, however, will affect how you do batch operations like batch updates or large data retrieval. We will discuss this further at the end of this section.
Later in this chapter, we will discuss Azure SQL Database. Azure SQL Database also allows you to store tabular data. Why would you use Azure Tables vs Azure SQL Database? Why have two products that have similar functions? Well, actually they are very different.
Azure Tables service does not enforce any schema for tables. It simply stores the properties of your entity based on the partition key and the row key. If the data in the entity matches the data in your object model, your object is populated with the right values when the data is retrieved. Developers need to enforce the schema on the client side. All business logic for your application should be inside the application and not expected to be enforced in Azure Tables. Azure SQL Database also has an incredible amount of features that Azure Tables do not have including: stored procedures, triggers, indexes, constraints, functions, default values, row and column level security, SQL injection detection, and much, much more.
If Azure Tables are missing all of these features, why is the service so popular among developers? As we said earlier, you are not charged for compute resources when using Azure Tables, and you are charged in Azure SQL DB. This makes Azure Tables extremely affordable for large datasets. If we effectively use table partitioning, Azure Tables will also scale very well without sacrificing performance.
Now that you have a good overview of Azure Tables, let’s dive right in and look at using it. If you’ve been following along through Azure Storage blobs, some of this code will be familiar to you.
In this section, you learn how to access table storage programmatically.
Create a C# console application.
In your app.config file, add an entry under the Configuration element, replacing the account name and key with your own storage account details:
<configuration>
<appSettings>
<add key="StorageConnectionString" value="DefaultEndpointsProtocol=
https;AccountName=<your account name>;AccountKey=<your account key>" />
</appSettings>
</configuration>
Use NuGet to obtain the Microsoft.WindowsAzure.Storage.dll. An easy way to do this is by using the following command in the NuGet console:
Install-package windowsazure.storage
Add the following using statements to the top of your Program.cs file:
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Table;
using Microsoft.WindowsAzure;
using System.Configuration;
Add a reference to System.Configuration.
Type the following command to retrieve your connection string in the Main function of Program.cs:
var storageAccount =CloudStorageAccount.Parse
( ConfigurationManager.AppSettings["StorageConnectionString"]);
Use the following command to create a table if one doesn’t already exist:
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("orders");
table.CreateIfNotExists();
To add entries to a table, you create objects based on the TableEntity base class and serialize them into the table using the Storage Client Library. The following properties are provided for you in this base class:
Partition Key Used to partition data across storage infrastructure
Row Key Unique identifier in a partition
Timestamp Time of last update maintained by Azure Storage
ETag Used internally to provide optimistic concurrency
The combination of partition key and row key must be unique within the table. This combination is used for load balancing and scaling, as well as for querying and sorting entities.
Follow these steps to add code that inserts records:
Add a class to your project, and then add the following code to it:
using System;
using Microsoft.WindowsAzure.Storage.Table;
public class OrderEntity : TableEntity
{
public OrderEntity(string customerName, string orderDate)
{
this.PartitionKey = customerName;
this.RowKey = orderDate;
}
public OrderEntity() { }
public string OrderNumber { get; set; }
public DateTime RequiredDate { get; set; }
public DateTime ShippedDate { get; set; }
public string Status { get; set; }
}
Add the following code to the console program to insert a record:
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("orders");
OrderEntity newOrder = new OrderEntity("Archer", "20141216");
newOrder.OrderNumber = "101";
newOrder.ShippedDate = Convert.ToDateTime("12/18/2017");
newOrder.RequiredDate = Convert.ToDateTime("12/14/2017");
newOrder.Status = "shipped";
TableOperation insertOperation = TableOperation.Insert(newOrder);
table.Execute(insertOperation);
You can group inserts and other operations into a single batch transaction. All operations in the batch must take place on the same partition. You can have up to 100 entities in a batch. The total batch payload size cannot be greater than four MBs.
The following code illustrates how to insert several records as part of a single transaction. This is done after creating a storage account object and table.:
TableBatchOperation batchOperation = new TableBatchOperation();
OrderEntity newOrder1 = new OrderEntity("Lana", "20141217");
newOrder1.OrderNumber = "102";
newOrder1.ShippedDate = Convert.ToDateTime("1/1/1900");
newOrder1.RequiredDate = Convert.ToDateTime("1/1/1900");
newOrder1.Status = "pending";
OrderEntity newOrder2 = new OrderEntity("Lana", "20141218");
newOrder2.OrderNumber = "103";
newOrder2.ShippedDate = Convert.ToDateTime("1/1/1900");
newOrder2.RequiredDate = Convert.ToDateTime("12/25/2014");
newOrder2.Status = "open";
OrderEntity newOrder3 = new OrderEntity("Lana", "20141219");
newOrder3.OrderNumber = "103";
newOrder3.ShippedDate = Convert.ToDateTime("12/17/2014");
newOrder3.RequiredDate = Convert.ToDateTime("12/17/2014");
newOrder3.Status = "shipped";
batchOperation.Insert(newOrder1);
batchOperation.Insert(newOrder2);
batchOperation.Insert(newOrder3);
table.ExecuteBatch(batchOperation);
More Info: Entity Group Transactions
You can batch transactions that belong to the same table and partition group for insert, update, merge, delete, and related actions programmatically or by using the Storage API. For more information, see the reference athttp://msdn.microsoft.com/en-us/library/dd894038.aspx.
You can select all of the entities in a partition or a range of entities by partition and row key. Wherever possible, you should try to query with the partition key and row key. Querying entities by other properties does not work well because it launches a scan of the entire table.
Within a table, entities are ordered within the partition key. Within a partition, entities are ordered by the row key. RowKey is a string property, so sorting is handled as a string sort. If you are using a date value for your RowKey property use the following order: year, month, day. For instance, use 20140108 for January 8, 2014.
The following code requests all records within a partition using the PartitionKey property to query:
TableQuery<OrderEntity> query = new TableQuery<OrderEntity>().Where(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "Lana"));
foreach (OrderEntity entity in table.ExecuteQuery(query))
{
Console.WriteLine("{0}, {1}\t{2}\t{3}", entity.PartitionKey, entity.RowKey,
entity.Status, entity.RequiredDate);
}
Console.ReadKey();
One technique you can use to update a record is to use InsertOrReplace(). This creates the record if one does not already exist or updates an existing record, based on the partition key and the row key. In this example, we retrieve a record we inserted during the batch insert example, change the status and shippedDate property and then execute an InsertOrReplace operation:
TableOperation retrieveOperation = TableOperation.Retrieve<OrderEntity>("Lana",
"20141217");
TableResult retrievedResult = table.Execute(retrieveOperation);
OrderEntity updateEntity = (OrderEntity)retrievedResult.Result;
if (updateEntity != null)
{
updateEntity.Status = "shipped";
updateEntity.ShippedDate = Convert.ToDateTime("12/20/2014");
TableOperation insertOrReplaceOperation = TableOperation.
InsertOrReplace(updateEntity);
table.Execute(insertOrReplaceOperation);
}
To delete a record, first retrieve the record as shown in earlier examples, and then delete it with code, such as assuming deleteEntity is declared and populated similar to how we created one earlier:
TableOperation deleteOperation = TableOperation.Delete(deleteEntity);
table.Execute(deleteOperation);
Console.WriteLine("Entity deleted.");
The Storage API for tables supports OData, which exposes a simple query interface for interacting with table data. Table storage does not support anonymous access, so you must supply credentials using the account key or a Shared Access Signature (SAS) (discussed in “Manage Access”) before you can perform requests using OData.
To query what tables you have created, provide credentials, and issue a GET request as follows:
https://myaccount.table.core.windows.net/Tables
To query the entities in a specific table, provide credentials, and issue a GET request formatted as follows:
https://<your account name>.table.core.windows.net/<your table
name>(PartitionKey=’<partition-key>’,RowKey=’<row-key>’)?$select=
<comma separated
property names>
Note: Query Limitations
The result is limited to 1,000 entities per request, and the query will run for a maximum of five seconds.
More Info: Odata
For more information on OData, see the reference at http://msdn.microsoft.com/en-us/library/azure/dn535600.aspx.
The Azure Table service can scale to handle massive amounts of structured data and billions of records. To handle that amount, tables are partitioned. The partition key is the unit of scale for storage tables. The table service will spread your table to multiple servers and key all rows with the same partition key co-located. Thus, the partition key is an important grouping, not only for querying but also for scalability.
There are three types of partition keys to choose from:
Single value There is one partition key for the entire table. This favors a small number of entities. It also makes batch transactions easier since batch transactions need to share a partition key to run without error. It does not scale well for large tables since all rows will be on the same partition server.
Multiple values This might place each partition on its own partition server. If the partition size is smaller, it’s easier for Azure to load balance the partitions. Partitions might get slower as the number of entities increases. This might make further partitioning necessary at some point.
Unique values This is many small partitions. This is highly scalable, but batch transactions are not possible.
For query performance, you should use the partition key and row key together when possible. This leads to an exact row match. The next best thing is to have an exact partition match with a row range. It is best to avoid scanning the entire table.
The Azure Storage Queue service provides a mechanism for reliable inter-application messaging to support asynchronous distributed application workflows. This section covers a few fundamental features of the Queue service for adding messages to a queue, processing those messages individually or in a batch, and scaling the service.
More Info: Queue Service
For a general overview of working with the Queue service, see the reference at http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-queues/.
You can access your storage queues and add messages to a queue using many storage browsing tools; however, it is more likely you will add messages programmatically as part of your application workflow.
The following code demonstrates how to add messages to a queue. In order to use it, you will need a using statement for Microsoft.WindowsAzure.Storage.Queue. You can also create a queue in the portal called, “queue:”
CloudQueueClient queueClient = storageAccount.CreateCloudQueueClient();
//This code assumes you have a queue called "queue" already. If you don’t have one, you
should call queue.CreateIfNotExists();
CloudQueue queue = queueClient.GetQueueReference("queue");
queue.AddMessage(new CloudQueueMessage("Queued message 1"));
queue.AddMessage(new CloudQueueMessage("Queued message 2"));
queue.AddMessage(new CloudQueueMessage("Queued message 3"));
In the Azure Portal, you can browse to your storage account, browse to Queues, click the queue in the list and see the above messages.
Note: Message Identifiers
The Queue service assigns a message identifier to each message when it is added to the queue. This is opaque to the client, but it is used by the Storage Client Library to identify a message uniquely when retrieving, processing, and deleting messages.
More Info: Large Messages
There is a limit of 64 KB per message stored in a queue. It is considered best practice to keep the message small and to store any required data for processing in a durable store, such as SQL Azure, storage tables, or storage blobs. This also increases system reliability since each queued message can expire after seven days if not processed. For more information, see the reference at https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted.
Messages are typically published by a separate application in the system from the application that listens to the queue and processes messages. As shown in the previous section, you can create a CloudQueue reference and then proceed to call GetMessage() to de-queue the next available message from the queue as follows:
CloudQueueMessage message = queue.GetMessage(new TimeSpan(0, 5, 0));
if (message != null)
{
string theMessage = message.AsString;
// your processing code goes here
}
Note: Invisibility Setting
By default, when you de-queue a message, it is invisible to the queue for 30 seconds. In the event message processing exceeds this timeframe, supply an alternate setting for this value when creating or updating the message. You can set the timeout to a value between one second and seven days. Visibility can also exceed the message expiry time.
A queue listener can be implemented as single-threaded (processing one message at a time) or multi-threaded (processing messages in a batch on separate threads). You can retrieve up to 32 messages from a queue using the GetMessages() method to process multiple messages in parallel. As discussed in the previous sections, create a CloudQueue reference, and then proceed to call GetMessages(). Specify the number of items to de-queue up to 32 (this number can exceed the number of items in the queue) as follows:
IEnumerable<CloudQueueMessage> batch = queue.GetMessages(10, new TimeSpan(0, 5, 0));
foreach (CloudQueueMessage batchMessage in batch)
{
Console.WriteLine(batchMessage.AsString);
}
Note: Parallel Processing Overhead
Consider the overhead of message processing before deciding the appropriate number of messages to process in parallel. If significant memory, disk space, or other network resources are used during processing, throttling parallel processing to an acceptable number will be necessary to avoid performance degradation on the compute instance.
When working with Azure Storage queues, you need to consider a few scalability issues, including the messaging throughput of the queue itself and the design topology for processing messages and scaling out as needed.
Each individual queue has a target of approximately 20,000 messages per second (assuming a message is within 1 KB). You can partition your application to use multiple queues to increase this throughput value.
As for processing messages, it is more cost effective and efficient to pull multiple messages from the queue for processing in parallel on a single compute node; however, this depends on the type of processing and resources required. Scaling out compute nodes to increase processing throughput is usually also required.
You can configure VMs or cloud services to auto-scale by queue. You can specify the average number of messages to be processed per instance, and the auto-scale algorithm will queue to run scale actions to increase or decrease available instances accordingly.
More Info: Back Off Polling
To control storage costs, you should implement a back off polling algorithm for queue message processing. This and other scale considerations are discussed in the reference at https://docs.microsoft.com/en-us/azure/storage/common/storage-performance-checklist.
Azure Cosmos DB is a cloud-hosted, NoSQL database that allows different data models to be implemented. NoSQL databases can be key/value stores, table stores, and graph stores (along with several others). Azure Cosmos DB has different engines that accommodate these different models. Azure Cosmos DB Table API is a key value store that is very similar to Azure Storage Tables.
The main differences between these products are:
Azure Cosmos DB is much faster, with latency lower than 10ms on reads and 15ms on writes at any scale.
Azure Table Storage only supports a single region with one optional readable secondary for high availability. Azure Cosmos DB supports over 30 regions.
Azure Table Storage only indexes the partition key and the row key. Azure Cosmos DB automatically indexes all properties.
Azure Table Storage only supports strong or eventual consistency. Consistency refers to how up to date the data is that you read and weather you see the latest writes from other users. Stronger consistency means less overall throughput and concurrent performance while having more up to date data. Eventual consistency allows for high concurrent throughput but you might see older data. Azure Cosmos DB supports five different consistency models and allows those models to be specified at the session level. This means that one user or feature might have a different consistency level than a different user or feature.
Azure Table Storage only charges you for the storage fees, not for compute fees. This makes Azure Table Storage very affordable. Azure Cosmos DB charges for a Request Unit (RU) which really is a way for a PAAS product to charge for compute fees. If you need more RUs, you can scale them up. This makes Cosmos DB significantly more expensive than Azure Storage Tables.