segunda-feira, agosto 18, 2008

ASP.NET: DataSet vs DataReader (Pt. 02/03)

Introduction
A couple weeks ago (April 19th) I gave a talk to the local San Diego ASP.NET SIG and during my talk I mentioned how I, personally, rarely, if ever, use DataSets in my ASP.NET applications, sticking with the good ol' DataReader instead. Since then I have received a number of emails from attendees asking me why I don't use DataSets. Rather than responding to each questioner individually, I decided to write this article explaining my rationale. Read on to learn why I am a DataReader man all the way.

The Fundamentals of the DataReader
Before I can explain why I choose to use DataReaders over DataSets in my Web applications, it's imperative that we all have an understanding of the fundamentals of both DataSets and DataReaders. These two objects have different roles: DataSets are designed to be a mini-in-memory database whereas a DataReader is designed to be a ferry of data between the database layer and a .NET application.

In ADO.NET a provider is some source of data, and there exist provider-specific classes for working with particular providers. There's the SqlConnection, SqlCommand, SqlDataAdapter, and SqlDataReader classes for working with the SqlClient provider; there's the OleDbConnection, OleDbCommand, OleDbDataAdapter, and OleDbDataReader classes for working with the OleDb provider. Objects that are prefixed by a provider name (Sql, OleDb, Oracle, Odbc, etc.) are provider-specific objects. They are designed to work with the particular provider. A DataReader is one such object (i.e., SqlDataReader, OleDbDataReader, etc.).

To work with data via a DataReader you must first establish a connection to the data store and specify the query to execute. Next, the DataReader is created and acts as a bridge between the .NET application and the data store. For example, you might use code like the following:

' Establish Connection
Dim myConnection as New SqlConnection(connection string)
myConnection.Open()

' Create command
Dim myCommand as New SqlCommand(myConnection, SQL query)

' Create a DataReader e
Dim myReader as SqlDataReader
myReader = myCommand.ExecuteReader()

'Iterate through the results
While myReader.Read()
'... Work with the current record ...
End While

' Close the connection
myConnection.Close()

The DataReader loads one record from the data store at a time. Each time the DataReader's Read() method is called, the DataReader discards the current record, goes back to the database, and fetches the next record in the resultset. The Read() method returns True if a row was loaded from the database, and False if there are no more rows.

DataReaders are connected data objects because they require an active connection to the database. Remember, the DataReader is just a ferry of data between the application and database. Understandably, it cannot ferry information back from the database after the connection has been severed. Furthermore, a DataReader is limited to being read-only and forward-only. That is, the information retrieved from the database cannot be modified by the DataReader, nor can the DataReader retrieve records in a random order. Instead, a DataReader is limited to accessing the records in sequential order, from the first one to the last one, one record at a time.

The Fundamentals of the DataSet
DataSets are a more complex and feature-rich object than DataReaders. Whereas DataReaders simply scuttle data back from a data store, DataSets can be thought of as in-memory databases. Just like a database is comprised of a set of tables, a DataSet is made up of a collection of DataTable objects. Whereas a database can have relationships among its tables, along with various data integrity constraints on the fields of the tables, so too can a DataSet have relationships among its DataTables and constraints on its DataTables' fields.

Unlike the DataReader, a DataSet is a provider-neutral data object. There's no SqlDataSet or OleDbDataSet - just a plain, ol' DataSet. It's the responsibility of the provider's DataAdpater object to translate a particular provider's data into the provider-neutral DataSet. The following code snippet illustrates how to populate a DataSet with data from a SQL query.

' Establish Connection
Dim myConnection as New SqlConnection(connection string)
myConnection.Open()

' Create command
Dim myCommand as New SqlCommand(SQL query, myConnection)

' Create the DataAdapter
Dim myDataAdapter as New SqlDataAdapter(myCommand)

' Create the DataSet
Dim myDataSet as New DataSet

' Fill the DataSet
myDataAdapter.Fill(myDataSet)

' Close the connection
myConnection.Close()

'... Work with the contents of the DataSet ...

As the code snippet shows, the DataAdapter's Fill() method populates the DataSet with the results of the specified query. Behind the scenes, the DataAdapter is using a DataReader to read in the results of the query and fill the DataSet. The DataSet is a disconnected data object. Once the DataSet has been filled, the connection can be closed and the DataSet's contents can still be examined and manipulated.

Since a DataSet represents a separate, disconnected collection of data, it's no surprise that the DataSet's data is both editable and can be accessed randomly, two traits not exhibited by the DataReader. Additionally, the DataSet has some powerful XML-related capabilities. For example, you can serialize a DataSet into XML through its WriteXml() method; conversely, you can populate a DataSet from a properly formatted XML stream using the DataSet's ReadXml() method.

The Tradeoff Between the DataSet and DataReader
Regardless of whether or not you bring back database data using a DataSet or DataReader, you can display the underlying data in a DataGrid, DataList, or Repeater using the exact same code. Namely, you set the data Web control's DataSource property to the DataReader or DataSet and then call the Web control's DataBind() method. ASP.NET makes working with data so easy that ASP.NET developers might not give pause and think about what is the best data object to use. All things being equal, it doesn't really matter which data object you use.

But not all things are equal. Clearly there is a major difference in features supported between the DataReader and DataSet, so it's only logical that there be an inverse in the tradeoff between the two objects' performance. Simply put, the DataSet's increased feature set makes it a less performant choice for reading data than the DataReader.

According to A Speed Freak's Guide to Retrieving Data in ADO.NET, the DataReader is roughly thirty times more performant than the DataSet. For large amounts of data being brought back - several hundred or several thousand records - the absolute time differences between accessing data with these two objects can be quite pronounced. The graph below, for example, plots the results from A Speed Freak's Guide to Retrieving Data in ADO.NET for 100 to 1,000 retrieved records using a DataSet (the pink line) and a DataReader (the dark blue line). As the data shows, for retrieving 1,000 records the DataSet is more than 30 times slower than the DataReader (8.89 seconds vs. 0.29 seconds). Eep.


Be sure to read A Speed Freak's Guide to Retrieving Data in ADO.NET for the actual numbers for test runs between 1 to 100 records, 100 to 1,000 records, and 1,000 to 10,000 records, along with the test conditions used. In addition to examining the differences between the DataSet and DataReader, the article compares the performance of the SqlClient provider vs. the OleDb provider when accessing data from a SQL Server 2000 database.

Additional statistics can be found at Performance Comparison: Data Access Techniques, which compares the DataSet and DataReader against a number of common data access scenarios. The end result is that the DataReader is more performant than the DataSet, although this particular article's results do not show as large a performance difference between these two data objects as the Speed Freak article does. Also, the article notes:

In all of the preceding tests, we saw that DataReader outperformed DataSet. As mentioned earlier, the DataReader offers better performance because it avoids the performance and memory overhead associated with the creation of the DataSet. ... The DataReader is a better choice for applications that require optimized read-only and forward-only data access. The sooner you load the data off the DataReader, close the DataReader, and close the database connection, the better performance you get.

When is a DataSet Useful?
Despite the DataSet's performance limitations, there is a time and a place for the DataSet, otherwise it wouldn't be a core component in the .NET Framework. However, it is my contention that rarely, if ever, is the time and place for a DataSet in a Web application. In my experience, DataSets are useful in one of the two following situations:

  1. In a desktop, WinForms application. Consider a desktop-based data entry-type program. A user might fire up the program, load up the sales data from some database server, make some changes, and then want to save those changes. This is an ideal situation for the DataSet. It allows the data to be read into a DataSet residing in the client's memory, which affords the user the ability to work on the data without needing to constantly make trips back to the database. Upon completing editing the data, they can do a batch update, gracefully handling any changes that may have occurred while the user was working with the data in a disconnected state. Furthermore, since the DataSet is a disconnected data store, this data can be taken offline. A salesman traveling to a client's site could load this data and be able to review the data on his laptop while in transit, or while at the client's office.

    (A situation like this might arise in a Web application. I've worked on projects before where the client was adamant that they be able to interact with the Web application as discussed above. That is, they'd be able to visit a page, make a series of changes, and then click a single "Update" button. The underlying database data wouldn't be updated until that "Update" button was clicked. For this particular problem I used a Session-based DataSet, using the same techniques for a DataSet used in a desktop application for batch editing and updating.)

  2. For sending/receiving remote database information or for allowing communication between disparate platforms. Since a DataSet can be serialized/deserialized into XML so easily, they are a prime candidate for sending information across physical boundaries or as a means of serializing data into a platform-neutral format. For example, if you want to return database data from a Web service, one option is to simply read the database data into a DataSet and then return the DataSet from the Web service method. The DataSet will automatically be serialized into XML and sent over the wire. (Personally I don't recommend returning data from a Web service in this manner. Rather, I prefer to use custom business objects - it allows a finer degree of control over the XML serialization, provides a much lighter return payload, and appears less architecture-specific.)
Now, how often are you doing either of these things in your day-to-day ASP.NET development? Hardly ever, I'd wager, which is why you probably shouldn't be using DataSets! While the ASP.NET data Web controls are rather indifferent on what data object you use, you are suffering from a performance loss by choosing to use a DataSet.

Reasons Why You May Be Using a DataSet... and Reasons Why You Probably Shouldn't
In this article I have made a pretty blanket statement in saying, "Use DataReaders in Web applications and don't use DataSets." There are some scenarios in Web applications where it may seem like a DataSet is the only option. For example, imagine that you want to cache some database information that will be used on many pages across the site. This data may be user-specific and stored in the Session, or it may be the same across all users and therefore stored in the data cache. Regardless, a DataReader can't be cached because it is a connected data object, and connections to a database should be short-lived. That is, the absolutely last thing we want is an open connection sitting around in the cache. Therefore, if you want to have cached database data it may seem that the only option is to use a DataSet.

But it isn't the only option. You could, instead, create a class that has as its properties the database fields that you are storing in the DataSet. Then, when you wanted to cache the data, you could use a DataReader to read the query from the database and iterate through the records returned. For each record you'd create an instance of the custom class, set its properties to the field values of the query, and add the custom class to an ArrayList (or, preferably, a strongly-typed collection object). You'd then cache this collection of custom objects. Not only does this method prove more efficient, but, personally, I think it's more maintainable, as it removes the tight coupling between the database field names and accessing their values from the cached object (as with the DataSet). Furthermore, you can bind this custom collection to an ASP.NET data Web control exactly like you would a DataSet or DataReader. (See Displaying Custom Classes in a DataGrid for more information on this technique.)

Another reason you might use a DataSet is because you want to have random access to some data in order to search through the records, because these records may be used repeatedly. For example, when displaying a master/detail DataGrid where one column of a DataGrid contains the parent record and the other column contains another DataGrid with that particular row's children, it might make sense to use a DataSet to grab all of the children records from the database rather than having to requery the database for each DataGrid row's related set of children. (See An Extensive Examination of the DataGrid Web Control: Part 14 for more information on this technique. Essentially, it involves creating a DataGrid that looks similar to the one shown to the right.) The performance tradeoff here depends on how many rows are in the parent table. Since a DataSet is, roughly, 30 times slower than a DataReader, if there are more than 30 records in the parent table being displayed it probably makes sense to use a single DataSet rather than requerying for each parent record. (Ideally, a collection of custom objects would be used, as discussed earlier.)

Conclusion
In this article we examined the fundamentals of the two data access objects provided by ADO.NET: the DataReader and the DataSet. Both objects have their time and place in .NET applications but, in my opinion, DataSets are rarely, if ever, useful in ASP.NET Web applications. There are exceptions, granted, but for the majority of Web applications, DataReaders should be used exclusively.


Article written by Scott Mitchell (avaliable Here)

Sem comentários: