MSDN Documentation

Microsoft Developer Network

Introduction to DataSet

The DataSet object in ADO.NET represents a complete set of data, including tables, relationships, and constraints. It is an in-memory representation of relational data, making it ideal for disconnected data scenarios where you retrieve data from a data source, work with it without a constant connection, and then update the data source.

A DataSet can contain multiple DataTable objects, each representing a table of data. These tables can be related to each other through DataRelation objects, mirroring the structure of a relational database. Furthermore, DataSet supports constraints such as primary keys and foreign keys to enforce data integrity.

Key Benefit: The DataSet is designed for disconnected data access. This means you can fetch data, close the connection to the data source, manipulate the data in the DataSet, and then re-open the connection to send changes back to the data source.

Key Concepts

Tables

A DataSet contains a collection of DataTable objects. Each DataTable represents a single table of data, with columns defining the schema and rows containing the actual data.

A DataTable itself has a collection of DataColumn objects and a collection of DataRow objects.

Relations

DataRelation objects define how tables within a DataSet are related. These relationships are crucial for navigating between related data, similar to how foreign keys work in a database. They allow you to easily access parent or child records from a given record in a related table.

Constraints

Constraints, such as UniqueConstraint and ForeignKeyConstraint, are used to enforce data integrity within the DataSet. These constraints mimic database constraints and help maintain the validity of the data.

Creating and Populating a DataSet

You can create a new DataSet programmatically or populate it from various sources, including XML files or by executing commands against a database using DataAdapter objects.

Here's a basic example of creating a DataSet and adding a DataTable:

using System.Data;

// ...

DataSet myDataSet = new DataSet("MySampleData");

DataTable customersTable = new DataTable("Customers");

// Define columns
customersTable.Columns.Add("CustomerID", typeof(int));
customersTable.Columns.Add("CompanyName", typeof(string));
customersTable.Columns.Add("ContactName", typeof(string));

// Set primary key
customersTable.PrimaryKey = new DataColumn[] { customersTable.Columns["CustomerID"] };

// Add some rows
customersTable.Rows.Add(1, "Microsoft", "Bill Gates");
customersTable.Rows.Add(2, "Google", "Sundar Pichai");

// Add the table to the DataSet
myDataSet.Tables.Add(customersTable);

Working with Data in a DataSet

Accessing Rows and Columns

You can access specific tables, rows, and columns within a DataSet.

// Access a specific table
DataTable table = myDataSet.Tables["Customers"];

// Iterate through rows
foreach (DataRow row in table.Rows)
{
Console.WriteLine($"ID: {row["CustomerID"]}, Company: {row["CompanyName"]}");
}

// Access a specific row by primary key
DataRow specificRow = table.Rows.Find(1);
if (specificRow != null)
{
Console.WriteLine($"Found Customer: {specificRow["CompanyName"]}");
}

Filtering and Sorting

You can filter and sort the rows in a DataTable using DataView objects.

DataView view = new DataView(myDataSet.Tables["Customers"]);

// Filter rows where CustomerID is greater than 1
view.RowFilter = "CustomerID > 1";

// Sort by CompanyName
view.Sort = "CompanyName ASC";

Console.WriteLine("Filtered and Sorted Customers:");
foreach (DataRowView rowView in view)
{
Console.WriteLine($"{rowView["CustomerID"]} - {rowView["CompanyName"]}");
}

Modifying Data

Rows can be added, updated, or deleted. The DataRowState property tracks the status of a row (e.g., Added, Modified, Deleted, Unchanged).

DataRow newRow = table.NewRow();
newRow["CustomerID"] = 3;
newRow["CompanyName"] = "Amazon";
newRow["ContactName"] = "Jeff Bezos";
table.Rows.Add(newRow);

// Update a row
DataRow rowToUpdate = table.Rows.Find(2);
if (rowToUpdate != null)
{
rowToUpdate["ContactName"] = "Pichai Sundar";
}

// Delete a row
DataRow rowToDelete = table.Rows.Find(1);
if (rowToDelete != null)
{
rowToDelete.Delete();
}

Data Relationships

DataRelation objects link tables, enabling navigation between parent and child records. This is essential for representing one-to-many or one-to-one relationships.

// Assume you have an "Orders" table with "CustomerID" column
DataTable ordersTable = new DataTable("Orders");
ordersTable.Columns.Add("OrderID", typeof(int));
ordersTable.Columns.Add("CustomerID", typeof(int));
ordersTable.Columns.Add("OrderDate", typeof(DateTime));

myDataSet.Tables.Add(ordersTable);

// Add sample order data
ordersTable.Rows.Add(101, 2, new DateTime(2023, 10, 26));
ordersTable.Rows.Add(102, 2, new DateTime(2023, 10, 27));
ordersTable.Rows.Add(103, 3, new DateTime(2023, 10, 27));

// Create a relationship
DataColumn parentColumn = myDataSet.Tables["Customers"].Columns["CustomerID"];
DataColumn childColumn = myDataSet.Tables["Orders"].Columns["CustomerID"];
DataRelation relation = new DataRelation("CustomerOrders", parentColumn, childColumn);
myDataSet.Relations.Add(relation);

// Access child rows from a parent row
DataRow customerRow = myDataSet.Tables["Customers"].Rows.Find(2);
if (customerRow != null)
{
DataRow[] childOrders = customerRow.GetChildRows("CustomerOrders");
Console.WriteLine($"Orders for {customerRow["CompanyName"]}:");
foreach (DataRow order in childOrders)
{
Console.WriteLine($"- Order ID: {order["OrderID"]}, Date: {order["OrderDate"]}");
}
}

XML Support

DataSet has robust support for reading and writing data in XML format. This is invaluable for data exchange and persistence.

Tip: When writing XML, consider using WriteXmlSchema(string path) to also save the schema of the DataSet, which can be useful for re-creating the structure later.

Performance Considerations

While powerful, the DataSet can consume significant memory, especially with large datasets. Consider these points:

Conclusion

The DataSet is a versatile and powerful component in ADO.NET for managing in-memory datasets. Its ability to handle complex relational data, enforce constraints, and support disconnected scenarios makes it a cornerstone for many data-driven .NET applications. By understanding its core concepts and best practices, you can leverage it effectively to build robust and scalable applications.