DotNet First Side: Applying LINQ

LINQ is really the centerpiece of a wide range of data access technologiesfrom datasets to databases, text files to XML documents, objects to object graphs. The extensibility of LINQ is one of its most elegant features, allowing LINQ to query just about any enumerable data source.

In this section we'll look at some of the more interesting ways you can use LINQ to query DataSets, databases, XML documents, text files, and more.

LINQ to DataSets

LINQ supports the querying of both typed and untyped DataSets. Expanding on our theme of a hospital scheduling application, suppose we have a namespace DataSets with a typed dataset SchedulingDocs containing three tables: Doctors, Calls, and Vacations. The Doctors table contains one record for each doctor, the Calls table keeps track of which doctor is on call each day, and the Vacations table makes note of vacation requests. The DataSet is summarized in Figure 2.

Tables in the SchedulingDocs typed DataSet

Let's assume an instance of SchedulingDocs has been created and filled:

DataSets.SchedulingDocs ds = new DataSets.SchedulingDocs(); // create dataset

. // open connection to a database and fill each table?

To find all the doctors living within Chicago, the query is exactly as we've seen before:

var chicago = from d in ds.Doctors

where d.City == "Chicago"

select d;

In this case ds.Doctors denotes a DataTable object, and d represents a DataRow object. Since the DataSet is typed, these objects are strongly-typed as DoctorsTable and DoctorsRow, respectively.

NOTE

For completeness, here's the code to create and fill an instance of SchedulingDocs, declared within the DataSets namespace:

using System.Data.Common;

using DataSets;

SchedulingDocsTableAdapters.DoctorsTableAdapter doctorsAdapter;

SchedulingDocsTableAdapters.CallsTableAdapter callsAdapter;

SchedulingDocsTableAdapters.VacationsTableAdapter vacationsAdapter;

SchedulingDocs ds;

DbProviderFactory dbFactory;

dbFactory = DbProviderFactories.GetFactory(providerInfo);

ds = new SchedulingDocs();

doctorsAdapter = new SchedulingDocsTableAdapters.DoctorsTableAdapter();

callsAdapter = new SchedulingDocsTableAdapters.CallsTableAdapter();

vacationsAdapter = new SchedulingDocsTableAdapters.VacationsTableAdapter();

using (DbConnection dbConn = dbFactory.CreateConnection())

{

dbConn.ConnectionString = connectionInfo;

dbConn.Open();

doctorsAdapter.Fill(ds.Doctors);

callsAdapter.Fill(ds.Calls);

vacationsAdapter.Fill(ds.Vacations);

} //dbConn.Close();

LINQ supports the notion of joins, including inner and outer joins. For example, let's join the Doctors and Calls tables to see which doctors are scheduled in the month of October 2006:

var oct2006 = (

from d in ds.Doctors

join c in ds.Calls

on d.Initials equals c.Initials

where c.DateOfCall >= new DateTime(2006, 10, 1) &&

c.DateOfCall <= new DateTime(2006, 10, 31)

orderby d.Initials

select d.Initials

)

.Distinct();

This query expression uses a number of standard LINQ query operators, including Join, OrderBy, and Distinct; Join implements an inner equijoin.

LINQ is not limited to yielding tabular data, but will produce hierarchical results as appropriate. For example, suppose we want to know not only which doctors are on call in October 2006, but also the dates. In this case, we join the Doctors and Calls tables, now grouping the results:

var oct2006 = from d in ds.Doctors

join c in ds.Calls

on d.Initials equals c.Initials

where c.DateOfCall >= new DateTime(2006, 10, 1) &&

c.DateOfCall <= new DateTime(2006, 10, 31)

group c by d.Initials into g

select g;

Notice we group the calls ("c") on a per doctor basis ("d"). For each scheduled doctor, this yields an enumerable collection of calls. Here's how we process the query:

foreach (var group in oct2006)

{

System.Console.WriteLine("{0}: ", group.Key);

foreach (var call in group)

System.Console.WriteLine(" {0}", call.DateOfCall.ToString("dd MMM yyyy"));

System.Console.WriteLine(" calls = {0}", group.Count());

}

The hierarchical output appears as follows:

jl:

02 Oct 2006

calls = 1

mbl:

01 Oct 2006

31 Oct 2006

calls = 2

How about we re-order the results by those working the most, and select just the data we need in the result set:

var oct2006 = from d in ds.Doctors

join c in ds.Calls

on d.Initials equals c.Initials

where c.DateOfCall >= new DateTime(2006, 10, 1) &&

c.DateOfCall <= new DateTime(2006, 10, 31)

group c by d.Initials into g

orderby g.Count() descending

select new { Initials = g.Key,

Count = g.Count(),

Dates = from c in g

select c.DateOfCall };

foreach (var result in oct2006)

{

System.Console.WriteLine("{0}:", result.Initials);

foreach (var date in result.Dates)

System.Console.WriteLine(" {0}", date.ToString("dd MMM yyyy"));

System.Console.WriteLine(" calls = {0}", result.Count);

}

By projecting just the needed data, we can do things like data-bind the result for ease of display. The trade-off is that this potentially requires another set of custom objects to be instantiated. However, keep in mind that the design of LINQ enables the query operators to optimize away unnecessary object creation, much like compilers routinely eliminate unneeded code. This is especially true when applying LINQ in other situations, such as against a database (i.e., "LINQ to SQL").

5.2. LINQ to SQL

Instead of executing our queries against a DataSet, suppose we want to execute against the database directly? No problem. With LINQ to SQL, we change only the target of our query:

Databases.SchedulingDocs db = new Databases.SchedulingDocs(connectionInfo);

var oct2006 = ( // find all doctors scheduled for October 2006:

from d in db.Doctors

join c in db.Calls

on d.Initials equals c.Initials

where c.DateOfCall >= new DateTime(2006, 10, 1) &&

c.DateOfCall <= new DateTime(2006, 10, 31)

orderby d.Initials

select d.Initials

)

.Distinct();

In this case, SchedulingDocs is a class denoting a SQL Server 2005 database named SchedulingDocs. This class, and its associated entity classesD o ctors, Calls, and Vacationswere automatically generated by LINQ's SQLMetal tool to represent the database and its tables. As you would expect, the query expression is lazily evaluated, waiting for the query to be consumed:

foreach (var initials in oct2006) // execute the query:

System.Console.WriteLine("{0}", initials);

The query is now translated into parameterized SQL, sent to the database for execution, and the result set produced. How efficient is the generated SQL? In this case (and the May CTP of LINQ), a single select statement is executed against the database:

SELECT DISTINCT [t0].[Initials]

FROM [Doctors] AS [t0], [Calls] AS [t1]

WHERE ([t1].[DateOfCall] >= @p0) AND

([t1].[DateOfCall] <= @p1) AND

([t0].[Initials] = [t1].[Initials])

ORDER BY [t0].[Initials]

NOTE

In the May CTP of LINQ, SQLMetal is provided as a command-line tool. To run, first open a command window and cd to the install directory for LINQ (most likely C:\Program Files\LINQ Preview\Bin). Now ask SQLMetal to read the metadata from your database and generate the necessary class files. For the SQL Server 2005 database named SchedulingDocs, the command is:

C:\...\Bin>sqlmetal /server:. /database:SchedulingDocs

/language:csharp /code:SchedulingDocs.cs /namespace:Databases

This will generate a source code file SchedulingDocs.cs in the current directory.

Let's look at a more complex query that computes the number of calls for every doctor in the month of October 2006. Recall that an inner join produces results for only those doctors that are working:

var oct2006 = from d in db.Doctors

join c in db.Calls

on d.Initials equals c.Initials

where c.DateOfCall >= new DateTime(2006, 10, 1) &&

c.DateOfCall <= new DateTime(2006, 10, 31)

group c by d.Initials into g

select new { Initials = g.Key, Count = g.Count() };

An outer join is needed to capture the results for all doctors, whether scheduled or not. Outer joins are based on LINQ's join ... into syntax:

var allOct2006 = from d1 in db.Doctors // join all doctors

join d2 in oct2006 // with those working in Oct 2006

on d1.Initials equals d2.Initials

into j

from r in j.DefaultIfEmpty()

select new { Initials = d1.Initials,

Count = (r == null ? 0 : r.Count) };

This left outer join produces a result set "into" j, which is then enumerated across using the sub-expression "from r in ...". If a given doctor is working, then r is the joined result; if the doctor is not working then the result is empty, in which case j.DefaultIfEmpty() returns null. For each doctor, we then project their initials and the number of calls they are workingeither 0 or the count from the inner join. Iterating across the query:

foreach (var result in allOct2006)

System.Console.WriteLine("{0}: {1}", result.Initials, result.Count);

Yields:

ay: 7

bb: 0

ch: 3

When the query is executed, the following SQL is sent to the database (this is a test intended for the SQL wizards in the audience):

SELECT [t7].[Initials], [t7].[value] AS [Count]

FROM (

SELECT

(CASE

WHEN [t4].[test] IS NULL THEN 0

ELSE (

SELECT COUNT(*)

FROM [Doctors] AS [t5], [Calls] AS [t6]

WHERE ([t4].[Initials] = [t5].[Initials]) AND

([t6].[DateOfCall] >= @p0) AND

([t6].[DateOfCall] <= @p1) AND

([t5].[Initials] = [t6].[Initials])

)

END) AS [value], [t0].[Initials]

FROM [Doctors] AS [t0]

LEFT OUTER JOIN (

SELECT 1 AS [test], [t3].[Initials]

FROM (

SELECT [t1].[Initials]

FROM [Doctors] AS [t1], [Calls] AS [t2]

WHERE ([t2].[DateOfCall] >= @p0) AND

([t2].[DateOfCall] <= @p1) AND

([t1].[Initials] = [t2].[Initials])

GROUP BY [t1].[Initials]

) AS [t3]

) AS [t4] ON [t0].[Initials] = [t4].[Initials]

) AS [t7]

NOTE

In all fairness, it should be noted that the focus of this Short Cut is LINQ, and not particularly LINQ to SQL. For this reason, the picture painted of LINQ to SQL is quite superficial. For example, in LINQ to SQL, queries are translated to SQL and executed with SQL semantics. In comparison, most other LINQ queries are directly executed by .NET Framework objects with CLR semantics. Likewise, LINQ to SQL handles changes to the data quite differently than other flavors of LINQ. For more details, we encourage you to read the forthcoming Part 2 of this Short Cut series on LINQ (expected fall 2006), which focuses exclusively on LINQ to SQL. Watch for an announcement at http://oreilly.com

5.3. Create, Read, Update, and Delete with LINQ

Most of the discussion thus far has focused on data querying, and not data modification. Don't be mistaken, LINQ provides full support for read/write data access, commonly referred to as CRUD.

Interestingly, while data is read using an SQL-like query language, data modification is approached using more traditional, object-oriented mechanisms. For example, to schedule the doctor mbl for call on November 30, 2006 in our SchedulingDocs database, we do two things. First, we add a new row to the object representing the Calls table:

db.Calls.Add( new Databases.Calls{Initials="mbl",

DateOfCall=new DateTime(2006, 11, 30} );

Second, we flush the change back to the database:

db.SubmitChanges();

The first step makes a local, in-memory change only; the second step is what triggers the underlying SQL (or stored procedure) to update the database. LINQ to SQL will automatically generate the appropriate SQL for inserts, updates, and deletes, or interoperate with your custom stored procedures.

To delete a call, we find the corresponding object, remove it from the table, and update the database:

var del = from c in db.Calls

where c.Initials == "mbl" && c.DateOfCall == new DateTime(2006, 11, 30)

select c;

foreach (var c in del)

db.Calls.Remove(c);

db.SubmitChanges();

Since there is at most one doctor on call for any given day, we know the above query will return exactly one record. In this case we can use the standard query operator Single, passing a lambda expression for the search criteria:

var call = db.Calls.Single( c => c.Initials == "mbl" &&

c.DateOfCall == new DateTime(2006, 11, 30) );

db.Calls.Remove(call);

db.SubmitChanges();

If it's possible that the search may fail, use the query operator SingleOrDefault, and check the query result for null.

Finally, to update existing data, the approach is (1) query to find the corresponding objects, (2) update those objects, and (3) flush the changes. For example, if the doctor ay's pager number changes to 53301, we update the database as follows:

var ay = db.Doctors.Single( d => d.Initials == "ay" );

ay.PagerNumber = 53301;

db.SubmitChanges();

The same logic applies to other LINQ scenarios, such as XML documents and DataSets. For example, with the typed DataSet SchedulingDocs (see Figure 2), scheduling a doctor on call is simply a matter of adding a new row to the Calls table:

ds.Calls.AddCallsRow( "mbl", new DateTime(2006, 11, 30) );

Much like the database objects, DataSets are a local, in-memory collection of objects. To persist your changes, an updated DataSet must be written to some durable medium, such as the file system or a database:

dbConn.Open();

callsAdapter.Update( ds.Calls );

dbConn.Close();

Here we re-open the connection, update the database to match, and close the connectionthe equivalent of db.SubmitChanges(). The key difference is that in the case of LINQ to SQL, the SQLMetal tool generates the necessary code to update the underlying database. In the case of DataSets and XML documents (and other flavors of LINQ), it's typically our responsibility to load the data, and consequently to persist it back. 5.4. LINQ to XML

From its beginnings, LINQ was designed to manipulate XML data as easily as it manipulates relational data. LINQ to XML represents a new API for XML-based development, equivalent in power to XPath and XQuery yet far simpler for most developers to use.

For example, let's assume the data source for our hospital scheduling application is an XML document stored in the file SchedulingDocs.xml. Here's the basic structure of the document:

<?xml version="1.0" standalone="yes"?>

<Calls>

<Call>

</Call>

</Calls>

<EmailAddress>ayang@uhospital.edu</EmailAddress>

<StreetAddress>1400 Ridge Ave.</StreetAddress>

<City>Evanston</City>

</Doctor>

</Doctors>

</Vacation>

</Vacations>

</SchedulingDocs>

Using LINQ, we load this document as follows:

import System.Xml.XLinq; // LINQ to XML

XElement root, calls, doctors, vacations;

root = XElement.Load("SchedulingDocs.xml");

calls = root.Element("Calls");

doctors = root.Element("Doctors");

vacations = root.Element("Vacations");

We now have access to the three main elements of the XML document: calls, doctors, and vacations. To select all the doctors, it's a simple query expression:

var docs = from doc in doctors.Elements()

select doc;

And to find just those doctors living in Chicago:

var chicago = from doc in doctors.Elements()

where doc.Element("City").Value == "Chicago"

orderby doc.Element("Initials").Value

select doc;

As you can see, querying XML documents with LINQ is conceptually the same as that of relational databases, DataSets, and other objects. The difference is that the structure of the XML document must be taken into account, e.g., in this case the document's hierarchical design and its use of elements over attributes.

An important aspect of LINQ is the ability to easily transform data into other formats. In the world of XML, transformation is commonplace given the need to create XML documents as well as translate from one schema to another. For example, suppose we need to produce a new XML document containing just the names of the doctors, with their initials as an attribute:

<?xml version="1.0" standalone="yes"?>

<Doctor Initials="bb">Boswell, Bryan</Doctor>

<Doctor Initials="lg">Goldstein, Luther</Doctor>

</Doctors>

This document is easily produced by the following query, which simply projects new XElements:

var docs = from doc in doctors.Elements()

orderby doc.Element("FamilyLastName").Value,

doc.Element("GivenFirstName").Value

select new XElement("Doctor",

new XAttribute("Initials", doc.Element("Initials").Value),

doc.Element("FamilyLastName").Value +

", " +

doc.Element("GivenFirstName").Value);

XElement newroot = new XElement("Doctors", docs);

The last statement creates the root element <Doctors>, using the query to generate the <Doctor> sub-elements.

Finally, here's a real-world example of translating a text-based IIS logfile into an XML document. This example comes from a series of posts to the MSDN LINQ Project General Forum, "Transforming a TXT file into XML with LINQ to XML," http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=574140&SiteID=1. The logfile contains 0 or more lines of the form:

Time IP-address Method URI Status

For example:

#Software: Microsoft Internet Information Services 5.1

#Version: 1.0

#Date: 2006-06-23 12:37:18

#Fields: time c-ip cs-method cs-uri-stem sc-status

12:37:18 127.0.0.1 GET /cobrabca 404

12:37:25 127.0.0.1 GET /cobranca 401

Here's the LINQ query to produce an XML document from such a log:

var logIIS = new XElement("LogIIS",

from line in File.ReadAllLines("file.log")

where !line.StartsWith("#")

let items = line.Split(' ')

select new XElement("Entry",

new XElement("Time", items[0]),

new XElement("IP", items[1]),

new XElement("Url", items[3]),

new XElement("Status", items[4])

)

);

LINQ over a text file? The next section will discuss this, and other examples, in more detail.

NOTE

The general focus of this Short Cut precludes an in-depth treatment of LINQ to XML. For more details, we encourage you to read the forthcoming Part 3 of this Short Cut series (expected Q4 2006), which focuses exclusively on LINQ to XML. Watch for an announcement at http://oreilly.com 5.5. LINQ to IEnumerable

One of the elegant design aspects of LINQ is that queries can be executed against any enumerable data source. If an object implements IEnumerable, then LINQ can access the data behind that object. For example, suppose we need to search the current user's My Documents folder (and sub-folders) for all non-system files modified in the last hour. Using LINQ we do this as follows:

using SIO = System.IO;

string[] files;

string mydocs;

mydocs = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);

files = SIO.Directory.GetFiles(mydocs, "*.*", SIO.SearchOption.AllDirectories);

var query = from file in files

let lasthour = DateTime.Now.Subtract(new TimeSpan(0, 1, 0, 0))

where SIO.File.GetLastAccessTime(file) >= lasthour &&

(SIO.File.GetAttributes(file) & SIO.FileAttributes.System) == 0

select file;

Notice the presence of the let statement, which allows for the definition of values local to the query; let is used to improve readability and efficiency by factoring out common operations.

This example should not be very surprising, since what we are really doing is iterating across an array of filenames ("files"), not the file system itself. But this is a design artifact of the .NET Framework, not a limitation of LINQ. A similar example is the searching of a file, which is easily done in LINQ by iterating across the lines of the file:

string filename = ...; // file to search

var lines = from line in SIO.File.ReadAllLines(filename)

where line.Contains("class")

select line;

In this example, we are reading all the lines into an array and then searching the array to select those lines containing the character sequence "class."

Need to search the Windows event log? An event log is a collection of EventLogEntry objects, and is accessed in .NET by creating an instance of the EventLog class. For example, here's how we gain access to the Application event log on the current machine:

using SD = System.Diagnostics;

SD.EventLog applog = new SD.EventLog("Application", ".");

Suppose we need to find all events logged by our application for a particular user. This is easily expressed as a LINQ query:

string appname = ...; // name of our application, e.g. "SchedulingApp"

string username = ...; // login name for user, e.g. "DOMAIN\\hummel"

var entries = from entry in applog.Entries

where entry.Source == appname &&

entry.UserName == username

orderby entry.TimeWritten descending

select entry;

Interestingly, while this query makes perfect sense, it does not compile. The issue is that EnTRies is a pre-2.0 collection, which means it implements IEnumerable and not IEnumerable<T>. Since IEnumerable is defined in terms of object and not a specific type T, the C# type inference engine cannot infer the type of objects the query expression is working with. The designers of LINQ provide an easy solution in the form of the standard query operator Cast. The Cast operator wraps a generic enumerable object with a type-specific one suitable for LINQ:

var entries = from entry in applog.Entries.Cast<SD.EventLogEntry>()

where entry.Source == appname &&

entry.UserName == username

orderby entry.TimeWritten descending

select entry;

The Cast operator allows LINQ to support .NET 1.x collections.

As a final example, let's apply LINQ to the world of Visual Studio Tools for Office (VSTO). To search a user's Outlook contacts, the basic query is as follows:

Outlook.MAPIFolder folder = this.ActiveExplorer().Session.

GetDefaultFolder( Outlook.OlDefaultFolders.olFolderContacts );

var contacts = from contact in folder.Items.OfType<Outlook.ContactItem>()

where ... // search criteria, e.g. contact.Email1Address != null

select contact;

We take advantage of LINQ's OfType query operator, which (a) wraps pre-2.0 collections and (b) filters the collection to return only those objects of the desired type. Here's a query to collect all distinct email addresses from a user's Outlook contacts:

var emails = (

from contact in folder.Items.OfType<Outlook.ContactItem>()

where contact.Email1Address != null

select contact.Email1Address

)

.Distinct();

Finally, given collections of email addresses from different folders or users, we can use LINQ to perform various set operations over these collections:

var union = emails.Union(emails2);

var intersection = emails.Intersect(emails2);

var difference = emails.Except(emails2);

DotNet First Side

Wednesday, February 13, 2008

Applying LINQ - LINQ to DataSets

LINQ to DataSets

Tables in the SchedulingDocs typed DataSet

5.2. LINQ to SQL

5.3. Create, Read, Update, and Delete with LINQ

No comments:

Post a Comment

Dotnet-Interviews

Blog Archive

Blog Roll