Wednesday, 4 February 2009

LINQ Best Practice 2 – One Data Source Per Repository, Inner and Outer Joins in LINQ

I’ve been using LINQ a lot with my new project, in particular with the Entity Framework.  I have learned a lot about good practices with LINQ and I figured it would be a good idea to post them up here.  Contained in these posts are all those things I’ve learned and suggest could be useful to you.  They are best practices that work for me, if they work for you then great, but everyone’s solution is different.

My first post was about how you can use the repository pattern to increase testability and separate concerns within your code.  This second post extends on that into the data sources.

The Data Source

This is the main table that your repository is concerned with.  If at all possible you should have only one of these per repository.  The main reason for this is the testability and maintainability of your code, setting up a single data source is far more simple than setting up several.  You will also find that the code produced will be more standardised and far more simple to read.

Example Code:

public class TypeRepository : ITypeRepository
{
    private LynxEntities _Entities;
    private IQueryable<Type> _DataSource;
    
    // Implementation here.
}

One to One Inner Join

This is the single data source for my TypeRepository.  This is easy to implement with when you’re only selecting data relating to that type, but when you’re selecting data that crosses multiple tables this becomes a little more difficult.  For example if my Type table has a foreign key to the Usage table and I want to get all types that are within a particular usage I would immediately think of doing a join.  This isn’t really the LINQ way, instead you should use the generated object dependencies to your advantage:

rfType ITypeRepository.GetByUsage(string usage, string type)
{
    rfType result = (from t in _DataSource
                     where t.rfUsage.Name.Equals(usage)
                         && t.Name.Equals(type)
                     select t).FirstOrDefault();
    return result;
}

 

One to Many Inner Join

This becomes a lot more tricky when you’re dealing with a one to many reference, or even worse a many to many reference.    Say for example my Type table has many usages, the above code is not going to work because there are lots of usage records.  But you can do a simple join to ensure that data exists.  You don’t need a second data source to do a join, though most of the examples out there will tell you to do it that way, you can do it like this.

rfType ITypeRepository.GetByUsage(string usage, string type)
{
    rfType result = (from t in _DataSource
                     from u in t.rfUsage
                     where u.Name.Equals(usage)
                         && t.Name.Equals(type)
                     select t).FirstOrDefault();
    return result;
}

One to Many Left Outer Join

The above is fine if you know the dependency exists, but when if you want to get the item even if it’s usage doesn’t exist, but you also want to retrieve the usage?  The Entity Framework is not great at lazy loading through LINQ, if you need something, you really have to ask for it.  So if I were to write:

public Type Get(string type)
{
    return (from t in _DataSource
            where t.Name.Equals(type)
            select t).FirstOrDefault();
}
 
// Other Code
Type returnedType = _TypeRepository.Get("myType");
ProcessUsage(returnType.rfUsage);

The ProcessUsage method would get NULL every time I called the method, even if there were several usages available in the database.  This is because the lazy loading of L2E (LINQ to Entities) is not all that great.  So you need to outer join all the tables you want to use data from.

return (from t in _DataSource
        where t.Name.Equals(type)
 
        let u = (from uLO in t.rfUsage
                 where uLO.Name.Equals(usage)
                 select uLO)
 
        select new 
        {
            t, u
        }).FirstOrDefault().t;

The middle part is the left outer join.   We tell LINQ to get t and u from the database but then return only t.  The entity framework will get all the data and return it in the t object representing the type.  You can also use this to limit your data selection of course.

Test Cases

Now when writing your tests all you need to do to setup a source of data is:

[TestInitialize()]
public void InitialiseTest()
{
    _Items = GetTestData();
    _Target = new TypeRepository(_Items.AsQueryable());
}
 
private List<Type> GetTestData()
{
    List<Type> items = new List<Type>
    {
        new Type
        {
            TypeID = 1,
            Name = Constant.Type.Thickness,
            rfUsage = new rfUsage
            {
                UsageID = 1,
                Name = Constant.Usage.ItemSize
            }
        },
        new Type
        {
            TypeID = 1,
            Name = Constant.Type.Weight,
            rfUsage = new rfUsage
            {
                UsageID = 1,
                Name = Constant.Usage.ItemSize
            }
        }
    };
 
    return items;
}

If you had multiple data sources (like one for type and one for rfusage) you would have to set each of them up individually and the links between them would not be setup in the same way as the links setup in the Entity Framework.

No comments: