LINQ: Select records but enforce one field to be unique [CSharp]

Prev: VB6 OCX In C# Windows App
Next: xml and xml schema

From: Rob on 6 Mar 2010 23:35

On Sat, 06 Mar 2010 22:46:24 -0500, "Mr. Arnold" <Arnold(a)Arnold.com>
wrote:

>Rob wrote:
>> When selecting records by forcing one field to be unique, is it
>> necessary to use the .Distinct(...) function with an external
>> comparator?
>>
>> EX: Say you had a list of purchase records...multifields. You want to
>> get -just- a list of people who had bought items. The 'purchaser'
>> field must be unique, so you don't just get back the entire list, but
>> you want to retrieve all fields in the record, so you can't just use
>> .Distinct(). (with no args)
>>
>> I've seen examples that require coding an external class, creating an
>> instance, then passing that object to .Distinct(...) but this seems
>> unduly complex. Is there a simpler way? I have Calvert's book, but
>> that doesn't seem to be covered. I thought that this would be built
>> in.
>
>I don't know. If it were me, I might try to this.
>
>
>var list = new List<T>();
>
>var distinctlist= (from a in resultset.distinct() select
>a.purchaser).tolist();
>
>foreach(var distinct in distinctlist)
>{
> var dist = (from a in resultset.Where(a => a.purchaser == distinct
>select a).first();
>
>list.add(dist);
>}

Ah, so my original query was understandable. Yeah, I believe that
will work! I should have thought of that, but was thinking in terms
of a single step.

And apparently there is no built-in function for enforcing uniqueness
of a given field. I'm surprised.

Thanks, Mr. Arnold.

From: Peter Duniho on 7 Mar 2010 00:15

Mr. Arnold wrote:
> [...]
> var list = new List<T>();
>
> var distinctlist= (from a in resultset.distinct() select
> a.purchaser).tolist();
>
> foreach(var distinct in distinctlist)
> {
> var dist = (from a in resultset.Where(a => a.purchaser == distinct
> select a).first();
>
> list.add(dist);
> }

Even not counting the compile errors, I don't see how the above is
useful. It will result in a new enumeration that has the same number of
elements as the original, but with each element in the enumeration
simply being a reference to the first element found for each key.

For example, a collection that looks something like this:

{
{ "One", 1 },
{ "One", 2 },
{ "Two", 3 },
{ "Two", 4 },
{ "Two", 5 },
{ "Two", 6 },
{ "Three", 7 },
{ "Three", 8 },
{ "Three", 9 }
}

will get converted to this:

{
{ "One", 1 },
{ "One", 1 },
{ "Two", 3 },
{ "Two", 3 },
{ "Two", 3 },
{ "Two", 3 },
{ "Three", 7 },
{ "Three", 7 },
{ "Three", 7 }
}

I'm almost certain the OP doesn't want that. It's hard for me to see
why anyone would. A somewhat more realistic interpretation of the
original question, though one I'm still skeptical of, is that the
original example above should map to this:

{
{ "One", 1 },
{ "Two", 3 },
{ "Three", 7 },
}

In other words, each unique key is represented once, with the row being
an arbitrarily chosen (e.g. first) element from each matching key. If
we take that as the goal, then the code above could be fixed by passing
an appropriate IEqualityComparer<T> to the Distinct() method to filter
the results only on the key, rather than the whole record.

But if that's the goal, the above code is not very efficient (this is
especially true if the source is a real database, where each key results
in a whole new query to the database). The following should work, and
more importantly should be much more efficient:

var result = resultset
.GroupBy(a => a.purchaser)
.Select(d => d.First());

or alternatively:

var result = from d in
(from a in resultset
group a by a.purchaser)
select d.First();

or (yet another alternative):

var result = from a in resultset
group a by a.purchaser into grouped
select grouped.First();

Of course any of the above results in selecting an arbitrary,
essentially random "representative" from each group of records, which
while a reasonable guess as to the original question's intent still
doesn't seem that useful to me.

Hopefully the OP can clarify their question, so that we can stop
guessing as to what he's really trying to do. :)

Pete

From: Peter Duniho on 7 Mar 2010 00:29

Rob wrote:
> [...]
> And apparently there is no built-in function for enforcing uniqueness
> of a given field. I'm surprised.

There is. It's the Distinct() overload that has an IEqualityComparer<T>
parameter.

I think that GroupBy() really is what you want. It would be much more
efficient than repeatedly querying the original collection. However, if
you insist on using the Distinct() method, you may find this little
helper class useful:

static class EqualityComparerFactory
{
public static IEqualityComparer<T>
ComparerForEnumerable<T>(IEnumerable<T> e, Func<T, T, bool> equals,
Func<T, int> hash)
{
return new CustomEqualityComparer<T>(equals, hash);
}

public static IEqualityComparer<T> Comparer<T>(Func<T, T,
bool> equals, Func<T, int> hash)
{
return new CustomEqualityComparer<T>(equals, hash);
}

private class CustomEqualityComparer<T> : IEqualityComparer<T>
{
private Func<T, T, bool> _equals;
private Func<T, int> _hash;

public CustomEqualityComparer(Func<T, T, bool> equals,
Func<T, int> hash)
{
_equals = equals;
_hash = hash;
}

#region IEqualityComparer<T> Members

public bool Equals(T t1, T t2)
{
return _equals(t1, t2);
}

public int GetHashCode(T t)
{
return _hash(t);
}

#endregion
}
}

Here's some sample code showing how to use it (incorporating the basic
code provided by Mr. Arnold, just adding the comparer):

static void Main(string[] args)
{
Random rnd = new Random();
string[] rgstr = { "One", "Two", "Three" };

var source = (from i in Enumerable.Range(0, 20)
select new { Key =
rgstr[rnd.Next(rgstr.Length)], Value = i }).ToArray();

var result0 = GetEmptyList(source);
var comparer =
EqualityComparerFactory.ComparerForEnumerable(source,
(x1, x2) => x1.Key.Equals(x2.Key),
x => x.Key.GetHashCode());

var distinctlist = (from a in source.Distinct(comparer)
select a.Key).ToList();
foreach (var distinct in distinctlist)
{
var dist = (from a in source.Where(a => a.Key ==
distinct) select a).First();

result0.Add(dist);
}
}

static List<T> GetEmptyList<T>(IEnumerable<T> enumerable)
{
return new List<T>();
}

The first factory method ComparerForEnumerable() is written to take the
IEnumerable<T> you're going to be working with, so that the generic type
can be inferred for the method arguments and the IEqualityComparer<T>
to return. This is necessary when dealing with anonymous types. Of
course, if you are always dealing with named types, you can just use the
Comparer() factory method instead, specifying the type parameter explicitly.

Pete

First | Prev |
Pages: 1 2
Prev: VB6 OCX In C# Windows App
Next: xml and xml schema