Venn SQL



We recently had to compare 2 datasets from excel to see which records existed in sheet a , which in sheet b and which were in both. This is probably possible just in Excel is you are an excel expert (which i'm not), so i loaded the data into oracle (using a wizard in apex) so i had the contents of each sheet as a table - i could then use my SQL knowledge to produce the result i wanted.

I've excluded the real data here but mocked up an example of how the simple comparison is done.

First up lets create some test data:

create table a (col1 number);

create table b (col1 number);

insert into a values (1);
insert into a values (2);
insert into a values (3);
insert into a values (4);
insert into a values (5);

insert into b values (4);
insert into b values (5);
insert into b values (6);
insert into b values (7);
insert into b values (8);

So we have two very simple tables, some of the data is only in table a , some in table b and some is repeated in both. So how do i easily write some SQL to solve this issue - i just want a list of values and where they exist.

The answer is to use some Venn logic (i still remember drawing these diagrams at school).

To solve the problem we make use of the union,minus and intersect features of SQL

First up lets find all the rows that are only in A

select 'A ONLY' as source, col1
  from a
 where col1 in (select col1
                  from a
                minus
                select col1 from b)

Here we are saying minus b from a, then display the results from a where the column value is present in the minus subquery. We only add the addional select wrapper round it for later use so we can show the derivation of the data - if we just wanted to minus b from a we have it just in the subquery.

Next we just reverse the logic to get the B only records.

select distinct 'B ONLY' as source, col1
  from b
 where col1 in (select col1
                  from b
                minus
                select col1 from a)


The final piece of the puzzle is to make use of the intersect function to find rows that exist in both.

select 'BOTH' as source, col1
  from a
 where col1 in (select col1
                  from b
                intersect
                select col1 from a)

Now we just need to union together all those individual pieces of logic to give us our answer


select 'A ONLY' as source, col1
  from a
 where col1 in (select col1
                  from a
                minus
                select col1 from b)
union
select distinct 'B ONLY' as source, col1
  from b
 where col1 in (select col1
                  from b
                minus
                select col1 from a)
union
select 'BOTH' as source, col1
  from a
 where col1 in (select col1
                  from b
                intersect
                select col1 from a)

Which gives us the nice result below



   SOURCECOL1
1A ONLY1
2A ONLY2
3A ONLY3
4B ONLY6
5B ONLY7
6B ONLY8
7BOTH4
8BOTH5


This is surprisingly easy to do when you get your head round the logic - the 'Venn' operators are incredibly powerful within SQL and making use of the correct one in your application logic can provide huge performance benefits and help solve complex issues quite simply.

Comments