xtrm Developer Tips: February 2012

13 February, 2012

How to calculate median in MSSQL

Definition of median:
Median is described as the numerical value separating the higher half from the lower half of a
sample, a population or a probability distribution.

The median value of a list of numbers can be found by arranging all the observations from lowest value to the highest value and picking the middle one.
If there is an uneven number of observations, then median will be the middle value.
If there is an even number of observations, the median is then usually defined to be the mean of the two middle values.

So lets say we have this sample data:

Lot	Wafer	ResultName	Value	Comment
8485474	1	someresult	50
8485474	1	someresult	40
8485474	1	someresult	30	<- Median 1 / Median 2 (Median1 + Median 2) / 2 = Median = 30
8485474	1	someresult	20
8485474	1	someresult	10
8485474	2	someresult	60
8485474	2	someresult	50
8485474	2	someresult	40	<- Median 1
8485474	2	someresult	30	<- Median 2 (Median1 + Median 2) / 2 = Median = 35
8485474	2	someresult	20
8485474	2	someresult	10

First we need to sort and number the resultvalues ascending.
Then we need to get the result count for each wafer.
To do this, we can use the row_number() function / count(*) function together with over / partition by.

(select l.lot, 
          w.waferno, 
          dr.resultvalue,
          row_number() over(partition by l.lot, w.waferno order by dr.resultvalue) as rownum,
          count(*) over(partition by l.lot, w.waferno) as resultcount
  from lot l inner join
  wafer w on l.lot = w.lot inner join
  results r on w.waferid = r.waferid inner join 
  resulttype rt on r.resulttypeid = rt.resulttypeid
where l.lot = 8485474 
and rt.resultname = 'someresult') as medianvalues

The above query will give us this resultset:

Lot	Wafer	ResultName	Value	RowNum	ResultCount
8485474	1	someresult	10	1	5
8485474	1	someresult	20	2	5
8485474	1	someresult	30	3	5
8485474	1	someresult	40	4	5
8485474	1	someresult	50	5	5
8485474	2	someresult	10	1	6
8485474	2	someresult	20	2	6
8485474	2	someresult	30	3	6
8485474	2	someresult	40	4	6
8485474	2	someresult	50	5	6
8485474	2	someresult	60	6	6

Now we can filter out the middle rows with the follwing formula:

(resultcount + 1) / 2 and (resultcount + 2) / 2)

So for wafer 1:

(5 + 1) / 2 = 3
(5 + 2) / 2 = 3  (3.5 will be 3 when converted to int)

And for wafer 2:

(6 + 1) / 2 = 3 (3.5 will be 3 when converted to int)
(6 + 2) / 2 = 4

So now that we have the two middle rows, we can just do an average of the two rows:
So for wafer 1:

30 + 30 / 2 = 30  (Row 3 + Row 3) / 2

And for wafer 2:

30 + 40 / 2 = 35  (Row 3 + Row 4) / 2

Now let's put it all together:

select lot, 
       waferno, 
       resultcount, 
       avg(resultvalue) as median, 
       (resultcount + 1) / 2 as middlerow1, 
       (resultcount + 2) / 2 as middlerow2
from
  (select l.lot, 
          w.waferno, 
          dr.resultvalue,
          row_number() over(partition by l.lot, w.waferno order by dr.resultvalue) as rownum,
          count(*) over(partition by l.lot, w.waferno) as resultcount
  from lot l inner join
  wafer w on l.lot = w.lot inner join
  results r on w.waferid = r.waferid inner join resulttype rt on r.resulttypeid = rt.resulttypeid
where l.lot = 8485474 
and rt.resultname = 'someresult') as medianvalues
where rownum in((resultcount + 1) / 2, (resultcount + 2) / 2)
group by lot,waferno, resultcount
order by lot,waferno

Or if you prefer the new with statement:

with medianvalues as
(
select l.lot, 
          w.waferno, 
          dr.resultvalue,
          row_number() over(partition by l.lot, w.waferno order by dr.resultvalue) as rownum,
          count(*) over(partition by l.lot, w.waferno) as resultcount
  from lot l inner join
  wafer w on l.lot = w.lot inner join
  results r on w.waferid = r.waferid inner join resulttype rt on r.resulttypeid = rt.resulttypeid
where l.lot = 8485474 
and rt.resultname = 'someresult'
)
select lot, 
       waferno, 
       resultcount, 
       avg(resultvalue) as median, 
       (resultcount + 1) / 2 as middlerow1, 
       (resultcount + 2) / 2 as middlerow2
from medianvalues
where rownum in((resultcount + 1) / 2, (resultcount + 2) / 2)
group by lot,waferno, resultcount
order by lot,waferno

The above query will give us this resultset:

Lot	Wafer	ResultCount	Median	MiddleRow1	MiddleRow2
8485474	1	5	30	3	3
8485474	2	6	35	3	4

09 February, 2012

Formatted TimeDiff UDF for MSSQL

I just created a UDF function for displaying a formatted time diff value in MSSQL.
Why ? I googled this and found a lot of solutions that did not work :(
So i just had to look into it.

I wanted the time difference to be displayed in the following format:
HHH:MM:SS (@type = 0)
Where HHH = Total number of hours, so if the difference is one week,
then this value will be 168. (168:00:00).

I also created another variant:
D MM:HH:SS (@type = 1)
Where D = Total number of days, so if the difference is one week,
then this value will be 7. (7 00:00:00).

create function [dbo].[TimeDiff](@date1 datetime, @date2 datetime, @type smallint)
returns varchar(50)
as
begin
declare @days int
declare @totalhours int
declare @hours int
declare @minutes int
declare @seconds int
declare @ret varchar(50)
set @seconds = datediff(ss, @date1,@date2) 
set @totalhours = @seconds / 3600 
set @days = @totalhours / 24 
set @hours = @totalhours - (@days * 24)
set @minutes = ((@seconds - (3600 * @totalhours)) / 60)
set @seconds = ((@seconds - (3600 * @totalhours) - (60 * @minutes)))
if(@type = 0)
begin
    --HHH:MM:SS 
    set @ret = case when @totalhours < 10 then '0' + convert(varchar(20), @totalhours) else convert(varchar(20), @totalhours) end + ':' + right('0' + convert(varchar(20), @minutes), 2) + ':' + right('0' + convert(varchar(20), @seconds), 2)
end
else
begin
    --D HH:MM:SS
    set @ret = convert(varchar(20), @days) + ' ' + right('0' + convert(varchar(20), @hours), 2) + ':' + right('0' + convert(varchar(20), @minutes), 2) + ':' + right('0' + convert(varchar(20), @seconds), 2)
end
return @ret
end

07 February, 2012

Custom multiply aggregation function in AS2008

In a recent project of mine i needed a custom aggregation function that needed to multiply som yield numbers of a specific measure in a given dimension.

My yield numbers are actually stored as a number between 0 and 100. And not between 0 and 1 which i guess are more common when expressed as percentage.
The reason for this is some limitations in our 3rd party reporting charts.
Any way this is how i made the custom aggregation function that will work with both variants:

Create a c# class library.
Add a reference to this dll (ADOMD.Net interfaces):

C:\Program Files\Microsoft Analysis Services\AS OLEDB\10\msmgdsrv.dll

Add a class named ASUtils with the following content:

namespace ASUtils
{
    using System;
    using Microsoft.AnalysisServices.AdomdServer;

    public class Multiplication
    {
        [SafeToPrepare(true)]
        public static Nullable<double> Multiply(Set inputSet, Expression inputExpression, double divider, bool excludeZeroValues)
        {            
            Nullable<double> num = null;
            bool first = true;
            foreach (Tuple tuple in inputSet.Tuples)
            {
                MDXValue mdxValue = inputExpression.Calculate(tuple);
                bool isEmpty = (mdxValue.ToChar() == 0x00);
                if (!isEmpty)
                {
                    double value = mdxValue.ToDouble();
                    if (value != 0 || (value == 0 && excludeZeroValues == false))
                    {
                        if (first == true)
                        {
                            num = value / divider;
                            first = false;
                        }
                        else
                        {
                            num *= value / divider;
                        }
                    }
                }
            }

            return num * divider;
        }
    }
}

I did not find any built in functions that could check if a value was empty or not, but this seems to work:

bool isEmpty = (mdxValue.ToChar() == 0x00);

Any other solutions out there ?

Build the project, then in your AS2008 cube project, add a reference to the new class library.
In your cube, under calculations. Add a new calculated member named YieldM with this code:

ASUtils.Multiply([Process].[ProcessNoGroup].Children,[Measures].[Yield], 100.0, false)

[Process] = Dimension
[ProcessNoGroup] = Hierarchy
[Measures].[Yield] = Measure containg numbers between 0 and 100.

If the measure contains numbers beteen 0 and 1, then you can change the multiplyer to 1 like:

ASUtils.Multiply([Process].[ProcessNoGroup].Children,[Measures].[Yield], 1.0, false)

Empty values are always eliminated from the multiply.
You can also exclude 0 (zero) values by changing the last parameter to true.

So if we now look at the yield measure over time and process, we have this test data:

Time Dimension	Process Dimension	Yield
W01	Process1	94.88
W01	Process2	96.42
W01	Process3	95.03
W02	Process1	98.86
W02	Process2	96.42
W02	Process3	94.03

When we aggregate this by week and look at our calculated YieldM measure:

Time Dimension	YieldM	Formula
W01	86.93	(94.8896.4295.03)
W01	89.63	(98.8696.4294.03)

And voila, our custom aggregation is working as it should.