Median is described as the numerical value separating the higher half from the lower half of a
sample, a population or a probability distribution.
The median value of a list of numbers can be found by arranging all the observations from lowest value to the highest value and picking the middle one.
If there is an uneven number of observations, then median will be the middle value.
If there is an even number of observations, the median is then usually defined to be the mean of the two middle values.
So lets say we have this sample data:
Lot | Wafer | ResultName | Value | Comment |
8485474 | 1 | someresult | 50 | |
8485474 | 1 | someresult | 40 | |
8485474 | 1 | someresult | 30 | <- Median 1 / Median 2 (Median1 + Median 2) / 2 = Median = 30 |
8485474 | 1 | someresult | 20 | |
8485474 | 1 | someresult | 10 | |
8485474 | 2 | someresult | 60 | |
8485474 | 2 | someresult | 50 | |
8485474 | 2 | someresult | 40 | <- Median 1 |
8485474 | 2 | someresult | 30 | <- Median 2 (Median1 + Median 2) / 2 = Median = 35 |
8485474 | 2 | someresult | 20 | |
8485474 | 2 | someresult | 10 |
First we need to sort and number the resultvalues ascending.
Then we need to get the result count for each wafer.
To do this, we can use the row_number() function / count(*) function together with over / partition by.
(select l.lot, w.waferno, dr.resultvalue, row_number() over(partition by l.lot, w.waferno order by dr.resultvalue) as rownum, count(*) over(partition by l.lot, w.waferno) as resultcount from lot l inner join wafer w on l.lot = w.lot inner join results r on w.waferid = r.waferid inner join resulttype rt on r.resulttypeid = rt.resulttypeid where l.lot = 8485474 and rt.resultname = 'someresult') as medianvalues
The above query will give us this resultset:
Lot | Wafer | ResultName | Value | RowNum | ResultCount |
8485474 | 1 | someresult | 10 | 1 | 5 |
8485474 | 1 | someresult | 20 | 2 | 5 |
8485474 | 1 | someresult | 30 | 3 | 5 |
8485474 | 1 | someresult | 40 | 4 | 5 |
8485474 | 1 | someresult | 50 | 5 | 5 |
8485474 | 2 | someresult | 10 | 1 | 6 |
8485474 | 2 | someresult | 20 | 2 | 6 |
8485474 | 2 | someresult | 30 | 3 | 6 |
8485474 | 2 | someresult | 40 | 4 | 6 |
8485474 | 2 | someresult | 50 | 5 | 6 |
8485474 | 2 | someresult | 60 | 6 | 6 |
Now we can filter out the middle rows with the follwing formula:
(resultcount + 1) / 2 and (resultcount + 2) / 2)So for wafer 1:
(5 + 1) / 2 = 3 (5 + 2) / 2 = 3 (3.5 will be 3 when converted to int)And for wafer 2:
(6 + 1) / 2 = 3 (3.5 will be 3 when converted to int) (6 + 2) / 2 = 4So now that we have the two middle rows, we can just do an average of the two rows:
So for wafer 1:
30 + 30 / 2 = 30 (Row 3 + Row 3) / 2And for wafer 2:
30 + 40 / 2 = 35 (Row 3 + Row 4) / 2
Now let's put it all together:
select lot, waferno, resultcount, avg(resultvalue) as median, (resultcount + 1) / 2 as middlerow1, (resultcount + 2) / 2 as middlerow2 from (select l.lot, w.waferno, dr.resultvalue, row_number() over(partition by l.lot, w.waferno order by dr.resultvalue) as rownum, count(*) over(partition by l.lot, w.waferno) as resultcount from lot l inner join wafer w on l.lot = w.lot inner join results r on w.waferid = r.waferid inner join resulttype rt on r.resulttypeid = rt.resulttypeid where l.lot = 8485474 and rt.resultname = 'someresult') as medianvalues where rownum in((resultcount + 1) / 2, (resultcount + 2) / 2) group by lot,waferno, resultcount order by lot,wafernoOr if you prefer the new with statement:
with medianvalues as ( select l.lot, w.waferno, dr.resultvalue, row_number() over(partition by l.lot, w.waferno order by dr.resultvalue) as rownum, count(*) over(partition by l.lot, w.waferno) as resultcount from lot l inner join wafer w on l.lot = w.lot inner join results r on w.waferid = r.waferid inner join resulttype rt on r.resulttypeid = rt.resulttypeid where l.lot = 8485474 and rt.resultname = 'someresult' ) select lot, waferno, resultcount, avg(resultvalue) as median, (resultcount + 1) / 2 as middlerow1, (resultcount + 2) / 2 as middlerow2 from medianvalues where rownum in((resultcount + 1) / 2, (resultcount + 2) / 2) group by lot,waferno, resultcount order by lot,waferno
The above query will give us this resultset:
Lot | Wafer | ResultCount | Median | MiddleRow1 | MiddleRow2 |
8485474 | 1 | 5 | 30 | 3 | 3 |
8485474 | 2 | 6 | 35 | 3 | 4 |