Monday, 21 November 2011

What are those outliers ?

Looking at the previous scatter chart, there is an outlier with a decrease in population of more than 3%... We can easily find which country this is from the CIAWF data:
229 Cook Islands
-3.20
2011 est.
230 Northern Mariana Islands
-4.00
2011 est.

There's a problem here. The CIAWF says that we should have *two* points with population decreases over 3% - for some reason we're not seeing the Northern Mariana Islands in the chart.


Adding a filter to the chart, we can restrict the data to countries with population growth rate rank > 200 (i.e. the countries with the lowest population growth rates).
Adding a filter to restrict the data in the chart

BIRT then correctly included the Northern Mariana Islands:
Look! Northern Mariana Islands!
Removing the filter, Northern Mariana Islands again disappeared. Adding a filter for Population Growth Rank > 5, NMI wasn't on the chart.  > 10 correctly included NMI... In fact PGR filters up to > 8 didn't show NMI, whereas those > 9 did. Previewing the dataset shows that NMI is there too.

Adjusting the join in the dataset to an outer join, rather than an inner join, we get 230 records returned rather than 223. However, the same threshold holds for NMI appearing (or not).

So. It looks as though we're not always getting all the rows back, and for now apparent reason. At some point, this should be revisited in debug mode to see what's going on - that and a proper search through bugs on the BIRT/DTP web-sites.

No comments:

Post a Comment