Friday, 18 November 2011

Getting CIAWF data into the reports using a Scripted Dataset

To create a scripted dataset to use with BIRT we need to:
  1. Create a scripted data source
  2. Create a scripted dataset
  3. Add code to the dataset for
    1. Describe the columns in the dataset
    2. Opening  the file
    3. Fetching a record
    4. Closing the file
Creating the dataset and data source is easy. For the data source: right-click the Data Sources in the report; select "New Data Source"; select "Scripted Data Source"; give it a name; finish. And it's pretty much the same for the data set.

We then need code against the data set for the various i/o operations.

Looking at the first file available from the CIA World Factbook country comparisons the data is GDP by country and the columns in "rawdata_2001.txt" are: GDP rank; country name and GDP in US dollars.

In the dataset describe event, we need to return the column names and column types:

try {
    java.lang.System.out.println("desc:"+this.getDataSource().name + ":" + this.name);
    this.addDataSetColumn("gdp_rank","INTEGER");

    this.addDataSetColumn("country_name","STRING");
    this.addDataSetColumn("gdp_usd","DECIMAL");

}
catch (e) {
    java.lang.System.out.println(e);
    throw(e);
};


Note that BIRT exception handling is... variable. Sometimes errors occur and are handled by BIRT (or maybe by DTP) and are not seen by the user. Using try/catch blocks allows us to add additional exception handling ourselves.
Running Eclipse in console mode (eclipsec.exe rather than eclipse.exe) we can output messages using the console  (using java.lang.System.out.println(...);)  allowing us to trace where we are in the code and to make sure that messages are output when exceptions occur.



In the open event, we open a Java FileInputStream to read the data and initialise assorted variables :

importPackage(java.io);

fileopened = false;
try {
    java.lang.System.out.println("open:"+this.getDataSource().name + ":" + this.name);
    ciawffile = new java.io.File("C:\\Users\\username\\Desktop\\Data\\CIA World Factbook 2011\\rankorder\\rawdata_2001.txt");
    filestream = new java.io.FileInputStream(ciawffile);
    endofinputfile = false;
    numrecordsread = 0;
    fileopened = true;
    }
catch (e) {
    java.lang.System.out.println(e);
    throw e;
};
 
return fileopened;

The close event doesn't actually need to do anything (deleting the FileInputStream can be left to garbage collection) - however, tracing when closes occur gives some indication of what BIRT is up to.


if (fileopened) {
    java.lang.System.out.println("clos:"+this.getDataSource().name + ":" + this.name + " (" + numrecordsread + " records read)");
} else {
    java.lang.System.out.println("clos:"+this.getDataSource().name + ":" + this.name + " (open failed)");
};
return true;



The core of the scripted dataset is the fetch event. This reads data from the file and adds it to the current record in the dataset and then returns true. Once all records have been read, the fetch event returns false. For simplicity's sake, we use single-byte reads rather than reading larger buffers and then parsing the data.


importPackage(java.io);










// Check the file has been opened

if (!fileopened) {
    java.lang.System.out.println("<fetch attempt : open failed>");
    return false;
};
// Check the file is not already finished
if (endofinputfile) { // already read to end of file
    return false;
};

try {
    // read bytes to either end of file or end of record (carriage return)
    var buffercontents = "";
    var endofrecord = false;
    var recorddata = new Array();
    var fieldnum = 0;
    while (!endofrecord) {
        var newdatabyte = filestream.read();
        if (newdatabyte == -1) { // End Of File
            endofrecord = true;
            endofinputfile = true;
        }
        else {
            var newdatachar = String.fromCharCode(newdatabyte);

            // Check for CR character indicating end of record
            if (newdatachar == "\r") {

                // Add field data
                recorddata[fieldnum] = formattedColumnData(this, fieldnum, buffercontents);
                buffercontents = "";
                fieldnum = fieldnum + 1;

                // New record
                endofrecord = true;
            }
            else if (newdatachar == "\t") { // TAB character separates fields
                // Add field data                 recorddata[fieldnum] = formattedColumnData(this, fieldnum, buffercontents);
                buffercontents = "";
                fieldnum = fieldnum + 1;
                }
        else {
            buffercontents = buffercontents.concat(newdatachar);
        };
    }
};
}
catch (e) {
    java.lang.System.out.println("f = " + fieldnum);
    throw e;
};
// Was record data set up at all
if (fieldnum == 0) {
    return false;
};

row.gdp_rank = recorddata[0];
row.country_name = recorddata[1];
row.gdp_usd = recorddata[2];
numrecordsread = numrecordsread + 1;

return true;


The above procedure uses the helper function formattedColumnData to convert data from the read string into the appropriate format:


function formattedColumnData(dSet, columnNum, columnData) {

try {

    // NB: First column of dataset is used internally. The actual data
    // columns start at column 1 (rather than 0).
    colType = dSet.getColumnMetaData().getColumnType(columnNum+1);
    if (colType == 5.0) {

        // string, just return the data
        formattedData = columnData;
    }
    else if (colType == 2.0) {
        // strip commas, dollar signs etc. from field and convert to integer
        if (columnData[0] == "$") {
            var crep = new RegExp("[$, ]", "g");
            formattedData = parseInt(columnData.replace(crep, ""));
            }
        else {
            formattedData = parseInt(columnData);
            };
    }
    else if (colType == 4.0) {
        // strip commas, dollar signs etc. from field and convert to integer
        if (columnData[0] == "$") {
            var crep = new RegExp("[$, ]", "g");
            formattedData = parseInt(columnData.replace(crep, ""));
            }
        else {
            formattedData = parseInt(columnData);
            };
    }
    else {
        java.lang.System.out.println(colType);
        formattedData = columnData + " (" + colType + ")";
    };
}
catch (e) {
    formattedData = -1;
    java.lang.System.out.println("F = " + columnNum);
    throw e;
};

return formattedData
}


Doing this, we now get the GDP data into BIRT. Albeit with some manual set up required for column names etc. We can then repeat this process for other CIAWF files.






No comments:

Post a Comment