| Sample data from CIAWF raw data displayed in Notepad2. |
The BIRT data providers are part of the Eclipse Data Tools Platform (DTP) project. Examining the code (FlatFileQuery.java) reveals that the flat file data source does indeed rely on the presence of a LF (\n) to mark the end of a record.
| Useful comment saying that, yes, LF is necessary |
| The code that means CIAWF won't load. |
They do, however, at least indicate that their approach to tab-separated files is based on a spec for comma-separated (CSV) files... I guess it started life reading CSV and then expanded to handle tabs and semicolons and suchlike!
| Another useful comment indicating where the base definition for the flat-file formats comes from! |
Quite why the BIRT UI doesn't offer all the usual types of file import options for field and record terminators is beyond me. Especially given the humungous list of character sets that you can choose from for your data.
Anyway, it doesn't suit the data so there are 3 options:
- Massage the data so that all the CRs are LFs
- Hack / copy-and-edit the flat file dataset to handle CR terminated records
- Use the Scripted Dataset instead.
Given that there are 10 years files in the CIAWF data and I'd like to point it at any CIAWF file my initial experiments are going to concentrate on (iii) and seeing what needs to be done to load the data using JavaScript (or Java... but most likely JavaScript). However, updating the FlatFile dataset to be more flexible is a tempting future project!
No comments:
Post a Comment