I’d almost mark this to a new category titled: "Trivia".
At least, that’s what it feels like – particularly when it came to finding out how to do this.
Scenario: Client has a financial package running on an AS/400, IBM OS v5 r4, which has DB2 for the iSeries installed. We are building a lightweight application to do some gap filling on this package so some basic reporting can be done while they wait to move to the new one. New data will be stored in SQL Server, however, necessary key descriptions and financial data points live in the AS 400 – and no one on the staff wants to maintain an SSIS package to pull over just a little bit of data.
So what to do? Connect directly to the DB2 database via .Net and surface up the few necessary data points in the application. Sounds simple, right?
Ah – the other bit in the scenario – -we need to do this using .Net OleDB versus the IBM native .Net driver so we don’t have to configure the production web servers (also being upgraded soon).
Still sounds like it should be simple though – and connecting is.
And that is where you will probably come across text and other variable character data that comes over as binary. This is where the fun begins..
DB2 has some interesting data types for storing text data (reference this link for supported sql data types for the DB2 .Net data provider), and when you run across these in work with DB2, it can be frustrating trying to translate the binary that is coming over into usable text.
One of the keys to resolving the mystery is figuring out what code page the IBM machine is using – in many cases this can be a little known (now a days) and IBM only code page lovingly titled "Extended Binary Coded Decimal Interchange Code" or EBCDIC. In using C# to decode the binary into text, this is a crucial bit of information.
In this instance, I am using EntLib, and the connection is pretty much the same as any DBMS. Below is the pertinent code that decodes the record:
using (IDataReader reader = db.ExecuteReader(System.Data.CommandType.Text, commandtext))
{
while (reader.Read())
{
byte[] b = (byte[])reader[0];
//here is where we tell the Encoder what code page we need
Decoder ebcdicDecoder = Encoding.GetEncoding(37).GetDecoder();
//initialize a char array that will hold the decoded output
char[] c = new char[ebcdicDecoder.GetCharCount(b, 0, b.Length)];
//call into the decoder and get the decoded char array
ebcdicDecoder.GetChars(b,0,b.Length,c,0);
//convert to a string if necessary
string accountCodeId = new string(c);
//do whatever else you want to do with the results now
}
}
When I was first getting the byte array back, a coworker and I were puzzled at the values in the byte array, as they did not match up to most U.S. character sets we would have expected to see—then we stumbled across Wikipedia’s entry for IBM code pages (and EBCDIC in particular) and the pieces started to come together. We were able to look at the integer values coming across in the byte array and see in the code table what the text values were – and that intelligible values were in those arrays.
The next bit was finding what code page on the Microsoft side to use to decode the byte array – which did not turn out to be as easy as I expected, but in the end, I stumbled across this MSDN entry on Host Integration Server 2000 and EBCDIC code page support. With this last bit (the 37 you see in the code on Encoding.GetEncoding(37), I was finally able to get the records translated.
Hopefully this will save some other time, headache and pain pill abuse – please feel free to e-mail me if you have any questions about this or leave a comment!