Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cobrix returning hexadecimal value in different format (qb) #625

Open
chsnarayana opened this issue May 23, 2023 · 5 comments
Open

Cobrix returning hexadecimal value in different format (qb) #625

chsnarayana opened this issue May 23, 2023 · 5 comments
Labels
question Further information is requested

Comments

@chsnarayana
Copy link

chsnarayana commented May 23, 2023

Hi All,

When I am reading hexadecimal data from Cobol file using Cobrix, the output is in a different format. I have tried to cast it using Spark SQL and Pyspark. But there is no use.

The data type defined in copy book is PIC X(06). In most cases, it is getting converted to "qb" with one or 2 spaces after that. Can anyone please help me with this? Whereas Abinitio reads the same data as "qbXXX". Some characters after "QB". But in Abinitio they were able to cast that value to hexadecimal.

As the data we are reading is highly sensitive, we could not able to get it from our customer. So I couldn't able to share the file here.

Thanks in advance,
Narayana

@chsnarayana chsnarayana added the question Further information is requested label May 23, 2023
@yruslan
Copy link
Collaborator

yruslan commented May 24, 2023

There could be several reasons for such behavior. Possibly, the data is in a different code page (not EBCDIC common).
You can use .option("debug", "hex") to see raw bytes of each field to debug the decoding process.

Please, provide an example value and corresponding value in '_debug' field. I know it is sensitive, hope a single QBxxxx number is not

@chsnarayana
Copy link
Author

Hi Ruslan,

    Thank you for your quick reply. I have used the debug option. Then the output is as follows 

"column":"qb","column_debug":"988200007600" .

@yruslan
Copy link
Collaborator

yruslan commented May 24, 2023

So EBCDIC 98 is q, 82 is b, but 00 nor 76 do not correspond to any character in EBCDIC common encoding (https://en.wikipedia.org/wiki/EBCDIC).

What output you expect for this column? What abinitio shows for this field and this record?

@chsnarayana
Copy link
Author

Hi Ruslan,

I got some information from the customer.

Ab initio reading data "988200007600" in the following format: "qb\x00\x00\x00\x00"

Then they use some function to cast it back to 988200000000

Transformation used: (decimal(12)) reinterpret_as(packed decimal(12,stripped), )

@yruslan
Copy link
Collaborator

yruslan commented Jun 7, 2023

I see. One workaround that you can use is: .option("binary_as_hex", "true") and make the field:

PIC X(06) COMP.

This will only work with the latest Cobrix (2.6.8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants