I am currently in the process of migrating data from Oracle to SQL Server and I'm encountering an issue trying to validate the data post-migration.
- Oracle 12 - AL32UTF8 character set
- Client - NLS_LANG - WE8MSWIN1252
- VARCHAR2 field
SQL Server 2016
- Latin1_General_CI_AS collation
- NVARCHAR field
I'm using DBMS_CRYPTO.HASH on Oracle to generate a checksum of the whole row, then copying to SQL and using HASHBYTES to generate a checksum of the whole row, which I'm then comparing to validate the data matches.
The checksums match for all rows, except those with multibyte characters.
For example, rows with this character: ◦ do not match in the checksums, even though the data is transferred correctly. When I use DUMP in Oracle or convert to VARBINARY in SQL Server the data matches exactly except for the bytes for this character.
In SQL Server, the bytes are 0xE625 and in Oracle they are 0x25E6.
Why are they ordered differently, and is there a reliable way to convert one to the other to ensure the checksum at the other end matches for strings with multi-byte characters?