I am currently converting one of the C# .Net program into a pig script that uses a java UDF.The program processes a million+ gz files into tab delimited text files.
One of the functions in the C# .Net program is to compute the MD5 hash of a few fields in the gz file and convert the hash to a long.I did not find an equivalent function for BitConverter in Java.
Below is the C# code.
private static long ComputeHash(string p)
{
MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
byte[] hash = md5.ComputeHash(Encoding.ASCII.GetBytes(p));
return BitConverter.ToInt64(hash, 0);
}
The Java equivalent is below
private long ComputeHash(String sHashString)
{
Below are the test results of Java program and C# .Net.The input is a column from the gz file and output is the string and its equivalent MD5 hash converted to long.Note that results on the left are from the PIG script that is using the Java UDF and the results on the right are from C# .Net program.
One of the functions in the C# .Net program is to compute the MD5 hash of a few fields in the gz file and convert the hash to a long.I did not find an equivalent function for BitConverter in Java.
Below is the C# code.
private static long ComputeHash(string p)
{
MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
byte[] hash = md5.ComputeHash(Encoding.ASCII.GetBytes(p));
return BitConverter.ToInt64(hash, 0);
}
The Java equivalent is below
private long ComputeHash(String sHashString)
{
MessageDigest md5 = MessageDigest.getInstance("MD5");
byte [] hash = md5.digest(sHashString.getBytes());
ByteBuffer buffer = ByteBuffer.wrap(hash);
buffer.order(ByteOrder.LITTLE_ENDIAN);
return buffer.getLong();
}Below are the test results of Java program and C# .Net.The input is a column from the gz file and output is the string and its equivalent MD5 hash converted to long.Note that results on the left are from the PIG script that is using the Java UDF and the results on the right are from C# .Net program.