Friday, March 25, 2016

Handling $ sign in Pig

Today I came across a task of calculating min value in a dataset. Though the task was straight forward, the issue was that the data had $ signs in them.Loading these fields using PigStorage was causing data loss. In order to handle this I had to use regular expressions to remove the $ sign perform the necessary aggregate functions and get the results. 

Input:

A,$820.48,$11992.70,996,891,1629
A,$817.12,$2105.57,1087,845,1630
B,$974.48,$5479.10,965,827,1634
B,$943.70,$9162.57,939,895,1635

PigScript:

A = LOAD 'test5.txt' USING TextLoader() as (line:chararray);
A1 = FOREACH A GENERATE REPLACE(line,'([^a-zA-Z0-9.,\\s]+)','');
B = FOREACH A1 GENERATE FLATTEN(STRSPLIT($0,','));
B1 = FOREACH B GENERATE $0,(float)$1,(float)$2,(int)$3,(int)$4,(int)$5;
C = GROUP B1 ALL;
D = FOREACH C GENERATE CONCAT('$',(chararray)MIN(B1.$1)),CONCAT('$',(chararray)MIN(B1.$2));

DUMP D;

Output:



Tuesday, March 8, 2016

Split string using Capital letters

I came across a question to split a string using capital letters.It was made up of names.

For Example:

s = "AaliyahAaronAarushiAbagail"

Expected Output:

Aaliyah
Aaron
Aarushi
Abagail

Below is the code

C#

using System;
using System.Text;

namespace StringSplitUsingCaps
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                string s = "AaliyahAaronAarushiAbagail";
                string s1 = s.Substring(1, s.Length-1);
                StringBuilder sName = new StringBuilder(s.Substring(0,1));
                foreach (char c in s1.ToCharArray())
                {
                    if (!Char.IsUpper(c))
                        sName = sName.Append(c);
                    else
                    {
                        Console.WriteLine(sName);
                        sName.Clear();
                        sName.Append(c);
                    }
                }
                Console.WriteLine(sName);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
            Console.WriteLine("Done");
            Console.Read();
        }
    }

}

Output:







Python:

s = "AaliyahAaronAarushiAbagail"
name = s[:1]
s = s[1:]

for c in s:
if(c.isupper()):
print(name)
name = ""
name += c
else:
name += c

print (name)

Output: