From version 0.11 onwards, Pig provides the ability to add a default set of statements that could be loaded every time one started pig. During adhoc analysis it is common to use a common set of DEFINE, REGISTER statements to declare the additional jars and UDFs the script requires. You can now create a file (.pigbootup) containing these statements that need to be used each time you start pig.

Let’s see what it looks like. Here is my grunt shell when I start pig in local mode

localhost:pig-trunk pkommireddi$ bin/pig -x local
2013-02-05 22:32:16,171 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-SNAPSHOT (r1442025) compiled Feb 03 2013, 21:36:06
2013-02-05 22:32:16,172 [main] INFO org.apache.pig.Main - Logging error messages to: /Users/pkommireddi/work/pig/pig-trunk/pig_1360132336169.log
2013-02-05 22:32:16,192 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /Users/pkommireddi/.pigbootup not found
2013-02-05 22:32:16,345 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt>

Now I create .pigbootup under my HOME directory with a few entries

localhost:pig-trunk pkommireddi$ cat ~/.pigbootup 
REGISTER 'foo.jar';
DEFINE MY_UDF com.sfdc.BAR();
SET default_parallel 10;

Let’s start pig again

localhost:pig-trunk pkommireddi$ bin/pig -x local
2013-02-05 22:35:48,148 [main] INFO  org.apache.pig.Main - Apache Pig version 0.12.0-SNAPSHOT (r1442025) compiled Feb 03 2013, 21:36:06
2013-02-05 22:35:48,149 [main] INFO  org.apache.pig.Main - Logging error messages to: /Users/pkommireddi/work/pig/pig-trunk/pig_1360132548146.log
2013-02-05 22:35:48,324 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> REGISTER 'foo.jar';
grunt> DEFINE MY_UDF com.sfdc.BAR();
grunt> SET default_parallel 10;
grunt>

You can see that pig has loaded up the 3 statements from .pigbootup automagically.

Location of the file .pigbootup is configurable

Location of .pigbootup is configurable via the property “pig.load.default.statements”. You can add an entry to pig.properties to point to an alternate location

Here I have added an entry to pig.properties to point to a different location

pkommireddi$ grep "pig.load.default.statements" pig.properties 
pig.load.default.statements=/Users/pkommireddi/work/.pigbootup

Time to upgrade to 0.11, there are a few other cool features to check out.

Advertisements