Run Spark Shell with a Different User Directory

I needed to be able to the run Spark’s shell using a different home directory due to a permissions issue with my normal HDFS home directory. A little known feature with spark is that it allows overriding Hadoop configuration properties!

ClassCastException submitting Spark apps to HDInsight

I recently ran into an issue submitting Spark applications to a HDInsight cluster. The job would run fine until it attempted to use files in blob storage and then blow up with an exception: java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration.

Writing a Spark DataFrame to ORC files

Spark includes the ability to write multiple different file formats to HDFS. One of those is ORC which is columnar file format featuring great compression and improved query performance through Hive.