Kit Menke is a Data Engineer / Software Architect from St. Louis, MO USA. He works mostly with open source Big Data technologies like Apache Hadoop, Apache Storm, Apache HBase, Apache Hive, and Apache Spark.
He is a member of the St. Louis Hadoop User group and has spoken at multiple conferences (DataWorks Summit and StampedeCon).
Running Apache Storm in production requires increasing the nofile and nproc default ulimits.
I recently ran into an issue submitting Spark applications to a HDInsight cluster. The job would run fine until it attempted to use files in blob storage and then blow up with an exception:
java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration.
My goal was to create a process for importing data into Hive using Sqoop 1.4.6. It needs to be simple (or easily automated) and use a robust file format.
Importing data from a Relational Database into Hive should be easy. It’s not. When you’re first getting started there are a lot of snags along the way including configuration issues, missing jar files, formatting problems, schema issues, data type conversion, and … the list goes on. This post shines some light on a way to use command line tools to import data as Avro files and create Hive schemas plus solutions for some of the problems along the way.