When working in a corporate environment, you’ll often have to deal with self-signed certificates that are used to secure internal dev tools like Artifactory or a git server.
I needed to be able to the run Spark’s shell using a different home directory due to a permissions issue with my normal HDFS home directory. A little known feature with spark is that it allows overriding Hadoop configuration properties!
Running Apache Storm in production requires increasing the nofile and nproc default ulimits.
A collection of code snippets and notes from working with linux.
I recently ran into an issue submitting Spark applications to a HDInsight cluster. The job would run fine until it attempted to use files in blob storage and then blow up with an exception:
java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration.
My goal was to create a process for importing data into Hive using Sqoop 1.4.6. It needs to be simple (or easily automated) and use a robust file format.
Importing data from a Relational Database into Hive should be easy. It’s not. When you’re first getting started there are a lot of snags along the way including configuration issues, missing jar files, formatting problems, schema issues, data type conversion, and … the list goes on. This post shines some light on a way to use command line tools to import data as Avro files and create Hive schemas plus solutions for some of the problems along the way.
Spark includes the ability to write multiple different file formats to HDFS. One of those is ORC which is columnar file format featuring great compression and improved query performance through Hive.
I recently decided to migrate my blog from Wordpress to Hugo. I don’t blog as often these days so trying to keep up with new wordpress versions, securing my wordpress installation against hack attempts, and fighting comment spam was becoming overwhelming.
As we gradually replace regular windows command line with powershell, it will be useful to set up a powershell environment for Java / Maven development.
For SQL Server Reporting Services (SSRS) reports there is the ability to create subscriptions. Subscriptions can be scheduled to run on a certain schedule to send emails, export reports to SharePoint document libraries, or save to windows file shares.
Depending on your requirement, you may need to grant or remove access to the subscribe action and can be managed by creating or editing the default SharePoint roles: Read, Contribute, Full Control, etc.