Hugo Bitbucket Pipeline to Deploy to S3
Update 11/11/2024: The hugo deploy
command was introduced which allows you to handle this entirely using Hugo. The pipeline below
could probably still work but I’m no longer using it.
Previously, this post described how to create a working Bitbucket pipeline for deploying a Hugo static site to AWS S3.
Integrating AWS Lambda with API Gateway in JavaScript
My latest project in AWS has involved AWS API Gateway and AWS Lambda deployed using Serverless Framework. The integration between AWS Lambda and API Gateway is a little tricky to get exactly right, so I’m documenting some key findings here.
How to list every Maven Dependency for a Project
I was working a Scala big data project with a lot of dependencies. Most of the time the dependencies exist on the server already or get packaged up inside an uber jar. In this case, the server was missing the dependencies and I didn’t want to package everything in the uber jar due to super slow upload speeds. This combined with the need to do a bunch of trial and error meant I wanted to upload all the dependencies to the server and then just tweak my few lines of code. I ended up using a nice maven command combined with a few lines of python to gather all of them up in one place.
Dealing with Self-signed Certificates
When working in a corporate environment, you’ll often have to deal with self-signed certificates that are used to secure internal dev tools like Artifactory or a git server.
Run Spark Shell with a Different User Directory
I needed to be able to the run Spark’s shell using a different home directory due to a permissions issue with my normal HDFS home directory. A little known feature with spark is that it allows overriding Hadoop configuration properties!
Apache Storm and ulimits
Running Apache Storm in production requires increasing the nofile and nproc default ulimits.
ClassCastException submitting Spark apps to HDInsight
I recently ran into an issue submitting Spark applications to a HDInsight cluster. The job would run fine until it attempted to use files in blob storage and then blow up with an exception: java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration
.
Creating an Avro table in Hive automatically
My goal was to create a process for importing data into Hive using Sqoop 1.4.6. It needs to be simple (or easily automated) and use a robust file format.
Importing data from a Relational Database into Hive should be easy. It’s not. When you’re first getting started there are a lot of snags along the way including configuration issues, missing jar files, formatting problems, schema issues, data type conversion, and … the list goes on. This post shines some light on a way to use command line tools to import data as Avro files and create Hive schemas plus solutions for some of the problems along the way.
Writing a Spark DataFrame to ORC files
Spark includes the ability to write multiple different file formats to HDFS. One of those is ORC which is columnar file format featuring great compression and improved query performance through Hive.