Kit Menke is a Data Engineer and Practice Lead from St. Louis, MO USA. He works with open source Big Data technologies like Apache Hadoop, Apache Storm, Apache HBase, Apache Hive, and Apache Spark.
He is a member of the St. Louis Hadoop User group and has spoken at multiple conferences like DataWorks Summit and StampedeCon.
My latest project in AWS has involved AWS API Gateway and AWS Lambda deployed using Serverless Framework. The integration between AWS Lambda and API Gateway is a little tricky to get exactly right, so I’m documenting some key findings here.
I was working a Scala big data project with a lot of dependencies. Most of the time the dependencies exist on the server already or get packaged up inside an uber jar. In this case, the server was missing the dependencies and I didn’t want to package everything in the uber jar due to super slow upload speeds. This combined with the need to do a bunch of trial and error meant I wanted to upload all the dependencies to the server and then just tweak my few lines of code. I ended up using a nice maven command combined with a few lines of python to gather all of them up in one place.
When working in a corporate environment, you’ll often have to deal with self-signed certificates that are used to secure internal dev tools like Artifactory or a git server.
I needed to be able to the run Spark’s shell using a different home directory due to a permissions issue with my normal HDFS home directory. A little known feature with spark is that it allows overriding Hadoop configuration properties!