Jonathan D Kelley

Resume & Portfolio


Linux Systems / Python Devops / Site Reliability Engineer

About me

I am a 14 year internet technology veteran with a passion for devops and site reliability. I got my start in the late 1990s in highschool over IRC and designing web pages on Redhat Linux 9 (Shrike). I consider myself an expert at Linux, Kubernetes, Networking, AWS and all the things in between.

Jonathan D Kelley


Jonathan Kelley
37 years old
Kalamazoo, Michigan, USA, Earth

I built this resume on Python Flask and Jinja2 framework using Twitter Bootstrap. Deployed on Kubernetes with ArgoCD.

See my project on Github

Public Github Activity

Loading some data just for you.

Job Experiences

“Protons give an atom its identity, electrons its personality.”
- Bill Bryson, A short history of nearly everything


Mezmo Inc.

June 2020 - Current

Site Reliability Engineer Works on the site reliability team managing site reliability and growth projects, along with on-call rotation

Show Extensive Accomplishments
  • Supports multiple cloud providers (AWS EKS, IBM IKS) managing around 7 terabytes of log ingestion every month across thousands of k8s nodes in 12 datacenters.
  • Deploys software through Terraform, Helm and IBM Razee deployment methods.
  • Manage reliability and patching around MongoDB, Redis and Elasticsearch clusters as large scale StatefulSets in Kubernetes.
  • Sponsored the adoption of LinkerD service mesh (multiple milestones) on Kubernetes to tackle endpoint security and eventually end-to-end observability concerns.
  • Wrote email deliverability performance collection tool for email vendor monitoring key performance SLIs for system performance.
  • Expanding access tools to internal CLI python client (logdnactl) for better integration into backend systems for the SRE team.
  • Regularly contributes to our internal tooling which uses the k8s api and pymongo libraries to manage administrative operations across the product.
  • Built a support dashboard for support to manage and integrated Flask/Rebrow Redis blueprints into the app along with Python-eve (REST toolkit) for full-search MongoORM via REST. All behind python-authlib and OpenIDConnect/Okta for RBAC.
  • Re-wrote the ansible integration for LogDNA logging library for a variety of new features for our customers.
  • Added functionality to support dashboard to look into ElasticSearch field mappings to troubleshoot index limits and indicate growth needs for customers.
  • Developed a proxy request tool for webhooks so support can easily debug webhook payloads.
Kalamazoo, Michigan |

Doximity, Inc.

June 2019 - May 2020

Devops Engineer Worked in a cross-discipline devops team managing everything from Kubernetes via terraform infrastructure-as-code to Chef on ec2 instances. Used best security practices in a strict HIPAA environment with segmented networks and RBAC/IAM policies on AWS.

Show Extensive Accomplishments
  • Built Doximity's first platform under k8s on EKS with Istio using Ansible, Terraform and Helm charts.
  • Refactored Terraform across teams using multiple statefiles using both Jenkins pipelines and Atlantis. Migrated to terraform module patterns rather then sprawled HCL resources.
  • Configured Sensu monitoring scripts for production systems.
  • Wrote / manage cookbooks in Chef as well as the dependencies for patch management and better inspect testing / kitchen tests.
  • Moved Jenkins jobs from traditional EC2 swarm builders over to ECS based pipelines for better resource utilization / cost saving.
  • Built Jenkins automation for Chef jobs using Jenkins Job Builder, and later helped implement JCasC and managed job creation using sandboxed Groovy scripts.
  • Built ansible/terraform based deployment system for Kubernetes and Istio onto EKS so the entire Doximity platform could be migrated.
San Francisco Bay Area |


May 2018 - June 2019

Devops Engineer / Site Reliability Engineer Managed a rapidly growing real estate SaaS platform with over 10,000 agent brokers deployed through Wordpress.

Show Extensive Accomplishments
  • Participates in on-call SRE style rotation for a split Windows / Linux environment.
  • Works with common Amazon AWS toolsets such as ECS, EC2, VPC, ELB, SQS and Lambda's.
  • Subject matter expert and mentor for Linux applications and platform tools.
  • Managed TeamCity build pipelines along with Jenkins for operational tasks.
  • Re-tooled container stack for frontend Wordpress product from 5 year old shell magic to docker-compose v3 + python-paver build process (for gulp/yarn/phpunit)
  • Upgraded php-fpm to PHP 7 from legacy PHP 5 stack, with weighted load canary testing as proof to launch.
  • Works with Varnishd, Memcached, Wordpress Network Sites, HAProxy, MySQL and Docker.
  • Uses Ansible to convert stacks previously hand-hacked into ground-up infrastructure as code.
Charleston, SC |


March 2011 - May 2018

Linux Systems Engineer II Worked on the openstack / cloud servers as a systems administrator supporting the Rackspace Cloud Control panel and backend systems.

Show Extensive Accomplishments
  • Works closely with Cloud integration teams running deployment and maintenance on large-scale backend infrastructure used for the world's 2nd largest Public Cloud.
  • Expertise diagnosing complex Linux application and system problems.
  • Updates and writes deployment templates and syntax for Puppet/Chef
  • Subject matter expert and senior support escalation for Cloud Load Balancers, Cloud Databases, Cloud Compute and infrastructure.
  • Designed migration plans of our Cloud identity token API of multi node environment. Moved identity database cluster and load balancer stack into new subnet while also performing schema upgrades and a software upgrade dependent on schema changes with record setting minimal impacting downtime.
  • Takes direct action to correct cloud infrastructure issues in an oncall 24/7 environment.
  • Has worked on identifying the root cause for key and critical issues in various application stacks throughout Rackspace Cloud and helped identify and report bugs, issues, or aid in fixing with appropriate product development groups.
  • Deploys Tomcat application code releases on a regular basis to staging and production sometimes under tight deadlines and always zero downtime.
  • Experience with the entire Rackspace Cloud suite including parts of OpenStack as well as experience writing and deploying Python applications that utilize public and internal API's.
  • Conceptualized, wrote and built monitoring tools and metrics collection systems never before used to help track health of various products and their subsystems and is currently used as an early warning system to service faults in various API's within Rackspace.
  • Serve as an escalation point for all Rackspace Cloud products in the Rackspace suite.
  • Plans efficient cloud resource consumption in new/existing environments for internal infrastructure.
  • Interview new candidates for jobs in cross-team discipline.
  • Built continuous integration in Jenkins for 5 new products using a python API we wrote to trigger builds on a build server and cycling app deployments.
  • Wrote python SDK for build system platform and deployment system so developers can hook, deploy and release 4 products via REST api from python CLI or jenkins or virtually anything.
  • Manages staging,pre prod and prod environments for complex Ruby web app that 6000+ growing employees use 24/7 to manage the administrative backends for the worlds 2nd largest public cloud.
  • Manages deployments in one-click (okay its actually 3 clicks) fashion using Salt and yum repositories to distribute software and configurations.
  • Moved apps to hosted Mongo service called Objectrocket to keep in-house operations costs at a minimum.
  • Handling deployment and design considerations for next-gen architecture replacing legacy internal controls design from serial to fully async process with a service backend layer to connect to backends/dbs/apis. The next gen is Python 3/Tornado with Angular.js web frontends. This is to replace a traditionally monolithic serial threaded Ruby 1.8.7 application on Phusion Passenger which bundled business logic and just couldn't scale.
  • Seeing weakpoints in ops/qe tests I have written BDD Gherkin style templates in pyhon using "behave" to test any API that is REST. Could be extremely helpful for ops or QA functional end to end testing. It's in my projects section called testvAPI.
  • Help start conversion of 230 or so Jenkins jobs to store them in Yaml and re-deploy with a tool called Jenkins Job builder. This helps enable full Jenkins disaster recovery in <1 hour instead of >2 weeks.
  • Migrated tooling from Puppet2 to Puppet3 with PuppetDB, Hiera, puppetserver, directory environments and wrote a tool to wrap r10k. Security compliant PuppetDB database security model. This reduces complexity and applies patterns we were missing previously.
  • Maintains Ansible modules that allow us to store firewall configurations in git. Built a release pipeline to deploy on git push. This builds a "working documentation" and a git log to make instant rollbacks a snap in production environments.
  • Manages deployment of billing, integration and pubsub environments for the Rackspace Cloud.
Dallas, TX |


February 2010 - March 2011

Embedded Device Support Engineer II Supported on-premise and SaaS cloud-hosted solution for HIPAA encryption. Supported Java Tomcat application as well as Postfix mail system. Managed network security policies and on-premise datacenter networks.

Show Extensive Accomplishments
  • Worked in a HIPAA compliant environment dealing with personal data e-mail encryption with well-recognized companies and government agencies.
  • Worked in accordance with UK data privacy and export laws in accordance to mail administration.
  • Worked as administrator for E-Mail appliance based on Postfix that implemented FIPS complaint e-mail encryption solution with SSL failover to HTTPS webpage-gateway for secure email transmission.
  • Handled telephone support with customers which located the appliance in their own datacenters, often assisting with network troubleshooting in very unfamiliar network environments.
  • Help troubleshoot mailflow issues with different network topologies and network layouts with dozens of companies on a daily basis.
Dallas, TX |


July 2006 - February 2010

Linux Administrator II Performed Linux technical support by phone / tickets. Did professional service migrations, hardware upgrades, cabinet wiring, and performed network maintenence in the datacenter on Cisco Catalyst 6500 series hardware. Supported Cpanel and Flesk control panels.

Show Extensive Accomplishments
  • Planned and executed primary data-center DNS cut-over to use new BIND stack with Linux HEARTBEAT fail-over. Wrote fail-over scripts to handle system failure with zero downtime in DNS lookups.
  • Performed phone and ticket work to meet customer SLA.
  • Managed OS patching, migrations, and upgrades as professional services to customers.
  • Worked as DC OPS managing hardware diagnostics, upgrades and provisioning.
  • Worked as DC network operations handling switch upgrades, troubleshooting, DDoS mitigation or disaster recovery.
  • Diagnosed routing and other concerns escalating to network operations when required.
  • Upgraded Nagios monitoring to latest version and scaled for performance.
  • Upgraded MRTG graphing system for Cataylst switch monitoring to improve bandwidth graph capability.
  • Created SSL monitoring in Nagios for customers who paid for SSL.
Dallas, TX |

C I Host

April 2006 - July 2006

Linux Technical Support 1 Handled generalized Linux phone / ticket based technical support for a web-hosting firm. Supported Miva Merchant shopping, Cpanel and Flesk control panels.

Show Extensive Accomplishments
  • Handled Linux Admin tasks for helping customers install and configure software within the hosting environment.
  • Handled configuration of Merchant and other shopping utilities for customers who ordered the software.
  • Performed work in the data-center across the street when staffing was low, assisting colo clients as well as responding to NOC escalations.
Bedford, TX


“We all have ability. The difference is how we use it.”
- Charlotte Whitton


  • AWS ec2
  • XenServer
  • VMWare ESXSI
  • KVM
  • Openstack

Devops Tools

  • Terraform/HCL
  • docker-compose
  • Vagrant
  • Puppet
  • SaltStack
  • Chef

Build / CI Tools

  • Jenkins
  • Bamboo
  • Teamcity
  • CircleCI
  • Github Actions
  • Gitlab


  • Python
  • Ansible
  • Kubernetes
  • REST Frameworks
  • HTML(5)
  • CSS(3)
  • Observium
  • MongoDB
  • MySQL
  • Postgres
  • AWS DynamoDB
  • Docker
  • Linux
  • Nginx
  • Apache
  • JSON
  • XML
  • Bootstrap Framework
  • SMTP
  • DNS
  • Wordpress
  • Git (SCM)
  • Postfix
  • Nagios
  • Tomcat
  • ElasticSearch
  • Redis
  • Zabbix
  • Networking
  • Ruby
  • Sensu
  • PHP
  • JQuery
  • C/C++
  • Java
  • Golang
  • Oracle
  • Javascript


  • English

Projects / Portfolio

“To give anything less than your best, is to sacrifice the gift.”
- Steve Prefontaine


Weather Station telemetry software for Amateur Radio
This open source software generates packets in AX.25 (ITU-T standard protocol suite for packet-switched data communication) for the purpose of processing weather station telemetry. This telemetry is transmitted over amateur radio and can be heard by stations around the world.


Jenkins configurations and job creation right out of the box.
A showcase of Jenkins configurations and job creation out of the box. Using groovy and the configuration as code plugin, you can create fully viable Jenkins instances on docker.


The backend code for this website!
A showcase of Python/Flask/Jinja2/HTML(5)/Bootstrap/JQuery used to both generate and my hardcopy resume.


A paginator for LogDNA log export
This command will recursively fetch all logs from the logdna export API. This is useful to get logs beyond the 10,000 line limit as the API does not natively provide pagination.


Redis "workbench" style tool in Python/Flask.
Built for the Python developer who needs to look into a Redis store. Allows for inspection and deletion of keys and follows PubSub messages. Also displays some runtime and configuration information.


A ARP cache tool for multi-device CISCO newtorks and search tool.
Simple tool to locate Cisco IP/ARP and display results. Useful for small to medium datacenters (1-100 network devices)


A monitoring plugin for Zabbix.
This was created to monitor HTTP / REST Endpoints under Zabbix.


REST API to drive electronic relays.
This was created to expose a REST API for a relay board microcontroller so we could build light-based alerting systems while at Rackspace.


Creative python suite which runs QA tests using plain English syntax.
Based on the python behave features this project emulates a cucumber style syntax to run HTTP API tests and is able to forward the messages to ELK stack. This uses a language called Gherkin to make plain-english QA testing a breeze.


Common Vagrantfiles I tend to be using
I made this repo because I keep encountering quick environments setup / breakfix I need from time to time.


“If you don't have time to read, you don't have the time (or the tools) to write. Simple as that.”
- Stephen King

Building my last resume, ever, in Kubernetes. An article about this website, my resume!
#resume #python #flask #pandoc #kubernetes #sidecar #docker-compose

Reconnecting after Postgres failover, introductory guide for application developers. A brief article targeted at application developers on how to use reconnection based connection strings using libpq.
#posgres #python #failover

Error Handling from backends to the frontend! The history about error handling in computing, and how a modern developer can handle frontend and backend errors better.
#exceptions #stacktrace #frontend #backend

How to design subnets the right way Regardless if you are running networks in a physical datacenter or across VPCs in the cloud, you only get one chance. Learn how to do it right the first time, every time.
#subnet #networking #vpc #vlan #dnat #snat

Contact Me