Crons in AWS Elastic Beanstalk

October 23, 2020

It’s as easy as adding the following to your .ebextensions folder:

mode: "000755"
owner: root
group: root
content: |
root . /command/to/run

Running a cron on one server?

mode: "000755"
owner: root
group: root
content: |
INSTANCE_ID=`curl 2>/dev/null`
REGION=`curl -s 2>/dev/null | jq -r .region`

# Find the Auto Scaling Group name from the Elastic Beanstalk environment
ASG=`aws ec2 describe-tags --filters "Name=resource-id,Values=$INSTANCE_ID" \
--region $REGION --output json | jq -r '.[][] | select(.Key=="aws:autoscaling:groupName") | .Value'`

# Find the first instance in the Auto Scaling Group
FIRST=`aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names $ASG \
--region $REGION --output json | \
jq -r '.AutoScalingGroups[].Instances[] | select(.LifecycleState=="InService") | .InstanceId' | sort | head -1`

# If the instance ids are the same exit 0

mode: "000755"
owner: root
group: root
content: |
/usr/local/bin/ || exit
root . /command/to/run

Crons not running on Amazon Linux?

When placing cronjobs in /etc/cron.d do not use dots or hyphens in the file name.

Amazon Aurora Row size limit.

October 31, 2018

We started using Aurora as soon as it was released. With the obvious pros over using MySQL. There’s never been an issue running our production workloads.

Recently we encountered an issue that one would have assumed would be a non-issue in Aurora. The row size of a column is limited to a little less than 8KB. Our application had a description text area that needed to accept a little more than 8KB of data. [1]

“Amazon Aurora MySQL does not support compressed tables that is tables created with ROW_FORMAT=COMPRESSED, hence we request you to change the row_format to DYNAMIC in BARRACUDA file format” [5]

What I found on the forums wasn’t helpful. The solutions there were a no go, use Oracle lol. [6]


The issue occurs when the file format chosen by InnoDB is ANTELOPE. (Please note that ANTELOPE is the original InnoDB file format and currently this file format supports COMPACT and REDUNDANT row formats for the InnoDB tables and this currently is the default file format in MySQL 5.6. This is kept as default to ensure that the earlier MySQL versions have maximum compatibility with 5.6 MySQL. Please note that this is the case with MySQL also which leverages ANTELOPE as default file format.)

Currently InnoDB has another file format called BARRACUDA which is the newest file format for InnoDB and this currently supports all InnoDB row formats including the newer COMPRESSED and DYNAMIC besides the COMPACT and REDUNDANT row format from ANTELOPE file format. The COMPRESSED and DYNAMIC row formats include compressed tables, efficient storage of off-page columns and index key prefixes up to 3072 bytes [4]. So when you create a table with row format in DYNAMIC or COMPRESSED, InnoDB can store long variable-length column values for VARCHAR, BLOB and TEXT data types fully off-page. [4]

Having said this please kindly note that both DYNAMIC and COMPRESSED row formats support index key prefixes up to 3072 bytes, to get the prefix of more than 768 bytes we will have to use turn the innodb_large_prefix parameter to ON that can give key prefixes up to 3072 bytes. DYNAMIC row format can maintain the efficiency of storing the entire row in the index node, however, this format avoids the problem of filling B-tree nodes with a large number of data bytes of long columns.


1. By default innodb_file_per_table parameter is set to ON in Cluster Parameter group, you can have a different custom cluster parameter group just in case if would like to change the collation and character sets in future, for now even with your default cluster parameter group the innodb_file_per_table parameter is set to ON.

  • The default values are innodb_file_format = Antelope [2] and innodb_large_prefix = 0 [3]

2. Tweak these respective parameters in a custom DB parameter group called ‘aurora56’

  • innodb_file_format value to BARRACUDA
  • innodb_large_prefix value to 1 so that you can get up to 3072 bytes for key prefixes.

Assign the custom DB parameter group to you Aurora instance.

3. Perform the ALTER TABLE after changing the values of parameter groups as suggested earlier and tag the custom parameter groups to your instance and after ensuring that the parameter groups have taken into effect you may perform the below.



This solution was made available by AWS Support. And I thought it sounded pretty smart for a workaround. Credit to Ashika that picked up my ticket.


[1] Limits on InnoDB Tables:

[2] innodb_file_format:


[4] InnoDB File-Format Management:

[5] Compressed tables not supported:

[6] Forums.

AWS Elastic Beanstalk remove health checks from Apache logs.

June 5, 2018

We running a PHP web application using AWS Elastic Beanstalk. Been looking at improving our monitoring using CloudWatch logs. Unfortunately the logs are polluted by our health checks originating from our ELB.

Here’s a quick guide on how to stop this.

Using the .ebextensions add the following file into your project (If you haven’t heard about EB extensions, read here.):


mode: “000644”
owner: root
group: root
content: |
SetEnvIf Request_URI “^/elb-health-check\.php$” dontlog
SetEnvIf Remote_Addr “127\.0\.0\.1” dontlog
SetEnvIf Remote_Addr “::1” dontlog

command: sed -i.bak ‘s+CustomLog “logs/access_log” combined.*+CustomLog “logs/access_log” combined env=!dontlog+’ /etc/httpd/conf/httpd.conf
command: apachectl configtest
command: service httpd restart

Change the health check URL in the Request_URI, my health check is at /elb-health-check.php. Then simply eb deploy and you should have a clearer view of your logs. 😀

Articles that helped:

AWS ELB in Apache logs

Apache Logs in AWS

View at


Passed AWS SA Pro!

April 27, 2017

Screen Shot 2017-04-27 at 11.02.40 PM

After just over a month(1 week full-time) of serious studying and stressing like crazy, I’ve passed my AWS SA Pro exam. Having received the exam reminder in October 2016, I did what most developers would do and left it to the last. The last day to be exact. Adding ample pressure to get it right the first time. I mean, who wants to write the Associate exam again?!

Google was my friend. First have a read at other more qualified bloggers than myself:


Have a google and there would be plenty. I wont repeat everything they’ve said. Just a few notes for those looking to get certified soon, maybe it helps, maybe not.

Also, if you have the money,$99), and definitely$29/m) is worth a look. If I had to choose one, it would be (free 7 day trial 🙂

My two cents is on time management. Before I started the exam I just made a column on my test paper to keep time with:

Question Time
10 2:35
20 2:20
30 2:05
40 1:50
50 1:35
60 1:20
70 1:05
80 50

You really have no time to waste! And rather move on than waste even an extra minute. I’m not the quickest reader, but I couldn’t believe I didn’t make a second pass on that exam. That rarely happens with an online MCQ exam.

Also, a month part-time is not enough. I was eventually forced into taking a week leave to study full time. As I had a feeling the content was to much. It is not difficult, just the amount of material is pretty hectic.

I hope that helps someone else, with enough time, to make better choices than I did and keep some of their hair intact. 🙂


DMS instead of Datapipeline

April 27, 2017

In a previous post I detailed my trials and fails with using Datapipeline to archive Aurora tables to Redshift. This led to a comment from rjhintz about using AWS DMS instead. I initially went with Datapipeline because we would eventually truncate the source table and would not want that replicated to Redshift, deleting the archive data. But I would still take some timeout to checkout the DMS service.

AWS DMS initial use case is to help people migrate to AWS from an on-premise DB installations. As it says in the name 😉 My use case would be for archiving. We initially used Datapipeline to achieve this. The setup was pretty tedious! To say the least. Once it is up and running, we still have to check that the jobs have completed correctly and that nothing has gone wrong.

This weekly checking had become a chore. This is where DMS comes in. We only have one table that needs truncating, whereas all it’s related tables simply need to be in-sync. It took a day to get up to speed with DMS, after that it we migrated all our Datapipelines except for one to DMS.

Create an instance that has access to your source and target databases. This was needed in Datapipeline as well. It’s just much much easier in DMS. No AMIs needed with larger instance stores.

Screen Shot 2016-08-13 at 10.41.56 PM.png

Create your database endpoints.

Create a task that will migrate and then keep in sync your desired tables.

Screen Shot 2016-08-13 at 10.44.15 PM.png

That’s it. Done.

AWS re:Invent 2016 with Emirates

January 31, 2017

Thought I would rub it in and share the Emirates Business Class flight with my local buddies! It’s truly amazing.



Lounge Brekkie.


Time to chill


Legroom. (You can’t see my legs!)


Pre-flight medication.


Bar in the back


Some more medication


Over Cape Town

Truly had my head in the clouds.

AWS Summit Notes

July 12, 2016


What a great first AWS Summit on African soil!

2016-07-12 08.14.29

Even had the honour of getting a quick pic with the legend himself 🙂

What a massive turn out as well! 600+ people turned up. Never knew the interest in AWS was this big in Cape Town.

We had great talks from Mix telematics on how they manage 66TB data.


2016-07-12 09.39.48

How Jumo world came to trust AWS security in the unsecured lending space. Some points I took with me were around compliance reporting. Which I might need in the future.

Other key points of discussion were ECS and their Lambda service. Lambda shows real promise in the Micro Services space. The Serverless Architecture demo was great. Non technical, but reducing the cost of running a website from $4000 to $13. Really got a lot of oohs and aahs.

There was a voice deploy demo using Amazon Echo. The backend was powered by AWS Lambda. It in turn triggered a CodeDeploy job to update code using a voice command. Some cool shit!

2016-07-12 16.28.59.jpg

Werner ended the day off with a talk on Machine Learning (Kudos for his AVB t-shirt and red chucks). Discussing how Machine Learning has become the differentiator in some spheres of business.

2016-07-12 21.22.43

All in all it turned out to be a good day. We even had some time to mingle over a beer or 6 and exceptional wines from Saronsberg 🙂

AWS Summit Cape Town

July 11, 2016


Yes you heard correct! AWS Summit is coming to little old Cape Town! The heart of South Africa. Probably the home of EC2 and a few other services. Not to mention a great place for developers to work,live and play. 🙂

If you haven’t already registered, it might not be to late.

Dr. Werner Vogels will be the keynote speaker. That should get you there by itself. Him being the CTO of Amazon and all. 😉

Looking forward to hear some cool announcements pertaining to Africa. Maybe an edge location in the making?

AWS re:Invent 2016

July 6, 2016

The dates have been set! Time for the largest gathering of AWS users in the world. If you are lucky to attend this year, there’s a few things that you should know before going.

Book the closest hotel.

That would be the Venetian. As it all goes down there. Second choice would be the Mirage. The Encore/Wynn are very classy hotels, but they are quite a walk away from the Venetian.

Arrive early.

Try and be there a day early. So you can adjust to the weather etc. And the side-effects of jet lag.

Don’t try and walk!

Everything is damn far from everything. Rather jump on a bus and walk from the nearest stop. You can get a 3 day bus pass instead. +-$22. Money well spent.

Get connected.

Get yourself a sim card for your phone. There’s wireless in the Venetian, but not all places outside are free. Install the official app to stay in touch with conference activities.

Nothing good happens after 12.

If you would like to partake in the majority of events, get back to your room by 12. (That’s my age talking) Always have a coupling with you. Good to have buddies nearby.


It’s year end. And if you looking for bargains, there are bargains to be had. Mainly off The Strip though. You might even decide to pack light, and get there a day early to shop for some jean pants and the like 🙂


Use Tripadvisor to see what’s cool to do in the area, if you have spare time. There’s a ton of stuff to do! (Besides the obvious! 😉

Related Links:


Dealing with Data Pipeline Failures

July 6, 2016

In my previous post on migrating a large amount of data I walked through the challenges faced when using AWS Data Pipeline. Now that we have our Data Pipeline job running weekly, we decided to copy related data tables to Redshift as well. This came with it’s own challenges. We use the Incremental copy of RDS MySQL table to Redshift template supplied by AWS.


Unlike the, now 600GB+ table, the related table schemas have ids. This makes a difference. It’s somewhat easier to deal with duplicate entries. You can define the id key in the Data Pipeline job and it will overwrite existing values in Redshift.

Set the id as the Redshift table distribution key. If the id is not the distribution key, set the id as one of the Redshift table sort keys.

Screen Shot 2016-07-06 at 9.48.44 AM

Choose the proper insert mode. Select OVERWRITE_EXISTING.

Screen Shot 2016-07-06 at 9.51.40 AM

This should avoid the insertion of duplicates. As Redshift creates a staging table before inserting into the original table. Then copies unique values into original table.

Tables with No IDs

What if your table has no IDs though and you are not sure if you have duplicates. To check if duplicates exist you can run the following on Redshift:

SELECT count(*) FROM (
          order by COL_1, COL_2, COL_n) AS r,
       from TABLENAME t
     ) x
where x.r > 1;

Replace COL_1 to COL_N with you columns and replace TABLENAME with your table name. A bit tedious to type all your column names, our table has 11. (Source)

Oh No! duplicates! What to do, if you do have duplicates? Delete them of course! Easier said than done though. You can’t delete the duplicates in the table. A temp table has to be created with unique rows only. And depending on your table size, this can be a real issue. We’ve have to resize our cluster to accommodate removing duplicates. I’ve posted a feature request.

To remove duplicates  run the following on Redshift:

lock table public.TABLENAME;
create table if not exists public."TABLENAME_mig" (
alter table public."TABLENAME_mig" add primary key (COL_1);
insert into public."TABLENAME_mig" SELECT COL_1,COL_2, COL_N FROM (
      from TABLENAME t
    ) x
where x.r = 1;
analyze public."TABLENAME_mig";
drop table public.TABLENAME cascade;
alter table public."TABLENAME_mig" rename to TABLENAME;
vacuum delete only;

Replace COL_1 to COL_N with you columns and replace TABLENAME with your table name. (Source)

Character encodings

Another issue that we constantly encounter is importing the S3 file to Redshift. This does not always succeed. As the data, which is plain CSV, might have encoding issues. In our case it was null-byte characters. Have a look at how to prepare your input data.

Screen Shot 2016-07-06 at 10.34.34 AM.png

The problem is we don’t have control on the options that dump the records to CSV. The table records has been written to CSV and stored on S3 already. We can check the errors and evaluate what to do next.

Running the following to check for errors on Redshift:

select * from  stl_load_errors order by starttime desc

Produces output similar to:

Missing newline: Unexpected character 0x6f found at location 31                                     

In the err_reason column.

To resolve, check your Activity Logs of the RDSToS3CopyActivity to see the CSV S3 file name.

Screen Shot 2016-07-06 at 10.41.25 AM

You should see the S3 file name at the end of the Activity Logs, it usually starts with s3://.

Now that you have the S3 location, execute the following on Redshift:

FROM 's3://PATH/TO/S3FILE.csv'
CREDENTIALS 'aws_access_key_id=ACCESS_KEY;aws_secret_access_key=SECRET_KEY’
NULL AS '\0'

Replace s3://PATH/TO/S3FILE.csv with the S3 location you got from Activity Logs and replace TABLENAME with your table name and add your access credentials ACCESS_KEY and SECRET_KEY. (Source)

Of course, you could have your application not saving saving NULL bytes. But this is not always possible.

Multiple NULL Bytes

In the rare case that you get a shitload of NULL bytes in your data, the solution above will not work. As the NULL as command only seems to replace one NULL byte. What I currently do is download the S3 file to an EC2 instance. (Much quicker on EC2) Then replace the NULL bytes and upload the file to S3.

aws s3 cp s3://S3FILE.csv .
tr < S3FILE.csv -d '\000' > S3FILE_OUT.csv
aws s3 cp S3FILE.csv s3://S3FILE.csv


For now those are the only issues that I have encountered with my weekly, monthly job runs. Hopefully this saves somebody else some valuable time. In the end the challenges far outweigh the rewards! Our data scientists are now able to do queries that took hours to days on Aurora in minutes to seconds on Redshift.