Note: This blog post is the third in a series written by our Sr. Web Analyst, Adrian Palacios, and is designed to provide marketers the information and instruction required for installing the programming language, Python. In case you missed it, Adrian’s first post explains why marketers should consider doing so in the first place and his second post explains how to install Python.
When I began learning how to program, I had a lot of trouble with the concept of “data types.” It was always one of the first topics discussed, but also one of the most abstract. Out of frustration from not understanding, I’d usually skip any discussion of data types and rationalize it by telling myself, “What do they even matter?” But after a few years of making programming part of my day-to-day work, data types have (slowly) started to make sense.
What are data types?
Data types are a set of rules that govern what you can (and cannot) do with Python. You experience similar constraints in the real world every day: try paying for a meal with a credit card when the restaurant only accepts cash. Try riding a bike down a river rather than using a canoe or kayak. Or try baking a cake with salt in place of sugar. You could try these things if you really wanted to, but I doubt you’d be happy with the outcome.
Similarly, if you think of programming as baking, then data types are the key ingredients of any Python code, just like eggs, flour, butter, baking powder and sugar are key ingredients for making a cake. Or, maybe you want a pie? Use less flour, drop the baking powder altogether, maybe add in a fruit filling and voila, you are on your way to making a pie. See where this is going? Understanding what each data type is capable of and how to mix them together is key to being a better programmer.
Here are just a few common data types in Python:
Numbers, which are further subdivided into other types, such as:
Integers: 0, 1, 2, 3
Floats: 0.0, 1.0, 2.5, 3.145
Strings: ‘Think Different’, ‘Netflix and Chill’
Booleans: True, False
There are many more data types in Python, but for now we’ll focus on these three.
In this post, you’ll learn some of the things you can do with the basic data types in Python and why data types are important.
In my experience, this tends to be the most boring part of any “Learn How To Program” book/video/online course, but because data types are so crucial to programming, I highly recommend not skipping this post.
The two main types of numbers in Python are integers (whole numbers) and floats. Floats can be thought of as decimals, but there are some big differences, one of which will be discussed later.
The most obvious thing we can do with numbers is…math! Here are some special characters used to do math in Python:
With these basic operations, we can now work through a few scenarios.
Your content team reported that the blog generated 80,000 pageviews two weeks ago and this week it generated 105,000 pageviews; what’s the weekly growth in pageviews? Recall that to calculate the percent change you can do (New Number – Old Number) ÷ Old Number:
Well, that doesn’t seem quite right. Why not? Like any good mathematician, Python followed the order of operations in the statement we typed. Here we can use parenthesis to tell Python the order we really want:
That’s better. Remember, we are looking at a percentage, so in this case, moving the decimal two places to the right will give us what we need. Looks like traffic to the blog has grown 31.25% week-over-week.
You are working with an online publisher to run a special campaign that includes large-format banner and video ads. The publisher sent a proposal stating the package has a fixed cost of $15,000 and they expect it to generate 550,000 impressions; you would like to calculate the CPM to better compare this proposal with other options.
To find the CPM, we use the formula Cost of Campaign ÷ (Total Impressions ÷ 1000):
This time we got the order of operations correct; looks like the CPM for the campaign is approximately $27.27.
Using Python like this is really tedious, especially when typing in long formulas only returns a single metric. While these examples are simplistic, there are ways to apply more complicated mathematical operations to hundreds of thousands (or even millions) of rows of data.
Older versions of Python (version 2.7 and lower) give some strange answers when it comes to dividing integers that would result in a float, such as dividing 1 ÷ 3. In these two examples, we’ve been dividing integers and ending up with floats without any problems, which is part of the magic of using a newer version of Python. It does beg the question: why are numbers with a decimal point referred to as floats rather than decimals? Jackie Kazil and Katharine Jarmul have a great example in their book, Data Wrangling With Python: in many programming languages, 0.1 + 0.2 does not equal 0.3. Try it out yourself: type 0.3 into your terminal, then follow it by 0.1 + 0.2
Weird, right? Mark Lutz goes into more depth about this oddity in his book, Learning Python, but since marketers don’t typically need to calculate numbers to the millionth degree, it’s not critical to fully understand this issue and it’s good enough to simply know it exists. But if you’re really curious, here are some posts that explain more:
Strings can be thought of as characters enclosed in quotes. This is a grossly oversimplified definition because it ignores the nuances between bytearrays, Unicode, ASCII, etc., but I think going into that kind of detail is unnecessary for now.
How does one properly enter characters enclosed in quotes? Turns out there are more than a few different ways. For example, most of the time it doesn’t matter if you use single quotes or double-quotes, so long as you remain consistent:
By beginning with a double-quote in the third string and ending with a single-quote, we encountered an error. You probably wouldn’t mix single and double quotes on purpose, but one thing to watch out for is apostrophes:
Notice that the first string, which was enclosed in double-quotes, handled the apostrophe well, while the second string didn’t. Why? Once Python encountered the second single-quote it expected the string to end, but the letters kept going. Cutting off the sentence silences the error:
But having only part of a sentence is not useful. If you really, really want to use single quotes, one option is to “escape” the apostrophe. That can be done by placing a backslash (\) immediately before the apostrophe in your sentence. This tells Python to treat the next character in a special manner:
Escaping the apostrophe allowed us to keep using single-quotes and an apostrophe in the same string.
There is one other way you can create strings in Python: triple-quotes. Notice that when you type in a multi-line string, you have to hit enter to start the next line; this also changes Terminal to display …: on the left of your new line. This will show up again when we start typing multiple lines of code.
Finally, you might notice some funky characters in the string: \n. This tells the computer where you entered a new line; it’s helpful if you need to print the string again and want to be precise about the formatting.
Ok, now that we’ve spent all that time looking at how to properly enter a string (I know, right? So much work for something so simple…), it’s time to move on to the fun part: manipulating strings.
There are many, many more methods for strings; there are also entire other topics such as pattern matching and slicing that we will address later. With the basics out of the way, let’s work through an actual example.
Let’s say you have new copy that needs to be uploaded into AdWords, but all the copy is lowercase and you’re not sure if the headlines are within the 30 character limit. Let’s check the length and print the headline “cheapest flights to paris” to title case:
The first example is technically a function, whereas the second is a method. The only thing we care about right now is that each is typed in differently: with a function we first type “len(”, then the string itself, and finally the last parenthesis “)”. The good news is that the headline is within the 30 character limit enforced by AdWords.
Next, with the method, we first type the string, then add .title() after the closing quote (with no spaces!).
Finally, you might be asking yourself “Why am I doing this in Python when the same formulas are available in Excel?” That’s a fair point.
While messing with headlines is a silly example, I hope the point is clear: there are many options available to manipulate text with Python. And similar to the previous scenario with calculating CPMs or percent change, being able to manipulate thousands of characters of text in just a few lines of code is a very powerful tool to have handy.
For example, the biggest headache this has saved me is cleaning up millions of rows of URLs from Google Analytics. The second point of this exercise is to point out something that may save other pain in the future: if you know how to use formulas in Excel or Google Docs, you are already one step ahead in understanding how to program. There are plenty of similarities that will translate from Excel into Python.
The two main booleans in Python are True and False. Their meaning is really straight-forward: True means true and False means false.
Until we dive into using logic in Python, it’s best to explain booleans through an analogy. When you create a Facebook video ad campaign, you need to decide what creative to use. Should it be the hip new video that’s aimed at brand awareness, or should you stick with the old but battle-tested video that has a clear call to action? It depends on what you’re trying to accomplish, right? Similarly, there will be a point where you will need to create some logic to tell your computer which path to follow, and booleans are one way to accomplish that task.
There are other objects in Python that can act like booleans, but for now just introducing True and False will suffice. We’ll deal with this subject more in-depth with a future post.
At the beginning of this post, I mentioned that when I first tried learning about data types the process was tedious and subject matter dull. Just like the impatient teenager in high school geometry class goading a teacher with the question of “When will I ever need to know this useless junk?”
I felt going through all these Rules about data types was pointless. But I’ve grown to appreciate the rules because when you try to break these Rules (most of the time), an error will occur. If you are a responsible programmer and test your code ahead of time, running into these errors can save you from making critical mistakes in your code when it really matters. Let’s explore some ways in which errors could surface.
In the scenarios where you practiced math with number types, did you notice no commas were used when doing these calculations? You are probably used to typing numbers with commas (or decimals for our European friends!), but if you try that in Python, you will encounter some odd behavior:
Hmmm; not at all what you expected. By placing a comma in this number, we unknowingly created a “tuple;” it’s not necessary to know what a tuple is at the moment, but is important to see that the comma split the number into 500 and 0, which is much different from five-hundred thousand.
This example also highlights a crucial aspect of writing code: just a single mistyped character can cause big problems. Proofreading code is difficult at first, but with practice, you will get better. Unexpected results or errors are nothing to panic about; it can feel scary, but it usually only means something was lost in translation.
One rule you’d probably expect is that when trying to add an integer and a string you’d get an error:
And you would be correct. In the second line of code we tried to add the integer 1 to the string ‘1’ which resulted in a error. You may be thinking to yourself “That’s a lame example; when would someone ever try to add an integer to a string?” You’d be surprised how many APIs return numbers as a string. There are very good reasons for doing this, but when you are dealing with an API, presuming that a number will be a number is not a safe bet. One great example is the newest Google Analytics Reporting API. Take a look at the screenshot of the data Google Analytics gives you when you request a report. Surprise! All of the numbers (look at the “values” fields) are enclosed in quotes.
That kind of throws a wrench into things, doesn’t it? It would, but if you read the API documentation ahead of time, you’ll be expecting this issue. Luckily there are also some tools in Python that can help us, such as the int() function:
By placing the string ‘1’ inside the int() function, we are telling Python we want to treat this like an integer; now you can add these numbers together and get on with your life.
These are just a few possibilities of how data types can determine what you can do in Python, but they also demonstrate that there’s almost always a way to work around these issues. Please reach out to me on twitter if you have any questions about data types.