Don't you love data?

Data Science Blog

Python Data Types. Tutorials For Mike Series

October, 2020

If you are completely new to programming, this is a very important topic to cover. Before we proceed to various data types, here are a few useful things to know:

  • You can use print() function to print things. If you are playing in the console/terminal, this function may not necessarily be helpful, but as you write your code in a code editor, sometimes, it may be helpful to explicitly print out various statements (can be really helpful when you are trying to debug something)
  • To make your code readable and reusable, it's advisable to add comments to your code. In Python, adding # in front of a statement, will comment out that statement and it will be ignored by Python. This can also be helpful when you try to debug your code, and need to eliminate certain parts to see if the rest works. Check out short keys in your editor/operating system to comment/uncomment things out.
  • You can assign a value to a variable in python by using =. This allows you to just use a variable name later in your code rather than typing a value over and over again. It's a good practice to use descriptive names when you create variables. Be careful not to unintentionally reuse a variable name in your code. A reminder, variable names cannot contain special characters aside from _ and can't start with a number.
  • You can use = to assign values to variables, but if you want to check if two values are equal to each other, you would need to use ==. I can't tell you how many times I forgot about it and my code returned an invalid syntax error. I'll remind you about it in the next posts.

Numeric Data Types

Two most common numeric data types are integers and floats. There are also complex numbers, but I've never used one in Python. One of the main purposes of types like integers or floats is to perform mathematical operations.

Here are a few basic examples that you can try to recreate in your editor.

print(4) 2 + 2 returns 4 int_example = 2 print(int_example) returns 2 int_example = 4 reassigned the value of x to 4 print(int_example) returns 4 print(type(int_example)) this is int print(type(0.5)) this is float

Strings

When working with strings in Python, you can use either single or double quotes, but do remember to use quotes, otherwise Python won't know what you are trying to do and return an error. You can do a lot of cool things with strings which I'll cover in the next post.

print("hello") string_example = "Hi" print(string_example) returns Hi print(type(string_example)) this is str

Boolean

Boolean can only take two values: True or False. These data types are used when you want to evaluate two expressions. I'll talk more about Boolean values in the Conditional Logic post.

print(10>9)this returns True print(10==9)this returns False

Data types mentioned below are also built-in, but comparing to the ones mentioned above, they are a bit more complex. You can't successfully code in Python without knowing these data types.

Lists

If you are familiar with other programming languages, you may know lists as arrays. These are very useful data objects. List is a collection of data elements. To tell Python that something is a list, you would need to use [] and separate each element with a comma. A few useful things to know about them.

  • You can have a list with only one element. You may also sometimes want to initialize an empty list.
  • You can add, remove, and modify elements within a list.
  • Lists are ordered collections which means that you can select a certain element from a list by specifying its index.
  • The order in a list starts with 0, so if you were to select the first element in the list, you would specify index of 0.
  • You can provide a negative index which would start the order from the end of the list. Since you can't provide -0, -1 would return the first element from the end.
  • Lists can contain any data type in the list, and various elements in the list can be of different data types. An example below contains numeric types, string, Boolean and even a list as its element. Yes, a list inside of a list! Some real world examples can have really complicated nested lists.

Here is some code to try. You can create your own lists (including nested lists), and see if you can correctly select desired elements.

my_list = [1, -1, 0, 3] print(my_list)prints [1,-1,0,3] [] is used to specify the index of the element you want to select. print(my_list[0])selects the first element and prints 1. print(my_list[1])selects the second element and prints -1. print(my_list[3])selects the fourth element and prints 3. You can use negative index to start from the end of the list print(my_list[-1])prints 3. You can even have a crazy list like this. crazy_list = [1, True, 0.5, "Hi", [1, 3,5]] print(crazy_list[-1][1])this prints 3.

Lists are very useful. You'll commonly need to iterate through them which will be covered in my post on Loops. Sometimes you may need to know the number of elements in a list. Other times you may need to calculate the sum (or average) of all elements in a list. I'll talk more about various operations you can do with lists in the Functions post.

Dictionary

Similar to lists, dictionaries are a collection of data elements, but unlike lists they are unordered. In Python, dictionaries are represented by curly brackets {}. You can remove, add, modify elements within a dictionary. Since it's an unordered data type, indexing becomes irrelevant (i.e. you can't use [i] on a dictionary). Instead, to extract a data element, you will need to specify a key. This is what a dictionary looks like {"key1": 1, "key2": "b"} where "key1" and "key2" are keys and 1 and "b" are values that correspond to "key1" and "key2" accordingly.

Values can contain any data type. They can be strings, numbers, Boolean, lists and even dictionaries. To extract an element from a dictionary, you'll need to use a squared bracket (similar to a list), but instead of an index, you specify a key in quotes.

simple_dict = {"country": "USA", "state": "NY", "city": "New York"} print(simple_dict)prints {"country": "USA", "state": "NY", "city": "New York"} print(simple_dict['country'])prints USA You can use single or double quotes to specify the key but make sure opening and closing quotes are the same print(simple_dict['city'])prints New York nested_dict = {"city": "New York", "boroughs":[{"name": "Manhattan", "population": 1628706},{"name": "Bronx", "population": 1418207},{"name": "Brooklyn", "population": 2559903},{"name": "Queens", "population": 2253858},{"name": "Staten Island", "population": 476143}]} The value for "boroughs" key is a list of dictionaries print(nested_dict['boroughs'])returns a list of dictionaries print(nested_dict['boroughs'][2]['name'])returns Brooklyn

Let's figure out why my code above returned "Brooklyn". I know that "Brooklyn" is contained within "boroughs" value which is a list of dictionaries. I can do nested_dict['boroughs'] which returns the entire list of dictionaries. A list is an ordered collection, so I need to know the order of the element within that list that I'm interested in which would be the 3rd. Since the index count in Python starts from 0, we need to add [2]; combining these two pieces we get nested_dict['boroughs'][2]. This prints {"name": "Brooklyn", "population": 2559903}. Now all is left is to extract "Brooklyn" which you can do by adding ['name']. Examples like this are very common and it's easy to get confused when you have a list. If you forget to specify the index [2] (which I've done numerous times), Python will return a TypeError that reads TypeError: list indices must be integers or slices, not str which I think is pretty straightforward. You are trying to pass a string as a list index which you can't do. List indices have to be integers! And if you think about it, you have "name" listed 5 times. How would Python know which name you are referring to? I recommend spending more time on understanding this.

In the next post I'll talk about common operations you can do with lists and dictionaries.

Tuple

Tuple is another way to represent a collection. Tuples are ordered collections, so you can safely use indexing to extract an element. Tuples are immutable which means elements within a tuple cannot be changed unlike lists and dictionaries.

my_tuple = (0, -1, 10, 5) print(my_tuple[1])prints -1 my_tuple1 = ((0,1), (2,4)) type(my_tuple1)prints class 'tuple' print(my_tuple1[1][1])prints 4

Data Frame

Data Frames are data structures not native to Python that need to be imported from a Pandas library. I'll have an entire post on data frames, but for now all you need to know is that a data frame is a two-dimensional structure (or a table) that has columns and rows. This data structure is very useful for analytics purposes.

Here are a few links that you can use to read more about various data types.

1. Installing Python
2a. What can you do with Lists & Dictionaries?