Data Representation in Python

In the name of Allah, most gracious and most merciful,

1. Simple Data Types

  • bool: True, False
  • int: -1, 0, 3
  • float: 1.2, 3.3, 7.8
  • str: "some string"

2. Collections

2.1 Collection properties in Python

  1. Size: An object that has a finite size. Like the length or number of items.
  2. Iterable: I can get elements from this collection one element at a time like the for loop.
  3. Container: I could check whether an element is in the data or not. Like when using the in keyword in Python to check if an element exists in a container or not. Containers are objects that hold some data.

2.2 Important Concepts

  • If you could change a container, then it is mutable so you can remove, add and replace any element from the contianer.
  • If all of the data within a collection is immutable, then it is hashable.
  • Hashable means that the data value doesn’t change during runtime so that Python could create a unique hash value for identifying it. This could be used for tracking unique keys in dictionaries, and to track unique values in `sets`. That’s why immutable datatypes are required for Python’s dictionary keys.

2.3 In Python there are three main collections

2.3.1 Sequence

An ordered collection of heterogeneous elements.

Examples:

  1. Lists: [1, 2, "one", True, True, False, ["even a list inside a list"]] —> It could contain any type of data, and it is mutable so you can remove, add and replace any element from the list.
  2. Tuples: (1, 2, "three", ["even a list inside a tuple"]) —> It could contain any type of data, and it is immutable so you can’t remove, add or replace any element from the tuple. You could only access data from it.
  3. Strings: "One Two Three" —> It is also immutable so you can’t remove, add or replace any element from the string. You could only access data from it.
2.3.2 Set

An unordered collection of unique elements. Python set is mutable. Ex: {1, 2, "one"}. But you can’t have a list inside a set because lists are unhashable.

2.3.3 Mapping

An unordered association between elements and other elements.

Example:
Python dictionary (dict). Like this {1: [1, 2], "one": {4, "7"}}. Which is an association between keys like 1 and values like [1, 2]. Instead of accessing data by indices like in sequences, now data is accessed by keys. These keys should be hashable so you can’t have a list as a key because again lists are unhashable in Python.

Dictionaries are mutable in Python.

3. Common Methods used with Python Data Structures

3.1 Lists

# define a list
some_list = list()
some_list = []

# extend list by appending iterable elements
some_list.extend(iterable)

# insert an object before certain index
some_list.insert(index, object)

# remove first value occurrence, or raise ValueError if value isn't found
some_list.remove(value)

# remove all elements
some_list.clear()

# count the number of occurrences of value
some_list.count(value)

# get first index of value, or raise ValueError if value isn't found
some_list.index(value, start, stop)

# remove and return element at index or IndexError if index isn't found

some_list.pop(index)
 
# sort list in place (key is used to sort by something else, lambda could be used here)
some_list.sort(key=None, reverse=False)

# sort list not in place
sorted(some_list, key=None, reverse=False)

# reverse list in place
some_list.reverse()

# lists, tuples, and sets could be formed with the items of another iterable
list(range(10))
tuple(range(10))
set(range(10))

3.2 Sets

# get unique characters in a string
some_set = set("abcabcaabc")
>> {'a', 'b', 'c'}

# create an empty set
some_set = set()

# add an item to the set
some_set.add(item)

# remove an item from the set, Raises a KeyError if no item is found
some_set.remove(item)

# remove an item from the set, won't raise error if no item is found
some_set.discard('x')

# remove and return the last element from the set
some_set.pop()

# remove all elements from the set
some_set.clear()

Sets support mathematical operations. Many of them overloads Python’s binary operators meaning. There are also named versions of these operations: like set.difference, set.union, set.intersection, set.symmetric_difference, and set.issubset, as well as in-place versions set.difference_update, set.update, set.intersection_update, and set.symmetric_difference_update, which correspond to the operators -=, |=, &=, and ^=

# difference
set_1 - set_2
 
# union
set_1 | set_2
 
# intersection
set_1 & set_2
 
# symmetric difference (The symmetric difference of two sets A and B is the set of elements that are in either A or B , but not in their intersection.)
set_1 ^ set_2  # => {'b', 'd', 'l', 'm', 'r', 'z'}
 
# check if set_1 is subset of set_2
set_1 <= set_2

3.3 Iterables

It is a Python object that is capable of returning its items one at a time. lists, and sets are examples of iterables.

# get maximum item
max(iterable)
# get minimum item
min(iterable)

# check if all items in an iterable are True
all(iterable)
# check if any of the items in an iterable is True
any(iterable)
 

# check if value in iterable or not
value in iterable
value not in iterable

# produce iterables
zip(*iterables)
zip(iterable_1, iterable_2, ...)
enumerate(iterable)


# filter an iterable based on a function
filter(function, iterable)
# apply a function to an iterable 
map(function, iterable)

4. Other Python Useful Data Structures

4.1 Comprehensions

They are very useful for making the complex unnecessary for loops more visually appealing and comprehendible. However if misused they could make the simple seems complex. So you should use them wisely.

# list comprehension
[x ** 2 for x in range(30)]

# set comprehension
{x ** 2 for x in range(30)}

# dictionary comprehension
{key_function(var): value_function(var) for var in iterable}

4.2 Collections module

A module for improving the built-in collection containers’ functionalities like list, dict, set, tuple, etc.

4.2.1 Counter

Counter is a subclass of the dictionary object, so it has all the methods of the dict class. It takes an iterable or mapping as input, and returns a dictionary whose keys are the iterable or mapping elements and the value is the number of occurrence of that element in the iterable or the mapping.

Here are some use cases for more clarification.

from collections import Counter

# counter with a list input
cnt = Counter([1, 2, 3, 1, 2, 1, 2, 4, 5, 4, 4, 5])
print(cnt)
>> Counter({1: 3, 2: 3, 4: 3, 5: 2, 3: 1})

# get all elements in the counter
cnt_elements = list(cnt.elements())
print(cnt_elements)
>> [1, 1, 1, 2, 2, 2, 3, 4, 4, 4, 5, 5]

# get most common elements ordered in discending order
print(cnt.most_common())
>> [(1, 3), (2, 3), (4, 3), (5, 2), (3, 1)]

# get the most common two elements ordered in discending order
print(cnt.most_common(2))
>> [(1, 3), (2, 3)]

# counter with a dictionary input
cnt_2 = Counter({1: 5, 2: 6, 3: 2})
print(cnt_2)
>> Counter({2: 6, 1: 5, 3: 2})
4.2.2 defaultdict

It is the same as python dictionary except for:

  1. It doesn’t throw a KeyError when trying to access a key that doesn’t exist.
  2. It initializes the key with the datatype that you pass on creating the defaultdict. So if the datatype you specified is int, accessing a non-existent key will give you the 0 integer instead of throwing a KeyError. This datatype is called default_factory. So in this example some_dict = defaultdict(int) –> int is the default_factory.
from collections import defaultdict

some_dict = defaultdict(int)
some_dict['a'] = 1

# printing a non-existent key
print(some_dict['b'])
>> 0

# adding to a non-existent key
some_dict['b'] += 3
print(some_dict['b'])
>> 3
# you see? no problem because of the default intialization
4.2.3 OrderedDict

It is a dictionary where keys order is preserved as they were inserted so if you change the key value, the position of the key will still be the same.

from collections import OrderedDict
# starting with an empty dictionary
ordered_dict_1 = OrderedDict()
ordered_dict_1 ['one'] = 1
ordered_dict_1 ['two'] = 2
ordered_dict_1 ['three'] = 3
print(ordered_dict_1)
>> OrderedDict([('one', 1), ('two', 2), ('three', 3)])

# starting with a non-empty dictionary
ordered_dict_2 = OrderedDict([('one', 1), ('two', 2), ('three', 3)])
print(ordered_dict_2)
>> OrderedDict([('one', 1), ('two', 2), ('three', 3)])
4.2.4 namedtuple

It is a container whose values could be accessed using indices like tuples, and using hashable keys like dictonaries.

from collections import namedtuple

person = namedtuple(typename="Person", field_names=["name", "age"])

# add values
p = person("Hossam", 23)

# access by key
print(p.name, p.age)
>> Hossam 23

# access by index
print(p[0], p[1])
>> Hossam 23

Finally

Thank you. I hope this post has been beneficial to you. I would appreciate any comments if anyone needed more clarifications or if anyone has seen something wrong in what I have written in order to modify it, and I would also appreciate any possible enhancements or suggestions. We are humans, and errors are expected from us, but we could also minimize those errors by learning from our mistakes and by seeking to improve what we do.

Allah bless our master Muhammad and his family.

References

https://www.udacity.com/course/intermediate-python-nanodegree–nd303

https://stackoverflow.com/questions/14535730/what-does-hashable-mean-in-python

https://www.w3schools.com/python/ref_func_id.asp

ttps://realpython.com/python-namespaces-scope/

https://www.tutorialspoint.com/Explain-python-namespace-and-scope-of-a-variable

https://www.askpython.com/python/python-namespace-variable-scope-resolution-legb

https://www.pythonlikeyoumeanit.com/Module2_EssentialsOfPython/Iterables.html

https://www.programiz.com/python-programming/methods/set/symmetric_difference

https://stackabuse.com/introduction-to-pythons-collections-module/

https://www.w3schools.com/python/python_variables_global.asp

https://www.geeksforgeeks.org/namedtuple-in-python/

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments