Intro
Sorting list data with python is a common task. The builtin list data type comes with .sort() method that does just that. However, depending on the content of each element in your list, the .sort() method may or may not work. This blog explores options and provides examples for sorting some more complex data types.
I’ve recently been enjoying listening to the Dune book series (Yes, after watching the show…). So we’ll use the Dune novels publishing data in the examples for this blog. I’ve included the book title: str, author: str and published_year: int as the included data.
Sorting Tuples
Tuples are an interesting Python data type. They are immutable and ordered, meaning the content and sequence of elements cannot change after creation.
For the Dune novel data we’ll start with the following list of tuples with values for title, author and published_year.
dune_tuples = [
("Dune Messiah", "Frank Herbert", 1969),
("Dune", "Frank Herbert", 1965),
("Children of Dune", "Frank Herbert", 1976),
("God Emperor of Dune", "Frank Herbert", 1981),
("Chapterhouse: Dune", "Frank Herbert", 1985),
("Heretics of Dune", "Frank Herbert", 1984)
]
Another interesting characteristic of Python Tuples is how they are compared to each other. The first element in each tuple provides the value for comparison operations. Given the tuples above, this means if you compared the Dune Messiah to Dune tuple with the following statement it results in True. This is because the two Tuples are compared lexicographically by the first element in each Tuple.
("Dune Messiah", "Frank Herbert", 1969) > ("Dune", "Frank Herbert", 1965)
True
To reinforce the lexicographical sorting of Tuples, let’s look at a different example. Below are two Tuples, the first with a greater value in the first element followed by two elements of lesser value compared to the second Tuple. The first Tuple is > than the second in this example.
(2, 1, 1) > (1, 5, 10)
True
Why discuss this? Because how the Tuples compare directly affects the result of sorting. Sorting the initial list of tuples from above using just dune_tuples.sort() ressults in the following:
[
('Chapterhouse: Dune', 'Frank Herbert', 1985),
('Children of Dune', 'Frank Herbert', 1976),
('Dune', 'Frank Herbert', 1965),
('Dune Messiah', 'Frank Herbert', 1969),
('God Emperor of Dune', 'Frank Herbert', 1981),
('Heretics of Dune', 'Frank Herbert', 1984)
]
This is useful if we’re interested in the sorting of the novels by alphebetical Title order. But with books, we often want to know the order of the books by published date. For this, we have a couple of options. We can reorder the data to have published_year as the first element of the tuples, then sort.
dune_tuples = [(book[2], book[0], book[1]) for book in dune_tuples]
[
(1985, 'Chapterhouse: Dune', 'Frank Herbert'),
(1976, 'Children of Dune', 'Frank Herbert'),
(1965, 'Dune', 'Frank Herbert'),
(1969, 'Dune Messiah', 'Frank Herbert'),
(1981, 'God Emperor of Dune', 'Frank Herbert'),
(1984, 'Heretics of Dune', 'Frank Herbert')
]
This results in a Tuple structure we can sort by the published date. Like the following:
dune_tuples.sort()
[
(1965, 'Dune', 'Frank Herbert'),
(1969, 'Dune Messiah', 'Frank Herbert'),
(1976, 'Children of Dune', 'Frank Herbert'),
(1981, 'God Emperor of Dune', 'Frank Herbert'),
(1984, 'Heretics of Dune', 'Frank Herbert'),
(1985, 'Chapterhouse: Dune', 'Frank Herbert')
]
But, what if we don’t want to reorder the elements? For this we can combine the sorted builtin function with a lambda expression to extract the key to use in the sorting method.
Python sorted(iterable, key=None, reverse=False) takes a positional argument (an iterable) and two key-word arguments; (key and reverse). In our example here dune_tuples is the iterable and we need to provide a sorting key. If a key is not provided the sorted() function will return dune_tuples sorted lexicographically the same way dune_tuples.sort() does.
dune_tuples = [
("Dune Messiah", "Frank Herbert", 1969),
("Dune", "Frank Herbert", 1965),
("Children of Dune", "Frank Herbert", 1976),
("God Emperor of Dune", "Frank Herbert", 1981),
("Chapterhouse: Dune", "Frank Herbert", 1985),
("Heretics of Dune", "Frank Herbert", 1984)
]
sorted(dune_tuples, key=lambda book: book[2])
[
('Dune', 'Frank Herbert', 1965),
('Dune Messiah', 'Frank Herbert', 1969),
('Children of Dune', 'Frank Herbert', 1976),
('God Emperor of Dune', 'Frank Herbert', 1981),
('Heretics of Dune', 'Frank Herbert', 1984),
('Chapterhouse: Dune', 'Frank Herbert', 1985)
]
With the sorted(dun_tuples, key=lambda book: book[2]) the initial, unsorted list of published novels is sorted by using the lambda expression to extract the key needed to sort the tuples from the the published_year element at index 2.
Dictionaries
Another common data type in Python is dictionaries. dictionaries are useful because they are self describing and are quick for accessing values by key. Let’s take a look at how we could sort dictionaries with the same Dune book data.
dune_dicts = [
{
'author': 'Frank Herbert',
'published_year': 1969,
'title': 'Dune Messiah'
},
{
'author': 'Frank Herbert',
'published_year': 1965,
'title': 'Dune'
},
{
'author': 'Frank Herbert',
'published_year': 1976,
'title': 'Children of Dune'},
{
'author': 'Frank Herbert',
'published_year': 1981,
'title': 'God Emperor of Dune'},
{
'author': 'Frank Herbert',
'published_year': 1985,
'title': 'Chapterhouse: Dune'},
{
'author': 'Frank Herbert',
'published_year': 1984,
'title': 'Heretics of Dune'
}
]
Unlike tuples, dictionaries cannot be compared with < or >. This means they cannot be sorted by sorting functions, attempting it results in a TypeError:
dune_dicts.sort()
TypeError: '<' not supported between instances of 'dict' and 'dict'
That means we need to lean on the same sorted() + lambda expression pattern as before:
sorted(dune_dicts, key=lambda book: book["published_year"])
[
{
'author': 'Frank Herbert',
'published_year': 1965,
'title': 'Dune'},
{
'author': 'Frank Herbert',
'published_year': 1969,
'title': 'Dune Messiah'
},
{
'author': 'Frank Herbert',
'published_year': 1976,
'title': 'Children of Dune'},
...
Custom Classes
Sometimes, you use have a list of classes you want to sort instead. A dataclass would work well in this example, and provides the same self-describing nature that dictionaries do:
from dataclasses import dataclass
@dataclass
class Book:
author: str
title: str
published_year: int
dune_books = [
Book(author='Frank Herbert', title='Dune Messiah', published_year=1969),
Book(author='Frank Herbert', title='Dune', published_year=1965),
Book(author='Frank Herbert', title='Children of Dune', published_year=1976),
Book(author='Frank Herbert', title='God Emperor of Dune', published_year=1981),
Book(author='Frank Herbert', title='Chapterhouse: Dune', published_year=1985),
Book(author='Frank Herbert', title='Heretics of Dune', published_year=1984)
]
This again, cannot be sorted directly.
dune_books[0] > dune_books[1]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[69], line 1
----> 1 dune_books[0] > dune_books[1]
TypeError: '>' not supported between instances of 'Book' and 'Book'
[!NOTE] There is a way with custom classes (and dataclasses) to get comparison working between classes using “dunder methods” defined for your class. But, it’s out of scope for this post.
With this simple dataclass and the sorted() + lambda expression from before we can quickly get a sorted value of the dune_books:
sorted(dune_books, key=lambda book: book.published_year)
[Book(author='Frank Herbert', title='Dune', published_year=1965),
Book(author='Frank Herbert', title='Dune Messiah', published_year=1969),
Book(author='Frank Herbert', title='Children of Dune', published_year=1976),
Book(author='Frank Herbert', title='God Emperor of Dune', published_year=1981),
Book(author='Frank Herbert', title='Heretics of Dune', published_year=1984),
Book(author='Frank Herbert', title='Chapterhouse: Dune', published_year=1985)]
Thanks for reading through my post, I hope you found my explanation and examples helpful!