How do i remove multiple duplicates from a list in python?

How to Remove Duplicates From a Python List of Lists?

What’s the best way to remove duplicates from a Python list of lists? This is a popular coding interview question at Google, Facebook, and Amazon. In this article, I’ll show you how (and why) it works—so keep reading!

How to remove all duplicates of a given value in the list?

  • Method 1: Naive Method
  • Method 2: Temporary Dictionary Conversion
    • Do Python Dictionaries Preserve the Ordering of the Keys?
  • Method 3: Set Conversion
    • Linear-Runtime Method with Set to Remove Duplicates From a List of Lists
  • Where to Go From Here?

Method 1: Naive Method

Algorithm: Go over each element and check whether this element already exists in the list. If so, remove it. The problem is that this method has quadratic time complexity because you need to check for each element if it exists in the list (which is n * O(n) for n elements).

lst = [[1, 1], [0, 1], [0, 1], [1, 1]]

dup_free = []
for x in lst:
    if x not in dup_free:
        dup_free.append(x)

print(dup_free)
# [[1, 1], [0, 1]]

Method 2: Temporary Dictionary Conversion

Algorithm: A more efficient way in terms of time complexity is to create a dictionary out of the elements in the list to remove all duplicates and convert the dictionary back to a list. This preserves the order of the original list elements.

lst = [[1, 1], [0, 1], [0, 1], [1, 1]]

# 1. Convert into list of tuples
tpls = [tuple(x) for x in lst]

# 2. Create dictionary with empty values and
# 3. convert back to a list (dups removed)
dct = list(dict.fromkeys(tpls))

# 4. Convert list of tuples to list of lists
dup_free = [list(x) for x in lst]

# Print everything
print(dup_free)
# [[1, 1], [0, 1], [0, 1], [1, 1]]

All of the following four sub methods are linear-runtime operations. Therefore, the algorithm has linear runtime complexity and is more efficient than the naive approach (method 1).

  1. Convert into a list of tuples using list comprehension [tuple(x) for x in lst]. Tuples are hashable and can be used as dictionary keys—while lists can not!
  2. Convert the list of tuples to a dictionary with dict.fromkeys(tpls) to map tuples to dummy values. Each dictionary key can exist only once so duplicates are removed at this point.
  3. Convert the dictionary into a list of tuples with list(...).
  4. Convert the list of tuples into a list of lists using list comprehension [list(x) for x in lst].
How do i remove multiple duplicates from a list in python?

Each list element (= a list) becomes a tuple which becomes a new key to the dictionary. For example, the list [[1, 1], [0, 1], [0, 1]] becomes the list [(1, 1), (0, 1), (0, 1)] the dictionary {(1, 1):None, (0, 1):None}. All elements that occur multiple times will be assigned to the same key. Thus, the dictionary contains only unique keys—there cannot be multiple equal keys.

As dictionary values, you take dummy values (per default).

Then, you convert the dictionary back to a list of lists, throwing away the dummy values.

Related blog articles:

  • Python List Remove
  • The Ultimate Guide to Python Dictionaries!
  • Remove Duplicates From Python Lists
  • Python List of Lists

Do Python Dictionaries Preserve the Ordering of the Keys?

Surprisingly, the dictionary keys in Python preserve the order of the elements. So, yes, the order of the elements is preserved. (source)

This is surprising to many readers because countless online resources like this one argue that the order of dictionary keys is not preserved. They assume that the underlying implementation of the dictionary key iterables uses sets—and sets are well-known to be agnostic to the ordering of elements. But this assumption is wrong. The built-in Python dictionary implementation in cPython preserves the order.

Here’s an example, feel free to create your own examples and tests to check if the ordering is preserved.

lst = ['Alice', 'Bob', 'Bob', 1, 1, 1, 2, 3, 3]
dic = dict.fromkeys(lst)
print(dic)
# {'Alice': None, 'Bob': None, 1: None, 2: None, 3: None}

You see that the order of elements is preserved so when converting it back, the original ordering of the list elements is still preserved:

print(list(dic))
# ['Alice', 'Bob', 1, 2, 3]

However, you cannot rely on it because any Python implementation could, theoretically, decide not to preserve the order (notice the “COULD” here is 100% theoretical and does not apply to the default cPython implementation).

If you need to be certain that the order is preserved, you can use the ordered dictionary library. In cPython, this is just a wrapper for the default dict implementation.

Method 3: Set Conversion

Given a list of lists, the goal is to remove all elements that exist more than once in the list.

Sets in Python allow only a single instance of an element. So by converting the list to a set, all duplicates are removed. In contrast to the naive approach (checking all pairs of elements if they are duplicates) that has quadratic time complexity, this method has linear runtime complexity. Why? Because the runtime complexity of creating a set is linear in the number of set elements. Now, you convert the set back to a list, and voilà, the duplicates are removed.

lst = list(range(10)) + list(range(10))
lst = list(set(lst))
print(lst)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Does this also work for tuples? Yes!

lst = [(10,5), (10,5), (5,10), (3,2), (3, 4)]
lst = list(set(lst))
print(lst)
# [(3, 4), (10, 5), (5, 10), (3, 2)]

However, converting a list to a set doesn’t guarantee to preserve the order of the list elements. The set loses all ordering information. Also, you cannot create a set of lists because lists are non-hashable data types:

>>> set([[1,2], [1,1]])
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    set([[1,2], [1,1]])
TypeError: unhashable type: 'list'

But we can find a simple workaround to both problems as you’ll see in the following method.

Linear-Runtime Method with Set to Remove Duplicates From a List of Lists

This third approach uses a set to check if the element is already in the duplicate-free list. As checking membership on sets is much faster than checking membership on lists, this method has linear runtime complexity as well (membership has constant runtime complexity).

lst = [[1, 1], [0, 1], [0, 1], [1, 1]]

dup_free = []
dup_free_set = set()
for x in lst:
    if tuple(x) not in dup_free_set:
        dup_free.append(x)
        dup_free_set.add(tuple(x))

print(dup_free)
# [[1, 1], [0, 1]]

This approach of removing duplicates from a list while maintaining the order of the elements has linear runtime complexity as well. And it works for all programming languages without you having to know implementation details about the dictionary in Python. But, on the other hand, it’s a bit more complicated.

Where to Go From Here?

Enough theory, let’s get some practice!

To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

Practice projects is how you sharpen your saw in coding!

Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?

Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

How do i remove multiple duplicates from a list in python?

While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.

To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.

His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.

How do I remove consecutive duplicates from a list in Python?

Using the groupby function, we can group the together occurring elements as one and can remove all the duplicates in succession and just let one element be in the list. This function can be used to keep the element and delete the successive elements with the use of slicing.

What is the easiest way to remove duplicates in Python?

There are many ways to remove duplicates from a Python List..
Using a temporary List and Looping..
Using set() built-in method..
Using Dictionary Keys..
List count() function..
List Comprehension..

How do I remove duplicates by group by?

SQL Delete Duplicate Rows using Group By and Having Clause According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.

How do I get rid of duplicates quickly?

Remove duplicate values.
Select the range of cells that has duplicate values you want to remove. Tip: Remove any outlines or subtotals from your data before trying to remove duplicates..
Click Data > Remove Duplicates, and then Under Columns, check or uncheck the columns where you want to remove the duplicates. ... .
Click OK..