Finding Duplicates in DAX

A prerequisite for creating a relationship in Tabular/Power Pivot is to have a primary key column in the table on the One side of the relationship. This column must have unique values. If it doesn’t, the relationship won’t get created and you’ll get an error that the both tables have duplicate keys. If you have a relatively large table, it might be difficult to find the duplicates.

ID

Column1

1

Foo

2

Foo

1

Foo

 

However, given the above table design, you can add a simple calculated column to the table to return the count of duplicates for column ID using the following DAX formula

=CALCULATE (COUNTROWS(), ALLEXCEPT(Table1, Table1[ID]))

This expression uses the COUNTROWS() function to count the number of rows of Table1. Coupled with the CALCULATE function, this expression will be resolved in the context of every row. To ignore the column that you want to count on (ID in this case), you need to exclude it from the context, so that the row counting happens across the entire table for each ID value. Once the column is created, you can filter on it in the Data View to find out the duplicate rows with values 2, 3, etc.

  • Marty Piecyk

    “Once the column is created, you can filter on it in the Data View to find out the duplicate rows with values 2, 3, etc.”
    Thanks for this helpful post (although it’s two years old at the moment). By filtering out duplicate rows with values 2 and greater, you’ll remove every instance of a value that has a duplicate. Do you have any advice how to keep the first instance of a duplicate?

    In your example above, the row with ID 2 would be present in your new calculated table, but customers will wonder where the row with ID 1 is.

    • Prologika

      Hi Marty. Which tool are using: Power BI Desktop, Excel? As you mentioned, this blog is two-year old. Have you tried the Remove Duplicates feature in the Query Editor which is available in both Power BI Desktop and Excel?

      • Marty Piecyk

        Thanks for the reply. I’m using Analysis Services and connecting to my server live through Power BI Desktop. So, I cannot edit queries or use the Remove Duplicates feature in Power BI Desktop. Also, I need to create relationships between my tables, but this can only be done after the duplicates are removed, so I would rather do this in my Analysis Services model.

        • Marty Piecyk

          Here’s one solution. I’m sharing this, since this posting shows up in the top results of search engines.

          For example, given this table “Host”:
          ID HostName IpAddress CreatedDate
          1 duplicatehost 192.168.1.2 2017-03-13
          2 duplicatehost 192.168.1.2 2017-01-01
          3 singlehost 192.168.1.3 2017-03-13

          This DAX can be used to generate the following calculated table using “HostName” and “IpAddress” as the key for finding duplicates and getting the latest CreatedDate:

          =FILTER(Host, [CreatedDate]=MAXX(FILTER(Host, [HostName]=EARLIER([HostName]) && [IpAddress]=EARLIER([IpAddress])), [CreatedDate]))

          ID HostName IpAddress CreatedDate
          1 duplicatehost 192.168.1.2 2017-03-13
          3 singlehost 192.168.1.3 2017-03-13

          There must not any duplication of the CreatedDate column for the key-grouping or duplicates will be present in the output. In my use case, duplicates are not possible.

          • Prologika

            The following modified expression will flag the first occurence of the duplicated row as 1, assuming the table has an unique identifier and the data is ordered by it, such as the DimProduct table in AdventureWorksDW.
            Index = CALCULATE(COUNTROWS(), ALLEXCEPT(Product, ‘Product'[ProductAlternateKey]), EARLIER(Product[ProductKey]) >= Product[ProductKey])