In a previous article, we were concerned with identifying and removing duplicate emails within a single table. But let's say that email addresses are located in multiple tables — how do we find which email addresses are duplicated in both lists?
To answer this question, we have to break out one of Strata's more powerful features called relationships.
In Strata, relationships allow you to match records in one table with records in another table based on a common value. In this example, the common value is a specific email address found in both tables. For instance, let's suppose we have two lists of email addresses stored in two tables, email_list1 and email_list2.
In email_list1, we have the following values:
email_list1 -------------------- email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org
In email_list2, we have the following values:
email_list2 -------------------- email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org
In order to find the addresses in email_list1 that are also found in email_list2, we'll set up a relationship. In this case the common information (email addresses) are stored in the same format in a single field called “list.” In some cases, information may be stored across multiple fields with multiple formats, but the general pattern can be used with any relationship.
To set up a relationship between these two tables, open the Relationship panel (Tools > Relationships) and then drag each table into the dialog from the Project panel. Once we have the tables, we set up the relationship by dragging the “common” field in the first table to the field in the second table. In our case, we just want to match on the information in the field called “list.”
Now we're all set, so we'll activate the relationship by clicking on the “Update Relationships” button. We can now compare our two lists.
Show Me the Related Fields
Now that we've updated the relationships, we can find our duplicated records between the two tables. First, we'll visibly show the records in email_list2 that match the email address in the cursor row of email_list1. To do this, we'll toggle on our related records filter (Tools > Filter Related Records).
Now, as we click on any record in email_list1, we can see the corresponding related record (if any) in the email_list2 table. If there are no matching records, the second (child) table is blank.
Insert a Related Field into the Parent Table
While filtering on related records is useful for getting a sense of how the values in the two email lists compare, it isn't particularly effective for comparing all the values at the same time. What we really want to do is find all the values in email_list2 that are also in email_list1. Fortunately, with Strata, this is easy: once we've set up a relationship between two tables, we can simply insert any of the fields in the related table into the main (parent) table. To do this, open up the Field list and double-click on the field you want to insert into the parent table. In the case of the email_list1 and email_list2, we simply insert the “list” field from email_list2 into email_list1:
Finally, to find the values in email_list1 that are also in email_list2, we simply filter for the non-empty records in the field. This can be done by just right-clicking a blank record, selecting Filter Records and then choosing the following option from the menu:
list != ''
This was a pretty trivial example of how to use relationships. If you want to get a little more complex, you can create relationships on multiple fields and multiple tables as well. For further insights and to see some relationships in action, check out these two video demonstrations: