Grouping Data

Home   Back   Next  

Performing Standard Data Operations > Grouping and Summarizing Data > Grouping Data

 

The Group Records operation groups together all like-records in a data set based upon a user-defined criteria. It also enables you to perform aggregate function calculations on the data set as well as apply advanced filtering criteria.

 

For example, let's say you have a table with the following six records:

 

Dept_no

Company

Amount

1

ABC INC

100

10

XYZ CORP

2000

1

ABC INC

400

500

RST LLC

50

10

XYZ CORP

1000

20

XYZ CORP

5000

 

We may want to know two things:

 

1.What is the total Amount per Company?
2.How many times does the Company appear in the table?

 

By using the grouping tool to generate a new table, we can quickly identify the answers to these questions:

 

Company

Amount

Count

ABC INC

500

2

XYZ CORP

8000

3

RST LLC

50

1

 

 

Accessing the Grouping Tool

 

 

To begin the grouping operation, you first need to open up a data set you wish to group.  Once the table is open, you can open the grouping tool by doing one of the following:

 

1. Select "Group Records" from the "Groups" item in the Data menu.

 

2. Select the Group Records icon (Sigma symbol) from the Bookmarks toolbar.

 

 

group_records_0

 

NOTE:  To pre-populate your dialog, simply highlight the field(s) you wish to use before selecting the Group Records option.

 

 

Grouping Basics

 

 

Grouping a Single Field

 

To group records in your data set, do the following:

 

1. Open the grouping tool, as discussed above.

 

2. Select the fields you wish to group together.  To do this, select the field name on the left side of the dialog (under "Name") and drag it to open area on the right side of the dialog.  For the field named "Company", the following would appear on the left:

 

group_records_1

 

The function "Group By" means that you wish to group based on this field.  The Input Field is the actual field name and the Output Field will be the name of the field that will appear in the new table generated by this process.

 

3.  Click the "Run" button to perform the grouping calculation.

 

We would get the following results table, which shows that there are three groups associated with the Company field:

 

Company

ABC INC

XYZ CORP

RST LLC

 

 

Grouping Multiple Fields

 

The grouping tool will combine data based on the unique groups you specify.  For instance, instead of just grouping on the Company field, we may prefer to know the group "Department and Company".  For this, we would Group By both Dept_no and Company:

 

group_records_2

 

We would get the following results table, which shows that there are four groups associated with both the Department and Company, since XYZ CORP has two different department numbers in the original table:

 

Dept_no

Company

1

ABC INC

10

XYZ CORP

20

XYZ CORP

500

RST LLC

 

 

The grouping tool groups items based upon the order in which the fields are listed.  To change the order, you can select a record in the dialog and drag it up or down to a new location in the list.

 

 

Using Counts and Group Identifiers

 

Let's say, besides just wanting to know the actual groups, we also wanted to know how many times the grouped records appeared in the original table.  To learn this information, we would add a "Count" field, by dragging in the <Count> item from under the "Name" area on the left:

 

group_records_3

 

We would get the following results table, which shows the number of times an individual group appears in the original table:

 

Dept_no

Company

Count

1

ABC INC

2

10

XYZ CORP

2

20

XYZ CORP

1

500

RST LLC

1

 

In addition to Count, you can also add a Group Identifier ("Group ID") that adds a new field with a unique number per group.

 

 

Aggregate Functions

 

 

In the example above, besides Department and Company information, each record also contains an Amount field.  Let's say we wanted to know the sum of all Amounts for each group.  In order to do this, we need to use an aggregate function.

 

Aggregate functions perform calculations on each group.  The following are available in the grouping tool:

 

Min:  Returns the minimum value in the group.  For numeric fields, it returns the smallest number in the group; for character fields, it returns the minimum alphabetical character (closest to "a").
Max:  Returns the maximum value in the group.  For numeric fields, it returns the largest number in the group; for character fields, it returns the maximum alphabetical character (closest to "z").
Sum:  Returns the sum of all values in a group.
Avg:  Returns the average of all values in a group.
Stddev:  Returns the standard deviation of a group.
Variance:  Returns the variance of a group.

 

Getting back to the example above, in order to identify the total amount per the group, we would use the "Sum" aggregate function.  You can do this as follows:

 

1.  Select the Amount field from the left side of the dialog (under "Name") and drag it to the area on the right side of the dialog.

 

2.  "Group By" will appear under the function name.  To change this, click on "Group By" and a drop-down menu will appear.  Select "Sum" from the drop-down menu.

 

 

group_records_4

 

After running this grouping operation, we would get the following results table, which shows the sum of the Amounts for each of the four groups associated with the department and company fields:

 

Dept_no

Company

Count

Sum_Amount

1

ABC INC

2

500

10

XYZ CORP

2

3000

20

XYZ CORP

1

5000

500

RST LLC

1

50

 

 

Filtering the Group Results

 

 

In addition to the standard grouping as discussed above, you may also perform queries on groups or aggregate functions in order to filter the output table to a more targeted set of records.  To do this, you can enter a formula in the "Group Filter" box at the bottom of the grouping tool.

 

Filtering Results

 

In the example above, we may only be interested in certain results in the output table, not all of them.  In this case, we can filter our output table prior to running the grouping operation.

 

To filter on the group operation, you need to type in a Boolean expression into the Group Filter box.  For example, if you only wish to see records that have a count greater than 1, you would use the following formula:

 

count() > 1

 

In our example above, we would get the following output table, which only shows the groups which had a count higher than 1:

 

Dept_no

Company

Count

Sum_Amount

1

ABC INC

2

500

10

XYZ CORP

2

3000

 

 

As another example, let's say that we wanted to limit the results to only show the groups where an individual record Amount was more than 1500.  We would use the following query:

 

max(Amount) > 1500

 

We would get the following output table, which shows these groups:

 

Dept_no

Company

Count

Sum_Amount

10

XYZ CORP

2

3000

20

XYZ CORP

1

5000

 

Note that "Dept 10" had two records, one with an Amount of 1000 and the other with an Amount of 2000.  This group appears in the output because the second record matched the formula criteria.

 

However, if we changed the formula to show where groups had records where Amount was greater than 2500:

 

max(Amount) > 2500

 

We would get the following output table, since the only record to match this criteria was from Department 20:

 

Dept_no

Company

Count

Sum_Amount

20

XYZ CORP

1

5000

 

Or, lastly, if we changed the formula to only show where groups had records where Amount was equal to 2000:

 

max(Amount) = 2000

 

We would get the following output table, since the only record to match this criteria was from Department 10:

 

Dept_no

Company

Count

Sum_Amount

10

XYZ CORP

2

3000

 

 

Filtering Results and Including Detailed Records

 

When using a filter as discussed above, it is often helpful to show all the detail results associated with a given group.  To show detailed records, simply click the "Include Detail Records" checkbox to the right of the Group Filter box.

 

group_records_5

 

 

Now we'll go through each of the examples above in the Filtered Results section, but this time we'll activate the Include Detail Records checkbox.

 

For the group filter:

 

count() > 1

 

We would get the following output table, which only shows the groups which had a count higher than one as well as the detail of what the groups were created from.  The field "Amount" for the original records which add up to the grouped Sum (note that in this example we do not include all the other fields in the table due to space considerations, but the normal output table will include all fields in the table):

 

Dept_no

Company

Count

Sum_Amount

Amount

1

ABC INC

2

500

100

1

ABC INC

2

500

400

10

XYZ CORP

2

3000

2000

10

XYZ CORP

2

3000

1000

 

For the group filter:

 

max(Amount) > 1500

 

We would get the following output table, which shows the groups as well as the related detail of those groups:

 

Dept_no

Company

Count

Sum_Amount

Amount

10

XYZ CORP

2

3000

2000

10

XYZ CORP

2

3000

1000

20

XYZ CORP

1

5000

5000

 

For the group filter:

 

max(Amount) > 2500

 

We would get the following output table (in this case there is only one detail record, since the entire group is made up of only a single record):

 

Dept_no

Company

Count

Sum_Amount

Amount

20

XYZ CORP

1

5000

5000

 

For the group filter:

 

max(Amount) = 2000

 

We would get the following output table, which shows the detail of Department 10, which has one record which matched the group filter criteria:

 

Dept_no

Company

Count

Sum_Amount

Amount

10

XYZ CORP

2

3000

2000

10

XYZ CORP

2

3000

1000