Introduction to Django GROUP BY
The following article provides an outline for Django GROUP BY. This operation is a fairly common operation for who are more familiar with the SQL part. The Django ORM is said to be an Abstraction layer which let us work with the database (which is an object-oriented), where finally it’s just a relational database and actually all the operations are translated into SQL statements. The database is a powerful tool with which you can do the work much faster and directly in it. The interesting information, when you get through the GROUP BY Queries, you will be no longer interested in each model instances and table row details.
GROUP BY in Django with SQL (Using Django Querysets & SQL)
We use to have a chaos among GROUP BY and aggregate. As both are not the same one but they can’t work without one-another.
A GROUP BY in Django is used with the AGGREGATE functions or the built-ins. The GROUP BY statement and aggregate functions like COUNT, MAX, MIN, SUM, AVG are used in a combo to group the result – set by one or more columns.
Syntax:
SELECT column _ name(s)
FROM table _ name
WHERE condition
GROUP BY column _ name(s)
ORDER BY column _ name (s);
Understanding this clause using the DJANGO GROUP BY Cheat Sheet.
This is the model from Django’s built-in django. contrib. auth app.
Count Rows:
Counting rows is one of the Django functions on the Query Set. Number of rows in a table can be identified using the method of Count Rows. Here the key name is derived from the ‘Name of the field’ and the ‘Name of the aggregate’.
Example:
The following SQL statement lists the no. of customers from each country:
Sample Database:
CustomerID | CustomerName | Address | City | PostalCode | Country |
01 | Anna Mary | Obere Str.57 | Berlin | 12209 | Germany |
02 | Brutto | Avda. De la Constitucion
2222 |
Mexico | 05021 | Mexico |
03 | Catty Comb | Mataderos 2312 | Mexico | 05023 | Mexico |
04 | Drake | 120 Hanover Sq | London | WA1 1DP | UK |
05 | Mark Zucker | Berguvsvagen 8 | Lulea | S-958 22 | Sweden |
Code:
SELECT COUNT (CustomerID), Country
FROM Customers
GROUP BY Country;
Output:
Example:
Count how many users we have.
Code:
SQL:
SELECT
COUNT(id) as total
FROM
auth_user;
Python:
from django.db.models import Count
queryset = (User.objects
.aggregate(
total=Count('id'),
)
)
Output:
Implementing the Process GROUP BY
When we use ‘aggregate’, the aggregate function is applied throughout the table (on the entire table). Eventhough this process makes work simple by applying applying the function on the entire table, the usual way is to apply ‘aggregation’ only on groups of rows.
Active users counting using GROUP BY:
Code:
SQL:
SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
GROUP BY
is_active
Python:
queryset = (User.objects
.values(
'is_active',
)
.annotate(
total=Count('id'),
)
.order_by()
)
Output:
***A combination of ‘VALUES’ & ‘ANNOTATE’ should be used to produce a GROUP BY.
If we fail to call VALUES before annotate, then the aggregate results will not be produced. So, the ORDER is one of the most important things to keep in mind while doing GROUP BY function.
Using Filter in a Query Set:
When a particular or a precise column or row need to be counted, we should use ‘filter’. In the process of applying aggregation in a filtered query, we can use ‘filter’ anywhere in the query set.
Code:
SQL:
SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
WHERE
is_staff = True
GROUP BY
is_active
Python:
queryset = (User.objects
.values(
'is_active',
)
.filter(
is_staff=True,
)
.annotate(
total=Count('id'),
)
)
Similar to filter, ‘order_by’ anywhere in the query to SORT a query set. and for our convenience, we can use both ‘ GROUP BY ‘ and the ‘aggregate’ field.
Code:
SQL:
SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
WHERE
is_staff = True
GROUP BY
is_active
ORDER BY
is_active,
total
Python:
queryset = (User.objects
.values(
'is_active',
)
.filter(
is_staff=True,
)
.annotate(
total=Count('id'),
)
.order_by(
'is_staff',
'total',
)
)
Output:
Conditional Aggregation in GROUP BY
To aggregate a part of the group, we should use Conditional Aggregation.
Example:
To count the number of staff and non-staff users by the year they signed-upCode.
Code:
SELECT
EXTRACT(‘year’ FROM date_jointed),
COUNT(id) FILTER (
WHERE is_employee = True
) AS employee_users,
COUNT(id) FILTER (
WHERE is_employee = False
) AS non_employee_users
FROM
authentic_user
GROUP BY
EXTRACT(‘year’ FROM date_joined)
From Django.db.models import F, Q
(User.objects
.values(‘date_joined__year’)
.annotate(
employee_users=(
Count(‘id’, filter=Q(is_employee=True))
),
non_employee_users=(
Count(‘id’, filter=Q(is_employee=False))
),
))
Having Clause
In SQL WHERE keyword and the aggregate function should not be used together. Instead, the HAVING clause is added to the SQL.
Syntax:
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
Usage of Having GROUP BY
To filter out the required data from the result of an aggregate function, the HAVING clause is used.
Sample Case:
Code:
SELECT Employees.LastName, COUNT(Orders.OrderID) AS NumberOfOrders
FROM (Orders
INNER JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID)
GROUP BY LastName
HAVING COUNT(Orders.OrderID) > 10;
Output:
LastName | NumberOfOrders |
Buchanan | 11 |
Callahan | 27 |
Davolio | 29 |
Fuller | 20 |
King | 14 |
Leverling | 31 |
Peacock | 40 |
Suyama | 18 |
GROUP BY – Distinct:
In a COUNT function, sometimes it would be desirable to only count the DISTINCT OCCURRENCES.
Sample:
To count the number of different last names being there per user active status.
Code:
SELECT
is_active,
COUNT(id) AS total,
COUNT(DISTINCT last_name) AS unique_names
FROM
auth_user
GROUP BY
is_active
(User.objects
.values(‘is_active’)
.annotate(
total=Count(‘id’),
unique_names=Count(‘last_name’ distinct=True),
))
Conclusion – Django GROUP BY
The AGGREGATE functions are a very very powerful tool to analyze the data and to gain the useful business insights. The rows that have the same values are grouped using the GROUP BY clause in a SQL command, and this clause is used in the SELECT statement. As we have seen in the intro summary, the GROUP BY clause is used in combo with the AGGREGATE functions to produce the summary report from the database. The grouping collapses multiple rows into a single row, based on certain criteria.