Data Gathering

In this guide, we’ll detail how Intuitive gathers data from the target PaperCut MF environment.

You may in either these training modules, or through industry knowledge, have us refer to the “ETL”. It’s an industry term for Extract, Transform and Load. In a standard Intuitive implementation, we have our own proprietary tool called the ETL, which fulfils the same objectives as that term.

It’s a piece of software that comes with the standard implementation, and that piece of software collects data from the print management software, transforms the requested data and loads it into a SQLite database for the dashboard software to read from.

Going into the ETL itself, the commands that it executes come from an “ETLService.config” file. This file is located on the below path by default but can be edited through a normal text editor.

C:\Program Files\Intuitive Business Intelligence\Intuitive Dashboards\Tools\ETL Service

Going through the general structure of this file, we first have our initialised and log file path(s).

The initialised path typically contains an initialised file which the ETL uses as a marker for whether it needs to perform an incremental, or a full load of data. The file is only generated by the ETL service after it has run. When the file is not present, the ETL understands that it will clear down the SQLite tables and restart the data load from scratch.

In a typical installation of Intuitive, the initialised file is not present, and therefore the first time the ETL starts, it will collect all available information and compile it into the associated SQLite tables. Thereafter, it will load incrementally based on that files’ presence.

The log files path is a key area for troubleshooting, where it detail any information, warnings or errors relating to the core Intuitive software applications, such as the ETL, Dashboard Service, Server Tool and so on.

Next up is the location of the SQLite database, and any associated scripts.

This location for the SQLite is just a single network path. As SQLite is just a file, it does not have an associated instance or service name, such as SQL Server. The file itself is usually quite lightweight, with some enterprise customers performing millions of jobs per year only reaching up to 5-10 GB in size over 2-3 years. Typically, most SQLite databases are under 5 GB at most.

PaperCut’s table structure compromises of six key tables. The first of which is the “environment_values” table. This table loads its data from a single CSV file, generated by PaperCut to hold the environmental parameters we use to calculate the figures on the Environmental dashboard. This file contains the values for calculating the kWh, CO2 and Trees consumed for the sheets of paper.

For the next four tables, we have our Printers, Users, Accounts and Servers. These tables act as a dictionary for the customers’ print estate. Each table contains an ID for that entry, along with the supplementary information. Take for example the Printers table, we would have an ID for the Printer, its current name, the Serial Number and the Location. When that device is moved around the estate, its location might change, and potentially along with its name. However, the Serial Number is static, as that should not be subject to change. Each of these four tables present data in a “Type 1” slowly changing dimension.

The last table is the Printer Usage. This contains the individual jobs that have been performed over the last two years (by default). It contains the timestamp of the job, the document name, document volumes and costs and the associated lookups for the Printer, User and Account that performed that job. We’ll use these lookups later in the datafeeds, but to summarise, we join the ID’s in the Printer Usage table to the associated ID’s in the aforementioned Printers, Users, Accounts and Servers tables. Printer Usage can be thought of as a “Type 2” SCD, as incremental changes are added over time to the table, but historical records are preserved.