Data Gathering

In this guide, we’ll detail how Intuitive gathers data from the target SAFEQ6 environment.

You may in either these training modules, or through industry knowledge, have us refer to the “ETL”. It’s an industry term for Extract, Transform and Load. In a standard Intuitive implementation, we have our own proprietary tool called the ETL, which fulfills the same objectives as that term.

It’s a piece of software that comes with the standard implementation, and that piece of software collects data from the print management software, transforms the requested data and loads it into a SQLite database for the dashboard software to read from.

Going into the ETL itself, the commands that it executes come from an “ETLService.config” file. This file is located on the below path by default but can be edited through a normal text editor.

C:\Program Files\Intuitive Business Intelligence\Intuitive Dashboards\Tools\ETL Service

Going through the general structure of this file, we first have our initialised and log file path(s).

The initialised path typically contains an initialised file which the ETL uses as a marker for whether it needs to perform an incremental, or a full load of data. The file is only generated by the ETL service after it has run. When the file is not present, the ETL understands that it will clear down the SQLite tables and restart the data load from scratch.

In a typical installation of Intuitive, the initialised file is not present, and therefore the first time the ETL starts, it will collect all available information and compile it into the associated SQLite tables. Thereafter, it will load incrementally based on that files’ presence.

The log files path is a key area for troubleshooting, where it detail any information, warnings or errors relating to the core Intuitive software applications, such as the ETL, Dashboard Service, Server Tool and so on.

Next up is the location of the SQLite database, and any associated scripts.

This location for the SQLite is just a single network path. As SQLite is just a file, it does not have an associated instance or service name, such as SQL Server. The file itself is usually quite lightweight, with some enterprise customers performing millions of jobs per year only reaching up to 5-10 GB in size over 2-3 years. Typically, most SQLite databases are under 5 GB at most.

SAFEQ6’s table structure compromises of four key tables. Of these four, three are the raw data from SAFEQ6, with the last table being the staging area of the compiled data.

Starting with the first table, this is named “Raw_SafeQ_Data”. As such, it’s the raw sample of data that we collect from SAFEQ6. This data is collected via an ODBC script, configured from the installation wizard, that captures that information from a target PostgreSQL or SQL Server database.

The below screenshot illustrates a PostgreSQL implementation:

This capture of information is controlled through the script “Full Load.sql”. This script collects data from two key tables on the SAFEQ6 target database, SAFEQ_STATS and SAFEQ_STATS_FULL.

These two tables have an almost identical set of contents. However, the latter table contains the Document Names and File Type ID’s. The extra data in SAFEQ_STATS_FULL is ordinarily kept for 30 days, and then ‘moves’ to SAFEQ_STATS where it scrubs the document detail. Inside of the “Full Load” script, you’ll also find the costing table that soem customers may, or may not use for representing costs inside of the Intuitive dashboards. A key is provided in this script, but this table’s values allow users to either use the calculated costs from the SAFEQ6 solution, or to use Intuitive’s global click costs.

Before moving onto how this data is transformed, there are two additional tables we import. The ACCID and Filetypes.

ACCID gives us a dictionary of terms for the job types, paper size and colour composition of documents. SAFEQ6 splits out job types into multiple different subsections, such as “color print A3”. With that, we can discern that it’s a colour document, that has the job type of ‘print’ and the size of the document is an A3.

Filetypes is much the same, in that it’s a dictionary of terms that we can use. Typically, the datafeeds use their own logic for handling extensions, as it allows us to build up our own terms for usage.

With the SAFEQ6 SQLite tables built up, the final table is “Processed_Data”.

Processed_Data does not capture data directly from SAFEQ6, instead it queries the now internal tables created via the previous queries. Processed_Data has effectively two functions, to retrieve the data currently held in “Raw_SafeQ_Data” and then to decide if it needs to include it inside of the processed data. Additionally, it creates another table to control the Cost metrics.

This latter table is effectively the main table the Intuitive datafeeds retrieve data from.