QlikView - How To Structure Load Scripts
A Qlik data load is accomplished using one or more qvw or qvf files, each of which has a load script within it.
The question comes up on how to best organize the data load's load script.
Data loads deal with the real world of complex database queries, slow data connections, corporations with multiple source database systems and so on. Load scripts can be as complex and take as long to run as other form of programming. This write up is to help someone new to this area obtain an overview of how to structure a load script.
Note that there are different technical abilities within Qlik that can be used to move data into and out of a qvw/qvf. The choices are simple and which one you use is based on where each qvw fits into the full data load. This is covered at the end of this document under "Movement Of Data".
Basic Rule To Follow
Separate the load script into logically related pieces that can be run and QA'd as a unit.
Load Script Levels
The full load script can be broken out into four basic levels.
Data Source
The data is read from the data sources and calculations done.
Merging
This takes information read by Data Sources level scripts and combines them or does further calculations on them.
Data Model
All of the information is put together. This contains all of the data used by the dashboard(s).
User Dashboards
The user oriented dashboards themselves.
There might be 15 level Data Source scripts and 6 Merging scripts. Then there is 1 Data Model and 1 or more User Dashboards.
Note that the users never see the Data Source, Merging and Data Model scripts. The only user interface objects in these are ones put there by the developer as part of the development and QA process.
The same client might require multiple Data Models due to different areas of data they are showing in their dashboards or due to the same data needing to be structured differently for different dashboards.
Programming Principles To Keep In Mind
The basic rules to follow in any complex programming project are the ones to follow here. These are the ones I consider most important.
Information Hiding
This means to keep specialized information or calculations within one script with the interface to that script not requiring a knowledge of what is in the script.
Each load script should deal with one logical area.When reading from the accounting database, you might have one level 1 load script for each module and then a level 2 load script to combine the different modules' data.
Keep It Simple
The person creating the load scripts is probably focused on that project.
This will not be the situation with someone else coming in to maintain that code a year later.
It might not even be the situation with the you coming in to fix a bug 6 months later.
Unless you have specific runtime constraints, it is better to have easy to understand code than the fastest code.
Runtime Speed
There is an old saying "A daily process must take less than 24 hours to run".
While you want to keep the code as simple as you can, you do need to ensure it runs quickly enough.
Ability To Efficiently QA
If you have 15 minutes to show a dashboard to a major corporation's CFO and it is supposed to match P&L values, you have better make sure it does.This is his point of reference and if it is wrong, then his acceptance of the dashboard will be compromised.
I insist that developers QA complex code to 99.9+% accuracy before they pass it to QA.The developers do a highly technical QA based on their knowledge of the code.They need to attest that they have looked at the results of each piece of code and that it does what it should.Then the QA staff can look at it with a strong baseline.
This is especially important with Qlik's sometimes weak handling of errors - sometimes it will plow on even if an obvious mistake was made.
There are two points to help with this QA process.
Keep each separate qvw or qvf script relatively small.This way it can be run the many times needed in development and QA without having to wait for already tested code to run as well.
Break complex calculations down into stages within one load script.Say you have eight fields and two functions being used to create a fractional value.Instead of trying to do this all at once, go through the records once to calculate the numerator and the denominator and save them in the records and then go through the records again to create the final value.This way you can QA the numerator and denominator separately.Reading through millions of records in a resident table is fast in Qlik.Doing it twice is not that much of a cost.
Case Study: Conglomerate
From Wikipedia: A conglomerate is the combination of two or more corporations engaged in entirely different businesses that fall under one corporate group.
I had one client who was a parent corporation with 17 completely separate business entities within it. For example, they had a restaurant, a large fair once a year, a catering company and a child care facility.
They wanted a dashboard that would combine the financial information from five of these entities.
Each entity had its own accounting and/or production databases.
The load script had to import the information from these different sources and merge them together.
The load scripts followed this basic pattern:
A separate load script to pull the information from each separate database.Each of the five entities had one or two databases and we needed to pull information from the parent corporation's accounting system as well.
A separate load script to combine the information from the different accounting systems.
A separate load script to combine the information from the different production systems.
A final load script to combine all of this data.
The different user dashboards.
Case Study: Large Amount Of Data
I had a client whose mainframes generate 16 million records of data per day for its dashboards.
The work site was in one city, the Qlik servers were in a different city and the mainframes were in a third city.
The pipe between the Qlik servers and the mainframes was slow. It took 15 minutes to pull a day's worth of records. Clearly, we were not going to reload the full dashboard data each day.
Instead, we loaded each day's data into its own qvd. Since this took so long to run, this load script did no calculations with the data.
Then a Merging level load script did the basic massaging of the day's records and stored the results into a separate qvd.
Finally, another Merging level load scripts combined all of these qvd's for the last year into one qvd.
Finally, the Data Model script brought in this qvd along with the other data used in the dashboard.
Multiple Merge Levels
All of the scripts to pull data from the source systems can be run at once, unless there are other constraints against this. For example, running them all at once might slow a server too much.
Once the Data Source scripts are finished, the merge level scripts can be run. As noted above in the "Large Amount Of Data" case study, the results of a single merge script might be used as input into another merge script. This would mean the merge scripts must be sequenced.
Naming Your Load Scripts
Each load script should be given a name that includes:
Its type.
Its sequence.
Its logical area.
From the example above for the estimated P&L, we had:
1_DS_ReadSales.qvw
2_MG_CalcRevenue.qvw
3_MG_CalcBadDebt.qvw
The data model was 4_DM_Projections.qvw.
Movement Of Data
Data Source scripts will load their information from the different original data sources. These should write their output to qvd files.
Merge scripts should read these qvd files and then write their output to other qvd files.
The DataModel should read the qvd files it needs and NOT output any data.
The user dashboards will do what is called a binary load of the DataModel. Putting the command "binary (qvw/qvf name)" as the first line of the load script will read in all data in the specified file.
What Is In the Load Script Of An Actual User Dashboard?
This is where you would put anything that is specific to the functionality of that dashboard.
Perhaps you might want a value added to the Fact table that will simplify the set analysis for charts in that dashboard. Or you might want a table with values that can be used to create a list box whose selection will be used by set analysis statements.