The Hadoop QVX Converter will serve organisations well that employ both QlikView and Hadoop.
QlikView (QV) is a visual dashboarding tool capable of visualizing large datasets with QlikTechs unique in-memory architecture. In memory technology enables a new and interactive way of navigating and understanding relationships in datasets. Analysis paths for dashboards don“t have to be conceived beforehand in a design process but arise on-the-fly in the data. The term Business Discovery has been introduced by QV to emphasize the new possibilities as opposed to more traditional BI tools.
QV is able to handle exceptionally large dataset, thanks to its native data compression (10x) technique and the scalability of RAM memory. Sometimes it can become a problem just getting the large datasets in QV! Whereas QV has some nice built-in features for parallel (multi-threaded) loading from relational database sources (RDBMS) it is less efficient to load from text formats like CSV because of parsing and type conversions. Very large flat file data sources (up to 100GB) can take well over an hour to load. It does not make sense to introduce an additional RDMBS layer just to enable the multi-threaded bulk loaders of QV. Moreover, in many organisations flat files are the preferred way of exchanging data. Fortunately now there is a solution for loading flat files into QV, fast.
In 2011 QlikTech released some documents that laid out the structure of a native but open file format named QVX. Data in QVX format can be loaded up to 3 times faster because the data is already preprocessed in a way QV understands. QVX is a semi-binary format with a header describing fields in XML followed by the data itself in binary format. In January 2012 Ralf Becher of TIQ Solutions wrote a Java based library to convert CSV files to QVX format. Along with the open-sourced version TIQ also sells a multi-threaded version to accommodate for very large datasets. Remember, the rising volume of the datasets in dashboards is the outset of the QVX format! Now this has been taken a step further.
Jasper Knulst a Hadoop specialist working for Incentro, has collaborated with Ralf to refactor his QVX converter to be executable on Hadoop as a MapReduce job. This means QVX files can be generated directly as Hadoop output, leveraging the extreme processing power of Hadoop, which is horizontally scalable by design.
The joint Hadoop QVX converter (HQC) will be announced this week especially for Qonnections 2012. The HQC will serve organisations well that employ both QV and Hadoop. On the one hand large datasets can be passed through a Hadoop MR job, using it as a QVX converter on steroids. On the other hand analytic or ETL jobs on Hadoop could produce output data in QVX format directly. Incentro and TIQ Solutions are both providing professional services for HQC implementations.
More details about the Hadoop QVX converter can be found here:
<a href=“http://www.tiq-solutions.de/download/attachments/425996/Process_schema_Hadoop-QVX-Converter.pdf.„>http://www.tiq-solutions.de/download/attachments/425996/Process_schema_Hadoop-QVX-Converter.pdf..
About TIQ Solutions
TIQ Solutions is a Leipzig based company specialized in data-quality and data-management (<a href=“http://www.tiq-solutions.de/display/enghome/Home)“>http://www.tiq-solutions.de/display/enghome/Home).
Ralf Becher: ralf.becher@tiq-solutions.de
About Incentro
Incentro is a company specializing in Enterprise Information Management and Big Data processing on Hadoop based in The Netherlands (<a href=“http://www.incentro.com)“>http://www.incentro.com).
Jasper Knulst: jasper.knulst@incentro.com
TIQ Solutions is a leading authority in Data Management and Data Quality solutions. We provide our customers specialised methodical and technical consultancy services, quality assured systems integration and application development. Our business approach is to serve our clients from the first problem analysis to the point of the implementation with business experience and technical excellence.
Also we advise, coach and support our clients especially in the Data Quality Management. Our services range from workshops to operational measurement, assessment, advancement and monitoring of data quality to strategic solutions and change management.
Kontakt:
TIQ Solutions
Alexandra Sauer
Weißenfelser Straße 84
04229 Leipzig
+49 (0)341 35590322
http://www.tiq-solutions.de/display/enghome/Home
a.sauer@tiq-solutions.de
Pressekontakt:
TIQ Solutions GmbH
Alexandra Sauer
Weißenfelser Str. 84
04229 Leipzig
a.sauer@tiq-solutions.de
+49 (0)341 35590322
http://www.tiq-solutions.de