By Nenshad Bardoliwalla.
Earlier this week, Paxata celebrated the issuance of US Patent US20170109402A1 covering algorithms that assist with the automated detection of joins in our self-service data preparation solution. While this is obviously a key milestone for a start-up software company and our engineers, what is more exciting is what this technology does to empower and drive business value for our customers.
In today’s world, we all strive to deliver better customer experiences, build smarter products, and even find cures for the world’s deadliest diseases – and data is the fuel that powers all these initiatives. But the speed of delivering data through traditional data management tools cannot match the speed of business. Self-service data preparation changes this paradigm in that business consumers can get access to the raw data and then combine, clean, and shape that data to fit their purpose.
Prepare your data without hurting yourself or the business
Self-service data prep may sound pretty straight forward, but there is a lot that goes into ensuring that the business analyst does not get things wrong and produce a bad result. While the business analyst has the context of what the data means and what they wish to get out of the final insights, there are plenty of gotcha’s if you wrongly combine two data sets and for instance create a cartesian product. This is where great software steps in to help with powerful built-in intelligence!
The US Patent US20170109402A1 that was issued covers one of several algorithmic intelligence methods Paxata embeds into our self-service data preparation software.
Automatic Join Detection Explained
Invariably, when you are preparing and cleaning data for your project, you will end up with multiple data sets you wish to merge or combine in some fashion. The challenge of course is that there needs to be some way for the files to be associated – for rows in one data set to be matched with rows from the other data set. This could be easy when you have exact corresponding columns in both data sets, for example a customer-id or an email. But what should you do when you have First Name and Last Name in separate columns in one data set and Last Name, First Name (for example “Smith, Joe”) in a single column in the other data set?
Automatic Join Detection is about scanning the content of the data sets and using Natural Language Processing (NLP), search techniques, and other algorithms to identify candidate columns that could form the basis for joining two disparate data sets, even if those datasets have millions of rows and hundreds of columns. It even has the ability to match a single column on the one side with multiple columns on the other side.
Automatic Join Detection
Combining Intelligent Software With Human Ideation To Power Your Business
A few weeks ago at the Evanta CDO Summit in Los Angeles, one of the speakers indicated that many organizations are empowering business with data at their fingertips, but having to rely on old techniques for data management result in them having to build out armies of data engineers to clean and provide these ready to go data sets.
Self-service data preparation solutions powered by intelligent algorithms augment the skills and knowledge of the business analyst and empower those analysts to perform complex data integration, transformation, and cleansing tasks by themselves – without the risk of getting it wrong. This not only drives better quality insights but greater productivity from your data initiatives, as you are no longer dependent on the very finite set of skilled developers in IT, but have business analysts preparing and shaping their data by themselves.
To see Paxata in action for yourself, sign up for a free trial here.