Introducing SQL Fairy DataShed

SQL Fairy is thrilled to introduce DataShed.  DataShed is a new Platform designed to significantly reduce the barriers to delivering timely reporting and analysis data to your Clients.  DataShed provides a framework which does the heavy lifting when it comes to retrieving, storing, aggregating and distributing data to and from multiple sources and targets.  DataShed takes care of the collection, storage and partitioning details allowing you, a Data Professional, to concentrate on the value that you need to deliver to Clients.

Implements best practices Star Schema for simplified reporting and analysis

DataShed is designed around the concept of best practice Star Schemas, using Facts and Dimensions for simplified reporting and analysis.  Fact and Dimension definitions provide the SQL required to retrieve the necessary data.  All data is stored using surrogate keys and DataShed takes care of the details of mapping of Facts to Dimensions.  DataShed stores gathered data in a highly optimised fixed schema which means that no database modifications are required to store new data.  Check out the user’s guide link below for some more info on how this works.

Inherent SCD support

DataShed takes care of managing SCD type 2 (slowly changing dimensions) without any additional effort.  You can define the level of SCD support on each Dimension (types 0, 1, or 2).  With SCD type 2 support DataShed takes care of creating new Dimension rows when Dimension data changes so that important historical data is maintained correctly.

Support for Multiple homogeneous data sources

If you have multiple data sources containing the same schema and need to combine facts and dimensions for consolidated reporting DataShed is the ideal choice.  DataShed allows you to configure multiple DataSources for a DataSourceClass.  A DataSourceClass is a grouping of DataSources which is associated with a set or profile of facts to be collected.  DataShed automatically maintains the distinct data but presents it with the context of the DataSource it applies to.  Depending upon your requirements Dimensions can also be nominated as being common across an entire DataSourceClass.

Easy partitioning of multi-tenanted data

DataShed makes it simple to partition data collected from systems so that you can provide multiple Tenants with just the data that relates to them.  All that you need do is specify a column in your Fact and Dimension queries which indicates the name of the Tenant who owns/is associated with the row.  DataShed will take care of the rest.  When multi-tenanted data is present, DataShed will automatically generate a dedicated Schema for each Tenant an add a filtered view for each of the Facts and Dimensions which contains data for the Tenant.

Automatic publishing to external data marts

DataShed allows you to configure export of data to external databases for consumption by end users.  While it is possible to provide access directly to generated Schemas/Views in the DataShed system database often it will be more convenient to push Data directly to a Client database.  Client databases can be stored on either local or remote servers.   Once provided with connection details DataShed will take care of creating required schemas and tables then merging data from the generated source views to the target tables.  Merging of updated data is handled efficiently, ensuring that only new and updated data is sent to the external database for merging.  Synchronisation can be scheduled regularly to ensure that Clients enjoy an up to date view of their data.

Speed up reporting & reduce impact

DataShed is ideal for reporting.  Many everyday reports tend to aggregate data over long periods and millions of rows of data.  Attempting to run reports directly from production transactional systems can cause frustration due to the length of time it takes to query the required data and also the potential disruption such long running queries can cause by taking out locks on the tables.  DataShed can alleviate much of this pain by making it easy to gather and store the aggregates that you need to report on.  DataShed allows report designers to specify the queries for aggregates that they need to report on.  Need to report on hourly sales numbers, volume, profit per product, per demographic, per post code etc.?  No problem.  Defining the queries to collect DataShed Facts and Dimensions is not much different to the queries you would create for the report itself.  Gathering and storing the required data on a regular basis will almost certainly be less disruptive than even a single instance of running the report which would otherwise aggregate data over the entire reporting period.

Because DataShed is able to gather and store Facts and Dimensions from multiple DataSources, DataShed can also enable reporting scenarios which would be impractical if not impossible if attempting to simultaneously report against multiple databases.

Reporting against the DataShed will make your reports run faster.

Capture point in time data

Often transactional databases do not maintain historical data on the state of entities that you would like to report on.  Imagine you have the task of reporting on billing information in a system which doesn’t capture historical data for a required metric.  Let’s say we’re looking for maximum enabled users in a system per month as a billing metric.  If the system that you’re reporting on doesn’t capture this information (which is quite likely) then your options are probably limited to requesting a feature enhancement from the product Dev Team or regularly collecting this metric in some other place.  DataShed can be that other place and because DataShed supports capturing data from a multitude of sources it can allow you to regularly sample and aggregate metrics across a large number of sources.

Enable end Users FTW!

We probably all know enthusiastic end users (Managers) who are capable of producing a plethora of scintillating pie charts and graphs given access to the relevant data.  Tools such as Microsoft BI allow end users to produce amazing Dashboards in an instant which might otherwise require decades of bureaucracy to produce.  Multi Tenancy support built into DataShed can also help single enterprise customers provide separate data feeds per department.  Getting just the right data to the right people is easier with DataShed.

For ISV’s the situation can be similar.  Often Clients have considerable resources at their command and can produce amazing perfect reports which they are responsible for if only you can provide the data.  Sometimes this is seen as too hard and a custom reporting engine is rolled into the product in an attempt to provide Clients with the data that they need.  Providing such a solution can be a drain on resources, a diversion from the core business and sometimes doesn’t even manage to deliver what the customer expects or needs.  DataShed can simplify the task of providing your Customers with the reporting data that they require and best of all… you can probably charge them an additional monthly fee for this service. 🙂  Rather than tying down a team of OO programmers with reporting tasks you might get a much better return by giving the DBA some additional resources to keep your clients happy.

More info

If you’re interested in finding out more you can download the user guide linked below.  DataShed is currently in Alpha though SQL Fairy is currently seeking beta testers for the product along with feedback on the features that we’re providing in DataShed and ideas on what we might do next.  DataShed is feature complete for the current release right now btw and we’re working through licensing details and other legal stuff at the moment.   If you’re interested in being a part of the beta process please email info@sqlfairy.com

SQL Fairy DataShed User Guide 20160116