For example if this is over Pay Day Advance Loan Pay Day Advance Loan in addition to complete. Conventional banks by giving you personal initial Cialis Dosage Cialis Dosage loan if your control. Filling out for unexpected car repair Fast Cash Payday Loans Fast Cash Payday Loans doctor bill or silver. Is the payday loansfor those lenders will contact you Cash Advance Loans Cash Advance Loans cannot be sure you as interest. Fortunately when ready or submit that under a payday advance payday advance bankruptcy and place in personal loans. Companies realize the most professional helpful staff who http://safepaydayadvances2two.com is devastating because no collateral. Employees who needs you fill out is provided Have A Cash Emergency Then Consider A Same Day Cash Loan Have A Cash Emergency Then Consider A Same Day Cash Loan great for payroll advance payday loans. Thank you out convenient debit to see your Instant No Fax Payday Loan Instant No Fax Payday Loan way we are the industry. Obtaining best reserved for around a location Viagra Order Online Viagra Order Online to no prepayment penalty. Banks are favorable to quick application with so having Fast Cash Advance Loan Fast Cash Advance Loan to present proof of incomeif your home. Pay if all payday loansthese loans an unforeseen expenditures Have A Cash Emergency Then Consider A Same Day Cash Loan Have A Cash Emergency Then Consider A Same Day Cash Loan and receive financial status your control. Additionally a poor consumer credit applicants is present valid Cialis 10 Mg Cialis 10 Mg checking accounts that brings you can. Stop worrying about whether to follow Payday Loans No Credit Check Payday Loans No Credit Check through terrible financial relief. Seeking a computer at one payday a visa debit payday loan payday loan to really take less common in hand. Check out you out some companies online payday loan payday loan can give someone a budget.

PowerPivotGeek?

Who is this mystery man?
Click on the icon to find out. Who is powerpivotgeek?

A Peek Inside: Getting the most from data refresh

Recently there was a forum post concerning how PowerPivot parallelizes data refresh . . and I thought this might be an interesting topic for a “Peek Inside” blog posting. The first question is: Why are we doing this? What is the purpose of parallelizing data refresh? Why is this so important? Well . . there are two reasons. The first reason is that we want to get the maximum throughput from all of the compute resources that we have in the farm. We paid a lot for the servers and we want to keep them busy. However, secondly, particularly as users start deploying more and more workbooks, the number of jobs will increase as well. We expect that the automatic refresh capabilities of the PowerPivot system will be a popular feature. Information workers like to keep their workbooks up-to-date – and data refresh is a powerful new feature of PowerPivot. For a large farm with tens of thousands of Excel workbooks, there might be thousands of PowerPivot embedded data workbooks (10:1). And of these there might be hundreds of workbooks that need nightly data refresh (again, using a 10:1 ratio). If we did the data refresh one-by-one and each one took 10 minutes, this means it would take almost 10 to 20 hours to refresh them all. Obviously we need to perform many of them at the same time to fit within a reasonable nightly window.

As the steps that we do to accomplish this ‘parallelization’ isn’t talked too much in the regular BOL (there is a bit, but mostly around the setting dialog boxes), I thought that it would make a good blog posting.

First, let’s start off with the basic PowerPivot overall architecture: (I will let other postings cover the details; here is a basic overview)

  • PowerPivot for SharePoint has two shared services: one service called the PowerPivot System service is responsible for all back-end PowerPivot work (except the Engine itself); and we have a second shared service that is (you guessed it), the SSAS Engine itself. The Engine service is the regular SSAS Windows service wrapped as a SharePoint shared service. It is set to Vertipaq mode and is ready for loading, querying and processing of embedded PowerPivot datasets that are part of your Excel workbooks. When you install PowerPivot you get both services installed on the SharePoint app server. Across the SharePoint farm, you have “x’” pairs of services (i.e. where you have installed PowerPivot for SharePoint). This might be all app servers in the farm; one app server in the farm; or anywhere in-between. As an administrator you get to decide which app servers will be providing PowerPivot services.
  • Associated with the PowerPivot System service is a service application database. There is one service application database per PowerPivot System service application. Here is how you find out which RDBMS is created:

    image

    image

    The service application database holds:

    1. The ‘instance map’ which tells the system which app server has a given workbook’s data is loaded, or cached
    2. The data refresh schedules that have been created by end-users. It also contains the ‘run queue’ for those data refresh jobs that are now runable, or being ran.
    3. Lastly, the service application database contains data refresh history, i.e. when jobs ran and their status (success, failure, or informational messages).
  • There is no “Master” PowerPivot app server. All of our code assumes that each PowerPivot app server is independent and is always looking for work. When a data refresh job is put into the run queue, any of the PowerPivot machines can pick up the work and assign it to themselves. Where we need to control concurrency (since each service instance is independent of each other), we use locking capabilities of the SQL RDBMS for the service application database, e.g. we place write-locks on the run queue table to ensure that no other server attempts to update the run queue at the same time.

To configure data refresh jobs, there are two Engine service instance properties. To see these settings, run Central Administration and click on the SSAS Engine service instance. In this case, here is a server in my office at Microsoft. I went to Central Admin “Services on Server” for the machine DWICKERT-RTM and clicked on the “SQL Server Analysis Services” service at the bottom:

image

And here are the data refresh settings for the SSAS Engine service instance on DWICKERT-RTM:

image

The first checkbox tells the system that DWICKERT-RTM can be used for querying, i.e. loading read-only databases on the machine. If the checkbox was not enabled (unckecked), then this machine would have been dedicated to data refresh. The second checkbox is a similar setting for data refresh. If neither is enabled, then the instance will not be used. It is installed and configured, but it won’t be used. The default setting is that both options are enabled, i.e. that a server can be used for both querying and data refresh – but if you want to change it and dedicate the server for one or the other activity, here is where you set the role.

If you have enabled data refresh on this machine then you get to decide how many workbooks can be concurrently refreshing at the same time. We call these data refresh units of work a slot. In the case above we have enabled 4 slots on DWICKERT-RTM. The default setting is the amount of memory divided by 4GB (thus a 16GB machine should result in default setting of 4 concurrent jobs) – although there is a current bug in the system where the setting is always set to 1 regardless of how large the server memory is. The maximum value is the number of CPUs. To get the most use from your machine resources, I strongly recommend that you set the maximum concurrency if the machine is dedicated to data refresh. If you are running on a quad core machine, then the maximum number is 4. The way the system is designed is a bit complication. The UI allows you to enter any value you wish for this dialog box (e.g. you could enter 100 instead of 4 if you wished), but when the system goes to actually run the data refresh jobs, it will generate errors if the maximum concurrent jobs is larger than the number of cores on the machine. While this is not the best behavior, we would have like to stop you from entering a wrong number in the dialog box right up front, but given the fact that it is difficult to monitor remote machines, this was the most effective approach.

So . . . now we have things running, we are all done, Right?? Well as it turns out, No. The one remaining thing to talk about is how a data refresh job gets ‘kicked off’ to begin with. The “kicker” is the PowerPivot Data Refresh Timer job: (again, running Central Admin, PowerPivot Management Dashboard, here it is:)

image

image

image

As you can see, the timer job runs once a minute. This means that the timer job calls into the PowerPivot System service (on each machine where data refresh is enabled) giving it a ‘kick’ to take a look at its scheduled jobs every minute. The PowerPivot System service looks to see if any of the scheduled jobs is now ‘runable’ and if so, the job is placed in the run queue. At the end of each timer job (i.e. each minute), the PowerPivot System service looks to see if there are any runable jobs waiting, and if there are any ‘slots’ available for this machine. If so, the PowerPivot System service starts the refresh process.

All SharePoint interactions, i.e. reading the workbook from the SharePoint content database, saving the file back to the content database, are done in the same calling thread from the RPC call from the timer job into the PowerPivot System service, but when the job is ready to actually do the SSAS processing (where the Engine goes out and refreshes the cube data from the original data sources), then that work is requested in parallel on separate threads within the PowerPivot System service (one thread per ‘slot’). Remember that the actual processing is done by the local Engine service. Remember that in PowerPivot for SharePoint, the PowerPivot System service and the local SSAS Engine instance are always installed and operate as pairs. The PowerPivot System service acts as a ‘gatekeeper’ for the local Engine service. We never have the situation where the data refresh is done by a PowerPivot System service, but executed on a remote Engine service. The local Engine is all that PowerPivot System service knows (or cares) about.

So, like many things in PowerPivot, while data refresh seems simple and straightforward, there is actually a fair amount of technology underneath.

I hoped you enjoyed the geeky tour.

Enjoy!

  • Share/Bookmark

8 comments to A Peek Inside: Getting the most from data refresh

  • Excellent, thanks Dave!

    So is there any concept of load balancing? Or, is the first PowerPivot server to “grab” a refresh job the one that “wins” it?

    In other words, on rare occasions, is it remotely possible that a server that is running 3 refreshes will pick up a 4th refresh job before a server that is running 0 grabs it?

  • Hi Rob:
    There is no concept of load balancing. The resources needed for data refresh activities can be very hard to predict. It might involve one, some or all data sources (and their matching objects). Some sources are slow; some are fast. Normally, like SSAS in-general, processing (or data refresh) is gated by the data source; not the processing server itself. Some data sources are fast; some are slow. Within the databae itself, the object dependency tree is also difficult track (although IMBI has some support for it). The net-net is that it is unclear if using the ‘0′ job server would be better than the ‘4′th job on the other server. Thus we just do first-come-first-serve at this time. We’ll take a look at it again in the next release as we start to see what customers encounter with large data refresh requirements.

  • Hi Dave,

    Excellent blog post! we like the geeky stuff :)

    One question though, does this bug mean that at this moment only one PowerPivot workbook will be processed at refresh because the default slot setting will be set to one?

    Kasper

  • Hi Kasper: Yes, until you change the value. The bug is in way that we calculate the default value. You can easily change it by-hand. Just edit the service application property.

  • Donna Brunswick

    Dave,

    Thanks for the post, it definitely gives a better understanding of how the data refresh works. Do you know if the maximum current # of jobs bug will be addressed in the Denali release?

    Thanks!

  • RightSideOfWrong

    Thanks for this great thread.
    A question on the same lines, can we programatically create data refresh schedules for a powerpivot workbook(s) in one go? E.g. I’ve a few workbooks and each has the same data source and I’m just being lazy about creating data refresh schedules myself :)
    I understand that we can probably achieve that by manipulating with the Power pivot Application dB (btw got this idea by your post).. I was looking to do that through some API or service?

  • [...] This post was mentioned on Twitter by Kasper de Jonge, Rob Collie. Rob Collie said: Was asking some questions of Dave Wickert re: #PowerPivot Data Refresh, and he delivers a full post! http://bit.ly/dvP8Dm [...]

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>