<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PowerPivotGeek &#187; Data refresh</title>
	<atom:link href="http://powerpivotgeek.com/tag/data-refresh/feed/" rel="self" type="application/rss+xml" />
	<link>http://powerpivotgeek.com</link>
	<description>An adventure in managed self-service computing</description>
	<lastBuildDate>Wed, 19 Jan 2011 23:04:58 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>A Peek Inside: Getting the most from data refresh</title>
		<link>http://powerpivotgeek.com/2010/09/08/a-peek-inside-getting-the-most-from-data-refresh/</link>
		<comments>http://powerpivotgeek.com/2010/09/08/a-peek-inside-getting-the-most-from-data-refresh/#comments</comments>
		<pubDate>Wed, 08 Sep 2010 19:30:26 +0000</pubDate>
		<dc:creator>powerpivotgeek</dc:creator>
				<category><![CDATA[A Peek Inside]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Bug]]></category>
		<category><![CDATA[Data refresh]]></category>
		<category><![CDATA[Midtier]]></category>
		<category><![CDATA[applicaiton database]]></category>
		<category><![CDATA[jobs]]></category>
		<category><![CDATA[parallelization]]></category>
		<category><![CDATA[runable]]></category>
		<category><![CDATA[service instance]]></category>

		<guid isPermaLink="false">http://powerpivotgeek.com/2010/09/08/a-peek-inside-getting-the-most-from-data-refresh/</guid>
		<description><![CDATA[<p>Recently there was a forum post concerning how PowerPivot parallelizes data refresh . . and I thought this might be an interesting topic for a “Peek Inside” blog posting. The first question is: Why are we doing this? What is the purpose of parallelizing data refresh? Why is this so important? Well . . there [...]]]></description>
			<content:encoded><![CDATA[<p>Recently there was a forum post concerning how PowerPivot parallelizes data refresh . . and I thought this might be an interesting topic for a “Peek Inside” blog posting. The first question is: Why are we doing this? What is the purpose of parallelizing data refresh? Why is this so important? Well . . there are two reasons. The first reason is that we want to get the maximum throughput from all of the compute resources that we have in the farm. We paid a lot for the servers and we want to keep them busy. However, secondly, particularly as users start deploying more and more workbooks, the number of jobs will increase as well. We expect that the automatic refresh capabilities of the PowerPivot system will be a popular feature. Information workers like to keep their workbooks up-to-date – and data refresh is a powerful new feature of PowerPivot. For a large farm with tens of thousands of Excel workbooks, there might be thousands of PowerPivot embedded data workbooks (10:1). And of these there might be hundreds of workbooks that need nightly data refresh (again, using a 10:1 ratio). If we did the data refresh one-by-one and each one took 10 minutes, this means it would take almost 10 to 20 hours to refresh them all. Obviously we need to perform many of them at the same time to fit within a reasonable nightly window.</p>
<p>As the steps that we do to accomplish this ‘parallelization’ isn’t talked too much in the regular BOL (there is a bit, but mostly around the setting dialog boxes), I thought that it would make a good blog posting.</p>
<p> <span id="more-1222"></span>
<p>First, let’s start off with the basic PowerPivot overall architecture: (I will let other postings cover the details; here is a basic overview)</p>
<ul>
<li>PowerPivot for SharePoint has two shared services: one service called the <em>PowerPivot System service</em> is responsible for all back-end PowerPivot work (except the Engine itself); and we have a second shared service that is (you guessed it), the SSAS Engine itself. The Engine service is the regular SSAS Windows service wrapped as a SharePoint shared service. It is set to Vertipaq mode and is ready for loading, querying and processing of embedded PowerPivot datasets that are part of your Excel workbooks. When you install PowerPivot you get both services installed on the SharePoint app server. Across the SharePoint farm, you have “x&#8217;” pairs of services (i.e. where you have installed PowerPivot for SharePoint). This might be all app servers in the farm; one app server in the farm; or anywhere in-between. As an administrator you get to decide which app servers will be providing PowerPivot services.</li>
<li>Associated with the PowerPivot System service is a service application database. There is one service application database per PowerPivot System service application. Here is how you find out which RDBMS is created:
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2010/09/image.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://powerpivotgeek.com/wp-content/uploads/2010/09/image_thumb.png" width="574" height="305" /></a>       </p>
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2010/09/image1.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://powerpivotgeek.com/wp-content/uploads/2010/09/image_thumb1.png" width="574" height="462" /></a>       </p>
<p>The service application database holds:</li>
<ol>
<li>The ‘instance map’ which tells the system which app server has a given workbook’s data is loaded, or cached</li>
<li>The data refresh schedules that have been created by end-users. It also contains the ‘run queue’ for those data refresh jobs that are now runable, or being ran.</li>
<li>Lastly, the service application database contains data refresh history, i.e. when jobs ran and their status (success, failure, or informational messages).       </li>
</ol>
<li>There is no “Master” PowerPivot app server. All of our code assumes that each PowerPivot app server is independent and is always looking for work. When a data refresh job is put into the run queue, any of the PowerPivot machines can pick up the work and assign it to themselves. Where we need to control concurrency (since each service instance is independent of each other), we use locking capabilities of the SQL RDBMS for the service application database, e.g. we place write-locks on the run queue table to ensure that no other server attempts to update the run queue at the same time.</li>
</ul>
<p>To configure data refresh jobs, there are two Engine service instance properties. To see these settings, run Central Administration and click on the SSAS Engine service instance. In this case, here is a server in my office at Microsoft. I went to Central Admin “Services on Server” for the machine DWICKERT-RTM and clicked on the “SQL Server Analysis Services” service at the bottom:</p>
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2010/09/image2.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://powerpivotgeek.com/wp-content/uploads/2010/09/image_thumb2.png" width="594" height="545" /></a> </p>
<p>And here are the data refresh settings for the SSAS Engine service instance on DWICKERT-RTM:</p>
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2010/09/image3.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://powerpivotgeek.com/wp-content/uploads/2010/09/image_thumb3.png" width="594" height="324" /></a> </p>
<p>The first checkbox tells the system that DWICKERT-RTM can be used for querying, i.e. loading read-only databases on the machine. If the checkbox was not enabled (unckecked), then this machine would have been dedicated to data refresh. The second checkbox is a similar setting for data refresh. If neither is enabled, then the instance will not be used. It is installed and configured, but it won’t be used. The default setting is that both options are enabled, i.e. that a server can be used for both querying and data refresh – but if you want to change it and dedicate the server for one or the other activity, here is where you set the role.</p>
<p>If you have enabled data refresh on this machine then you get to decide how many workbooks can be concurrently refreshing at the same time. We call these data refresh units of work a <em>slot</em>. In the case above we have enabled 4 slots on DWICKERT-RTM. The default setting is the amount of memory divided by 4GB (thus a 16GB machine should result in default setting of 4 concurrent jobs) – although there is a current bug in the system where the setting is always set to 1 regardless of how large the server memory is. The maximum value is the number of CPUs. To get the most use from your machine resources, I strongly recommend that you set the maximum concurrency if the machine is dedicated to data refresh. If you are running on a quad core machine, then the maximum number is 4. The way the system is designed is a bit complication. The UI allows you to enter any value you wish for this dialog box (e.g. you could enter 100 instead of 4 if you wished), but when the system goes to actually run the data refresh jobs, it will generate errors if the maximum concurrent jobs is larger than the number of cores on the machine. While this is not the best behavior, we would have like to stop you from entering a wrong number in the dialog box right up front, but given the fact that it is difficult to monitor remote machines, this was the most effective approach.</p>
<p>So . . . now we have things running, we are all done, Right?? Well as it turns out, No. The one remaining thing to talk about is how a data refresh job gets ‘kicked off’ to begin with. The “kicker” is the PowerPivot Data Refresh Timer job: (again, running Central Admin, PowerPivot Management Dashboard, here it is:)</p>
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2010/09/image4.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://powerpivotgeek.com/wp-content/uploads/2010/09/image_thumb4.png" width="644" height="404" /></a> </p>
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2010/09/image5.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://powerpivotgeek.com/wp-content/uploads/2010/09/image_thumb5.png" width="594" height="341" /></a> </p>
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2010/09/image6.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://powerpivotgeek.com/wp-content/uploads/2010/09/image_thumb6.png" width="594" height="469" /></a> </p>
<p>As you can see, the timer job runs once a minute. This means that the timer job calls into the PowerPivot System service (on each machine where data refresh is enabled) giving it a ‘kick’ to take a look at its scheduled jobs every minute. The PowerPivot System service looks to see if any of the scheduled jobs is now ‘runable’ and if so, the job is placed in the run queue. At the end of each timer job (i.e. each minute), the PowerPivot System service looks to see if there are any runable jobs waiting, and if there are any ‘slots’ available for this machine. If so, the PowerPivot System service starts the refresh process. </p>
<p>All SharePoint interactions, i.e. reading the workbook from the SharePoint content database, saving the file back to the content database, are done in the same calling thread from the RPC call from the timer job into the PowerPivot System service, but when the job is ready to actually do the SSAS processing (where the Engine goes out and refreshes the cube data from the original data sources), then that work <u>is requested in parallel on separate threads within the PowerPivot</u> <u>System service</u> (one thread per ‘slot’). Remember that the actual processing is done by the local Engine service. Remember that in PowerPivot for SharePoint, the PowerPivot System service and the local SSAS Engine instance are always installed and operate as pairs. The PowerPivot System service acts as a ‘gatekeeper’ for the local Engine service. We never have the situation where the data refresh is done by a PowerPivot System service, but executed on a remote Engine service. The local Engine is all that PowerPivot System service knows (or cares) about.</p>
<p>So, like many things in PowerPivot, while data refresh seems simple and straightforward, there is actually a fair amount of technology underneath.</p>
<p>I hoped you enjoyed the geeky tour.</p>
<p>Enjoy! </p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fpowerpivotgeek.com%2F2010%2F09%2F08%2Fa-peek-inside-getting-the-most-from-data-refresh%2F&amp;linkname=A%20Peek%20Inside%3A%20Getting%20the%20most%20from%20data%20refresh"><img src="http://powerpivotgeek.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a>]]></content:encoded>
			<wfw:commentRss>http://powerpivotgeek.com/2010/09/08/a-peek-inside-getting-the-most-from-data-refresh/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>When is a refresh not a refresh?</title>
		<link>http://powerpivotgeek.com/2009/11/15/when-is-a-refresh-not-a-refresh/</link>
		<comments>http://powerpivotgeek.com/2009/11/15/when-is-a-refresh-not-a-refresh/#comments</comments>
		<pubDate>Mon, 16 Nov 2009 05:21:00 +0000</pubDate>
		<dc:creator>powerpivotgeek</dc:creator>
				<category><![CDATA[Data refresh]]></category>
		<category><![CDATA[Excel Services]]></category>
		<category><![CDATA[data sources]]></category>

		<guid isPermaLink="false">http://powerpivotgeek.com/?p=190</guid>
		<description><![CDATA[<p>Ok. This post will be a bit complicated but stick with me. Hopefully, in the end, all will be clear. And the geek in you will love it.</p>
<p>One of the things that users just kind of glance over, but don’t realize the implication, is the fact that PowerPivot is a copy of the data. If [...]]]></description>
			<content:encoded><![CDATA[<p>Ok. This post will be a bit complicated but stick with me. Hopefully, in the end, all will be clear. And the geek in you will love it.</p>
<p>One of the things that users just kind of glance over, but don’t realize the implication, is the fact that PowerPivot is a copy of the data. If you haven’t already, let me suggest that you read my <a href="http://powerpivotgeek.com/2009/11/09/a-peek-inside-wheres-the-beef/">&quot;Where&#8217;s the beef?&quot; posting</a>. In that posting I talked about the fact that <u><strong>data itself</strong></u> is pulled into the workbook when you save it. When you click any of these buttons:</p>
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2009/11/image63.png"><img title="image" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="484" alt="image" src="http://powerpivotgeek.com/wp-content/uploads/2009/11/image_thumb64.png" width="630" border="0" /></a> </p>
<p>The import process runs. From then on, the data can start change and shift away from the values that is stored in memory and ultimately in the workbook. The data is ‘real-time’ only when the import is running; afterwards all calculations, pivot and slice is driven by the stored data. On the client this is clear because we have the ‘Refresh’ button (and its options) that provide refresh on the client. But how about the server?? Well, that is the core of this posting. Let’s take a closer look at it. We will start at the menu items for the Excel Services rendering of the workbook. Notice the options here:</p>
</p>
<p> <span id="more-190"></span>
</p>
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2009/11/image64.png"><img title="image" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="379" alt="image" src="http://powerpivotgeek.com/wp-content/uploads/2009/11/image_thumb65.png" width="644" border="0" /></a> </p>
<p>The question is “What does it mean to ‘refresh’ the connection?” The answer to that is that it depends on the data provider. For virtually every OLEDB and ODBC provider that Excel Services uses, ‘refreshing a connection’ means going out to the data source and re-querying the data source for its data. SQL Server RDBMS, Oracle, Teradata, virtually to everyone it means refreshing that Excel Services data. And it means that in PowerPivot also, but in PowerPivot where is the data stored? (You know the answer this already, don’t you). The data is in the workbook. Has the workbook changed since you last opened the .xlsx file? Well, I suppose it might have – and in which case, refreshing the connection might bring in new data. But in the vast, vast number of cases, <em>refreshing the PowerPivot table means just re-reading the data that Excel Services already has</em>. In most cases, it has absolutely no effect at all.</p>
<p>To really drive this home, let’s shift into super-geek mode and drill down into the workbook itself. I will go back to the workbook in the first screen shot and first click on the Connections option in the Data ribbon. Notice that there is a connection that has been defined behind my back in the workbook. It is called “Sandbox” which by the way was the name of our system prior to Gemini and prior to PowerPivot. I didn’t create that connection. It was created for me when the PowerPivot Excel add in was first started. This is the connection which is actually interfacing to the in-memory database. Now let’s drilldown further into the “Sandbox” connection and look at its connection string. WOW! The “Data Source=” property, which would normally point to the server for where the database is stored, instead points to “<strong>$Embedded$</strong>” – What’s that?? </p>
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2009/11/image65.png"><img title="image" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="484" alt="image" src="http://powerpivotgeek.com/wp-content/uploads/2009/11/image_thumb66.png" width="644" border="0" /></a> </p>
<p><strong>$Embedded$</strong> is the magic tag that tells PowerPivot for SharePoint that the data does not come from some server somewhere – instead the data comes from the workbook itself. One of the new OLEDB interfaces created for PowerPivot is a property that Excel Services sets which contains the URL for the workbook that Excel Services is opening. The msolap OLEDB provider takes that URL and replaces the $Embedded$ string with the URL itself –&gt; and thus the infrastructure will read its data from the workbook itself.</p>
<p>But – and this is the critical “BUT” – notice that the embedded content never changes. After you upload a workbook, that workbook doesn’t change on its own. Thus neither does the data. Remember the data is a <strong><u>copy</u></strong> of the data that is embedded in a workbook. If Excel Services refreshes it, the ECS calc engine gets the same data over and over again. The SSAS database embedded in the workbook hasn’t changed – so the data refresh is a nop – it never changes. Refreshing a connection to an embedded PowerPivot database doesn’t refresh anything. You get the same data over and over again.</p>
<p>So, how does the workbook data get refreshed? After all, there must be some way to do it . . . In fact, there are two ways:</p>
<ol>
<li>Bring the workbook down on the client and refresh the data in the workbook. Then re-publish the workbook back to the same location in SharePoint. New data is automatically given to Excel Services and existing connections. </li>
<li>Use the data refresh facility, see the <a href="http://powerpivotgeek.com/misc/my-other-blog-articles/powerpivot-data-refresh/">data refresh posting</a> and <a href="http://powerpivotgeek.com/2009/11/12/steps-taken-during-a-powerpivot-data-refresh/">detailed steps posting</a> for more information. In this case the PowerPivot System Service will reach out and pull in new data into the workbook. A new version of the workbook is created and new data is automatically give to Excel Service and existing connections. </li>
</ol>
<p>And before you ask, <u>No</u>, PowerPivot V1 has no option to monitor the data in real-time and update its data in-memory as the source data changes. The workbook captures the data at a point in time – and then users work with that data. There are no provisions for real-time access to data while doing analytics / calculations / pivot table operations. </p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fpowerpivotgeek.com%2F2009%2F11%2F15%2Fwhen-is-a-refresh-not-a-refresh%2F&amp;linkname=When%20is%20a%20refresh%20not%20a%20refresh%3F"><img src="http://powerpivotgeek.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a>]]></content:encoded>
			<wfw:commentRss>http://powerpivotgeek.com/2009/11/15/when-is-a-refresh-not-a-refresh/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Steps taken during a PowerPivot data refresh</title>
		<link>http://powerpivotgeek.com/2009/11/12/steps-taken-during-a-powerpivot-data-refresh/</link>
		<comments>http://powerpivotgeek.com/2009/11/12/steps-taken-during-a-powerpivot-data-refresh/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 05:08:55 +0000</pubDate>
		<dc:creator>powerpivotgeek</dc:creator>
				<category><![CDATA[Data refresh]]></category>
		<category><![CDATA[credentials]]></category>
		<category><![CDATA[data sources]]></category>
		<category><![CDATA[ULS]]></category>

		<guid isPermaLink="false">http://powerpivotgeek.com/?p=177</guid>
		<description><![CDATA[<p>In this posting we will take a more detailed technical look at how the data refresh facility works and the steps that it takes to accomplish a data refresh cycle. Rather than starting with the &#8220;Manage data refresh” page, we will assume that you know how to setup a schedule – in this posting, we [...]]]></description>
			<content:encoded><![CDATA[<p>In this posting we will take a more detailed technical look at how the data refresh facility works and the steps that it takes to accomplish a data refresh cycle. Rather than starting with the &#8220;Manage data refresh” page, we will assume that you know how to setup a schedule – in this posting, we will take a deep dive into the cycle itself.</p>
<h4>What steps are taken when the data is refreshed?</h4>
<p>Now that you have configured your schedule(s) for the workbook, let’s take a step back and examine more closely what data refresh actually means. I think that it is valuable to understand, at some basic level, exactly what the system is going to do on your behalf at 2am in the morning. When a job actually run, the data refresh facility goes through the following steps:</p>
<ol>
<li>First, the system looks for schedules that are ‘runable’ meaning that their schedule time period has come due. As all of the jobs might be scheduled at close to the same time (midnight, for example, is a popular time <img src='http://powerpivotgeek.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ), the system tries to run the job as soon as it can. All of the PowerPivot SharePoint servers are doing this at the same time. Ultimately one of them detects that your job has “come due” and is runable.</li>
<li>After impersonating the Windows user specified in the schedule, the system extracts the workbook from the content database using the SharePoint binary OM. The user must supply a valid Windows account in the schedule and he or she must ensure that that account has contributor (read/write) access rights to the workbook. The workbook is stored in a temporary folder (in the SSAS Backup folder) so it can be used later (see step 9 below).</li>
<li>The system extracts the embedded database from the workbook and loads the database into the local SSAS Engine instance. The database is loaded read/write (so it can be updated). This database is only used for this data refresh job – the system ensures that it is <span style="text-decoration: underline;">not</span> used for querying while updating is going on (the SSAS processing commands).</li>
<li>If a data source(s) specified for this schedule has custom data source credentials specified for the job, then the data source(s) have their connect string properties changed (in V1 we only support the changing of the “Username” and “Password” properties. This is done using an XMLA command to the data source.</li>
<li>The system impersonates the Windows user for a second time and sends processing commands to the database. This causes the Engine to reach out to the sources and pull updated data into the database. The processing command is not sent to all tables/dimensions. The process commands are sent just to those objects that are dependent on the data source(s) included in the schedule.</li>
<li>The data source credentials (if any) are reset.</li>
<li>The database is saved back to the workbook.</li>
<li>If it is not set already, the embedded connection’s property “Refresh data when opening the file” is set to True. This ensures that users immediately see the new data the workbook opened. It also means that snapshot generation will include the new data in the thumbnail.</li>
<li>Impersonating the Windows account yet a 3rd time, the workbook is saved back to the content database using the SharePoint binary OM. If the document library is a PowerPivot Gallery, then the OM fires its ‘new file’ event handler fires which starts the snapshot generation process. The “new file’ event handler was added by the Gallery content type.</li>
<li>The schedule’s status is updated with information about the job, i.e. its success, failure, error messages, etc.</li>
<li>And finally, the database is converted to a read-only database so it is available to users immediately for querying. This makes the user’s first query as fast as possible and lessens the load on the SharePoint content database since the PowerPivot database is already loaded into memory. </li>
</ol>
<p>The end result is that a new, updated workbook has been stored back in the original workbook’s document library – the overall system is primed and ready to go when the workbook is viewed.</p>
<p> </p>
<p><span id="more-177"></span>A few observations:</p>
<ul>
<li>Remember that to edit the schedule, you must enable it. I am always forgetting to check the “Enable” box at the top of the schedule. The radio buttons can still be selected if disabled, but the options will not expand. I cannot tell you how often I’ve sat staring at a page wondering what was wrong, only to realize that I forget to enable the schedule.</li>
<li>The schedule is kept independent from the workbook itself. It is stored in the service application database indexed by the SPFile.FileID. This uniquely defines a file on the SharePoint farm. A file can be deleted and its schedule remains. Publish a new file and the schedule automatically picks up.</li>
<li>The schedule history (success or failure results w/ error messages) is also kept in the service application database so it can remain a long time after the file has been deleted. While not available from the end-user’s UI (unless they recreate the file), the history information is available via the Mgmt Dashboard – so the information can be in a report (again, long after the file has been deleted).</li>
<li>An important point to remember: You specify the <span style="text-decoration: underline;">Windows user</span> here:  (one per schedule; pick your favorite method – one of the three)<a href="http://powerpivotgeek.com/wp-content/uploads/2009/11/image8.png"><img style="display: inline" title="image" src="http://powerpivotgeek.com/wp-content/uploads/2009/11/image_thumb9.png" alt="image" width="700" height="215" /></a>
<p>You specify the <span style="text-decoration: underline;">data source user</span> here: (one per data source; again, pick your favorite one of three methods)</p>
<p><a href="http://powerpivotgeek.com/wp-content/uploads/2009/11/image9.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" src="http://powerpivotgeek.com/wp-content/uploads/2009/11/image_thumb10.png" border="0" alt="image" width="709" height="601" /></a></p>
<p>Get these two types of users mixed up and it will be <span style="text-decoration: underline;">very</span> confusing.</li>
<li>Troubleshoot: The ULS logs are your friend. Search Codeplex (<a href="http://www.codeplex.com">http://www.codeplex.com</a>) or your favorite SharePoint web site and pickup a good viewer. You will use it *a lot*.<br />
Another good ULS viewer is at: <a title="http://code.msdn.microsoft.com/ULSViewer" href="http://code.msdn.microsoft.com/ULSViewer">http://code.msdn.microsoft.com/ULSViewer</a> </li>
</ul>
<p>Enjoy.</p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fpowerpivotgeek.com%2F2009%2F11%2F12%2Fsteps-taken-during-a-powerpivot-data-refresh%2F&amp;linkname=Steps%20taken%20during%20a%20PowerPivot%20data%20refresh"><img src="http://powerpivotgeek.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a>]]></content:encoded>
			<wfw:commentRss>http://powerpivotgeek.com/2009/11/12/steps-taken-during-a-powerpivot-data-refresh/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

