Tuesday, October 31, 2006

At the end of this month I'll be moving to Florida. During my transition period my email and web services will be off which means my blog too.

While my email server and web server are hosted within VM's, I do not know if I'll have enough computer equipment available to me to have both up and running at the same time. So don't be to concerned when my site goes down.

I'll bring everything back up as quickly as possible, lord knows I cannot live without email anymore. :-)

----- Rom

Miw of PrimeGrid has raised some interesting points regarding my thoughts in this thread:
http://www.primegrid.com/orig/forum_thread.php?id=432&sort=5

First off let me say that Miw is right about the per TCP connection overhead. It applies to file uploads, file downloads, scheduler requests, trickles, forum requests, and now AMS requests.

I also agree with him that if a public facing BOINC project on a single server it would keel-over after an outage because of the file upload requests.

The thing about both the upload and download servers is that their can be any number of them for a project. As a matter of fact all the components except for the scheduler and database can exist on any number of machines. So most of the time we are involved in scale-up vs scale-out debates when brainstorming about future optimizations.

I'll have to check on the scheduler again to be sure though, as I have a funny feeling I remember some code from Carl C. of CPDN that fiddled around with the feeder query and he may have introduced a way to run multiple schedulers.

The basic gist I want to get across though is that most, if not all, of the components in a BOINC server farm can scale for a project with unlimited funds. Only the database server proves to be difficult to change out as S@H experienced during their database server upgrade. In BOINC's defense on that issue, I would like to point out that the database file formats changed when switching from Solaris to Linux, so the database had to be dumped to a flat file and reloaded on the new machine.

I believe that the file upload and download servers are used as dams most of the time to keep the rest of the system from keeling over, for instance if the those servers were not keeping the hoards of machines at bay and everything was gated on the database then after an outage nobody would be able to use the website, or read/post in the forums.

By far the easiest servers to replace in a BOINC server farm are the upload/download servers, all you need is a Linux box and Apache. File uploads are handled with a small CGI program.

I'll talk to David tomorrow and see if accepting 2 or 3 files during an upload request makes since, it sounds good on the surface but I'm concerned about the increased disk bandwidth requirements. S@H for instance has a shared disk array for file uploads and downloads, when that array is bogged down then the whole pipeline boggs down.

Thanks for all the great feedback.

----- Rom

References:
http://en.wikipedia.org/wiki/TCP/IP
http://httpd.apache.org/
http://en.wikipedia.org/wiki/Common_Gateway_Interface

Original Article:
BOINC Client: The evils of 'Returning Results Immediately'

Monday, October 30, 2006

So all-in-all I'm really impressed with wxWidgets. Most of the time it is pretty straight forward and easy to figure out, but on Wednesday night through Thursday I was frustrated with the framework. The documentation made it sound easy, and in the end it was easy, but getting their wasn't so easy.

I went through about 15 iterations before finding the right solution. My worst solution took 2 minutes to finish a re-paint, but it finally did paint everything the right way. During the 2 minutes though there was enough flicker it might have sent somebody into convulsions. I even attempted to use Google to see what others had done, most of the references were really old and stale, all of them didn't seem to work anymore with 2.6. So I'm posting my solution for others to find and use.

Header File:

   1:  class CTransparentStaticText : public wxStaticText
2: {
3: DECLARE_DYNAMIC_CLASS (CTransparentStaticText)
4:
5: public:
6: CTransparentStaticText();
7: CTransparentStaticText(
8: wxWindow* parent,
9: wxWindowID id,
10: const wxString& label,
11: const wxPoint& pos = wxDefaultPosition,
12: const wxSize& size = wxDefaultSize,
13: long style = 0,
14: const wxString& name= wxStaticTextNameStr
15: );
16:
17: bool Create(
18: wxWindow* parent,
19: wxWindowID id,
20: const wxString& label,
21: const wxPoint& pos = wxDefaultPosition,
22: const wxSize& size = wxDefaultSize,
23: long style = 0,
24: const wxString& name= wxStaticTextNameStr
25: );
26:
27: virtual bool HasTransparentBackground() { return true; };
28:
29: virtual void OnPaint(wxPaintEvent& event);
30:
31: DECLARE_EVENT_TABLE()
32: };
 

Source File:

   1:  IMPLEMENT_DYNAMIC_CLASS (CTransparentStaticText, wxStaticText)
2:
3: BEGIN_EVENT_TABLE(CTransparentStaticText, wxStaticText)
4: EVT_PAINT(CTransparentStaticText::OnPaint)
5: END_EVENT_TABLE()
6:
7:
8: CTransparentStaticText::CTransparentStaticText() {}
9:
10: CTransparentStaticText::CTransparentStaticText(wxWindow* parent, wxWindowID id, const wxString& label,
const wxPoint& pos, const wxSize& size, long style, const wxString& name ) {
11: Create(parent, id, label, pos, size, style, name);
12: }
13:
14:
15: bool CTransparentStaticText::Create(wxWindow* parent, wxWindowID id, const wxString& label,
const wxPoint& pos, const wxSize& size, long style, const wxString& name ) {
16: bool bRetVal = wxStaticText::Create(parent, id, label, pos, size, style|wxTRANSPARENT_WINDOW, name);
17:
18: SetBackgroundColour(parent->GetBackgroundColour());
19: SetBackgroundStyle(wxBG_STYLE_COLOUR);
20: SetForegroundColour(parent->GetForegroundColour());
21:
22: return bRetVal;
23: }
24:
25:
26: void CTransparentStaticText::OnPaint(wxPaintEvent& /*event*/) {
27: wxPaintDC dc(this);
28: dc.SetFont(GetFont());
29: dc.DrawText(GetLabel(), 0, 0);
30: }
 

Thanks it. It took a while to find the fix and simple solution, I'm glad I finally figured it out.

My previous iterations were embarrassing.

----- Rom

Is the boinc core client and manager going to support IPv6 in the near future?

All of the communication between the core client and project servers is done through a library called libCurl. It has an awesome feature set and it wouldn't surprise me if they already supported it. A quick pass over their comparison chart says they do. At this point I'm not sure there is anything more we have to do.

Does anybody have some IPv6 gear to test things out on?

Will a future BOINC have an interface tab or an options extension or the like to set any of the 'override' parameters?

I'm not sure what you mean by override parameters. If you are referring to the global preferences then yes, the manager will include the ability to override the global preferences. That feature will first make it's debut with the BSG with a small subset of the overall features, probably within a release after that I'll add the rest of the global preferences to an enhanced preferences dialog which will be available through the advanced interface.

Currently the simple preferences dialog looks like this:

To be fair though, I just got done butchering everything on Friday to take care of a usability issue and WCG hasn't had a chance to give me an updated bitmap and that is why you can see the magenta border. The general layout is there though, it should look pretty intuitive on how it should work.

Everybody should feel free to provide any first impression feedback, we are all interested in what you all have to say.

any update on new BOINC client interface? can anyone sign up for beta testing?

Well for the last several weeks I've been saying we will hit beta this week. So without further ado, we'll hit beta this week. Kevin and I will probably chat tomorrow and decide what to do. My new target date for a beta release is Wednesday.

Like with all of our beta releases they are available for those who really want to try things out, just be advised that beta releases have bugs and things may not work. In the worst case scenarios' their could even be data corruption.

When BOINC is updated, it ignores already installed folder; user have to manually choose correct folder - every new BOINC version, every machine running over and over. Any good reason for that?

Nope, none. It is on my plate to fix. I was hoping to have more time in this release to do a couple of things like storing setup information/version upgrade notification, I still might after we get the beta process underway, but right now I'm head down on the Simple GUI until things have stabilized.

So when can those of US who run Windows XP x64 see a Native 64bit Boinc and app?

I suppose when I can get my hands on a 64-bit machine.

I generally buy my own hardware, I have expensive tastes and really don't like low-budget computer hardware or base configuration models. Down-side to that is I don't upgrade often, my current workstation I've had for several years and probably has another couple of years left on it.  Although I have been looking at a few of the dual-processor/dual-core/hyperthread-enabled workstations from Dell. Who knows, I might pick one up next year.

If there is enough demand for a 64-bit build, and for whatever reason Crunch3r and crew are having problems releasing builds, I'm sure David would hook me up with a 64-bit machine.

considering it's clock-changing weekend: does boinc take into account the fact that the clocks change when recording/calculating processing time?

For the most part BOINC uses Epoch time internally, I suspect BOINC will be superseded by something else before we run into time keeping issues.

why doesn't boinc use actual CPU time directly?

Any place BOINC can use CPU time to account for the amount of CPU time an application has used, it does.  Some operating systems don't provide a very good way to get at hat information, and in those cases wall clock time is used.

about crashes etc., when something fails/crashes in windows, the user is asked to send a "report" to a Microsoft server somewhere.
Are these reports actually collected from MS for debugging purposes?

Short answer, no, the crashes are uploaded to a Microsoft server, but Microsoft only investigates their own application crashes. Microsoft does offer access to the crash reports to the 'owners' of the software so they can download the crash dump files and try to figure out what is going on. You actually shouldn't be seeing the 'Error Reporting' tool which I'll refer to as Doctor Watson.

BOINC is supposed to be completely autonomous, meaning it just runs in the background and if an application crashes it silently handles it and any diagnostic data that we can get at is analyzed in the background and then uploaded to the project server in a condensed form. I participated in debugging both S@H and R@H applications using this technology and have started to collect and publish little nuggets of information about common crashes. You can find it here:

http://boinc.berkeley.edu/app_debug_win.php

I'll continue to add to the list as I find them, or am called in to help isolate bugs in another application. Most of the examples are R@H crash dumps, I should have started the document during the S@H beta cycle, but I didn't think about it then.

1) the most annoying one is the upload+report & download of new work process with a short cache.
I have a tiny cache (something like 0.0001 days) because i have a premanent Inet connection. When a task is very near finishing, due to my small cache a new workunit is downloaded, and then the near-finishing WU is uploaded and reported. The problem here is that 2 requests are being made, one for new work, then one soon after to report finished work, it would be more sensible to wait for the unit to finish, upload, then report and get new work in one operation, rather than hammer the servers as "return results immediately" does
Will the new CPU scheduler avoid this problem?

John really is the best person to ask about CPU scheduling issues, I'm just a consumer of his and David's work, same as you.

That said, I do not believe the new CPU scheduler will avoid the problem, one of the goals is to keep the CPU busy, if you finish your result and have to wait for the client to download another one, the CPU isn't busy.

If I was in your shoes, after the new scheduler is released, I would set my cache size to 1 and let the client re-normalize on that. The days of having a very small cache to keep from missing a deadline should be coming to a close.

4) not a bug, but a question, i've got some changes to some of the web code, and i want to checkin my changes to the CVS/SVN system, but obviously i don't have the permissions to do so. how do i go about getting my changes merged?

Send them to David and/or Rytis and let them look over the changes.

Carl was unable to trap all the exceptions within Visual Studio (unlike the Linux environment which was more helpful) which is why I suggested having a call-back process so that Boinc could get the science app to help with 'difficult' exceptions. So you'd still have a black box, just not a cubic one :-)

Yeah, I've been working with AutoDock@Home a little bit trying to help them get setup in there Fortran environment. It appears that the Intel Fortran compiler uses a different form of exceptions than Windows knows about. I found some interoperability documentation between C/C++ and Fortran and suggested some changes. When they let me know how things went we might be able to provide some extra information for those using Fortran in the BOINC environment.

 

 

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

----- Rom

References:

http://curl.oc1.mirrors.redwire.net/
http://curl.oc1.mirrors.redwire.net/docs/comparison-table.html
http://en.wikipedia.org/wiki/Unix_time
http://boinc.berkeley.edu/app_debug_win.php
http://boinc.berkeley.edu/sched.php

Wednesday, October 25, 2006

Is anybody going to ask me some BOINC related questions this week?

----- Rom

Tuesday, October 24, 2006

Significant amount of time and energy has gone into making BOINC's communication infrastructure efficient, yet there are still many whom believe that it really doesn't cost the projects any more to return the results immediately vs. returning the results when BOINC believes it ought too.

For the purposes of this article I'm going to define the cost of a query at $1.00 per query to cover the cost of electricity, air conditioning, maintenance, and cost of personal to manage the database server. Now in real life that number is greatly exaggerated, but it is easier to describe the relative cost of something based off of something tangible.

Here is a basic rundown of query cost per MySQL documentation:

Insert Select Update
Connecting 3 3 3
Sending query to server 2 2 2
Parsing query 2 2 2
Inserting row 1 x
size of row
   
Inserting indexes 1 x
number of indexes
   
Selecting row   0.01  
Updating row     0.05
Updating indexes     1 x
number of index changes
Sending results to caller   2  
Closing 1 1 1

Notes:

  • Scheduler only does selects and updates
  • Selects happen really fast since that is what databases are optimized around
  • Updates are only a little more expensive than a select because they have to acquire an exclusive lock on the row to make sure nobody else is trying to write to that record and then change the record.

Now with using FastCGI we can throw out the connecting and closing costs since the database connection is always available for the life of a single scheduler process, which their can be 100-150 running at a time.

We'll keep track of the number of queries executed and the number of query parts used so we can calculate the cost per query part.

Well break out the results for the following two scenarios:

  1. Reporting 20 results individually.
  2. Reporting 20 results at once.

 

Scheduler RPC

A scheduler RPC does many things as it has to do authentication, preferences, receive incoming result status, and send out new results to be processed. I'll tackle each section one at a time.

Authentication

Authentication consists of a query for host, user, and team. Each query is independent, although we have talked about batching them into a single query, we just haven't gotten that far yet. Now this part of the RPC may result in a new host record being created if your connecting up for the first time or something is wrong with what you have sent to the scheduler.

Scenario 1: 60 Queries, 360.6 Query parts.

Scenario 2: 3 Queries, 18.03 Query parts.

Platform Check

Checks to see if your platform is supported.

Scenario 1: 20 Queries, 120.2 Query parts.

Scenario 2: 1 Queries, 6.01 Query parts.

Preferences Check

Determines if the preferences on the client need to be updates or the server needs to be updated. If the server needs to be updated then an update query is submitted.

Scenario 1: 20 Queries, 120.2 Query parts.

Scenario 2: 1 Queries, 6.01 Query parts.

Handle Reported Results

Here each result is looked up to see if it was assigned to the person reporting it and to update its values. The workunit record for each result record has to be updated so the transitioner will look at the workunit and decide what to do next. Two indexes have to be updated in the result table and 1 in the workunit table for each result. What is important to point out here is that in scenario 2 we batch all of the selects and updates for results in scenario 1 into a single select and update. The workunit updates are also batched in scenario 2.

Scenario 1: 60 Queries, 342.2 Query parts.

Scenario 2: 3 Queries, 17.11 Query parts.

Assign New Results

Most of the preparation work for this phase is actually done by the feeder. So here we get the latest information about the result, then update the result, then update the workunit.

Scenario 1: 60 Queries, 342.2 Query parts.

Scenario 2: 60 Queries, 342.2 Query parts.

 

Totals

Now that we have broken down the scheduler into each of its parts and isolated the number and types of queries we can calculate what it would cost the project if each query cost $1. The query parts metric is useful in determining how much wasted database time is spent for each operation. All around scenario 2 costs the project less in time and maintenance on equipment.

Scenario 1 costs a project $11 per result, and scenario 2 costs a project $3.40 per result.

Scenario 2 is 70% more efficient than scenario 1 in the amount of time used to process 20 results.

 

So be kind to your project(s), let BOINC report the results in batches. The project admin's will be able to support more people and more machines with the same hardware.

 

----- Rom

[Edit: Since originally writing this I hunted down a few numbers from jocelyn which is the S@H database server

On average jocelyn is processing 314 queries per second.
In the last 5 days jocelyn has processed 144.7 million queries.

]

 

Saturday, October 21, 2006

Can we get more (unlimited - well, within reason!) preferences than home, school and work? Three profiles isn't enough for me and I'm only running a small number of computers. I know these can be overridden (although the project preferences for Rosetta (i.e. runtimes) cannot)I'd find it really useful if these profiles could be added to as required, and please can you make them renamable?!?

I believe the account manager folks are working on some features which will allow greater configuration flexibility. The BOINC client is capable of dealing with a greater number of zones, there just hasn't been an easy way of configuring them on a project's web site. Rytis is now at the helm of the project web site and forum features. I'm looking forward at seeing what he is going to cook up.

Also, any update on BOINC on the consoles?

Well there is a lot of buzz, but nobody has signed on the dotted lines yet. David and Eric are going to a Sony R&D center next week to meet some engineers for the PS3. I haven't heard anything new about the XBOX 360, the XNA Game Studio from Microsoft is a bust for BOINC, it assumes all of the game code is going to be managed code on the 360. So that leaves us with the need of the same development kit as the professional game studios use.

Again moor of a request i am attached to a lot of projects and when I need to take a box out of service(without throwing away wu) I have to click "no new tasks" over 30 times. A bit tedious especially over VNC. A global (per host)no new tasks button would be of great use to me.


Is the global update ever returning? Although I can see where it can be abused.

Right now many things are on hold until after we can get the BSG out the door. Tentatively I have some time allocated to re-work the Advanced UI and playing around with Vista has inspired me on how to handle the multi-selection cases in a list view control. We shall see though.

'Retry Communications' is about as close as your going to get for an update all type function. It basically resets the countdown timer for any pending action.

With regards to the whole '-return_results_immediately' thing, from a project perspective it is altogether evil. I'll write up another post about that separately.

1) What are the typical things which cause the work unit to fail?
(Environmental - antivirus, graphics drivers, excessive overclocking, PC crashes, playing games for hours, video encoding, etc.
Human factors - Misunderstanding boinc messages, for example incorrect URL - they detach and attach, then get upset that x months of work is 'down the pan'. Ditto installation of berkeley version over bbc version, easy to fix but they don't know how)

You have nailed the majority of cases. I mean we could go off into the really obscure cases like cosmic rays and the like, but you covered all the things in the majority case.

In the future we won't be allowing a directory name change for any software package that we build for others, so that should take care of any potential future BBC issues. Now before you all think I'm making up the whole cosmic ray thing here is an article from ZDNet about eBay suffering one to two crashes a month due to a defect in their ECC memory which left them prone to cosmic rays.

2) Is there anything which can be done to avoid these, either by the science app or by Boinc itself?
(Uploading partial results as the WU runs. Exception handlers, both at science app and callbacks at boinc? Restart from checkpoint/backup if error code 0,-107...,etc etc received? Going into hibernation if PC is very busy, out of memory, etc)

This is one of those really cool but really though questions. Each environment handles things a bit differently. About the best advice I can give is for each project to really understand how the programming language they are using interacts with the operating system they are using.

CPDN is advancing the trickle model to the point where they could resend out a workunit that has timed out and take the previous users trickles and reuse them as the starting point of the new work unit.

One thing I would like to point out is that BOINC itself cannot do anything about a science application failure except fail the workunit and move on to the next one. To BOINC each of the science applications are a little black box and the only way BOINC knows anything about what is going on inside is through a little 8k chunk of shared memory broken up into 8 channels. Simple commands are passed around in these channels like show graphics, hide graphics, and here is the amount of CPU time I've used.

Now exceptions, and error tracking in general, use pointers in the local address space for the science application. For BOINC to be able to track exceptions in a science application would mean that BOINC would have to act like a debugger while the science application is running which would cause a 20-30% performance decrease for all science applications, and would more than likely negate any optimizations available to an application.

We did add a little something to the BOINC API library which we internally refer to the 'BOINC runtime debugger'. This little chunk of code is compiled into the science application and informs the OS that if any unhandled exceptions happen, it needs to execute a chunk of code. Using stackwalker as a template we expanded the functionality and improved the data returned to the project using a Microsoft library on Windows to dump out as much information about the exception as possible. This code isn't ever executed or used unless an unhandled exception happens within an application, so no performance decrease is experienced.

I'm going to need to write a whole different article on this topic.

3) What support does Boinc have / plan to have which relate to this category of work unit specifically?
(e.g.) some ideas, many of which may be impractical -
* Separation of graphics from the work unit so that a temporary problem with the graphics drivers doesn't cause the WU to fail

Separation of the graphics code from the worker code will probably start at the beginning of next year. It is going to be a requirement for supporting Vista and other OS's as they increase in their defense in depth models.

* Automatic backups
* Backups which are per-workunit rather than for all workunits which happen to be running

There are other tools that can be used for backups. Frankly, trying to tackle that role is complicated and really outside the design scope for BOINC.

* Callbacks from Boinc into science app to allow the science app to handle boinc exceptions it wouldn't normally be able to trap

What kind of exceptions do you think the science applications need to handle?

* Handling of the situation where the PC is very busy, out of memory or other resources, about to crash, TCP/IP stack blocked...)

We are adding more smarts into the CPU scheduler to handle the memory/paging cases.

Crashing is a random event, the only way you could know something is about to crash would be to already know what the bug is.

We added some code awhile back to test the various communication mechanisms when BOINC is first launched, that should have taken care of the TCP/IP blocks. If you know of any cases we haven't covered with recent builds let me know.

how's the progress with allowing AMS/BMS/BAM (whatever it's called these days) to control the state of projects and WUs
such as setting NNW, or suspending a project/task?

I believe this code is in for the 5.8.x release.

Farm Managers ?
Farm Manager ability came with Account Managers, I cannot find any programs on the BOINC website to install a Farm Manager on my computer, what is it? is it working? or has it been abandoned?

A farm manager is an idea that James Drews had, I believe, that is geared towards managing hundreds of machines. Basically you setup a web server which acts as a private AMS, the BOINC client includes it's IP address, port number, and GUI RPC password (I think) when it first connects to the farm manager. After that if you want to do something specific to a machine the farm manager can issue a GUI RPC just like the BOINC Manager. I'm not sure if anybody besides James has done anything about creating a farm manager package.

BOINCView is probably the best bet unless you come by several hundred machines.

Auto update of 'BOINC' ?

Funny you should ask this, WCG was asking about this very same thing. We'll probably start looking into something like this for the 5.10 release.

We were always concerned if we had put something like that in place it might be exploited by an attack vector we never even thought of. At least with a human at the other end of the equation the amount of damage would be limited.

Now with WCG as a contributor we can get the IBM security department to look things over and let us know if something is really wrong. IBM has looked over the BOINC source once already so we are confident we have our i's dotted and our t's crossed but with auto-deployment of code without user intervention you can never be too careful.

I am new to BOINC and I'm loving it, but I was wondering: are any plans for BOINC to use the powerful new age GPU's and PhysX processors that are perfect for floating point computations?

FluffyChicken Wrote:

I can answer the last one,
ATI(AMD) have asked BOINC if they would like help, though it would be the projects that would need the help if the GPU is capable. NVIDIA would probably need to jump in if your(we) are going to get it running on that, or somebody like Microsoft developes an easy to use API (Accelerator in research ?)
As for PhysX, we (some members in the forum) contacted them from Rosetta@home and had no real rosponse.
Rosetta@Home are in talks with Microsoft for the XBOX360 though, apparently.

I would just like to add that with the next release BOINC currently detects your video card and processor capabilities and reports them to the project. If/when a project commits to using a graphics card or physics accelerator we could go through with the rest of the work items to turn them into a resource that can be scheduled for use.

We added in the detection code so we could try and get the stats sites to break down video card usage and processor capabilities, maybe spur on the projects to develop specific customized applications to harness the untapped capabilities of the machines.

It is much easier to go to a project and sell them with hard numbers than to say we think this could help you by 'x' amount.

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

----- Rom

Previous Articles:

BOINC Q&A -- 13/10/06
BOINC Q&A -- 10/06/06
BOINC Q&A -- 09/30/06
BOINC Q&A -- 09/22/06

References:

http://news.zdnet.com/2100-9595_22-525403.html
http://www.codeproject.com/threads/StackWalker.asp

Thursday, October 19, 2006

Yesterday Scott Hanselman blogged about an analysis tool called 'Ohloh'. I checked out what it had to say about BOINC.

Direct project url:
http://www.ohloh.com/projects/3215

 

It generates many charts and graphics about the changes it detects in the source tree over time. I have looked over quite a few things and got to heckle David a bit about some of the graphs. I really got a kick out of this chart:

To be fair though I need to point out that David checks in code using 'davea', 'boincadm', and 'sorabji' depending on where or when he has checked in code.

This is one chart we both got a kick out of:

 

How cool is that?

----- Rom

Tuesday, October 17, 2006

Is anybody going to ask me some BOINC related questions this week?

----- Rom

Monday, October 16, 2006

So I spent some time this weekend tweaking my blogs look and feel. Saturday afternoon I just got a burr up my butt and decided to do something about it.

I ran into an article that had some useful pointers:
http://www.useit.com/alertbox/weblogs.html

I took some of his advice and made the following changes:

  • Added a profile
  • Moved my contact info and email signing certificate to the header
  • Optimized the viewing space for 1024x768

I went so far as to include links to translate my postings to Spanish, German, French, Portuguese, Japanese, Chinese, and Arabic.

I've contacted the Feedburner crew about fixing the link text for the actual links within the RSS feed itself. For some reason the encoding's get borked.

Hopefully the translations Google does for me won't offend anybody. :-)

Well catch ya all later...

----- Rom

Previous articles:
Blog Moved

Friday, October 13, 2006

Advanced Memory Management, what is the idea/aim behind that?

Well that is a good question, the advanced memory management is more about setting boundary conditions on how much BOINC and related processes are allowed to use.

We still get a few reports of BOINC causing systems to become unresponsive or sluggish. Most of the investigations we have done revealed a machine that was paging a lot during the times BOINC was running. Paging is the process the OS uses to free up less frequently used memory to make room for active tasks by writing those pages of memory to disk. Each page of memory is roughly 4KB in size on a x86 processor.

So lets say you are running a machine with 512MB's of memory. Windows XP uses roughly 128MB of that on boot-up and will allow parts of itself to be paged out to disk. The last round of virus scanners I looked at want around 100MB of memory, the little system tray icons in the lower right part of your screen generally take about 5MB a piece, with the notable exception of the various IM clients which have bloated out to 20-60MB a piece. Any additional programs running on your machine such as a web browser or email client can take anywhere from 20MB up to 100MB.

When the OS comes under memory pressure it starts looking for chunks of memory that haven't been touched in awhile and writes them out to disk and then loads something into that chunk of memory that is more relevant.

So let us say that you are attached to R@H and you walk away from your computer for an hour or so, during that time R@H has used over 256MBs of memory continuously for at least 30 minutes and the OS has had to page a lot of stuff to make room for it, including itself. You start menu has to be reread from disk or whichever application you happen to be using before you left. All of that paging takes a few moments and makes your computer feel really really slow.

With the introduction of this feature we hope we can finally close one of the last remaining loopholes to user responsiveness.

Right now we have the following two settings planned:

  1. Percentage of memory use while user is active.
  2. Percentage of memory use while user is idle.

What should happen is that BOINC will detect how much memory is installed on the machine, and every 10 seconds or so looks at how much memory a science application is using. If a science application exceeds the total allotment BOINC will shut it down and look for another application to schedule.

I'm really looking forward to this feature since my 2GB machine uses about 1.2GB of memory without BOINC even running and I have four processors to feed. Up until the middle of last year I only had 1GB in my machine and if I had BOINC running it was pretty painful when BOINC rescheduled all the science applications on the machine while I was working.

Scheduler Improvents (already implemented?) how do these help ?

As far as I know John Mcleod has finished the work on the new scheduler and work-fetch policy. The new system should reduce the number of wasted cycles lost between the last checkpoint for an application and when it needed to quite due to a reschedule to honor resource shares.

John is really the wizard in this area.

How are any other improvement going to improve us? and the projects?

I believe the two major work items over the next year will probably be the inclusion of the projects to be able to use torrents in their file download process and the ability for projects to be able to send out optimized science applications for each processor type and possibly GPU enabled applications.

Is there anybody working on boinczilla? Bug reports are raising and nobody sort it out :/

My bad, I'll see what I can do about that this weekend.

Why not run the benchmark at higher priority, so each system produces a constant value, rather than the haphazard, particular as occurring only every 5 days?

The idea behind running the benchmarks at the same priority level as the science applications is to get a rough idea how how many cycles the science applications will get. If you run the benchmarks at a normal thread priority it won't be that much more consistent, and if you run them at the highest thread priority a user mode application can have you'll get numbers that are not very realistic for a science application running as an idle process.

The systems are benchmarked every 5 days or so to handle changes to the environment, such as a more resource intensive virus scanner or any content indexing systems that might have been installed.

When are we going to see the first alpha/beta with the BSG?

Hopefully next week.

With regard to the idea of switching tasks at a checkpoint, what happens (as in, are there any checks etc) when an application gets "stuck" and doesn't make any progress? This also applies to a similar situation with current apps, where they get stuck and the clint tries and tries to get it done by the aproching deadline, but obviously never will. This pushes the client into NNW and EDF. Will BOINC abandon the unit if no progress is made, or the deadline is met?

To be honest, I don't know. I'll have to bug John and David about that.

Is there any possibilty of releasing 5.6.4 or 5.6.5 as alternate versions?

I don't intend to put them on the download page. But if you feel comfortable with the quality of the client that you feel you can recommend people to use it, then go ahead and give them the link. I think we were far enough along in the testing process to know it isn't going to cause any major problems and might have only a few small bugs left before it was ready to be released.

The reason for not adding it to the download page is then people would receive a message in the message long requesting they upgrade to it. If all goes according to plan we'll be able to release 5.8 in a few weeks, and it would be a bad experience to bug people about upgrading twice in one month.

I suspect that if somebody was experiencing a bug that is fixed in 5.6 they would be happy to start using it now and not be so annoyed when they see the upgrade notice for 5.8.

Is there any chance of a purge function being implemented?

I haven't heard any talk of one. I'll bring it up with David, it sounds like something a project might want.

Hot topic: Why is the hourly benchmark value between Linux and Windows different, or it's claimed. When done with stock BOINC 5.4.9 e.g. on Windows it kicks out 8.1 per hour, when same done under Linux, it kicks out 5.0. The WU's are processed at equal speed i.e. a job on Wondows taking 2 CPU hours would take near equal time on Linux.

It has been my experience that the Microsoft compiler has been better at optimization than the GCC compiler. I'm sure I'll get flamed by the OSS crowd but most of the projects are experiencing the same result.

I should point out that the optimizers have been able to equal things out by a lot of trial and error by turning off and on the various optimization switches for GCC.

If the optimizers want to submit a patch that contains different non-CPU specific optimizations I'm sure we could use them.

 

 

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

----- Rom

Thursday, October 12, 2006

Are you more productive with multiple monitors?

I personally, am more productive with multiple monitors, I have 3 17" LCD displays. My roommates have three displays a piece.

Microsoft Research has been looking into this for awhile, and now a french researcher states roughly the same thing about larger monitors.

I'm not sure I would ever want to go back to a single monitor for a desktop rig again.

On my left display I have Outlook and my IM clients, my center display is for whatever I'm working on at the moment, and my right display is for my web browser and online books.

It is nice to be able to glance and see if there is an email I really need to deal with without having to change the focus of the application I'm working on.

----- Rom

References:

Robert Scoble
Toward Characterizing the Productivity Benefits of Very Large Displays
Could a 30-in. monitor help you do your job faster?

Is anybody going to ask me some BOINC related questions this week? Or am I going to have to make something up?

----- Rom

Monday, October 9, 2006

Sorry about the erratic web server behavior today.

Most of my 24/7 computerized services all run off of one machine. The machine runs Windows XP Media Center 2005 and has Virtual Machines for my email and web services. Since it is my everything computer I normally don't pay attention with what I do with it, but today I started a video conversion job today which took 12 hours or more. A few hours ago I noticed that the web server started to reject people with a web server busy message.

The disk queue length grew large enough that all the services running on the machine started having problems.

In the future if I start another video encoding job I'll be sure to adjust the process priority such that I won't have to worry about it again.

----- Rom

So today I discovered why my notebook was failing to hibernate when I closed the lid on Windows XP, thanks to Scott Hanselman.

Scott's article can be found here.

The bug affects all computers that had more than 1GB of RAM. Microsoft finally released a hotfix to the problem here http://support.microsoft.com/?kbid=909095

Why did Microsoft wait so long to release this to the public? It appears they had this available for awhile, but you had to call PSS to get it.

----- Rom

Friday, October 6, 2006

Recently it was announced the 5.6 BOINC has been canceled to concentrate on the 5.8 code (as 5.7 for the time being)
Why is this and why can you not release 5.6 as is now?

The BSG (BOINC Simple GUI) is nearing completion and the 5.6 release was nearing completion but wasn't done baking yet.

After looking over the schedules it became pretty clear that managing two different test efforts was going to create a lot of confusion and management hassles.

We believe we have stabilized most, if not all of the 5.6 features, and the remaining testing work will be focused on the BSG and improved memory management support. We believe we are a few weeks out from having a stable BSG build ready for the public, so instead of asking the community at large to do two back-to-back upgrades within a month, we decided to bag 5.6 and focus on 5.8.

I personally believe this is for the best.

If the tech savvy people want to checkout 5.6 then by all means go ahead and play around with it. We won't be releasing any bug fix releases for that version of the client though.

----- Rom

Thursday, October 5, 2006

I was wondering if you could shed some light on DNS caching, and why the BOINC client apparently keeps records for days, which would seem to ignore the TTLs associated with the records? (the recent DNS changes for Leiden would indicate this; requiring a client restart)

Actually libCurl handles all the DNS stuff. We just pass the server name to libCurl and it handles all the OS details. I took a quick peek at the libCurl source and it looks like they have an internal DNS cache. It also appears that they have a way to expire the DNS cache entries. It isn't clear to me at the moment if we are supposed to call an API to expire DNS cache entries or if that is handled automatically as part of the easy API set.

I'll look into it a bit more to see if I can figure it out.

 

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

----- Rom

So on my vacation in Switzerland my notebook had this annoying problem where it would fail to hibernate when I closed the lib with the Toshiba Portege M400 Windows XP install. Let me tell you, it is annoying to have a notebook fail to hibernate and drains two batteries because it cannot it cannot hibernate. I have both the built-in battery and the add-on slice battery.

The notebook appeared to get stuck in a loop where it would suspend and then when it had been inactive for a period of time it would attempt to hibernate, fail to do so with some sort of System API failure system tray balloon, and then suspend after a period of time. Rinse and Repeat. Uninstalling Toshiba's power management toolset didn't fix the hibernation problem, so I suspect it is another device driver that was causing me grief.

Anyway, when I returned home I had received an email from Microsoft informing me they had released Windows Vista RC1 for beta testing, So I repartitioned the HD of the notebook and installed Vista RC1 and I am impressed. My only complaint so far is that I still haven't gotten BitLocker working, I keep receiving an error stating that the BIOS wasn't able to pass along information to the MBR and I did upgrade the BIOS to 1.70. Oh well.

Toshiba published a set of Vista drivers for the M400 here:
Microsoft Windows Vista Drivers for the M400

I have only experienced the BSOD two times in a week, both times happened shortly after I installed the fingerprint reader drivers. I uninstalled the fingerprint reader drivers and haven't experienced a BSOD since.

The touchpad driver package really isn't Vista friendly, they add startup references to the registry and Windows Defender complains every time you reboot the box asking if it is okay to start them up again. Either Microsoft needs to add them to the approved list, since I couldn't find a way to do it myself, or the ALPS touchpad people just need to add themselves via the startup group instead of the registry. The good news is that the touchpad works without the touchpad driver package so you can skip it if you want too.

Intel finally got around to getting video drivers for the M400 put into Windows so Aero Glass is supported the very first time you boot up. Hurray.

If you use the tablet features on the M400 you'll need to install the XP version of the rotation utility found here:
Toshiba Rotation Utility for Windows XP

All in all, I'm a pretty happy camper right now. To top it all of, hibernation works as intended.

----- Rom

Previous Articles:
Toshiba Portege M400, Windows Vista, and BOINC

Tuesday, October 3, 2006

Can you explain more on Average CPU efficiency and Result Duration Correction Factor? There seems to be some confusion about this, and little definite knowledge. For instance, some say a lower RDCF is better, others say an RDCF closer to 1.0 is best. Which is the truth?

CPU efficiency is the difference between how much CPU time a process received relative to the amount of wall clock time that has passed. It is the answer to the question of "In the last ten minutes or so, how much CPU did BOINC based science applications receive?" The thing to remember here is that the OS is constantly doing things in the background and each of those things eats a little bit of the CPU.

Duration Correction Factor is a per project value that measures the difference between the the expected time to process a result based on the benchmark verses what it actually took. A score of 1.0 means that the benchmark and the application processing time are in sync. The lower the score the greater the variance between what the benchmarks predict verse what it actually took to complete the result.

BOINC tries very hard not to ask for more work than it can actually process in a given period of time, so it tries to keep track of the machine overhead by the CPU efficiency score and Duration Correction Factor. Another thing to keep in mind is that memory speed plays a big part in the Duration Correction Factor. When you see similar processing times for a result for a 3.0Ghz processor and a 2.0Ghz processor it normally means that the 3.0Ghz processor is running with memory that cannot keep up with the processor. Or that both processors are bottlenecked with the memory speed.

We haven't come up with a good solution for measuring the memory bandwidth problem yet. However, we are working on it.

BOINC version release notes do not seam as complete as they were before or am I looking in the incorrect places?

You can checkout the latest and greatest changes to BOINC at this web address:
http://setiathome.berkeley.edu/cgi-bin/cvsweb.cgi/boinc/

The file you'll want to look at is 'checkin_notes' which contains the latest changes made to the client and sever packages.

You can see the check-in history for a specific branch by changing the tag specified near the bottom of the web page. The 5.6 branch tag is 'boinc_core_release_5_6'. If you want to see the changes for 5.4 you would use 'boinc_core_release_5_4' and on it goes.

Any plans on releasing the full minutes of what went on (when your back), I read up on the 1st one but was a bit disappointed with the info on show it only gave a brief overview of what went on.

You can find the workshop proceedings here:
http://boinc.berkeley.edu/ws_06.php

How was the vacation?

I had a blast. I met a bunch of great people. I'm looking forward to going again next year.

wxWidgets 2.7 has been released. Is this going to be used in 5.6 or is it too late?

Too late for this release.

I've seen and myself tried to compile 5.4.x using Microsoft's free Visual Studio Express 2005 editions (with all the bits and bobs needed, wxWidgets, SDK that needed..) Errors show up and does not compile. Is this fixed in 5.6, given this would probably be the major environment used by people trying develop BOINC under windows (since it's free).

The BOINC DLL relies upon the ATL libraries which are not included in the express editions of the MS Development tools. I'm not sure if this is going to change in the future or not. I suspect that if we can incorporate a torrent library that doesn't use COM/DCOM on Windows then I'll invest more time into removing the need for ATL/COM/DCOM so that the DLL can be built with VS Express.

On a side note, I do not believe the express editions of the Visual Studio toolset contain the optimizing compilers or linkers. You might have to upgrade for those, or use the GCC toolset's.

Would it be possible for you (since afaik you compile the final Windows releases) to put up instruction on how you compile BOINC. This may help a lot of people who just wish to dabble.

I'll see what I can do.

How is the progress on low-latency-computing? Which projects expressed their interest in this feature?

I believe this feature was put in for a hospital who wished to be able to process MRI images faster than their current method. I'm not sure this feature will be used by a public project in the near future.

 

 

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

----- Rom