Amazon.com Widgets

IO performance monitoring

If you can measure it, you can manage it. I’m a measurement, monitoring, analysis and statistics addict :-)

That’s why I’ve always wanted to be able to monitor the IO load of the Linux systems I’ve worked with. While there are well established monitoring and accounting tools for the CPU usage – both system wide and per process – there were virtually none for the IO system until very recently.

Two of the more important reasons why I’d like to see better IO load monitoring are:

  • The mechanical drives have big latency. In general the CPU feels much better than the disks when overloaded. For example if load average 10 is caused by CPU bound processes the system feels much more responsive than the same load but caused by IO bond processes. CPU load average 10 on a server system with two processors isn’t very noticeable. At the same time IO load average of 10 on the same system with 2x 7200 rpm disk drives in RAID1 feels very sluggish.
  • The hard disk drives failed to keep up with the performance improvements in microprocessor technologies. Disk capacity has grown quite well, but the speed and especially access times are far behind. The IO performance is the most common bottleneck and most precious resource in today’s systems. Or at least the systems I work with :-)
  • At the beginning of my Linux career, ten years ago, there was only one metric – blocks read/written. And that’s it. How busy the disk is you can guess only by looking at load average and checking how many processes are stuck in D state. I wish there are separate load average readings for CPU and IO…

    At some point (linux 2.5 times?) extended statistics were added and things like queue size, utilization in % etc. became available. Much better. Still it was hard to tell who exactly is causing the load. If we speak of multi user system all you can see is multiple processes in D state. It’s unclear whether these are the ones causing the IO havoc or just victims of the already overloaded IO subsystem waiting.

    In Linux 2.6.20 another step was made by adding per process IO accounting. I was very excited when I heard about this feature and eager to try it. It turned out that this per process IO accounting counts only the bytes read/written by a process. Not that better. A modern 7200 rpm SATA drive is only capable of about 90 IOPS so it could be choked with the pathetic 90 bytes per second…

    Then there are the atop patches. These add per process IO occupation percentage. That sounds great but… when you have a lot of small random writes they go to the page cache first and only then are periodically flushed to the physical device. This is performance feature and is generally a (very) good thing as it allows the elevators to group writes together etc. Unfortunately, atop ends up accounting all these writes and IO utilization to pdflush and kjournald.

    Ok, lets see what’s the state of the affairs in some other operating system. Everybody talks about dtrace so it’s time to check it out. Linux doesn’t have dtrace. At least yet. There is work in progress by Paul Fox. On the other hand Linux has system tap but it doesn’t look very mature to me. Anyway, there are number of operating systems that support dtrace: as it is create by Sun engineers first come Solaris and OpenSolaris. Then there is the FreeBSD port and Apple OS X. I’m familiar with FreeBSD but I wanted to check the current state of OpenSolaris kernel. On the other hand I wanted to keep the learning curve less sloppy, so I opted for Nexenta core 2 rc1. Nexenta is GNU userspace (Debian/Ubuntu) and OpenSolaris kernel.

    Download, install – everything was smooth. The install defaulted to root fs on ZFS. Good! I was thinking about playing with ZFS these days anyway.

    And the moment of truth:

    I started dbench -S 1, run dtrace -s iotop.d and here’s the output:

      UID    PID   PPID CMD              DEVICE  MAJ MIN D   %I/O
        0      0      0 sched            cmdk0   102   0 W     17
    

    Hm, that looks somewhat familiar. I see a pattern there. Isn’t sched the ZFS cousin of pdflush/kjournald? Oh, well it is: http://opensolaris.org/jive/thread.jspa?threadID=39545&tstart=285

    No luck… dtrace’s iotop works with UFS but has problem with ZFS.

    Turns out the proper IO monitoring is a very tricky business.

    Popularity: 100% [?]

Noise canceling

Wearing noise canceling headphones in a noisy data centre.

Listening to industrial/ebm music.

Sounds weird.

Popularity: 67% [?]

Tasks of the Day

So, as I was not very happy with how my projects were advancing I had to research and implement various systems to help me move forward. After much fiddling GTD was the first breakthrough. It helped me to get my current affairs in order and gave me the peace of mind to go to the upper levels. I identified my big goals and sought to align my actions with these goals. I learned to distinguish between task’s urgency and importance. Things started to gradually get better but still not good enough. I was still missing something… and it turned out that my daily routine is weak and allows me to procrastinate :-) Basically I didn’t have a daily routine. I used to just open the list with tasks and diligently delay the more difficult looking ones until they fall off the current day and were left off for the next day. Probably these are just my personal flaws but the good news is there are fixes. The concept of Most Important Tasks of the day saved the day.

I was doing my daily task lists for some time when I stumbled upon the term MIT on the Zen Habits blog and I adopted it (I’m not a native English speaker).

Essentially you need to pick a bunch of tasks to do for the day. These are the Most Important Tasks. Of course you may do other things as well but you should throw all of your energy at completing these MITs. Make separate list with these tasks and stick to it.

Start with the most difficult or daunting task. This is important. Direct your efforts at the task you need most energy to accomplish. Keep an eye on the other important and urgent tasks. But you have good chance to do these later if they are easier and not that scary as the MITs. Thanks to my GTD based approach my tasks already have attributes like Energy (mental and physical), Importance, Urgency, Context and Time (ETA) so it’s rather easy for me to sort and choose.

It’s important to start with the tasks that require most energy because with the advancement of the day most people’s energy levels start to drop. If you delay the MITs too much you will not have the energy to start or complete them.

For example I discovered that my ability to concentrate varies greatly throughout the day. Even though I might think that in the evening I’m at the same energy level as in the morning I can easily prove myself wrong. Just have to try to focus on something more difficult while there is some distraction like say TV. I do ignore distractions much more easily in the morning. And it’s nearly impossible for me to do the same in the evening.

Because MITs list is small it allows for better focus. Most people have tens if not hundred of tasks in their lists (or worse – in their heads). If you keep this enormous pile of tasks in front of you it easily makes you feel overwhelmed and hesitant to start working on it.

The process of choosing tasks for the MITs list is essentially a planning process. The usual disclaimer about plans apply: circumstances may force you to abandon your plan but the planning process is important. Planning forces you to do the required thinking. As the saying goes: the failure to plan is a plan for failure. By keeping the more important tasks first you have the chance to complete them even if you’ve underestimated how much time would they take.

And finally the MITs list is a commitment. You bind yourself to the course of action.

For example that’s how I do it:

First I identify the hard landscape for the day. Are there any tasks that must be done at a specific time? Any meetings? At very least there’s your lunch and it’s definitely important.

After putting my hard landscape on the calendar I’m ready to distribute other tasks between the fixed ones. I keep in mind the context, energy levels, urgency and importance. You can’t do a task if you are not in the right context. And it’s a whole lot better to do some hard work that early in the day when you are still fresh. Don’t overcommit! You can always pick some more things to do if you finish earlier but people tend to underestimate the time required to complete a task. And in my case new things pop-up every now and then during the day.

It sounds like common sense but unfortunately it took me some time to figure out and more importantly to establish the habit. I was underestimating how important it is until I forced myself to include it in my daily routine and never miss it. Almost every time I skip it I got sloppy results. Of course the GTD still applies: if you find yourself stuck somewhere or your energy level drops dramatically for some reason – you can always pick another Next Action that matches your current context or energy level.

Popularity: 64% [?]