disk, disk, disk

I started adding 165 TB of disk to one of our clusters today. This is what that looks like – 55 three TB disks:

165 TB

The packaging was not too great; while all disks were well packaged individually, the big boxes that contained the individual drive boxes were flimsy. As a consequence, one of the disks got rather damaged (the one on the right):

damaged disk

I don’t know what it got hit with, but it must have been a pretty serious blow. The aluminium enclosure of the drive is severely dented and even cracked; the white line in the image below is an actual crack in the metal:

cracked disk

Back in October 2009 I added 130 TB of disk to another cluster, which looked like this, prior to install:

130TB

That was 65 times WD2002FYPS.

So this time around – almost 18 months later – we get 27% more capacity using 15% fewer drives. Got to love the computer industry and the progress it makes.

Posted in Hardware, Sysadmin | Leave a comment

idle power draw of modern Opteron CPUs

I’ve been curious for a while about how much power Opteron CPUs draw when idle, so I set aside a bit of time to do some measurements. I used a Supermicro 1U system with redundant power supply. The motherboard model is H8DGU-F. The system has 32GB of DDR3 ECC ram, and two Intel X25-M 120GB SSDs. There are two Opteron 6128 CPUs installed. These Opterons have 8 cores each, and they run at 2.0GHz. These are the CPU power specs:

  Average CPU Power 80W
  Thermal Design Power (TDP) 115W

The ‘Average CPU Power’ is based on ‘average’ use, which is explained on Wikipedia.

According to Wikipedia, the Thermal Design Power is the maximum power consumption for thermally significant periods running worst-case non-synthetic workloads (cf. this article). If we assume that the bulk of the electrical power consumed by a CPU is converted into waste heat, then the TDP can be a reasonable approximation for the amount of electrical power a CPU would consume under a worst-case, real-world load.

I used cpuburn to generate such a load. There was no IO load on the system during the tests. I measured power draw with an off-the-shelf Kill-a-watt, so these results should be taken with a grain of salt.

  16 cores idle 145W (153VA)
  8 cores loaded on 1 cpu 215W (221VA)
  8 cores loaded spread over 2 cpus 235W (243VA)
  14 cores loaded 277W (285VA)
  16 cores loaded 290W (297VA)

The data indicates that the idle vs. full load power consumption difference for one CPU is 70 to 75W.

If we assume the power consumption under full load is 115W (the TDP for the processor), then idle power consumption would be 40 to 45W per CPU. That would put idle power consumption at 35-39% of its TDP for this particular CPU.

Posted in Environment, Sysadmin | 3 Comments

acts_as_paranoid and acts_as_versioned on Rails 3

A few years ago, I described how to combine acts_as_paranoid and acts_as_versioned in order to make deleted records end up in your versioning tables.

In order to do the same thing under Rails 3, I had to make a few adjustments. First of all, you need the rails3_acts_as_paranoid gem, which is a total rewrite of acts_as_paranoid for rails 3. Add these lines to your Gemfile:

gem 'rails3_acts_as_paranoid'
gem 'acts_as_versioned'

Then put a file in config/initializers with these contents:

module ActiveRecord
  module Acts
    module Versioned
      def acts_as_paranoid_versioned(options = {})
        acts_as_paranoid
        acts_as_versioned options

        # Override the destroy method. We want deleted records to end up in the versioned table,
        # not in the non-versioned table.
        self.class_eval do
          def destroy()
            with_transaction_returning_status do
              run_callbacks :destroy do
                # call the acts_as_paranoid delete function
                self.class.delete_all(:id => self.id)

                # get the 'deleted' object
                tmp = self.class.unscoped.find(id)

                # run it through the equivalent of acts_as_versioned's
                # save_version(). We used to call that function but it is a
                # noop when @saving_version is not set. That only gets done in
                # a protected function set_new_version(). Easier to just
                # replicate the meat of the save_version() function here.
                rev = tmp.class.versioned_class.new
                clone_versioned_model(tmp, rev)
                rev.send("#{tmp.class.version_column}=", send(tmp.class.version_column))
                rev.send("#{tmp.class.versioned_foreign_key}=", id)
                rev.save

                # and finally really destroy the original
                self.class.delete_all!(:id => self.id)
              end
            end
          end
        end

        # protect the versioned model
        self.versioned_class.class_eval do
          def self.delete_all(conditions = nil); return; end
        end
      end
    end
  end
end

I wonder if there is a more elegant way to achieve this…

Note: code updated at 2011-05-28 to make sure :dependent => :destroy on has_many associations does the right thing.

Posted in Rails | 1 Comment

compression

Before:

-rw-r--r-- 1 root  root  1.1G 2010-10-31 20:19 10125-127-2010-10.error

After:

-rw-r--r-- 1 root  root   11M 2010-10-31 20:19 10125-127-2010-10.error.bz2

Bzip2 reduced the file to 1% of its original size. Not bad!

Posted in Sysadmin | Leave a comment

resistor captcha

Adafruit Industries uses an awesome captcha. For an example, look at the Kinect bounty page (scroll all the way to the bottom).

resistor-captcha

Posted in Hardware | Leave a comment

64K hours

Some disks last a long time. This is an old IBM IDE drive (IC35L040AVVA07-0).

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        66         -
# 2  Extended offline    Completed without error       00%     65434         -
# 3  Extended offline    Completed without error       00%     65266         -

Interestingly, the Power_On_Hours field did not wrap. Bug in smartctl? Bug in the drive firmware?

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   095   095   060    Pre-fail  Always       -       458761
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       210
  3 Spin_Up_Time            0x0007   105   105   024    Pre-fail  Always       -       160 (Average 154)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       27
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   111   111   020    Pre-fail  Offline      -       43
  9 Power_On_Hours          0x0012   091   091   000    Old_age   Always       -       65616
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       27
192 Power-Off_Retract_Count 0x0032   099   099   050    Old_age   Always       -       1387
193 Load_Cycle_Count        0x0012   099   099   050    Old_age   Always       -       1387
194 Temperature_Celsius     0x0002   189   189   000    Old_age   Always       -       29 (Lifetime Min/Max 21/39)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
Posted in Sysadmin | Leave a comment

microsoft discovers remote attestation

Via slashdot: Microsoft’s corporate VP for trustworthy computing – Scott Charney – has published a position paper that boils down to remote attestation: let ISPs cut off internet access for computers that are not deemed free of malware.

So… how would this work? Presumably the computer would run some code that is not under the control of the user/owner of the machine, and protected by the TPM module. That code would then validate if the machine is free of malware or not – somehow. I have no idea how that could possibly be foolproof, but let’s assume for a moment there is a way to do this.

First problem: your computer would have to run code that most likely comes without source, is hard or impossible to inspect, and cannot be changed.

Let’s say for the sake of argument that this validation code is somehow optional. Or perhaps you are an enterprising person, and you’ve managed to kick this stuff off your computer (TPM-ectomy, anyone?). Next problem: now you can’t validate your computer with your ISP to prove that it is free of malware. To do that, you need access to the secret encryption key buried in the TPM.

This is called remote attestation: the machine(s) your computer communicates with can see information about your computer – say, what operating system you run, and what patch level – and because that data is signed or encrypted by your TPM chip, you can not change it.

Note that it’s already pretty easy for remote machines to see what (version of) an operating system a computer runs, for instance with TCP/IP fingerprinting, but that is easy to fake.

Remote attestation is the real danger of ‘trustworthy’ computing. They can try to put all sorts of things in the hardware; if people have physical access, someone will find a way around it. But if they make it impossible to network your computer without an operational TPM chip then we might as well kiss all our free software and free hardware goodbye. It won’t be any good to run a computer with GNU/Linux, if we can’t go online with it… Or if our online banking refuses to talk to our computer because our machine is not deemed to be running a fully patched version of Windows.

Given that this position paper comes from Microsoft, it’s not too hard to see where they want to go. Microsoft would love to be in a position where ISPs and banks require certain patchlevels of its software. Can you imagine a better way to force people to keep upgrading their Windows licenses? Or to force people to stop using free operating systems?

I have a better idea to combat the malware problem, mr. Charney. Why don’t we ask people to stop using Windows. Without Windows, the malware/botnet problem would not be nearly as bad as it is today.

Posted in Free Software/Open Source, Hardware | Leave a comment

Intel selling crippled CPUs

Via boingboing.net: Intel is now selling crippled CPUs that can be ‘upgraded’ through the purchase of scratch cards (!) with a code. That code can be entered in the BIOS of the computer, thus unlocking additional horsepower.

I’m running out of CPU – quick, head over to the corner store for an Intel scratch card!

Is this an alternate universe? And, how long before they sell cards that will unlock extra features, but only for a limited time?

I guess now we know at least one concrete reason why Intel does not like coreboot. You can’t restrict people like this when their computer does not have a proprietary BIOS.

Posted in DRM, Hardware | Leave a comment

View from my ‘office’

Lac Leman

Sadly, this is just for a few weeks…

Posted in Personal | Leave a comment

failed US broadband policy – excellent lecture by Larry Lessig

There is a good article over at PCWorld titled Why America’s Telecom System Stinks. It refers to a lecture by Larry Lessig which you can view in its entirety here. The first 25 minutes or so deal with broadband policy.

I wish lawmakers took the time to view that lecture. Maybe they would come to understand the problem then – barring a few exceptions it seems that most high up in government do not have the faintest clue that broadband policy in the US is fundamentally broken.

It’s pretty bad when a big honcho from the FCC publicly states that he is not interested in returning to a competitive broadband landscape by reinstituting unbundling because it would result in lengthy legal battles with the big telcos. Can you say regulatory capture?

Posted in Broadband | Leave a comment