Upgrading to 14.04.1 LTS or If It Ain’t Broke, Don’t Fix it

I should have left my Ubuntu 12.04 LTS well alone. Yes, it is over 2 years old, but it worked rock solid and I’ve been good about installing updates on it.

I don’t know what devil rode me last Friday, but when the system informed me that an upgrade to 14.04.1 LTS was available, I went ahead and gave it a try. I should have known better.

When the upgrade finished many hours later, POP access to the dovecot server was no longer working and rsync using modules was broken (rsync daemon not running). I had accepted all the defaults to keep existing configuration files during the upgrade. It turned out that dovecot needed some changes for namespace inbox:

namespace inbox {
...
inbox=yes
}

The rsync daemon needed to be manually enabled again via

sudo vi /etc/default/rsync

RSYNC_ENABLE=true

Hopefully I won’t stumble across more problems that will need fixing, but the experience was a reminder not to needlessly mess with a working system.

Adding sudo on Debian Linux

For a long time I had been using the sudo command on Ubuntu and other Linux versions, but my main server did not have it installed. I always had to use ‘su’ with the root password to do be able to do administrative jobs. It turns out it was really easy to fix. Simply follow these steps as root (using your actual user name in place of jsmith):

apt-get install sudo
adduser jsmith sudo

This installs the sudo package, creates a sudo user group and the /etc/sudoers configuration file. It then adds your user to the user group sudo, which per the default /etc/sudoers file is permitted to run sudo.

Note that these changes do not take effect for any ssh sessions already open. If you have a running session logged in as the user you just added to the sudoers list and you attempt to use sudo from there, it will ask for your password and then fail with this error message:

jsmith is not in the sudoers file. This incident will be reported.

The fix is simple: log out and log back in again. On the new login, the new configuration will be picked up and you will be able to use sudo as intended.

If you would like to do multiple commands from sudo like you could from su, it’s very easy. Simply use sudo to launch a copy of bash and exit after you’re done:

sudo bash

Acer One D260 system restore

The hard disk in my wife’s Acer One D260 netbook got damaged. A new hard disk is about a quarter the price of a new netbook, so I wanted to install a new drive. Like with most PCs these days there aren’t any Windows install DVDs included.

The netbook came with Windows 7 Starter, which we needed to somehow install on the new hard disk. Fortunately, the damaged hard disk was still limping along enough to use the Acer eRecovery system to create two Recovery DVDs. These should allow restoring the initial system state to a hard disk in the machine, wiping all the data on the drive.

To replace the hard disk, I had to undo seven clips around the edge of the keyboard, lift off the keyboard and disconnect the keyboard ribbon cable to the motherboard connector. Then I needed to undo 4 screws underneath and push through, to pop out the cover on the bottom of the machine. This opened access to the single memory slot and drive cage.

The 1 GB memory module on the motherboard can be replaced with a 2 GB PC3-8500 1066MHZ DDR3 module available for about $20. This is a wortwhile investment and I already have the module on order.

I replaced the damaged 250 GB WD Scorpio Blue drive with a spare 500 GB drive (available new for about $60-$80). Then I closed the cover and reinstalled the screws and then the keyboard.

With the new drive it was possible to boot off the first Recovery DVD using a USB DVD drive. The eRecovery software copied data from both DVDs to the hard disks and then rebooted. However, that reboot failed because the new drive did not yet have a Windows Master Boot record (MBR) on it. You can install an MBR from within Windows, but not from the bootable eRecovery DVD. So I had a chicken and egg problem.

I overcame this hurdle by booting off a Ubuntu Live DVD (32 bit), installing the ‘lilo’ package and telling it to install the Linux equivalent of Microsoft’s MBR code:

sudo apt-get install lilo
sudo lilo -M /dev/sda mbr

At the next attempt to boot off the hard disk, Windows started installing its components and drivers and launched into its initial configuration, just like the first time we had unboxed the machine more than two years ago. So we are back to a working Winmdows 7 machine!

Thank you, Linux — you saved my day again! 🙂

Western Digital 4 KB sector drive alignment for Windows XP and 2003 server

If your existing Windows XP or Windows 2003 Server machine needs a new C: drive, there are ways of upgrading to one of the latest drives without a complete software reinstall, but you may encounter some stumbling blocks due to the new Advanced Format technology, which uses 4 KB sectors.

When one of my PCs developed hard disk problems and I had to upgrade one of its drives, I also checked out my other machines. I found the C: drive of a Windows 2003 Server machine was about to fail. Windows 2003 is basically the server version of Windows XP, with which it shares most components. I opted for a 1 TB WD Red drive (WD10EFRX) by Western Digital, since these drives are designed for 24/7 operation, primarily for use in Network Attached Storage (NAS) appliances (desktop drives are only designed for an 8 hours on, 16 hours off use pattern).

I did not want to reinstall everything from scratch on that machine, so I used a Linux boot DVD and the GNU dd utility to mirror the failing drive onto the new WD Red drive (“sudo dd if=/dev/sda of=/dev/sdb”). As a result, all the partitions were in the same place and the same size as on the old drive, a Seagate Barracuda 7200.11 320 GB. The partitions on the old drive had not been aligned on 4 KB boundaries as is recommended to get decent performance on modern Advanced Format drives, so I needed to run an align tool to move the partition to the proper place. Western Digital offers one free to its customers, so that should be easy then, right?

No quite. I encountered all the troubles described by others in this thread: Basically, the download link for the WD Align tool (AcronisAlignTool_s_e_2_0_111.exe) takes you back to the same page, over and over, without error message. It turns out that you need to be registered and logged in to the WD site for the download link to do anything. You need to register both your contact details (name, e-mail address, postal address, phone number) and your hard disk’s serial number. For the latter I had to shut down the machine again and take out the drive once more to take a look, because the number is not printed on the cardboard box, only on the drive itself.

Once I registered my new drive, a download link did appear next to the registered product, but from it I found I could only download Acronis True Image and not the Acronis Align Tool (Advanced Format Software, WD Align). The WD Red series drives are all Advanced Format Drives, as is pretty much every drive made since 2011, but WD say it is designed for NAS use and hence don’t see the need for a fix for what they see as a Windows XP problem.

Various people online recommended a download site in Ukraine that apparently offers a copy of that program, but if you’re downloading from sites like that you risk installing malware on your computer. Beware!

There is a safer solution. I had to register another Western Digital drive, an old WD10EARS to get a usable download link for Advanced Format Software. If you don’t happen to have one lying around, a Google image search for WD10EARS will show you many photographs of disk drives with clearly readable serial numbers on the label. And apparently, these serial numbers will do the trick! 😉

After I downloaded the software, I ran it to make a bootable CD (it also seems to be Linux-based), booted and ran it and 1 hour and 30 minutes later my C: partition was showing up as properly aligned.

I can understand that Western Digital wants to restrict the use of licensed Acronis software to its own customers, denying other brands a free ride. However, the hoops it is making people jump through to be able to use one of their new drives as an upgrade to an existing Windows XP machine is just ridiculous. If a login is required to do the download, it should clearly say so. And if a drive uses 4 KB sectors (Advanced Format), its serial number should qualify you for the download. There are millions of existing XP users out there still and many will need new hard disks before they need a new computer.

Upgrading to a Western Digital WD20EFRX hard disk

All hard disks will die, sooner or later. They only way to avoid that is to retire a drive early enough. Often I upgrade drives because I run out of disk space, and migrate the data to a bigger drive. However, this times it looks like one of my drives is about to die.

Over the last couple of months, one of my PCs that is processing data 24/7 has been seizing up periodically, so I was starting to get suspicious about its hard drives (it has two of them). This week the Windows 7 event viewer reported that NTFS had encountered write errors on the secondary drive. It’s a Samsung SpinPoint F2 EG (Samsung HD154UI, 1.5 TB) which basically has been busy non stop for over three years.

I installed smartmontools for Windows and it showed errors:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 099 065 051 Pre-fail Always - 5230
(...)
13 Read_Soft_Error_Rate 0x000e 099 065 000 Old_age Always - 5223
(...)
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 12379
(...)
197 Current_Pending_Sector 0x0012 099 099 000 Old_age Always - 24

“Reported_Uncorrect” are fatal errors and “Current_Pending_Sector” are bad sectors the drive wants to replace with spare sectors as soon as it can. Neither is a good sign. So I have ordered a new drive, started a backup to another machine and will replace the drive with a new disk that I have ordered from Amazon.

The new drive is a 2 TB Western Digital WD20EFRX, which is part of WD’s “Red” series. These drives are specifically designed for 24/7 operation (as opposed for 8/5 office computers). The drive is 0.5 GB bigger, which is just as well as the old drive was getting close to filling up. Gradually I will be moving my processing to an Ubuntu server, which I already use as my main archive machine with a RAID6 drive array.

Good bye, Dennis Ritchie!

Back in the early 1980s I learnt programming in C by reading “Kernighan and Ritchie”, as everyone around me called this book then: “The C programming language” by Brian Kernighan and Dennis Ritchie.

It is no exaggeration to say that C and its derivatives are to computers what hydrogen is to the universe.

Dennis Ritchie, who passed away today at the age of 70, was a co-creator of both the C programming language and of the Unix operating system, after which open source Linux is modelled today. Mac OS X and iOS are direct descendants of Unix (NetBSD), while Android, which runs on millions of smart phones, is based on Linux. Virtually every operating system that matters these days (including all versions of Microsoft Windows) is written in C or C++ or another C-derived language.

Dennis Ritchie may not have become as much of a household name as Steve Jobs, but the software he created probably brought about much more fundamental changes than anything Steve Jobs did, and in fact most of what Jobs created would have been unthinkable without either C or Unix.

See also:

strcpy data corruption on Core i7 with Linux 64bit

If you’re C programmer, does this code look OK to you?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char* argv[])
{
  char szBuffer[80];
  strcpy(szBuffer, "abcdefghijklmnopqrstuvwxyz");
  printf("Before: %s\n", szBuffer);
  strcpy(szBuffer, szBuffer+2);
  printf(" After: **%s\n", szBuffer);

  return 0;
}

Here is the output on my server, a Core i7 running Debian 6:

Before: abcdefghijklmnopqrstuvwxyz
After: **cdefghijklmnopqrstuvwzyz

What the program does is dropping two characters from a text string in a buffer, moving the rest of it left by two characters. You expect the moved characters to stay in sequence, but if you compare the last three characters of the output you see that that isn’t the case. The ‘x’ has been obliterated by a duplicate ‘z’. The code is broken.

It’s a bug, and not a straightforward one, as I’ll explain.

I first came across it a couple of months ago, as I was moving some code of mine from an Athlon 64 Linux server to a new Intel Core i7 server. Subsequently I observed strange corruption in data it produced. I tracked it down to strcpy() calls that looked perfectly innocent to me, but when I recoded them as in-line loops doing the same job the bug went away.

Yesterday I came across the same problem on a CentOS 6 server (also a Core i7, x86_64) and figured out what the problem really was.

Most C programmers are aware that overlapping block moves using strcpy or memcpy can cause problems, but assume they’re OK as long as the destination lies outside (e.g. below) the source block. If you read the small print in the strcpy documentation, it warns that results for overlapping moves are unpredicable, but most of us don’t take that at face value and think we’ll get away with it as long as we observe the above caveat.

That is no longer the case with the current version of the GNU C compiler on 64-bit Linux and the latest CPUs. The current strcpy implementation uses super-fast SSE block operation that only reliably work as expected if the source and destination don’t overlap at all. Depending on alignment and block length they may still work in some cases, but you can’t rely on it any more. The same caveat theoretically applies to memcpy (which is subject to the same warnings and technically very similar), though I haven’t observed the problem with it yet.

If you do need to remove characters from the middle of a NUL terminated char array, instead of strcpy use your own function based on the memmove and strlen library functions, for example something like this:

void myStrCpy(char* d, const char* s)
{
  memmove(d, s, strlen(s)+1);
}
...
  char szBuffer[80];
...
  // remove n characters i characters into the buffer:
  myStrCpy(szBuffer+i, szBuffer+i+n);

I don’t know how much existing code the “optimzed” strcpy library function broke in the name of performance, but I imagine there are many programmers out there that got caught by it like I was.

See also:

Gateway M-6750 with Intel Ultimate-N 6300 under Ubuntu and Vista

My Gateway M-6750 laptop uses a Marvell MC85 wireless card, for which there is no native Linux driver. Previously I got it working with Ubuntu 9.10 using an NDIS driver for Windows XP. Recently I installed Ubuntu 11.04 from scratch on this machine (i.e. wiping the Linux ext4 partition) and consequently lost wireless access again.

Instead of trying to locate, extract and install the XP NDIS driver again, this time I decided to solve the problem in hardware. Intel’s network hardware has good Linux support. I ordered an Intel Centrino Ultimate-N 6300 half-size mini PCIE networking card, which cost me about $35. Here is how I installed it.

Here is a picture of the bottom of the laptop. Remove the three screws on the cover closest to you (the one with a hard disk icon and “miniPCI” written on it) and open the cover. Use a non-magnetic screwdriver because the hard disk is under that cover too. As a matter of caution, use only non-magnetic tools near hard disks or risk losing your data.

Remove the screw that holds the MC85 card in the mini PCI slot on the right. Remove the network card. Carefully unplug the three antenna wires. Connect those wires to the corresponding locations on the Intel card. Insert the Intel card into the socket on the left. Note: I had first tried the Intel card in the socket on the right but in that case it always behaved as if the Wireless On/Off switch was in the Off position, regardless of its actual state. Even rebooting didn’t make it recognize the switch state. The left mini PCI socket did not have this problem 🙂

Because the Intel card is a half size card you will also need a half size to full size miniPCI adapter to be able to screw down the card to secure it. Instead I simply used a stiff piece of cardboard (an old business card) to hold it in place and closed the cover again. If you take your laptop PC on road a lot I recommend doing it properly (don’t sue me if the cardboard trick melts your motherboard or burns down your house).

Download the Intel driver and utility set for Windows from the Intel website using a wired connection. Under Ubuntu the card seemed to work first time I rebooted into it. I just had to connect to the WLAN.

UPDATE:

I fixed it properly using a half size to full size Mini PCI-E (PCI Express) adapter converter bracket by Shenzhen Fenvi Technology Co., Ltd. in Guangdong. I had found it on Alibaba. I paid $9.50 by Paypal and a bit over a week later five sets of brackets and matching screws arrived by mail from Hong Kong (one set is only $1.90 but the minimum order was 5, so that’s what I ordered). The brackets come with about a dozen each of two kinds of screws. Four of the smaller screws worked fine for me.

VIA PC3500 board revives old eMachines PC

Last September one of my desktop machines died and I bought a new Windows 7 machine to replace it. Today I brought it back to life again by transplanting a motherboard from an old case that I had been using as my previous Linux server. The replacement board is a VIA MM3500 (also known as VIA PC3500), with a 1.5 GHz VIA C7 CPU, 2 GB of DDR2 RAM and on-board video. It still has two IDE connectors as well as two SATA connectors, allowing me to use both my old DVD and parallel ATA HD drives, as well as newer high capacity SATA drives.

After the motherboard swap I had to reactivate Windows XP because it detected a major change in hardware. Most of the hardware of the new board worked immediately, I could boot and had Internet access without any reconfiguration. When I started with the new machine. I just had to increase video resolution from the default 640×480 to get some dialogs working.

I then downloaded drivers for the mother board and video from the VIA website. I now have the proper CN896 (Chrome IGP9) video driver working too.

When I tested the board as a server with dual 1 TB drives (RAID1), it was drawing 41W at idle. Running in my eMachines T6212 case with a single PATA hard drive it draws 38W at idle.

Before removing the old motherboard I made a note of all the cable connections on both motherboards. The front-mounted USB ports and card reader have corresponding internal cables, which connected to spare on-board USB connectors. The analog sound connectors connect to the motherboard too. The only port at the front left unconnected was the IEEE-1394 (FireWire / iLink) port, which has no counterpart on the VIA board.

It feels great to have my old, fully configured machine with all its data and applications back thanks to a cheap motherboard that works flawlessly.

Ubuntu 11.4, GA-H67MA-UD2H-B3, EarthWatts EA-380D, Centurion 5 II, 5K3000

CoolerMaster Centurion 5 II

It’s been 2 months since I have written a blog post that wasn’t about the Tohoku earthquake and tsunami or the Fukushima 1 nuclear disaster, but today I am taking a break from those subjects. The reason is that I replaced my local Ubuntu server with newer hardware. The primary requirements were:

  • GNU/Linux (Ubuntu)
  • Reasonably low power usage
  • Large and very reliable storage
  • Affordability

I was considering boards ranging from the new AMD Zacate E-350 dual core to LGA-1155 (“Sandy Bridge”) boards with the Core i5 2500K. First Intel’s P67/H67 chip set problems and then the disaster in Japan prompted me to postpone the purchase.

Finally I picked the GigaByte GA-H67MA-UD2H-B3, a MicroATX board with 4 SIMM slots in conjunction with the Core i3 2100T, a 35W TDP part with dual cores and 4 threads. The boxed version of the Intel chip comes with a basic fan that didn’t sound too noisy to me. I installed two 4 GB DDR3 modules for a total of 8 GB of RAM, with two slots still available. When you install two memory modules on this board you should install them in memory slots of the same colour (either the blue or the white pair) to get the benefit of dual channel.

Gigabyte GA-H67MA-UD2H-B3

I chose a H67 board because of the lower power usage of the on-chip video and the 2100T has the lowest TDP of any Core 2000 chip. I don’t play games and my video needs are like for basic office PCs. Unlike P67 boards, H67 boards can not be overclocked. If you’re a gamer and care more about ultimate performance than power usage you would probably go for a P67 or Q67 board with an i5 2500K or i7 2600K with a discrete video card.

To minimize power use at the wall socket I picked an 80 Plus power supply (PSU), the Antec EarthWatts EA-380D Green. It meets the 80 Plus Bronze standard, which means it converts AC to DC with at least 82% efficiency at 20% load, at least 85% load at 50% load and at least 82% at full load. It’s the lowest capacity 80Plus PSU I could find here. 20% load for a 380W PSU is 76W. Since the standard does not specify the efficiency achieved below 20% of rated output and typically efficiency drops at the lower end, it doesn’t pay to pick an over-sized PSU.

Disk storage is provided by four Hitachi Deskstar 5K3000 drives of 2 TB each (HDS5C3020ALA632). These are SATA 6 Gbps drives, though that was not really a criterium (the 3 Gbps interface is still fast enough for any magnetic disks). I just happened to find them cheaper than the Samsung HD204UI that I was also considering and the drive had good reports from people who had used them for RAID5. The 2TB Deskstar is supposed to draw a little over 4W per drive at idle. I don’t use 7200 rpm drives in my office much because of heat, noise and power usage. Both types that I had considered have three platters of 667 GB each instead of 4 platters of 500 GB in older 2 TB drives: Fewer platters means less electricity and less heat. A three platter 2 TB drive should draw no more power than a 1.5 TB (3×500 TB) drive.

There are “enterprise class” drives designed specifically for RAID, but they cost two to three times more than desktop drives — so much for the “I” in RAID that is supposed to stand for “inexpensive”. These drives support a special error handling mode known as CCTL or TLER which some hardware RAID controllers and Windows require, but apparently the Linux software RAID driver copes fine with cheap desktop drives. The expensive drives also have better seek mechanisms to deal with vibration problems, but at least some of those vibration problems are worse with 7200 rpm drives than the 5400 rpm drives that I tend to buy.

Motherboard, PSU and 4 RAID drives in case

The case I picked was the CoolerMaster Centurion 5 II, which as you can see above is pretty large for a MicroATX board like the GA-H67MA-UD2H-B3, but I wanted enough space for at least 4 hard disks without crowding them in. Most cases that take only MicroATX boards and not full size ATX tend to have less space for internal hard disks or squeeze them in too tightly for good airflow. This case comes with two 12 cm fans and space to install three more 12 or 14 cm fans, not that I would need them. One of these fans blows cool air across the hard disks, which should minimize thermal problems even if you work those disks hard.

One slight complication was that the hard disks in the internal 3 1/4″ slots needed to be installed the opposite way most people expect: You have to take off both covers of the case, then connect power and SATA cables from the rear end (view to bottom of the motherboard) after sliding the drives in from the front side (view to top of motherboard). Once you do that you don’t even need L-shaped SATA cables. I could use the 4 SATA 6 Gbps cables that came with the GigaByte board. Most people expect to be able to install the hard disks just opening the front cover of the case and then run into trouble. It’s not a big deal once you figure it out, but quite irritating until then.

4 RAID drives in case

I installed Ubuntu 11.4, which has just been released, using the AMD64 alternate CD using a USB DVD drive. I configured the space for the /boot file system as a RAID1 with 4 drives and the / file system as a RAID6 with 4 drives with most of the space. Initially I had problems installing Grub as a boot loader after the manual partitioning, but the reason was that I needed to create a “bios_grub” partition on every drive before creating my boot and data RAID partitions.

RAID6 is like RAID5 but with two sets of parity data. Where the smallest RAID5 consists of three drives, a minimal RAID6 has four, with both providing two drives’ worth of net storage space. A degraded RAID6 (i.e. with one dead drive) effectively becomes a RAID5. That avoids nasty surprises that can happen with RAID5 when one of the other disks goes bad during a rebuild of a failed drive. If you order a spare when you purchase a RAID5 set and plan to keep the drive in a drawer until one of the others fails, you might as well go for a RAID6 to start with and gain the extra safety margin from day 1.

I had problems getting the on-board network port to work, so I first used a USB 2.0 network adapter and later installed an Intel Gigabit CT Desktop Adapter (EXPI9301CT). With two network interfaces you can use any Linux machine as a broadband router, there are various pre-configured packets for that.

While the RAID6 array was still syncing (writing checksums computed from data on two drives to two other drives) and therefore keeping all disks and partly the CPU busy the machine was drawing about 58W at the wall socket, as measured by my WattChecker Plus. Later, when the RAID had finished rebuilding and the server was just handling my spam feed traffic, power usage dropped to 52W at the wall socket. That’s about 450 kWh per year.

The total cost for the server with Core i3 2100T, 8 GB DDR3 RAM (1333), H67 MicroATX board, PCIe Ethernet card, 4 x 2 TB SATA drives, case and 380W PSU was just under 80,000 yen including tax, under US$1,000.