How to put Coreboot on Wyse S30/S50

This post is long overdue following the teaser back in January but unfortunately I have been swamped with work so the following instructions are not as thorough as I would have liked and come with a ‘some assembly required’ sticker.

Is it worth it?

Coreboot support of this board is far from perfect. The following stuff does not work:

  • ACPI
  • early VGA (VGA ROM) (?)
  • serial port IRQ may or may not conflict with EHCI IRQ
  • power button, LED and beeper

Lack of ACPI means that there is no S5 and as such you can’t turn off (soft off) the device but merely halt it. Without early VGA support you are left with a blank screen until the kernel takes over from coreboot/SeaBIOS (assuming you put gxfb module in your initramfs or compiled it in).

However, Coreboot does fix problems encountered with the original manufacturer’s BIOS:

  • booting from USB and virtually anything else supported by Coreboot/SeaBIOS payloads
  • large USB drives work
  • no PCI subsystem hangs on boot
  • normal ATA access with pata_cs5536 kernel module when booted from USB

Still interested? Very well…

What am I going to need?

  • Wyse S30/S50 Revision 02L. Rev. 02 (with soldered-on RAM) is not tested.
  • Linux installation capable of booting with the original WYSE BIOS
  • SST49LF080A PLCC flash memory chip (about 4 euros in your corner electronics store or free if sampled from Microchip)
  • flashrom >= 0.9.5
  • coreboot around commit eb84f6a978147fbe543fbe15af254632f215098a (I can’t vouch for current trunk to be working)
  • GX2 VSA binary blob
  • (optionally) a null modem cable for rs-232 debugging
  • (optionally) PLCC extractor or a steady hand

GX2 VSA (Virtual System Architecture) is a low-level system library that needs to be compiled with an antediluvian and esoteric toolchain from sources that are almost nowhere to be found and embedded within the coreboot image (and before you ask, OpenVSA never took off and is now defunct). A fearless person under the name of Nils Jacobs from the coreboot mailing list hacked together a working VSA binary for me but asked me not to distribute it as it was a bit buggy. His intention was to get it into a working shape and contribute to the coreboot upstream but I have not seen him post on the mailing list in a while. I recommend jumping straight to the mailing list and asking about his progress. Don’t even try building coreboot without the VSA code as it won’t work.

 Are you still reading? Got everything? Okay…

Let’s do this.

Disclaimer: the following instructions are general and require understanding of what you actually do. Don’t begin before you are entirely familiar with the topic at hand.

  1. Build the coreboot image. Select S50 as the mainboard model (S50 and S30 are identical), 1MB for ROM size, add a VSA binary image (make sure the path is correct), enable serial console, add a SeaBIOS payload and it might just work.
  2. Open up your Wyse thin client (you need to remove the feet first).
  3. Boot Linux on the client and install flashrom.
  4. Back up the original BIOS with flashrom.
  5. Hot-swap the Wyse BIOS with a blank SST49LF080A chip. This is safe as BIOS is only read during the boot time and immediately saved to RAM. The flash chip is not accessed during normal operation. If you don’t have a PLCC extractor consider super-gluing a push pin to the chip.
  6. Write your coreboot image to the blank chip with flashrom.
  7. Optionally: modify your bootloader and Linux installation to enable a serial console. Attach a null-modem cable and open a console on the second PC.
  8. Cross your fingers and reboot.

If you did everything right you will see something like this on your serial console. Good luck!

Watching high-definition video from a NAS over WiFi

So you want to watch HD videos stored on a network-attached storage in your wireless local area network. Getting this right is actually not that trivial. There are at least four potential points of failure:

1. The media player does not cache files.

For the bulk of SD/720p content the choice of a media player is usually not that important. However, for 720p video with sudden changes of bitrate or 1080p you need to keep a small cache of the file being played (for reasons including but not limited to latencies and packet retransmission).

Interestingly, MPC-HC fails in this regard (#218, #1264). It works better or worse depending on the amount of buffering (often hardcoded) in the media splitter. For now, I am going to tentatively recommend VLC 2.0 which detects if a file is played from a network share and adjusts the buffer size accordingly. Alternatively, MPC-HC with LAV Filters works well most of the time.

2. You cannot play the videos locally to begin with.

Most netbooks and very old Celeron/Core2Duo systems will lack the required processing power to play back a high-bitrate (>20mbps) video file. If possible, take advantage of hardware decoding technology such as DXVA. Verify that you can successfully play high-bitrate videos from the hard drive without heavy CPU usage and stuttering before moving on.

3. The NAS is not up for the task.

Usually not a problem if you use a commercial NAS. However, if you use an ancient or esoteric device turned into a NAS with NTFS USB hard drives attached you need to make sure it is actually able to send the file at a sufficient rate. To test this, connect both the NAS and the client (e.g. laptop that will later connect to the wireless LAN) to your router with a CAT5 Ethernet cable (1ooBASE-TX) and measure copy transfer speed from a network share. Preferably you want to see over 10MB/s (80 mbps) with a low CPU load on the NAS.

4. Your WLAN is not set up correctly.

802.11g provides a theoretical throughput of 54 mbps. However, if you take into account the overhead of the transmission protocol and latencies you are looking at about 21-25 mbps of maximum practical throughput. This will generally suffice for standard definition and some 720p videos but 1080p is mostly out of the question. With that in mind, make sure that your signal is strong and stable and you are using an empty WLAN channel  (using e.g. Inssider). If necessary, identify and eliminate sources of interference such as wireless keyboards, bluetooth headsets, microwave ovens or clueless neighbours.

802.11n is usually advertised to work at 300 mbps. However, most of consumer-grade 802.11n networks operate at a theoretical throughput of only 72 mbps (and with real-world performance at about 60% or 40-42 mbps). Here’s why.

For a full theoretical throughput of 300 mbps to work two conditions need to be met:

1. Two spatial streams must be supported by both the router and the client

This requires that both a transmitter and a receiver have two antennas one for each stream. This cuts out a significant number of netbooks and laptops right away. Moreover, it requires duplicate electronic logic (radio frequency chains and analog-to-digital converters) which translates into higher implementation costs. As a result, many cheap WLAN cards only support a single stream (I am looking at you, Realtek).

2. Two non-overlapping channels must be available for transmission (AKA “wide 40MHz channel”).

Most of consumer-grade 802.11n networks operate in 2.4GHz. And two empty non-overlapping channels are not that easy to come by in this band. For practical reasons you are mostly limited to the following combinations: 1+5, 6+2, 6+10 and 7+11. Check your airspace with e.g.  Inssider and pick a pair of channels that do not overlap with other networks (note: in a city it may be next to impossible). Configure your router and again with Inssider verify that both channels are being used. If your hardware detects interference it may back down to a single channel.

Now, if you meet only one of the above conditions you end up with 150 mpbs of theoretical and about 70-75 mpbs of practical throughput. This is sufficient for most of 1080p videos.

And there you have it. With all four points taken care of you should be able to watch videos directly from your NAS over WiFi.

 

Running Debian GNU/Linux on WYSE S30/S50

updated March 2012

I have recently purchased a second-hand WYSE S30 thin client for a paltry 12 euros with a plan to use it as my NAS/torrent box and retire Neoware e90. It turned out to be one annoying little bugger. Let’s have a look at the hardware first.

Hardware

Motherboard: Custom, 02L revision. 44-pin IDE connector with a 64MB Disk-On-Module, SO-DIMM memory slot, beeper, RTL8100C 100Mbit ethernet chip (8139too), cpu, chipset, BIOS. It is remarkably small as the entire device measures just 17.5cm x 3cm x 12.5cm and is actually smaller than the 3’5″ USB drive attached to it. The entire platform uses only about 6W of power, less than a half of what my previous box used.

Processor: AMD Geode GX 500 @366MHz with a staggering 32KB of cache. This is technically an i686 processor which does not support the NOPL instruction. This is not an issue unless you happen to use an unlucky version of binutils. CMOV and a handful of others flags are supported. This architecture can be actually traced back to National Semiconductor GX2 (2002) which was derived from Cyrix MediaGXm which debuted in… 1997.

Chipset: CS5536 Companion Device. AC97 audio, virtual 66MHz PCI bus, ATA-6 IDE, EHCI/OHCI USB. Connects with the CPU over a shared 66MHz PCI bus (“Geode Link PCI Southbridge”) which has rather sad implications for I/O-intensive operations.

RAM: 128MB of user-replaceable 200-pin SO-DIMM DDR SDRAM. Some people got 512MB running in a single bank.

BIOS: 256KB SST49LF020A. To get in, hold Del when the device is powering off, and still holding Del turn it on. You can only change the boot order (IDE/USB/PXE) and password. The default password is ‘Fireport’.

Software

Bootloaders & Kernels: Good grief. Arch Linux and Ubuntu server installers both failed. Debian installed correctly but grub2 failed to boot. So did grub-legacy. I finally managed to boot using LILO (config here) with lba32 and compact enabled only to find out that the kernel is ‘unable to enumerate USB device’ once I plug in my keyboard. Finally, using Wheezy with the 3.0 kernel resulted in a working installation.

Five caveats you should be aware of:

1. The BIOS disables ATA when set to boot from USB, preventing the user from accessing the 64MB DOM. The workaround for this is to unload pata_amd, pata_cs5536 and ata_generic modules and do ‘modprobe ata_generic all_generic_ide=1‘. Preferably, you can set it up in modprobe.d by blacklisting the first two and appending the aforementioned option to ata_generic.

2. The device may not boot from USB if there are several USB devices connected. As a matter of fact, it won’t boot anything at all and freeze, requiring to be power-cycled. A simple workaround is to put the /boot directory on the 64MB DOM.

3. You can’t boot from USB if there is nothing connected to the IDE.

4. The kernel occasionally freezes at boot time during PCI initialization. The problem is intermittent but usually happens on a warm reboot. I have tried booting with pci=nocrs|noacpi|nobios to no avail.

5. The device may fail to boot if there is a large USB drive connected (500GB in my case) even if ATA is first in the BIOS boot order.

Basically, you want to boot it once and hope for the best. Fortunately once booted the device is quite stable (weeks of uptime) and barely gets warm.

With hindsight, I’d suggest the following way to install Debian Wheezy:

1. Boot the installer in Expert text-mode

2. Open a console, remove autoloaded ata/pata modules and use modprobe ata_generic all_generic_ide=1 to gain access to the 64MB DOM.

3. Optional: back up the contents of the DOM with dd.

4. Make an ext3 partition on the DOM and return to the installer

5. When preparing drives and setting up mount points in the installer choose the ext3 partition on the DOM for /boot and your USB drive (preferably <=80GB) for /

6. When choosing packages/kernel pick linux-image-3.0

7. Choose LILO as the bootloader. Before the installation is finished, chroot into /, mount /boot, edit /etc/lilo.conf accordingly and run /sbin/lilo to update the boot sector

Other software: OpenSSH, Transmission, Samba, openntpd, ntfs-3g, iptables. Memory is a bit scarce, especially when Samba starts to thrash the page cache reading large files but everything seems to work well.

Performance

As this is supposed to be a NAS box I am most interested in Samba performance. The following was measured between the terminal and a Windows 7 box over a 100Mbps Ethernet connection. Copying from an attached USB hard drive with a NTFS partition (with fuse and ntfs-3g) and an ext3 partition results in a sustained transfer of 4.2MB/s (33Mbps) and 6MB/s (48Mbps) respectively. Watching a 720p movie from the samba share is entirely possible (as long as your media player uses a small file cache in case of a sudden bitrate change).

The bottleneck here is actually the CPU. Copying from the ext3 partition smbd and usb-storage use over 80% of the processor time.

Downloading torrents with transmission-daemon (encryption allowed) maxes out at roughly 2MB/s with a 50Mbps connection. Once again, the CPU is the bottleneck.

A few quick OpenSSL and GnuTLS benchmarks can be found here.

Hax!

I would not be myself if I did not pause to think about potential ways of abusing the device. Clearly, the main source of woes is the BIOS. Interestingly, WYSE s50 which is an identical device with the exception of RAM and DOM size is supported by coreboot. The SST49LF020A BIOS ROM is also supported by flashrom. I have purchased a few SST49LF080A‘s and will see how it goes. Follow-up: hints on replacing the BIOS with Coreboot/SeaBIOS.

I was curious how the manufacturer handles BIOS updates for WYSE s50 and took apart their software update. To my horror, it appears that they have a custom binary kernel module for 2.6.16 (sic!) which enables them to write to/from BIOS through a /proc interface. I have compared the BIOS downloaded from my S30 with flashrom with the one pulled from the manufacturer’s update and they are in fact binary identical.

Simple Transmission profiling and VIA Padlock Montgomery Multiplier

In my desperate procrastination attempts I have decided to tinker with the Transmission BitTorrent client. It is fairly lightweight, feature-full and written in C. I noticed that it puts quite a bit of CPU load on my VIA C7 box running at 400MHz. Let’s narrow it down.

The first problem was the choice of profiler. I went with valgrind’s callgrind tool first: it has nifty source annotation, is easy to use and the output can be nicely visualised. For starters I got this (excerpt):

It is immediately apparent that the application spends a lot of time doing bn_mul_mont (side note: valgrind’s estimate is a bit off, I am unsure why) which is OpenSSL implementation of Montgomery multiplication used in DH key generation.  Transmission uses those keys for handshakes with encryption-enabled peers. Montgomery multiplication, hmm, that sure rings a bell. Oh, right:

The PadLock Montgomery Multiplier (PMM) in VIA C5Q/C5J series processors implements the Montgomery Multiplication algorithm.

And guess what, there is even a sample implementation in OpenSSL! I have quickly whipped up libcrypto.so with this function crudely included, fired up valgrind and…

vex x86->IR: unhandled instruction bytes: 0xF3 0xF 0xA6 0xC0 
==21570==  Illegal opcode at address 0x415DAE4
==21570==    at 0x415DAE4: bn_mul_mont (in /usr/lib/libcrypto.so.1.0.0)

Well. Of course. 0xF3 0xF 0xA6 0xC0 is REP MONTMUL: the Padlock instruction.

This called for a different approach and I went with oprofile. After some initial issues with NMI watchdog being broken on new kernels I got it up and running. To cut a long story short:

samples  %        diff %    image name               symbol name
2190     18.9332  +22.8178  libc-2.14.1.so           /lib/libc-2.14.1.so
1537     13.2878  -55.8762  libcrypto.so.1.0.0       bn_mul_mont
973       8.4119  +18.8992  libcrypto.so.1.0.0       bn_mul_add_words
804       6.9508  +39.7453  libcrypto.so.1.0.0       RC4
613       5.2996  +40.2968  libcrypto.so.1.0.0       bn_mul_comba8
496       4.2881  +20.8645  transmission-daemon      bandwidthPulse
429       3.7088  +15.9008  transmission-daemon      comparePeerCandidates

The time spent in bn_mul_mont went down by 55% to a mere 13% of total execution time. Not bad at all. The libc function which moved up to the first place with 19% (I used a stock binary with no debugging symbols, fortunately callgrind was able to figure this out) is actually qsort called in peer-mgr.c:3916:

I don’t see an obvious way to shave its execution time much further.

In any case, I decreased CPU load when seeding quite a bit (~15%) by taking advantage of the Padlock engine in the VIA C7 processor. Of course, I could decrease it even further if I just disabled encryption, but what’s the fun in that.

I am wondering if torrent piece hashing while downloading also takes a heavy toll on the system, but that’s a story for another time.

VIA Padlock support in GnuTLS

My distribution has recently updated GnuTLS and suddenly some of applications that I use on charlie would crash. It turned out that GnuTLS had implemented Padlock-based acceleration which was a little buggy on VIA C7. I filed a bug report, hooked up one of the developers with SSH access and a little while later it got fixed. Benchmark results on a 800MHz Esther processor below.

ARCFOUR, 3DES and SHA512 are not supported by the Padlock hardware crypto engine. There is however a marked improvement in the performance for other functions, especially AES-128-CBC which skyrockets to 0.56 GB per second and doesn’t even fit on the graph. SHA-1 and SHA-256 are not too shabby either. Raw results from one of the runs here: http://pastebin.com/UzbiuDxy

I am happy to see the padlock engine supported by another major crypto library. Special thanks to Nikos Mavrogiannopoulos for implementing this in GnuTLS.

Photos

Today, I end my little adventure with photography. During the past 5 years I took merely 7 reels of film with my Nikon F55 and Nikkor 28-80mm f/3.5-5.6D lenses. I have less time, money and heart for that than ever. I did not see much of an improvement in the quality of my work either. Here is, however, a few unprocessed photos that I actually liked (despite glaring technical mistakes):

Continue reading

New Livebox (still Sagem 3202)

I had my old faulty livebox replaced and got a shiny new one with FAST3202_2601E4 firmware. And would you look at that:

Compilation flags: LIC=/filer1/dev_projets/fast_rg1_linux_2_6/dev/cluzeau/integration/4.1.7_2/lastcheckout/license/jpkg_fast3202.lic DIST=FAST3202_POL

Let the hacking begin! resume!

Update: Admittedly, I had less time than I had thought, but so far it seems I will need to break out a soldering iron. The busybox environment is unaccessible from the telnet CLI, the old trick of fishing out the OpenRG config file doesn’t work and no particularly weird services are running. There is also no firmware to analyse this time around. TR-069 support draws attention, I suppose it would be feasible to try a MITM attack with a fake ACS and some traffic manipulation to replace the configuration file in order to elevate CLI user access rights (assuming this option is enabled). Alternatively, there is still a serial port to be found. As I technically don’t own the device but lease it from the ISP, I’d have to acquire an another second-hand one, which probably won’t happen any time soon.

Update 2: In cooperation with a frequent visitor to my blog, quetzalcoatl, some progress has been made. We managed to pull out the rg.conf file which holds a few surprises. Also, a few minor information leaks were discovered. Currently, the work focuses on static analysis of a very similar firmware. Hopefully something will come up soon.

Update 3: One tutorial on MIPS assembler and several hours later, we managed to discover a way of replacing the openrg config file. It can be downloaded through 192.168.1.1/save_rg_conf.cgi and replaced with an authenticated POST request to 192.168.1.1/replace_rg_conf.cgi (e.g. curl -u admin:admin -F new_rg_conf=@Livebox.conf  192.168.1.1/replace_rg_conf.cgi). However, I was unable to find a way to access the busybox shell — it appears that all OpenRG functions that would allow to break out got axed. At this point one could try fooling around with a serial cable (boot menu and its options are referenced in the binary) or find a way to exploit OpenRG to execute /bin/sh.