This blog will shut down on 01/10/14. No new content will be published until then.
This blog will shut down on 01/10/14. No new content will be published until then.
This post is long overdue following the teaser back in January but unfortunately I have been swamped with work so the following instructions are not as thorough as I would have liked and come with a ‘some assembly required’ sticker.
Coreboot support of this board is far from perfect. The following stuff does not work:
Lack of ACPI means that there is no S5 and as such you can’t turn off (soft off) the device but merely halt it. Without early VGA support you are left with a blank screen until the kernel takes over from coreboot/SeaBIOS (assuming you put gxfb module in your initramfs or compiled it in).
However, Coreboot does fix problems encountered with the original manufacturer’s BIOS:
Still interested? Very well…
GX2 VSA (Virtual System Architecture) is a low-level system library that needs to be compiled with an antediluvian and esoteric toolchain from sources that are almost nowhere to be found and embedded within the coreboot image (and before you ask, OpenVSA never took off and is now defunct). A fearless person under the name of Nils Jacobs from the coreboot mailing list hacked together a working VSA binary for me but asked me not to distribute it as it was a bit buggy. His intention was to get it into a working shape and contribute to the coreboot upstream but I have not seen him post on the mailing list in a while. I recommend jumping straight to the mailing list and asking about his progress. Don’t even try building coreboot without the VSA code as it won’t work.
Are you still reading? Got everything? Okay…
Disclaimer: the following instructions are general and require understanding of what you actually do. Don’t begin before you are entirely familiar with the topic at hand.
If you did everything right you will see something like this on your serial console. Good luck!
So you want to watch HD videos stored on a network-attached storage in your wireless local area network. Getting this right is actually not that trivial. There are at least four potential points of failure:
For the bulk of SD/720p content the choice of a media player is usually not that important. However, for 720p video with sudden changes of bitrate or 1080p you need to keep a small cache of the file being played (for reasons including but not limited to latencies and packet retransmission).
Interestingly, MPC-HC fails in this regard (#218, #1264). It works better or worse depending on the amount of buffering (often hardcoded) in the media splitter. For now, I am going to tentatively recommend VLC 2.0 which detects if a file is played from a network share and adjusts the buffer size accordingly. Alternatively, MPC-HC with LAV Filters works well most of the time.
Most netbooks and very old Celeron/Core2Duo systems will lack the required processing power to play back a high-bitrate (>20mbps) video file. If possible, take advantage of hardware decoding technology such as DXVA. Verify that you can successfully play high-bitrate videos from the hard drive without heavy CPU usage and stuttering before moving on.
Usually not a problem if you use a commercial NAS. However, if you use an ancient or esoteric device turned into a NAS with NTFS USB hard drives attached you need to make sure it is actually able to send the file at a sufficient rate. To test this, connect both the NAS and the client (e.g. laptop that will later connect to the wireless LAN) to your router with a CAT5 Ethernet cable (1ooBASE-TX) and measure copy transfer speed from a network share. Preferably you want to see over 10MB/s (80 mbps) with a low CPU load on the NAS.
802.11g provides a theoretical throughput of 54 mbps. However, if you take into account the overhead of the transmission protocol and latencies you are looking at about 21-25 mbps of maximum practical throughput. This will generally suffice for standard definition and some 720p videos but 1080p is mostly out of the question. With that in mind, make sure that your signal is strong and stable and you are using an empty WLAN channel (using e.g. Inssider). If necessary, identify and eliminate sources of interference such as wireless keyboards, bluetooth headsets, microwave ovens or clueless neighbours.
802.11n is usually advertised to work at 300 mbps. However, most of consumer-grade 802.11n networks operate at a theoretical throughput of only 72 mbps (and with real-world performance at about 60% or 40-42 mbps). Here’s why.
For a full theoretical throughput of 300 mbps to work two conditions need to be met:
This requires that both a transmitter and a receiver have two antennas one for each stream. This cuts out a significant number of netbooks and laptops right away. Moreover, it requires duplicate electronic logic (radio frequency chains and analog-to-digital converters) which translates into higher implementation costs. As a result, many cheap WLAN cards only support a single stream (I am looking at you, Realtek).
Most of consumer-grade 802.11n networks operate in 2.4GHz. And two empty non-overlapping channels are not that easy to come by in this band. For practical reasons you are mostly limited to the following combinations: 1+5, 6+2, 6+10 and 7+11. Check your airspace with e.g. Inssider and pick a pair of channels that do not overlap with other networks (note: in a city it may be next to impossible). Configure your router and again with Inssider verify that both channels are being used. If your hardware detects interference it may back down to a single channel.
Now, if you meet only one of the above conditions you end up with 150 mpbs of theoretical and about 70-75 mpbs of practical throughput. This is sufficient for most of 1080p videos.
And there you have it. With all four points taken care of you should be able to watch videos directly from your NAS over WiFi.
So I heard that you don’t like the BIOS in Wyse S30/S50?
coreboot-4.0-1943-gf91cf9f-0.1 Sat Jan 7 18:13:55 UTC 2012 starting... POST: 0xa0 POST: 0xa1
Edit: A follow-up is posted here.
updated March 2012
I have recently purchased a second-hand WYSE S30 thin client for a paltry 12 euros with a plan to use it as my NAS/torrent box and retire Neoware e90. It turned out to be one annoying little bugger. Let’s have a look at the hardware first.
Motherboard: Custom, 02L revision. 44-pin IDE connector with a 64MB Disk-On-Module, SO-DIMM memory slot, beeper, RTL8100C 100Mbit ethernet chip (8139too), cpu, chipset, BIOS. It is remarkably small as the entire device measures just 17.5cm x 3cm x 12.5cm and is actually smaller than the 3’5″ USB drive attached to it. The entire platform uses only about 6W of power, less than a half of what my previous box used.
Processor: AMD Geode GX 500 @366MHz with a staggering 32KB of cache. This is technically an i686 processor which does not support the NOPL instruction. This is not an issue unless you happen to use an unlucky version of binutils. CMOV and a handful of others flags are supported. This architecture can be actually traced back to National Semiconductor GX2 (2002) which was derived from Cyrix MediaGXm which debuted in… 1997.
Chipset: CS5536 Companion Device. AC97 audio, virtual 66MHz PCI bus, ATA-6 IDE, EHCI/OHCI USB. Connects with the CPU over a shared 66MHz PCI bus (“Geode Link PCI Southbridge”) which has rather sad implications for I/O-intensive operations.
RAM: 128MB of user-replaceable 200-pin SO-DIMM DDR SDRAM. Some people got 512MB running in a single bank.
BIOS: 256KB SST49LF020A. To get in, hold Del when the device is powering off, and still holding Del turn it on. You can only change the boot order (IDE/USB/PXE) and password. The default password is ‘Fireport’.
Bootloaders & Kernels: Good grief. Arch Linux and Ubuntu server installers both failed. Debian installed correctly but grub2 failed to boot. So did grub-legacy. I finally managed to boot using LILO (config here) with lba32 and compact enabled only to find out that the kernel is ‘unable to enumerate USB device’ once I plug in my keyboard. Finally, using Wheezy with the 3.0 kernel resulted in a working installation.
Five caveats you should be aware of:
1. The BIOS disables ATA when set to boot from USB, preventing the user from accessing the 64MB DOM. The workaround for this is to unload pata_amd, pata_cs5536 and ata_generic modules and do ‘modprobe ata_generic all_generic_ide=1‘. Preferably, you can set it up in modprobe.d by blacklisting the first two and appending the aforementioned option to ata_generic.
2. The device may not boot from USB if there are several USB devices connected. As a matter of fact, it won’t boot anything at all and freeze, requiring to be power-cycled. A simple workaround is to put the /boot directory on the 64MB DOM.
3. You can’t boot from USB if there is nothing connected to the IDE.
4. The kernel occasionally freezes at boot time during PCI initialization. The problem is intermittent but usually happens on a warm reboot. I have tried booting with pci=nocrs|noacpi|nobios to no avail.
5. The device may fail to boot if there is a large USB drive connected (500GB in my case) even if ATA is first in the BIOS boot order.
Basically, you want to boot it once and hope for the best. Fortunately once booted the device is quite stable (weeks of uptime) and barely gets warm.
With hindsight, I’d suggest the following way to install Debian Wheezy:
1. Boot the installer in Expert text-mode
2. Open a console, remove autoloaded ata/pata modules and use modprobe ata_generic all_generic_ide=1 to gain access to the 64MB DOM.
3. Optional: back up the contents of the DOM with dd.
4. Make an ext3 partition on the DOM and return to the installer
5. When preparing drives and setting up mount points in the installer choose the ext3 partition on the DOM for /boot and your USB drive (preferably <=80GB) for /
6. When choosing packages/kernel pick linux-image-3.0
7. Choose LILO as the bootloader. Before the installation is finished, chroot into /, mount /boot, edit /etc/lilo.conf accordingly and run /sbin/lilo to update the boot sector
Other software: OpenSSH, Transmission, Samba, openntpd, ntfs-3g, iptables. Memory is a bit scarce, especially when Samba starts to thrash the page cache reading large files but everything seems to work well.
As this is supposed to be a NAS box I am most interested in Samba performance. The following was measured between the terminal and a Windows 7 box over a 100Mbps Ethernet connection. Copying from an attached USB hard drive with a NTFS partition (with fuse and ntfs-3g) and an ext3 partition results in a sustained transfer of 4.2MB/s (33Mbps) and 6MB/s (48Mbps) respectively. Watching a 720p movie from the samba share is entirely possible (as long as your media player uses a small file cache in case of a sudden bitrate change).
The bottleneck here is actually the CPU. Copying from the ext3 partition smbd and usb-storage use over 80% of the processor time.
Downloading torrents with transmission-daemon (encryption allowed) maxes out at roughly 2MB/s with a 50Mbps connection. Once again, the CPU is the bottleneck.
A few quick OpenSSL and GnuTLS benchmarks can be found here.
I would not be myself if I did not pause to think about potential ways of abusing the device. Clearly, the main source of woes is the BIOS. Interestingly, WYSE s50 which is an identical device with the exception of RAM and DOM size is supported by coreboot. The SST49LF020A BIOS ROM is also supported by flashrom. I have purchased a few SST49LF080A‘s and will see how it goes. Follow-up: hints on replacing the BIOS with Coreboot/SeaBIOS.
I was curious how the manufacturer handles BIOS updates for WYSE s50 and took apart their software update. To my horror, it appears that they have a custom binary kernel module for 2.6.16 (sic!) which enables them to write to/from BIOS through a /proc interface. I have compared the BIOS downloaded from my S30 with flashrom with the one pulled from the manufacturer’s update and they are in fact binary identical.
Shrinking a partition and moving it to the right by two megabytes with GParted takes twenty three hours with a 500GB USB drive. I wish I were kidding.
In my desperate procrastination attempts I have decided to tinker with the Transmission BitTorrent client. It is fairly lightweight, feature-full and written in C. I noticed that it puts quite a bit of CPU load on my VIA C7 box running at 400MHz. Let’s narrow it down.
The first problem was the choice of profiler. I went with valgrind’s callgrind tool first: it has nifty source annotation, is easy to use and the output can be nicely visualised. For starters I got this (excerpt):
It is immediately apparent that the application spends a lot of time doing bn_mul_mont (side note: valgrind’s estimate is a bit off, I am unsure why) which is OpenSSL implementation of Montgomery multiplication used in DH key generation. Transmission uses those keys for handshakes with encryption-enabled peers. Montgomery multiplication, hmm, that sure rings a bell. Oh, right:
The PadLock Montgomery Multiplier (PMM) in VIA C5Q/C5J series processors implements the Montgomery Multiplication algorithm.
And guess what, there is even a sample implementation in OpenSSL! I have quickly whipped up libcrypto.so with this function crudely included, fired up valgrind and…
vex x86->IR: unhandled instruction bytes: 0xF3 0xF 0xA6 0xC0 ==21570== Illegal opcode at address 0x415DAE4 ==21570== at 0x415DAE4: bn_mul_mont (in /usr/lib/libcrypto.so.1.0.0)
Well. Of course. 0xF3 0xF 0xA6 0xC0 is REP MONTMUL: the Padlock instruction.
samples % diff % image name symbol name 2190 18.9332 +22.8178 libc-2.14.1.so /lib/libc-2.14.1.so 1537 13.2878 -55.8762 libcrypto.so.1.0.0 bn_mul_mont 973 8.4119 +18.8992 libcrypto.so.1.0.0 bn_mul_add_words 804 6.9508 +39.7453 libcrypto.so.1.0.0 RC4 613 5.2996 +40.2968 libcrypto.so.1.0.0 bn_mul_comba8 496 4.2881 +20.8645 transmission-daemon bandwidthPulse 429 3.7088 +15.9008 transmission-daemon comparePeerCandidates
The time spent in bn_mul_mont went down by 55% to a mere 13% of total execution time. Not bad at all. The libc function which moved up to the first place with 19% (I used a stock binary with no debugging symbols, fortunately callgrind was able to figure this out) is actually qsort called in peer-mgr.c:3916:
In any case, I decreased CPU load when seeding quite a bit (~15%) by taking advantage of the Padlock engine in the VIA C7 processor. Of course, I could decrease it even further if I just disabled encryption, but what’s the fun in that.
I am wondering if torrent piece hashing while downloading also takes a heavy toll on the system, but that’s a story for another time.
My distribution has recently updated GnuTLS and suddenly some of applications that I use on charlie would crash. It turned out that GnuTLS had implemented Padlock-based acceleration which was a little buggy on VIA C7. I filed a bug report, hooked up one of the developers with SSH access and a little while later it got fixed. Benchmark results on a 800MHz Esther processor below.
ARCFOUR, 3DES and SHA512 are not supported by the Padlock hardware crypto engine. There is however a marked improvement in the performance for other functions, especially AES-128-CBC which skyrockets to 0.56 GB per second and doesn’t even fit on the graph. SHA-1 and SHA-256 are not too shabby either. Raw results from one of the runs here: http://pastebin.com/UzbiuDxy
I am happy to see the padlock engine supported by another major crypto library. Special thanks to Nikos Mavrogiannopoulos for implementing this in GnuTLS.
Today, I end my little adventure with photography. During the past 5 years I took merely 7 reels of film with my Nikon F55 and Nikkor 28-80mm f/3.5-5.6D lenses. I have less time, money and heart for that than ever. I did not see much of an improvement in the quality of my work either. Here is, however, a few unprocessed photos that I actually liked (despite glaring technical mistakes):
I had my old faulty livebox replaced and got a shiny new one with FAST3202_2601E4 firmware. And would you look at that:
Compilation flags: LIC=/filer1/dev_projets/fast_rg1_linux_2_6/dev/cluzeau/integration/4.1.7_2/lastcheckout/license/jpkg_fast3202.lic DIST=FAST3202_POL
Let the hacking
Update: Admittedly, I had less time than I had thought, but so far it seems I will need to break out a soldering iron. The busybox environment is unaccessible from the telnet CLI,
the old trick of fishing out the OpenRG config file doesn’t work and no particularly weird services are running. There is also no firmware to analyse this time around. TR-069 support draws attention, I suppose it would be feasible to try a MITM attack with a fake ACS and some traffic manipulation to replace the configuration file in order to elevate CLI user access rights (assuming this option is enabled). Alternatively, there is still a serial port to be found. As I technically don’t own the device but lease it from the ISP, I’d have to acquire an another second-hand one, which probably won’t happen any time soon.
Update 2: In cooperation with a frequent visitor to my blog, quetzalcoatl, some progress has been made. We managed to pull out the rg.conf file which holds a few surprises. Also, a few minor information leaks were discovered. Currently, the work focuses on static analysis of a very similar firmware. Hopefully something will come up soon.
Update 3: One tutorial on MIPS assembler and several hours later, we managed to discover a way of replacing the openrg config file. It can be downloaded through 192.168.1.1/save_rg_conf.cgi and replaced with an authenticated POST request to 192.168.1.1/replace_rg_conf.cgi (e.g. curl -u admin:admin -F new_rg_conf=@Livebox.conf 192.168.1.1/replace_rg_conf.cgi). However, I was unable to find a way to access the busybox shell — it appears that all OpenRG functions that would allow to break out got axed. At this point one could try fooling around with a serial cable (boot menu and its options are referenced in the binary) or find a way to exploit OpenRG to execute /bin/sh.