-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow download, very small packet sizes #1170
Comments
Please use current master, you will have to build it yourself, you can also use the ones from boot.ipxe.org, but for any debugging to be done you will need to be able to modify and build new versions. Could you please dump the http headers you get from the server? |
Will do.
As it's https I can give you the headers from a curl call to the same URL. Is that what you want? |
You could start with what curl shows you, but really want what iPXE gets. |
Alright, here's the curl story:
This is a public URL, so if you like you can poke it directly for debugging. How do I get to see the headers that ipxe sees? |
I ran a quick test downloading that URL with both curl and iPXE just now: curl: 54.3s so I am unable to reproduce your problem. Since you have a packet capture: could you please provide the raw .pcapng file? Doesn't need to include the whole download: the first 10 seconds or so should be sufficient to observe the problem. |
Thanks a lot. I'll get a pcap file, could be a couple of days, though as travel is coming up. |
Alright, here's a pcap file (unfortunately it doesn't compress well due to the encryption). Something I noticed while going through it is a high number of duplicate ACKs. I'm not aware of an underling issue in our network here as I can use another host attached to the same network and switch and get the download within 10s which is close to 1GBit which is almost identical to the slowest link on the path. |
Ugh. Seems like some corruption is happening here as well. The archive itself is intact. I downloaded it using curl on the neighbouring machine. I'm double checking this on other hardware now to see whether this is specific to that one machine. Edit: actually, I'm going to try a manually compiled current version of ipxe (based on 226531e) first. |
After trying a couple of times I was able to boot one of the initrds we have available and on that machine, using the same link I got the initrd downloaded within 1 minute. So it's not an issue with the machine itself. |
Thanks. The capture file is taken from an interface with some kind of TCP offload enabled, so is not showing the actual packets that went over the wire. For example: packet 115 is shown as being 15994 bytes long, which is longer than an Ethernet jumbo frame. We therefore cannot trust what the capture shows about duplicate ACKs, etc, since we are seeing a resynthesis of a TCP conversation rather than the actual TCP conversation. Could you try disabling the assorted segmentation offload features on the capture interface via |
Ok, so, this is quite fiddly to setup and I only managed to get an excerpt from the middle of the conversation. It might be that this doesn't yet help, but I think I managed to get a better dump now. I used a router in the middle and set its offloading settings to Looking at the dump in wireshark now only shows packet sizes around 1514, so correct L2 overhead for 1500 link MTU. I still see messages about reassambled PDUs, though as well as bursts of retransmisions and duplicate acks ... Any ideas? Let me know if you do need the beginning of the conversation instead. |
Great, so we can rule out any problem relating to packet sizes.
I see normal length packets and ACK RTT times (at the point of the wireshark capture) of <1ms from iPXE. TCP SACK is in use and is working as expected. I think you're using undionly.kpxe, which means that we have no direct control over the NIC and no visibility into things like RX buffer exhaustion. Are you able to use |
The cards are I'm using undionly mostly due to (very longterm) historical reasons when I tried to get things working reliably around 10+ years ago ... so this choise is likely cargo cult for now. I can try using ipxe.pxe - I'm curious whether this might be a driver issue and would resolve itself by switching to the natvie driver ... |
Ok, so I chainloaded into ipxe.pxe and had the impression, that the kernel loaded faster, but the initrd is still as slow at 1% in 10seconds. I canceled the download and here's the data from the interfaces:
The relevant interface is net1 (or potentially net0 which is the same) ... and ... right in this moment I'm noticing that usually we did boot from net0 and not net1. There was a slight firewall misconfiguration that caused the tftp server not to respond from net0 but on net1. Interestingly ... I now chained this to
This shows much much lower error rates ... I'm 95% sure that this isn't a problem on the actual network it's connected to. I can double check that once I booted. Consider me puzzled. |
Ah, I chained this again into the
|
Interesting! In the absence of any information to the contrary, I'm going to assume that this is most likely a configuration issue on the network side. If you are able to test that it really does depend on whether the NIC is using port 0 or port 1 (e.g. by physically swapping cables and observing that the slow/fast behaviour can be reproduced the other way round), then we can investigate further. |
Yes. I'm a bit tight on on-hands resources at the moment, so the first thing I can check is whether this also happens in a regular Linux environment. I'm happy to experiment with swapping the cables in a few days. |
So, within Linux on the same machine downloading over the two interfaces shows no differences. I'll try with switches cables in a couple of days. |
Are the 2 interfaces connected to identically configured ports? Is there any LCAP or other group functions enabled on the ports? STP configuration? |
Both are connected to identical switches, no LACP or other functions enabled. The faster network has a bit less traffic on the router (both area 1 switch away from the same router) but either are 1g interfaces that aren't fully utilized either way. |
Commenting here just so I can follow along. We've seen this when updating the Triton Data Center version of iPXE to March 2024 (ending with upstream commit 926816c) from October 2023 (ending with upstream commit 8b14652). I can detail the our-own commits and what-not if need be, but we are seeing problem with the most recent merge, and bisection has not helped us a lot in digging into the problem as of yet. |
I've added a packet trace of the very slow http download of our "unix" binary here: https://kebe.com/~danmcd/webrevs/2440-variants/httpboot-stock-failed-only.snoop . This was captured by one of our community members. |
I've been able to do some testing locally. I'm seeing very slow http downloads with a recent merge with our upstream. Given the 2MB window size in the snoop I linked above, I considered undoing this commit: and the resultant undionly.kpxe appears to be noticeably faster on downloading our 3.5MB boot archives. I believe this is a problem EXCLUSIVELY with undionly.kpxe. I have other methods of iPXE booting in deployment on my test cloud: EFI netboot chain to snponly,kpxe, and off-disk ipxe.lkrn. Both of these have NO issues with the larger 2MB max buffer size. I get the feeling undionly is special for some reason. The community member who has set up the "woodchipper" to confirm/deny things won't be back until Monday. I'll report back here with the woodchipper's results. Those who have this problem ( @ctheune ) who can recompile undionly.kpxe with the max window size shrunk back down to 256k, please try it, and see if it helps. I do think the window size is exposing an undionly.kpxe problem, not causing it, given my positive experience with other iPXE binary artifacts. |
Makes sense that this is an issue withe the underlaying UNDI stack, maybe more info on which NICs and ROMs/BIOSes this happens for needs to be collected, because it isn't on every device? |
I'm not 100% sure about every-device or not, because when I first heard of this bug I had not noticed the severe slowdown in my otherwise successful boot_archive download. There may be others out there who are experiencing a problem without having outright failure occur, so they dismiss it. I will make sure I gather information on my slowed-down-but-not-failed one, as will anyone else from Triton-land with failures or slowdown who can gather that as well. |
So for my Supermicro, Xeon E5 v3 (Haswell), booting off of the Intel X540: BIOS version: An interesting data point for me is that I have "DUAL" boot support selected but also have "LEGACY to EFI support" DISABLED. NOTE that this machine does boot, but MUCH MORE SLOWLY with a 2MiB TCP max buffer, vs. the 256KiB max buffer. |
Hello, I'm a user of Triton and the downstream user of @danmcd mentioned above with the "woodchipper" which I've been running tests through. The test system is a Dell OptiPlex 980. It was, during this debugging, updated to BIOS version A18. It has been configured in BIOS to use PXE boot, and has no UEFI mode. During the boot sequence, the following information is displayed, which may help in identifying the exact model of card.
A packet trace of a failed boot using the default for Triton version of undionly is here https://manta.matrix.msu.edu/goekesmi/public/iPXE-debug/2024-0520-0001/AgentSmith-3a.2024-0502.undionly-d0c0252a89de00b943aa2017c39c204b.snoop A packet trace of a successful boot using a variant that @danmcd provided that backs out 2d180ce is available at https://manta.matrix.msu.edu/goekesmi/public/iPXE-debug/2024-0520-0001/AgentSmith-3a.ipxe-256k-tcp-buffer-undionly.kpxe-610c7eaf10dd2e585671fae58afc1577-bootsequence.snoop The capture was done with a mirror port from a switch so temporal packet reordering is possible in the trace, along with the occasional dropped packet. The specific build versions and options @danmcd can speak to. The embedded md5 hashes in the file names refer to the undionly.pxe version that was used for that boot and packet trace. Hope this helps. |
Thanks for the packet trace. I have a working theory as to what may be happening. Using undionly is known to be slower than using a native driver, and so packet drops due to receiver overruns are much more likely than with a native driver. This is exercising portions of the TCP RX queue management that don't normally get much use. A 2MB TCP window is necessary in order to get close to expected throughput on a modern network (as documented in the commit message for commit 2d180ce). However, this 2MB window is now larger than iPXE's internal heap, which is limited to 512kB. There are good reasons for keeping the heap size small: not least of these is that in some boot scenarios (such as iSCSI boot under BIOS) any memory used by iPXE is lost to the operating system. With a 2MB TCP window, a 512kB heap, and a high rate of packet loss, there will inevitably be scenarios in which iPXE is forced to discard packets from the TCP receive queue, i.e. to "renege" in the terminology as used by RFC 2018. This is permitted by the RFC, but is expected to cause the overall behaviour to fall back to relying upon a retransmission timer on the sender side, which will degrade the performance back to roughly what it would have been without SACK. This is a significant degradation: as noted in commit e0fc8fe78, the improvement from adding SACK was in the region of 400%-700% of throughput increase. It would be interesting to try undionly.kpxe in the known-bad setup with a single modification: change I am also noticing some oddities in the SACK values shown in the packet capture. For example, for two consecutive ACKs sent by iPXE:
i.e. we seem to have discarded ("reneged upon") some packets from earlier on in the TCP receive queue, rather than dropping the later packets. This is not how iPXE is supposed to behave: under memory pressure, the TCP cache discarder (in I will check the behaviour of the TCP cache discarder. In the meantime, @danmcd @goekesmi could you please carry out the test with the 8MB |
Rebuilding the Triton iPXE from 20240502/master but with this:
@goekesmi ==> Same kebe.com location, but file is |
For me and my supermicro it was faster than stock, but after an initial smooth burst it trickled to a much slower transfer rate. I should probably recapture TCP snoops on all three scenarios: Stock 20240502, reversion of max buffer, and 8MiB heap. Can't do that this moment, but hope to in the next 24-48 hours (sooner if I'm lucky). |
Okay I took snoops of three variants: 1.) "stock" == The current iPXE downstream in TritonDataCenter. Last merged with upstream commit:
2.) "8m" == Stock, but with the heap sized raised to 8MiB. 3.) "256k" == Stock, but with the TCP max buffer size revered to 256KiB. The winner in my environment is still "256k" by a long shot. Here are the highlights:
Note the 32sec for stock, 23sec for 8m heap, and 3.5sec for 256k max tcp buffer size. These snoops are available in https://kebe.com/~danmcd/webrevs/2440-variants/ for download. |
Can/should we have HEAP_SIZE and TCP_MAX_WINDOW_SIZE be configurable in |
No, that's definitely not a solution I'd accept. Those aren't meaningful user configuration choices, and making them configurable would just be papering over the problem and putting the burden onto all future users to guess what the "correct" values might happen to be for their use case. |
To my great surprise, The test node using the 8m heap variant did boot. It was slow, and inconsistent on transfer speeds, but it did complete the boot. Twice. Which is all I have tested it. Packet capture of the boot at https://manta.matrix.msu.edu/goekesmi/public/iPXE-debug/2024-0522-0001/AgentSmith-3a.8m-heap-undionly.kpxe-79223488ec603400a2c638bd47b5f2dd-bootsequence.snoop |
This matches my experience (See my timings above). |
Hi,
I've been running with an older image for quite a while successfully (undionly.kpxe from around 2020 or 2021) and this only started showing up with newer machines. I've updated to a current version (not exactly sure how old, likely only a few days/weeks).
I've seen that slow downloads are a recurring theme and I've tried doing my homework ...
When downloading over HTTPs I'm getting less that 10mbit/s. Looking at this tcpdump I see that the window doesn't seem to increase and only wobbles around 1.5k and 3k bytes. The latency is around 10ms (a WAN link) so this makes initrd downloads extremely non-fun:
The interface stats also don't look too good, but the linked explainers don't help ...
The version that is shown when booting is a bit unspecific (1.0.0+) and I'm not 100% sure whether I might still be accidentally running an old image, but as far as I can tell my tftp server is deliverying the correct image file that I've taken from my distro as
nix/store/3fm734b6ci0klbsijc8mi04rryfhfh10-ipxe-unstable-2023-07-19
.Thanks for any help ..
The text was updated successfully, but these errors were encountered: