-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipxe.efi hangs at downloading large files over HTTP #1023
Comments
I say it works perfectly fine as is. (That is, it works for me and many others on the hardware we have)
Since a few versions back of wimboot you no longer need boot.sdi or BCD |
@NiKiZe I understand, I figure it likely hasn't been mentioned yet in the git issues lately because mostly people aren't running into the same problem with my specific HW perhaps? To answer your questions:
Thanks for the pointer on wimboot, i did not know that! Just getting my feet wet with pxe booting in general in the last couple weeks. At the end of the day, the ipxe.efi i compiled after doing these steps made the difference
And it worked perfectly and allowed me to fully download my boot.wim file from my web server. So going back in time in git history definitely shows something went wrong somewhere, and so I doubt it has much to do with my machine, but rather some regression that's happened since then that maybe nobody noticed until now? 🤷♂️ Just brought up the issue so that at least someone else can see it in case they run into the same thing, or if the problem might be obvious to anyone familiar with the codebase. Meantime I can try to figure out which commit this starts to break to pinpoint it, but until then if you or anyone else has an idea, happy to hear it! |
Some more info. This is my exact NIC Also simply reverting that commit 059c4dc makes downloading over HTTP work without issues, from the latest master commit. But looking at the diff I don't really see what the issue could really be 😕 , specifically with device id 16D8 like mine. Is it possible this affected all BNXT nics? |
That is strange. There is nothing of substance in that commit that should affect behaviour. Could you check the output of |
@mcb30 Here's the output of an ifstat on: I do see a slight change in that top line using |
Yes, that's a major difference. 🙂 For some reason, the commit is causing the NIC to be recognised by the In the earlier commit, the NIC is for some reason not recognised by the |
Thanks for the explanation, things are starting to make sense!
It sounds like the current logic is somehow more correct, but for some reason the bnxt driver just doesn't want to work, right? I have updated the NIC firmware before and still shows the same problem, so not sure if it's solely a problem with this specific driver or bnxt ones in general that iPXE tries to handle specifically. Do you think all bnxt drivers should just fall back to |
The broadcom driver has issues on many machines. Often it depends on the firmware in the machine (on Macs) where it works in older, but not newer. Consider using snponly.efi or snp.efi, at least on these machines. |
@NiKiZe Thanks for the suggestion, snp.efi seems to work just fine, now realizing that it was an option 😁 Feel free to mark this as a duplicate then to another bad driver issue. |
I'm not sure how useful this is, but I am also experiencing this issue. I have an AsRock Rack motherboard with the BCM57416 (same chip from the card linked earlier). Reverting 059c4dc fixed it for me as well from master (currently 8b14652). Once I finish my redundant server setup, I'd be happy to do some testing and maybe contribute. I'm not great with C, but I can kind of kludge something together and take comments in a review to clean it up. This commit doesn't seem too complicated, but I'm missing some of the context. If someone who understands it could point me in the right direction I'd love to give it a go. Otherwise, I might circle back to it later. |
Could you grab ifstat from iPXE , both with and without that commit. Unless there was changes to which devices are "supported" by that commit we wouldn't expect any changes. |
Sorry for the late response. I haven't had a good opportunity to take my server down for troubleshooting lately without making someone upset. Here are the results of running |
Reported errors from ifstat is common, and often not an issue. The interesting part is comparing ifstat output between the working and non working iPXE builds. |
Did you have a chance to compare ifstat between working and non working builds? |
I had booting issue with a BCM57414 (14e4:16d7) where depending on the server I'm loading a large ramfs (+500MB) from, ipxe remains stuck on a given (random) percentage of the file. It was reproducible so started on top on the upstream version (26d3ef0), I enabled debug.
If appear that bnxt_rx_complete() https://github.com/ipxe/ipxe/blob/master/src/drivers/net/bnxt/bnxt.c#L484, returns NO_MORE_CQ_BD_TO_SERVICE in loop. This makes the driver looping on the same packet and not reading new packets and blocks the boot process. It's unclear to me if it's a firmware or driver issue. I tried a workaround (attached to this comment) and by reducing the number of RX buffers, all my servers were fine at booting large file. This patch can affect the download speed when operating in perfect conditions but solved my issue here. If some want to test it and make a feedback I'd be happy about it, I can also offer a PR but I'd love having more comments on it first. For the reference, please find my card info I also emailed the original driver author to inform him about the issue and my workaround, we'll see how it goes. |
Hi, I tried reproducing this issue by downloading a 2.5GB test file using tftp and http, but I was not able to observe the hang. Is the boot.wim or large ramfs available to download so that I can test that on my setup directly |
I cannot share the RAMFS I'm using in production but it's a 600MB one. Please note that's on a real production network infrastructure implicating several switches and routing between the server and the client. |
@jw14812 thanks for testing! @ErwanAliasr1 can you try simplifying the setup by e.g. trying a different (and public) large image to download, or by using a direct connection that eliminates the variety of switches and routers from the scenario? |
@mcb30 Trying downloading another file will be easy but bypassing the whole infra will be complicated for me. I can't hijack the infra like this :( |
Hey guys! Thanks for clues!
When i used Sorry for lack of debug logs - unfortunately have only IPMI access, can't copy-paste. Br, Alexey |
So it's not an issue with my infra but a common issue between many. My
patch only workaround the real issue I have no clue but should be useful
for the author to dig the code/fw.
Glad it helped one at least
Le dim. 25 févr. 2024, 19:28, Alexey Grevtsev ***@***.***> a
écrit :
… Hey guys! Thanks for clues!
Were trying to boot Supermicro server with H12SSL-NT m/b, and such NICs
45:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
45:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
When i used ipxe.efi, booting over HTTP were stuck while downloading
initrd (around 30Mb), with error Error: No buffer space available
<https://ipxe.org/err/2a2c40>.
As soon as i rebuild ipxe.efi with @ErwanAliasr1
<https://github.com/ErwanAliasr1> (thanks mate!) fix - server could
download initrd and boot (download were veeeery slow - but successful)
Sorry for lack of debug logs - unfortunately have only IPMI access, can't
copy-paste.
Br, Alexey
—
Reply to this email directly, view it on GitHub
<#1023 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCUIHOPXUPODZIXVIKHRNTYVN7F5AVCNFSM6AAAAAA3OHQFDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRTGAZDEMJRGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I've been trying to boot to WinPE which requires me to download a boot.wim file at ~400MB. It always stops after the first block it downloads (could be 2%, 16%, 29% etc), and never progresses further afterwards.
Interestingly, I am able to download the same boot.wim file over TFTP with the latest ipxe source code, but i definitely had some data corruption issues on that boot.wim file since i'd run into this Red screen of death half the time, and the other half it would boot properly. Perhaps because TFTP is using UDP instead of TCP and is dropping/corrupting some packets? Not sure, but anyways...
I spent a couple days wondering what the issue could be trying to get HTTP working, and said screw it eventually and tried to compile the ipxe.efi file off of commit 1295b4acff1f2014261c40d9f9d2107ffd668d92 instead after reading issue 155.
The ipxe.efi file i compiled from that commit now allows the boot.wim file to download over HTTP in < 1 second and boots straight to WinPE properly. Also, i don't get a red screen of death half the time, anymore. It's consistent when downloading over HTTP instead of TFTP!
I have no clue what happened between now and 2020 when that commit was made, but I just wanted to shed light on this issue that downloading a large file from HTTP seems to be busted currently.
If someone makes a patch to try to address this, I am happy to recompile that ipxe.efi file with it to test :). Until then, i'll keep using this older version.
The text was updated successfully, but these errors were encountered: