Transition all size variables to `ssize_t` #592

johnsonjh · 2022-12-28T00:31:26Z

Reproducer:

$ ls -la bigfile
-rw-r--r-- 1 jhj jhj 2751926272 Dec 27 19:25 bigfile
$ BIG="$(cat bigfile)"
$ print -v BIG | wc -c
[1]    647915 segmentation fault (core dumped)  ksh93

bigfile contains ASCII text; this is ksh93 1.0.4 on a 64-bit Linux system.

The text was updated successfully, but these errors were encountered:

phidebian · 2022-12-28T16:24:10Z

Avoiding the bigfile.

TC$ echo $KSH_VERSION
Version AJM 93u+m/1.1.0-alpha+4ec93745 2022-10-31

TC$  printf -v v '%02147483648d' 0
Segmentation fault (core dumped)

phidebian · 2022-12-29T11:14:32Z

Actually I don't reprod your test case :-(

TC$ echo $KSH_VERSION
Version AJM 93u+m/1.0.4+d4503c15/MOD 2022-10-22

TC$ ll yo
-rw-rw-r-- 1 phi phi 3221225472 Dec 29 11:54 yo

TC$ BIG="$(cat yo)"

TC$ print -v BIG  | wc -l
48220802

I don't reprod either with production (ubuntu 22.04)

TC$ echo $KSH_VERSION
Version AJM 93u+m/1.0.0-beta.2 2021-12-17

TC$ ll yo
-rw-rw-r-- 1 phi phi 3221225472 Dec 29 11:54 yo

TC$ BIG="$(cat yo)"

TC$ print -v BIG  | wc -l
48220802

Indeed the reported wc is completly wrong but it doesn't core dump.

johnsonjh · 2022-12-30T20:03:31Z

Strange, this is using a Red Hat supplied package (ksh-1.0.4 from https://kojipkgs.fedoraproject.org//packages/ksh/1.0.4/1.fc38/x86_64/ksh-1.0.4-1.fc38.x86_64.rpm) and this happens on a clean install on a Fedora 38 (Rawhide) container as well as on Fedora 37, and reproduces every time.

I’ll try rebuilding from source myself later just to rule out a bad package.

phidebian · 2022-12-31T06:36:57Z

May be RH setup on your vm has some limit (ulimit -a) not handled by ksh (mem, swap exhaustion) leading to seg fault ?

johnsonjh · 2022-12-31T06:40:09Z

I haven’t had a chance to look at it further, but that’s not the issue …

 » ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             unlimited
-c: core file size (blocks)         unlimited
-m: resident set size (kbytes)      unlimited
-u: processes                       unlimited
-n: file descriptors                99999998
-l: locked-in-memory size (kbytes)  unlimited
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 unlimited
-q: bytes in POSIX msg queues       unlimited
-e: max nice                        0
-r: max rt priority                 unlimited
-N 15: rt cpu time (microseconds)   unlimited

posguy99 · 2022-12-31T16:48:43Z

Avoiding the bigfile.

TC$ echo $KSH_VERSION
Version AJM 93u+m/1.1.0-alpha+4ec93745 2022-10-31

TC$  printf -v v '%02147483648d' 0
Segmentation fault (core dumped)

I get this one...

$ echo $KSH_VERSION                                                                                        
Version AJM 93u+m/1.1.0-alpha+4ec93745 2022-10-31

 $ printf -v v '%02147483648d' 0
Bus error
$

But not the bigfile one. Is my bigfile not "big" enough?

$ ls -la bigfile
-rw-r--r--  1 mwilson  staff  1199931414 Dec 31 08:36 bigfile
$ BIG="$(cat bigfile)"
$ print -v BIG | wc -l
 4208606
$

Well, SOMETHING happens if I make it big enough..

$ ls -la biggerfile
-rw-r--r--  1 mwilson  staff  3146375859 Dec 31 08:45 biggerfile
$ BIGGER="$(cat biggerfile)"
$ print -v BIGGER | wc -l
       1
$ wc -l biggerfile
 11035511 biggerfile
$

macOS 12.6.2 on an M1Pro.

phidebian · 2022-12-31T17:23:38Z

The printf version seems another issue (simple wrap around on buf size), and can easily be avoided.

So the real problem is with bigfile loading, I took care to set a bigfile with pure ascii data (no zero filled holes).

I still don't have segfault on fresh installed fedora (x86_64), yet I got resource problem with the cat version even though I got plenty of space (and swap space).

$ uname -a
Linux localhost.localdomain 6.0.15-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Dec 21 18:33:23 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ echo $KSH_VERSION
Version AJM 93u+m/1.0.3 2022-08-25

$ od /dev/random | dd bs=4K count=786432 >bigfile
786432+0 records in
786432+0 records out
3221225472 bytes (3.2 GB, 3.0 GiB) copied, 76.5784 s, 42.1 MB/s

$ ls -l bigfile 
-rw-r--r--. 1 phi phi 3221225472 Dec 31 18:07 bigfile

$ df /
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda2       12275848 9160200   2470272  79% /

$  sudo swapon --show
NAME       TYPE      SIZE USED PRIO
/dev/zram0 partition 5.3G   0B  100
/dev/sda3  partition   4G   0B   -2
/swapfile  file        4G   0B   -3

$ BIG="$(cat bigfile)"
cat: write error: No space left on device

This cat: error is strange since I thought I got plenty of space. doing < instead of cat.

$ BIG="$(< bigfile)"  
$ print -v BIG | wc -c
3221225473

No errors, no seg fault.

phidebian · 2023-01-02T06:18:18Z

On fedora fresh install in a VM guest I got a problem with the 'cat' version of your test case

$ BIG="$(cat bigfile)"
cat: write error: No space left on device

This is due to ksh93 using /dev/shm to implement pipe (instead of pipe(2)).

/dev/shm is implemented on top of tmpfs, and then is limited to 50% of you main memory, in my case my VM guest is 4Gb then /dev/shm is 2Gb, the size where I got the 'No space left on device' though I got plenty of space on my discs.

FED$ df /dev/shm
Filesystem     1K-blocks    Used Available Use% Mounted on
tmpfs            2761136 2761136         0 100% /dev/shm

To get rid of this you can umount this /dev/shm

But then pipes ends up in /tmp, still no joy.

FED$ df /tmp
Filesystem     1K-blocks    Used Available Use% Mounted on
tmpfs            2761140 2761140         0 100% /tmp

To get rid of this one comment out the [mount] section in /usr/lib/systemd/system/tmp.mount and reboot the VM
then again umount /dev/shm, at this point no more tmpfs for bot /dev/shm and /tmp

FED$ df | grep tmpfs
devtmpfs            4096       0      4096   0% /dev
tmpfs            1104456    1004   1103452   1% /run
tmpfs             552224       0    552224   0% /run/user/1000

FED$ ksh
$ BIG="$(cat /d/bigfile)"                                                    
$ print -v BIG | wc -c
3221225473
$

Note that tmpfs DON'T overflow to the swap device, as I did thought, at some point in time /tmp was a ram disk spilling to the swap device, and then the tmpfs got me by surprise.

I think that ksh should catch the error on the reading part of the pipe, i.e after the cat crash) and tell the user that the crash was due to tmpfs usage (i.e points to user to something to look for).

I think as well that a ksh option (or ENV var or both) should be used to tell ksh not to use tmpfs in src/lib/libast/sfio/sftmp.c::_tmpfd(), after the warning of tmpfs use above, tell the user to use the ksh option to avoid this.

Generally speaking, tmpfs is a bad good idea specially on system with small main memory, still capable of handling huge files, /tmp as always got an history of being filed up, it as always randomly crashed system when mishandled, tmpfs just exacerbate it. Generally, /tmp should handle small short lived files, and those file got to be cleaned up to recover FS space, often this is done by unlink them for auto clean at process exit, and still if not cleaned a reboot may do so, but for system that don't reboot... So bottom line, using tmpfs for long lived file that can be big is not good and is a misuse/abuse of tmpfs and thats what ksh does.

The purpose of pipe(2) was to exactly reduce the filesystem usage between 2 process, the consumer would produce a buffer, the consumer would grab it, then tell the consumer it can trash (reuse) its buffer to produce the next buffer, etc... the tmpfs usage to implement ksh pipe is bad because it is equivalent to make the producer produce its whole file, then the consumer read it (not exactly the same, but at least produced data is never cleaned up)

One could argue that the tmpfs pipe implementation avoid two buffer copy between userland andkernel, in this case ksh should use some heuristics to decide when to go the tmpfs way when the producer will not produce massive output, for instance $(echo blah) is ok with tmpfs.

So before cutting patch on this a discussion must be done to figure out all the option available.
But the way it is now is far from optimal, even close the a DOS.

johnsonjh · 2023-01-02T23:31:12Z

I’ll look into it a bit more later now that we’re in January, but the system I’ve been testing on has 128GB of memory, so the tmpfs limit isn’t a factor here, but all of this is a good observation.

phidebian · 2023-01-03T04:27:54Z

Yes the 'cat bigfile' case failure on 'small' system may be has nothing to do with your segfault situation. The fact is that ubuntu and fedora don't behave the same at the moment on my side.

May be ksh is just missing some more helpfull error message, it took me age to converge to tmpfs on the first No space left on device occurence.

EDIT:
Ok one diff between fedora and ubuntu is that fedora use tmpfs for /tmp while ubuntu do not (out of the box), that's why when removing (umount) /dev/shm fedora still crash, because ksh backup to /tmp when /dev/shm is not there and then still have the small uncontrolable tmp area.
I think that since ksh has many strategy to place the tmp anonymous pipe files, option/env should be available to fine tune what tmp area to use. But that's another story, should have its own issue number.

McDutchie · 2023-03-05T21:58:30Z

$ printf -v v "%$(getconf INT_MAX)d" 0
$ echo ${#v}
2147483647
$ printf -v v "%$(( $(getconf INT_MAX) + 1))d" 0
Bus error

Clearly, the problem is due to int being widely used for sizes. In the olden days it was inconceivable we'd ever have more than INT_MAX bytes of RAM, but here we are. The fix will be a comprehensive transition to using ssize_t for all variables containing sizes. This is going to take some serious work and testing.

McDutchie · 2023-03-05T22:03:21Z

Related (?) problem: AST getconf outputs a wrong (i.e. 32-bit) value for SSIZE_MAX.

$ uname -a
Darwin greybeard.local 21.6.0 Darwin Kernel Version 21.6.0: Mon Dec 19 20:43:09 PST 2022; root:xnu-8020.240.18~2/RELEASE_ARM64_T6000 arm64
$ /opt/ast/bin/getconf SSIZE_MAX
2147483647
$ /usr/bin/getconf SSIZE_MAX
9223372036854775807

McDutchie added bug Something is not working TODO Things to be done before releasing labels Mar 5, 2023

McDutchie changed the title ~~Segmentation fault trying to work with an extremely large variable (>2GB)~~ Transition all size variables to ssize_t Mar 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transition all size variables to `ssize_t` #592

Transition all size variables to `ssize_t` #592

johnsonjh commented Dec 28, 2022 •

edited

phidebian commented Dec 28, 2022

phidebian commented Dec 29, 2022

johnsonjh commented Dec 30, 2022

phidebian commented Dec 31, 2022

johnsonjh commented Dec 31, 2022

posguy99 commented Dec 31, 2022

phidebian commented Dec 31, 2022

phidebian commented Jan 2, 2023

johnsonjh commented Jan 2, 2023 via email

phidebian commented Jan 3, 2023 •

edited

McDutchie commented Mar 5, 2023

McDutchie commented Mar 5, 2023

Transition all size variables to ssize_t #592

Transition all size variables to ssize_t #592

Comments

johnsonjh commented Dec 28, 2022 • edited

phidebian commented Dec 28, 2022

phidebian commented Dec 29, 2022

johnsonjh commented Dec 30, 2022

phidebian commented Dec 31, 2022

johnsonjh commented Dec 31, 2022

posguy99 commented Dec 31, 2022

phidebian commented Dec 31, 2022

phidebian commented Jan 2, 2023

johnsonjh commented Jan 2, 2023 via email

phidebian commented Jan 3, 2023 • edited

McDutchie commented Mar 5, 2023

McDutchie commented Mar 5, 2023

Transition all size variables to `ssize_t` #592

Transition all size variables to `ssize_t` #592

johnsonjh commented Dec 28, 2022 •

edited

phidebian commented Jan 3, 2023 •

edited