Kill with zero exit code

Killing a UNIX process is easy; there is a command kill for doing just that. Technically it sends a signal to the process, causing the process's signal handler to be invoked. The handlers for most of the signals it is possible to send default to exiting with an error code. By default none will cause an exit with no error code.

But what if one wishes to kill a process in such a way that it returns a zero exit code (success)? This might be useful if something else is monitoring its exit code, and will take unwanted action if the exit code records an error. Simple use of kill will not suffice, but, provided that one can ptrace the process, it is possible to make it exit immediately with a zero exit code.

The following assumes Linux and x86_64.

Using ptrace one can stop a process, modify its memory and registers, and restart it. So the question becomes how to modify its memory and registers so that, on being restarted, it immediately exits.

When a process wishes to exit, it calls the kernel. For x86_64 Linux, the instruction to make a kernel call is syscall, and its op code is two bytes, 0x0f 0x05. The call to be made is passed in register %rax, with a value of 231 corresponding to sys_exit_group, the standard exit call. (A value of 60 corresponds to sys_exit, but that causes just the single calling thread to exit.) The value that will be returned as the process's error code on exit is passed in the register %rdi. This is almost equivalent to the C library call of _exit(), but not the same as exit() which also flushes buffers etc.

So the recipe is simple. Stop the process. Place 231 in %rax and 0 in %rdi. Place the two bytes 0x0f 0x05 at the address of the next instruction to be executed, that is the address given by %rip, and then restart the process. Noting that Intel is little-endian, so, as a word 0x0f 0x05 is 0x050f, the code will look something like:

#include <stdio.h>
#include <stdlib.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <errno.h>

int main(int argc, char * argv[]){

  pid_t pid;
  long err;
  int status;
  struct user_regs_struct regs;
  
  pid=atoi(argv[1]);

  if (!pid) {
    fprintf(stderr,"Usage: %s pid\n",argv[0]);
    exit(1);
  }

  err=ptrace(PTRACE_ATTACH,pid,NULL,NULL);
  if (err) {perror(NULL); exit(1);}
    fprintf(stderr,"Successfully attached\n");

  waitpid(pid,&status,WUNTRACED);
  fprintf(stderr,"Wait over\n");

  ptrace(PTRACE_GETREGS, pid, NULL, &regs);
  if (err) {perror(NULL); exit(1);}
  fprintf(stderr,"Registers fetched\n");

  regs.rax=231;  /* sys_exit_group */
  regs.rdi=0;
  ptrace(PTRACE_SETREGS, pid, NULL, &regs);
  if (err) {perror(NULL); exit(1);}

  fprintf(stderr,"Registers set\n");

  ptrace(PTRACE_POKETEXT, pid, (void*)regs.rip, (void*)0x050f);
  
  ptrace(PTRACE_DETACH, pid, NULL, NULL);
  fprintf(stderr,"Target resumes\n");
  exit(0);
}

This skeletal program has little in the way of error checking or help, and takes as its sole argument the PID of the process to kill.

Depending on the paranoia level of one's kernel,

# echo 0 > /proc/sys/kernel/yama/ptrace_scope

may be necessary before users can use this on their own processes.

How can one test this? In one window type

$ sleep 1000 && echo OK

Then, from another window, compare the results of killing the sleep process (after finding its PID with something like ps aux | grep sleep ) to using the above code on the sleep process. A simple kill gives

$ sleep 1000 && echo OK
Terminated
$

whereas the above method yields

$ sleep 1000 && echo OK
OK
$

showing that this time sleep exited with a zero return code so the echo command ran.