OSDN Git Service

qmiga/qemu.git
4 years agovirtiofsd: fix libfuse information leaks
Stefan Hajnoczi [Fri, 22 Nov 2019 11:31:30 +0000 (11:31 +0000)]
virtiofsd: fix libfuse information leaks

Some FUSE message replies contain padding fields that are not
initialized by libfuse.  This is fine in traditional FUSE applications
because the kernel is trusted.  virtiofsd does not trust the guest and
must not expose uninitialized memory.

Use C struct initializers to automatically zero out memory.  Not all of
these code changes are strictly necessary but they will prevent future
information leaks if the structs are extended.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: set maximum RLIMIT_NOFILE limit
Stefan Hajnoczi [Fri, 22 Mar 2019 15:54:13 +0000 (15:54 +0000)]
virtiofsd: set maximum RLIMIT_NOFILE limit

virtiofsd can exceed the default open file descriptor limit easily on
most systems.  Take advantage of the fact that it runs as root to raise
the limit.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Drop CAP_FSETID if client asked for it
Vivek Goyal [Tue, 13 Aug 2019 19:29:44 +0000 (15:29 -0400)]
virtiofsd: Drop CAP_FSETID if client asked for it

If client requested killing setuid/setgid bits on file being written, drop
CAP_FSETID capability so that setuid/setgid bits are cleared upon write
automatically.

pjdfstest chown/12.t needs this.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
  dgilbert: reworked for libcap-ng
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: cap-ng helpers
Dr. David Alan Gilbert [Tue, 3 Dec 2019 12:23:44 +0000 (12:23 +0000)]
virtiofsd: cap-ng helpers

libcap-ng reads /proc during capng_get_caps_process, and virtiofsd's
sandboxing doesn't have /proc mounted; thus we have to do the
caps read before we sandbox it and save/restore the state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Parse flag FUSE_WRITE_KILL_PRIV
Vivek Goyal [Tue, 13 Aug 2019 19:29:42 +0000 (15:29 -0400)]
virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV

Caller can set FUSE_WRITE_KILL_PRIV in write_flags. Parse it and pass it
to the filesystem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: add seccomp whitelist
Stefan Hajnoczi [Wed, 13 Mar 2019 09:32:51 +0000 (09:32 +0000)]
virtiofsd: add seccomp whitelist

Only allow system calls that are needed by virtiofsd.  All other system
calls cause SIGSYS to be directed at the thread and the process will
coredump.

Restricting system calls reduces the kernel attack surface and limits
what the process can do when compromised.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
with additional entries by:
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: piaojun <piaojun@huawei.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: move to a new pid namespace
Stefan Hajnoczi [Wed, 16 Oct 2019 16:01:57 +0000 (17:01 +0100)]
virtiofsd: move to a new pid namespace

virtiofsd needs access to /proc/self/fd.  Let's move to a new pid
namespace so that a compromised process cannot see another other
processes running on the system.

One wrinkle in this approach: unshare(CLONE_NEWPID) affects *child*
processes and not the current process.  Therefore we need to fork the
pid 1 process that will actually run virtiofsd and leave a parent in
waitpid(2).  This is not the same thing as daemonization and parent
processes should not notice a difference.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: move to an empty network namespace
Stefan Hajnoczi [Wed, 16 Oct 2019 16:01:56 +0000 (17:01 +0100)]
virtiofsd: move to an empty network namespace

If the process is compromised there should be no network access.  Use an
empty network namespace to sandbox networking.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: sandbox mount namespace
Stefan Hajnoczi [Tue, 12 Mar 2019 15:51:38 +0000 (15:51 +0000)]
virtiofsd: sandbox mount namespace

Use a mount namespace with the shared directory tree mounted at "/" and
no other mounts.

This prevents symlink escape attacks because symlink targets are
resolved only against the shared directory and cannot go outside it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: use /proc/self/fd/ O_PATH file descriptor
Stefan Hajnoczi [Tue, 12 Mar 2019 15:48:50 +0000 (15:48 +0000)]
virtiofsd: use /proc/self/fd/ O_PATH file descriptor

Sandboxing will remove /proc from the mount namespace so we can no
longer build string paths into "/proc/self/fd/...".

Keep an O_PATH file descriptor so we can still re-open fds via
/proc/self/fd.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: prevent ".." escape in lo_do_readdir()
Stefan Hajnoczi [Tue, 5 Mar 2019 09:32:49 +0000 (09:32 +0000)]
virtiofsd: prevent ".." escape in lo_do_readdir()

Construct a fake dirent for the root directory's ".." entry.  This hides
the parent directory from the FUSE client.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: prevent ".." escape in lo_do_lookup()
Stefan Hajnoczi [Mon, 4 Mar 2019 10:38:46 +0000 (10:38 +0000)]
virtiofsd: prevent ".." escape in lo_do_lookup()

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: check input buffer size in fuse_lowlevel.c ops
Stefan Hajnoczi [Thu, 28 Feb 2019 16:38:31 +0000 (16:38 +0000)]
virtiofsd: check input buffer size in fuse_lowlevel.c ops

Each FUSE operation involves parsing the input buffer.  Currently the
code assumes the input buffer is large enough for the expected
arguments.  This patch uses fuse_mbuf_iter to check the size.

Most operations are simple to convert.  Some are more complicated due to
variable-length inputs or different sizes depending on the protocol
version.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: validate input buffer sizes in do_write_buf()
Stefan Hajnoczi [Thu, 28 Feb 2019 11:25:32 +0000 (11:25 +0000)]
virtiofsd: validate input buffer sizes in do_write_buf()

There is a small change in behavior: if fuse_write_in->size doesn't
match the input buffer size then the request is failed.  Previously
write requests with 1 fuse_buf element would truncate to
fuse_write_in->size.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: add fuse_mbuf_iter API
Stefan Hajnoczi [Thu, 28 Feb 2019 10:30:20 +0000 (10:30 +0000)]
virtiofsd: add fuse_mbuf_iter API

Introduce an API for consuming bytes from a buffer with size checks.
All FUSE operations will be converted to use this safe API instead of
void *inarg.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Pass write iov's all the way through
Dr. David Alan Gilbert [Fri, 4 Jan 2019 16:47:39 +0000 (16:47 +0000)]
virtiofsd: Pass write iov's all the way through

Pass the write iov pointing to guest RAM all the way through rather
than copying the data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Plumb fuse_bufvec through to do_write_buf
Dr. David Alan Gilbert [Fri, 4 Jan 2019 18:23:00 +0000 (18:23 +0000)]
virtiofsd: Plumb fuse_bufvec through to do_write_buf

Let fuse_session_process_buf_int take a fuse_bufvec * instead of a
fuse_buf;  and then through to do_write_buf - where in the best
case it can pass that straight through to op.write_buf without copying
(other than skipping a header).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: validate path components
Stefan Hajnoczi [Tue, 26 Feb 2019 17:58:59 +0000 (17:58 +0000)]
virtiofsd: validate path components

Several FUSE requests contain single path components.  A correct FUSE
client sends well-formed path components but there is currently no input
validation in case something went wrong or the client is malicious.

Refuse ".", "..", and paths containing '/' when we expect a path
component.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: passthrough_ll: add fallback for racy ops
Miklos Szeredi [Wed, 14 Nov 2018 15:52:03 +0000 (16:52 +0100)]
virtiofsd: passthrough_ll: add fallback for racy ops

We have two operations that cannot be done race-free on a symlink in
certain cases: utimes and link.

Add racy fallback for these if the race-free method doesn't work.  We do
our best to avoid races even in this case:

  - get absolute path by reading /proc/self/fd/NN symlink

  - lookup parent directory: after this we are safe against renames in
    ancestors

  - lookup name in parent directory, and verify that we got to the original
    inode,  if not retry the whole thing

Both utimes(2) and link(2) hold i_lock on the inode across the operation,
so a racing rename/delete by this fuse instance is not possible, only from
other entities changing the filesystem.

If the "norace" option is given, then disable the racy fallbacks.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: passthrough_ll: add fd_map to hide file descriptors
Stefan Hajnoczi [Thu, 31 Jan 2019 06:02:40 +0000 (14:02 +0800)]
virtiofsd: passthrough_ll: add fd_map to hide file descriptors

Do not expose file descriptor numbers to clients.  This prevents the
abuse of internal file descriptors (like stdin/stdout).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Fix from:
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
dgilbert:
  Added lseek
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers
Stefan Hajnoczi [Thu, 31 Jan 2019 06:01:28 +0000 (14:01 +0800)]
virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers

Do not expose lo_dirp pointers to clients.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers
Stefan Hajnoczi [Thu, 31 Jan 2019 04:57:34 +0000 (12:57 +0800)]
virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers

Do not expose lo_inode pointers to clients.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: passthrough_ll: add lo_map for ino/fh indirection
Stefan Hajnoczi [Thu, 31 Jan 2019 04:49:39 +0000 (12:49 +0800)]
virtiofsd: passthrough_ll: add lo_map for ino/fh indirection

A layer of indirection is needed because passthrough_ll cannot expose
pointers or file descriptor numbers to untrusted clients.  Malicious
clients could send invalid pointers or file descriptors in order to
crash or exploit the file system daemon.

lo_map provides an integer key->value mapping.  This will be used for
ino and fh fields in the patches that follow.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: passthrough_ll: create new files in caller's context
Vivek Goyal [Wed, 15 Aug 2018 15:05:29 +0000 (17:05 +0200)]
virtiofsd: passthrough_ll: create new files in caller's context

We need to create files in the caller's context. Otherwise after
creating a file, the caller might not be able to do file operations on
that file.

Changed effective uid/gid to caller's uid/gid, create file and then
switch back to uid/gid 0.

Use syscall(setresuid, ...) otherwise glibc does some magic to change EUID
in all threads, which is not what we want.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofs: Add maintainers entry
Dr. David Alan Gilbert [Mon, 21 Oct 2019 10:41:36 +0000 (11:41 +0100)]
virtiofs: Add maintainers entry

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: add --print-capabilities option
Stefan Hajnoczi [Tue, 27 Aug 2019 09:54:35 +0000 (10:54 +0100)]
virtiofsd: add --print-capabilities option

Add the --print-capabilities option as per vhost-user.rst "Backend
programs conventions".  Currently there are no advertised features.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: add vhost-user.json file
Stefan Hajnoczi [Tue, 27 Aug 2019 09:54:37 +0000 (10:54 +0100)]
virtiofsd: add vhost-user.json file

Install a vhost-user.json file describing virtiofsd.  This allows
libvirt and other management tools to enumerate vhost-user backend
programs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: make -f (foreground) the default
Stefan Hajnoczi [Tue, 27 Aug 2019 09:54:34 +0000 (10:54 +0100)]
virtiofsd: make -f (foreground) the default

According to vhost-user.rst "Backend program conventions", backend
programs should run in the foregound by default.  Follow the
conventions so libvirt and other management tools can control virtiofsd
in a standard way.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: add --fd=FDNUM fd passing option
Stefan Hajnoczi [Tue, 25 Jun 2019 16:18:00 +0000 (17:18 +0100)]
virtiofsd: add --fd=FDNUM fd passing option

Although --socket-path=PATH is useful for manual invocations, management
tools typically create the UNIX domain socket themselves and pass it to
the vhost-user device backend.  This way QEMU can be launched
immediately with a valid socket.  No waiting for the vhost-user device
backend is required when fd passing is used.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Fast path for virtio read
Dr. David Alan Gilbert [Wed, 15 Aug 2018 19:26:05 +0000 (20:26 +0100)]
virtiofsd: Fast path for virtio read

Readv the data straight into the guests buffer.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
With fix by:
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Add Makefile wiring for virtiofsd contrib
Dr. David Alan Gilbert [Thu, 7 Feb 2019 12:17:21 +0000 (12:17 +0000)]
virtiofsd: Add Makefile wiring for virtiofsd contrib

Wire up the building of the virtiofsd in tools.

virtiofsd relies on Linux-specific system calls and seccomp.  Anyone
wishing to port it to other host operating systems should do so
carefully and without reducing security.

Only allow building on Linux hosts.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Keep track of replies
Dr. David Alan Gilbert [Thu, 21 Jun 2018 09:38:03 +0000 (10:38 +0100)]
virtiofsd: Keep track of replies

Keep track of whether we sent a reply to a request; this is a bit
paranoid but it means:
  a) We should always recycle an element even if there was an error
     in the request
  b) Never try and send two replies on one queue element

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Send replies to messages
Dr. David Alan Gilbert [Mon, 18 Jun 2018 17:46:01 +0000 (18:46 +0100)]
virtiofsd: Send replies to messages

Route fuse out messages back through the same queue elements
that had the command that triggered the request.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Start reading commands from queue
Dr. David Alan Gilbert [Thu, 14 Jun 2018 18:52:23 +0000 (19:52 +0100)]
virtiofsd: Start reading commands from queue

Pop queue elements off queues, copy the data from them and
pass that to fuse.

  Note: 'out' in a VuVirtqElement is from QEMU
        'in' in libfuse is into the daemon

  So we read from the out iov's to get a fuse_in_header

When we get a kick we've got to read all the elements until the queue
is empty.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Poll kick_fd for queue
Dr. David Alan Gilbert [Thu, 14 Jun 2018 11:07:07 +0000 (12:07 +0100)]
virtiofsd: Poll kick_fd for queue

In the queue thread poll the kick_fd we're passed.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Start queue threads
Dr. David Alan Gilbert [Wed, 13 Jun 2018 19:17:51 +0000 (20:17 +0100)]
virtiofsd: Start queue threads

Start a thread for each queue when we get notified it's been started.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
fix by:
Signed-off-by: Jun Piao <piaojun@huawei.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: get/set features callbacks
Dr. David Alan Gilbert [Wed, 13 Jun 2018 19:17:17 +0000 (20:17 +0100)]
virtiofsd: get/set features callbacks

Add the get/set features callbacks.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Add main virtio loop
Dr. David Alan Gilbert [Tue, 12 Jun 2018 15:31:24 +0000 (16:31 +0100)]
virtiofsd: Add main virtio loop

Processes incoming requests on the vhost-user fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Start wiring up vhost-user
Dr. David Alan Gilbert [Fri, 8 Jun 2018 18:59:20 +0000 (19:59 +0100)]
virtiofsd: Start wiring up vhost-user

Listen on our unix socket for the connection from QEMU, when we get it
initialise vhost-user and dive into our own loop variant (currently
dummy).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Open vhost connection instead of mounting
Dr. David Alan Gilbert [Thu, 7 Jun 2018 19:11:14 +0000 (20:11 +0100)]
virtiofsd: Open vhost connection instead of mounting

When run with vhost-user options we conect to the QEMU instead
via a socket.  Start this off by creating the socket.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: add -o source=PATH to help output
Stefan Hajnoczi [Fri, 8 Mar 2019 12:23:55 +0000 (12:23 +0000)]
virtiofsd: add -o source=PATH to help output

The -o source=PATH option will be used by most command-line invocations.
Let's document it!

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Add options for virtio
Dr. David Alan Gilbert [Thu, 7 Jun 2018 16:22:33 +0000 (17:22 +0100)]
virtiofsd: Add options for virtio

Add options to specify parameters for virtio-fs paths, i.e.

   ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Make fsync work even if only inode is passed in
Vivek Goyal [Thu, 30 Aug 2018 18:22:10 +0000 (14:22 -0400)]
virtiofsd: Make fsync work even if only inode is passed in

If caller has not sent file handle in request, then using inode, retrieve
the fd opened using O_PATH and use that to open file again and issue
fsync. This will be needed when dax_flush() calls fsync. At that time
we only have inode information (and not file).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovitriofsd/passthrough_ll: fix fallocate() ifdefs
Xiao Yang [Mon, 13 Jan 2020 09:37:34 +0000 (17:37 +0800)]
vitriofsd/passthrough_ll: fix fallocate() ifdefs

1) Use correct CONFIG_FALLOCATE macro to check if fallocate() is supported.(i.e configure
   script sets CONFIG_FALLOCATE intead of HAVE_FALLOCATE if fallocate() is supported)
2) Replace HAVE_POSIX_FALLOCATE with CONFIG_POSIX_FALLOCATE.

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Merged from two of Xiao Yang's patches

4 years agovirtiofsd: Trim out compatibility code
Dr. David Alan Gilbert [Wed, 27 Nov 2019 17:31:24 +0000 (17:31 +0000)]
virtiofsd: Trim out compatibility code

virtiofsd only supports major=7, minor>=31; trim out a lot of
old compatibility code.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Fix common header and define for QEMU builds
Dr. David Alan Gilbert [Fri, 8 Feb 2019 11:48:42 +0000 (11:48 +0000)]
virtiofsd: Fix common header and define for QEMU builds

All of the fuse files include config.h and define GNU_SOURCE
where we don't have either under our build - remove them.
Fixup path to the kernel's fuse.h in the QEMUs world.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Fix fuse_daemonize ignored return values
Dr. David Alan Gilbert [Fri, 8 Feb 2019 16:05:24 +0000 (16:05 +0000)]
virtiofsd: Fix fuse_daemonize ignored return values

QEMU's compiler enables warnings/errors for ignored values
and the (void) trick used in the fuse code isn't enough.
Turn all the return values into a return value on the function.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Remove unused enum fuse_buf_copy_flags
Xiao Yang [Wed, 22 Jan 2020 02:40:08 +0000 (10:40 +0800)]
virtiofsd: Remove unused enum fuse_buf_copy_flags

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: remove unused notify reply support
Stefan Hajnoczi [Thu, 28 Feb 2019 11:22:58 +0000 (11:22 +0000)]
virtiofsd: remove unused notify reply support

Notify reply support is unused by virtiofsd.  The code would need to be
updated to validate input buffer sizes.  Remove this unused code since
changes to it are untestable.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: remove mountpoint dummy argument
Stefan Hajnoczi [Fri, 8 Mar 2019 13:24:31 +0000 (13:24 +0000)]
virtiofsd: remove mountpoint dummy argument

Classic FUSE file system daemons take a mountpoint argument but
virtiofsd exposes a vhost-user UNIX domain socket instead.  The
mountpoint argument is not used by virtiofsd but the user is still
required to pass a dummy argument on the command-line.

Remove the mountpoint argument to clean up the command-line.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Format imported files to qemu style
Dr. David Alan Gilbert [Mon, 9 Dec 2019 19:53:47 +0000 (19:53 +0000)]
virtiofsd: Format imported files to qemu style

Mostly using a set like:

indent -nut -i 4 -nlp -br -cs -ce --no-space-after-function-call-names file
clang-format -style=file -i -- file
clang-tidy -fix-errors -checks=readability-braces-around-statements file
clang-format -style=file -i -- file

With manual cleanups.

The .clang-format used is below.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed by: Aleksandar Markovic <amarkovic@wavecomp.com>

Language:        Cpp
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false # although we like it, it creates churn
AlignConsecutiveDeclarations: false
AlignEscapedNewlinesLeft: true
AlignOperands:   true
AlignTrailingComments: false # churn
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: None
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterReturnType: None # AlwaysBreakAfterDefinitionReturnType is taken into account
AlwaysBreakBeforeMultilineStrings: false
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
  AfterControlStatement: false
  AfterEnum:       false
  AfterFunction:   true
  AfterStruct:     false
  AfterUnion:      false
  BeforeElse:      false
  IndentBraces:    false
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Custom
BreakBeforeTernaryOperators: false
BreakStringLiterals: true
ColumnLimit:     80
ContinuationIndentWidth: 4
Cpp11BracedListStyle: false
DerivePointerAlignment: false
DisableFormat:   false
ForEachMacros:   [
  'CPU_FOREACH',
  'CPU_FOREACH_REVERSE',
  'CPU_FOREACH_SAFE',
  'IOMMU_NOTIFIER_FOREACH',
  'QLIST_FOREACH',
  'QLIST_FOREACH_ENTRY',
  'QLIST_FOREACH_RCU',
  'QLIST_FOREACH_SAFE',
  'QLIST_FOREACH_SAFE_RCU',
  'QSIMPLEQ_FOREACH',
  'QSIMPLEQ_FOREACH_SAFE',
  'QSLIST_FOREACH',
  'QSLIST_FOREACH_SAFE',
  'QTAILQ_FOREACH',
  'QTAILQ_FOREACH_REVERSE',
  'QTAILQ_FOREACH_SAFE',
  'QTAILQ_RAW_FOREACH',
  'RAMBLOCK_FOREACH'
]
IncludeCategories:
  - Regex:           '^"qemu/osdep.h'
    Priority:        -3
  - Regex:           '^"(block|chardev|crypto|disas|exec|fpu|hw|io|libdecnumber|migration|monitor|net|qapi|qemu|qom|standard-headers|sysemu|ui)/'
    Priority:        -2
  - Regex:           '^"(elf.h|qemu-common.h|glib-compat.h|qemu-io.h|trace-tcg.h)'
    Priority:        -1
  - Regex:           '.*'
    Priority:        1
IncludeIsMainRegex: '$'
IndentCaseLabels: false
IndentWidth:     4
IndentWrappedFunctionNames: false
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: '.*_BEGIN$' # only PREC_BEGIN ?
MacroBlockEnd:   '.*_END$'
MaxEmptyLinesToKeep: 2
PointerAlignment: Right
ReflowComments:  true
SortIncludes:    true
SpaceAfterCStyleCast: false
SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: ControlStatements
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInContainerLiterals: true
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard:        Auto
UseTab:          Never
...

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Trim down imported files
Dr. David Alan Gilbert [Fri, 8 Feb 2019 12:49:54 +0000 (12:49 +0000)]
virtiofsd: Trim down imported files

There's a lot of the original fuse code we don't need; trim them down.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with additional trimming by:
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Add passthrough_ll
Dr. David Alan Gilbert [Thu, 28 Nov 2019 12:22:49 +0000 (12:22 +0000)]
virtiofsd: Add passthrough_ll

passthrough_ll is one of the examples in the upstream fuse project
and is the main part of our daemon here.  It passes through requests
from fuse to the underlying filesystem, using syscalls as directly
as possible.

From libfuse fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Fixed up 'GPL' to 'GPLv2' as per Dan's comments and consistent
  with the 'LICENSE' file in libfuse;  patch sent to libfuse to fix
  it upstream.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Add fuse_lowlevel.c
Dr. David Alan Gilbert [Thu, 28 Nov 2019 12:22:49 +0000 (12:22 +0000)]
virtiofsd: Add fuse_lowlevel.c

fuse_lowlevel is one of the largest files from the library
and does most of the work.  Add it separately to keep the diff
sizes small.
Again this is from upstream fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Add auxiliary .c's
Dr. David Alan Gilbert [Thu, 28 Nov 2019 12:22:49 +0000 (12:22 +0000)]
virtiofsd: Add auxiliary .c's

Add most of the non-main .c files we need from upstream fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Pull in kernel's fuse.h
Dr. David Alan Gilbert [Mon, 30 Sep 2019 13:01:47 +0000 (14:01 +0100)]
virtiofsd: Pull in kernel's fuse.h

Update scripts/update-linux-headers.sh to add fuse.h and
use it to pull in fuse.h from the kernel; from v5.5-rc1

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agovirtiofsd: Pull in upstream headers
Dr. David Alan Gilbert [Thu, 28 Nov 2019 12:22:49 +0000 (12:22 +0000)]
virtiofsd: Pull in upstream headers

Pull in headers fromlibfuse's upstream fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
4 years agoMerge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-5.0-pull-request...
Peter Maydell [Thu, 23 Jan 2020 14:38:43 +0000 (14:38 +0000)]
Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging

Fix mmap guest space and brk
Add FS/FD/RTC/KCOV ioctls

# gpg: Signature made Thu 23 Jan 2020 08:21:41 GMT
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/linux-user-for-5.0-pull-request:
  linux-user: Add support for read/clear RTC voltage low detector using ioctls
  linux-user: Add support for getting/setting RTC PLL correction using ioctls
  linux-user: Add support for getting/setting RTC wakeup alarm using ioctls
  linux-user: Add support for getting/setting RTC periodic interrupt and epoch using ioctls
  linux-user: Add support for getting/setting RTC time and alarm using ioctls
  linux-user: Add support for enabling/disabling RTC features using ioctls
  linux-user: Add support for TYPE_LONG and TYPE_ULONG in do_ioctl()
  linux-user: Add support for KCOV_INIT_TRACE ioctl
  linux-user: Add support for KCOV_<ENABLE|DISABLE> ioctls
  configure: Detect kcov support and introduce CONFIG_KCOV
  linux-user: Add support for FDFMT<BEG|TRK|END> ioctls
  linux-user: Add support for FD<SETEMSGTRESH|SETMAXERRS|GETMAXERRS> ioctls
  linux-user: Add support for FS_IOC32_<GET|SET>VERSION ioctls
  linux-user: Add support for FS_IOC32_<GET|SET>FLAGS ioctls
  linux-user: Add support for FS_IOC_<GET|SET>VERSION ioctls
  linux-user: Reserve space for brk
  linux-user:Fix align mistake when mmap guest space

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4 years agoMerge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
Peter Maydell [Thu, 23 Jan 2020 13:41:47 +0000 (13:41 +0000)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

virtio, pc: fixes, features

Bugfixes all over the place.
CPU hotplug with secureboot.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Thu 23 Jan 2020 07:08:32 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  vhost: coding style fix
  i386:acpi: Remove _HID from the SMBus ACPI entry
  vhost: Only align sections for vhost-user
  vhost: Add names to section rounded warning
  vhost-vsock: delete vqs in vhost_vsock_unrealize to avoid memleaks
  virtio-scsi: convert to new virtio_delete_queue
  virtio-scsi: delete vqs in unrealize to avoid memleaks
  virtio-9p-device: convert to new virtio_delete_queue
  virtio-9p-device: fix memleak in virtio_9p_device_unrealize
  bios-tables-test: document expected file update
  acpi: cpuhp: add CPHP_GET_CPU_ID_CMD command
  acpi: cpuhp: spec: add typical usecases
  acpi: cpuhp: introduce 'Command data 2' field
  acpi: cpuhp: spec: clarify store into 'Command data' when 'Command field' == 0
  acpi: cpuhp: spec: fix 'Command data' description
  acpi: cpuhp: spec: clarify 'CPU selector' register usage and endianness
  tests: q35: MCH: add default SMBASE SMRAM lock test
  q35: implement 128K SMRAM at default SMBASE address

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4 years agoMerge remote-tracking branch 'remotes/rth/tags/pull-tcg-20200121' into staging
Peter Maydell [Thu, 23 Jan 2020 13:01:14 +0000 (13:01 +0000)]
Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20200121' into staging

Remove another limit to NB_MMU_MODES.
Fix compilation using uclibc.
Fix defaulting of -accel parameters.
Tidy cputlb basic routines.
Adjust git.orderfile for decodetree.

# gpg: Signature made Wed 22 Jan 2020 02:44:18 GMT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20200121:
  scripts/git.orderfile: Display decodetree before C source
  cputlb: Hoist timestamp outside of loops over tlbs
  cputlb: Initialize tlbs as flushed
  cputlb: Partially merge tlb_dyn_init into tlb_init
  cputlb: Split out tlb_mmu_flush_locked
  cputlb: Hoist tlb portions in tlb_flush_one_mmuidx_locked
  cputlb: Hoist tlb portions in tlb_mmu_resize_locked
  cputlb: Pass CPUTLBDescFast to tlb_n_entries and sizeof_tlb
  cputlb: Make tlb_n_entries private to cputlb.c
  cputlb: Merge tlb_table_flush_by_mmuidx into tlb_flush_one_mmuidx_locked
  vl: Only choose enabled accelerators in configure_accelerators
  vl: Remove useless test in configure_accelerators
  vl: Reduce scope of variables in configure_accelerators
  vl: Remove unused variable in configure_accelerators
  util/cacheinfo: fix crash when compiling with uClibc
  cputlb: Handle NB_MMU_MODES > TARGET_PAGE_BITS_MIN

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4 years agovhost: coding style fix
Michael S. Tsirkin [Wed, 22 Jan 2020 08:06:47 +0000 (03:06 -0500)]
vhost: coding style fix

Drop a trailing whitespace. Make line shorter.

Fixes: 76525114736e8 ("vhost: Only align sections for vhost-user")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
4 years agolinux-user: Add support for read/clear RTC voltage low detector using ioctls
Filip Bozuta [Wed, 15 Jan 2020 19:36:40 +0000 (20:36 +0100)]
linux-user: Add support for read/clear RTC voltage low detector using ioctls

This patch implements functionalities of following ioctls:

RTC_VL_READ - Read voltage low detection information

    Read the voltage low for RTCs that support voltage low.
    The third ioctl's' argument points to an int in which
    the voltage low is returned.

RTC_VL_CLR - Clear voltage low information

    Clear the information about voltage low for RTCs that
    support voltage low. The third ioctl(2) argument is
    ignored.

Implementation notes:

    Since one ioctl has a pointer to 'int' as its third agrument,
    and another ioctl has NULL as its third argument, their
    implementation was straightforward.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Filip Bozuta <Filip.Bozuta@rt-rk.com>
Message-Id: <1579117007-7565-7-git-send-email-Filip.Bozuta@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for getting/setting RTC PLL correction using ioctls
Filip Bozuta [Wed, 15 Jan 2020 19:36:39 +0000 (20:36 +0100)]
linux-user: Add support for getting/setting RTC PLL correction using ioctls

This patch implements functionalities of following ioctls:

RTC_PLL_GET - Getting PLL correction

    Read the PLL correction for RTCs that support PLL. The PLL correction
    is returned in the following structure:

        struct rtc_pll_info {
            int pll_ctrl;        /* placeholder for fancier control */
            int pll_value;       /* get/set correction value */
            int pll_max;         /* max +ve (faster) adjustment value */
            int pll_min;         /* max -ve (slower) adjustment value */
            int pll_posmult;     /* factor for +ve correction */
            int pll_negmult;     /* factor for -ve correction */
            long pll_clock;      /* base PLL frequency */
        };

    A pointer to this structure should be passed as the third
    ioctl's argument.

RTC_PLL_SET - Setting PLL correction

    Sets the PLL correction for RTCs that support PLL. The PLL correction
    that is set is specified by the rtc_pll_info structure pointed to by
    the third ioctl's' argument.

Implementation notes:

    All ioctls in this patch have a pointer to a structure rtc_pll_info
    as their third argument. All elements of this structure are of
    type 'int', except the last one that is of type 'long'. That is
    the reason why a separate target structure (target_rtc_pll_info)
    is defined in linux-user/syscall_defs. The rest of the
    implementation is straightforward.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Filip Bozuta <Filip.Bozuta@rt-rk.com>
Message-Id: <1579117007-7565-6-git-send-email-Filip.Bozuta@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for getting/setting RTC wakeup alarm using ioctls
Filip Bozuta [Wed, 15 Jan 2020 19:36:38 +0000 (20:36 +0100)]
linux-user: Add support for getting/setting RTC wakeup alarm using ioctls

This patch implements functionalities of following ioctls:

RTC_WKALM_SET, RTC_WKALM_GET - Getting/Setting wakeup alarm

    Some RTCs support a more powerful alarm interface, using these
    ioctls to read or write the RTC's alarm time (respectively)
    with this structure:

        struct rtc_wkalrm {
            unsigned char enabled;
            unsigned char pending;
            struct rtc_time time;
        };

    The enabled flag is used to enable or disable the alarm
    interrupt, or to read its current status; when using these
    calls, RTC_AIE_ON and RTC_AIE_OFF are not used. The pending
    flag is used by RTC_WKALM_RD to report a pending interrupt
    (so it's mostly useless on Linux, except when talking to the
    RTC managed by EFI firmware). The time field is as used with
    RTC_ALM_READ and RTC_ALM_SET except that the tm_mday, tm_mon,
    and tm_year fields are also valid. A pointer to this structure
    should be passed as the third ioctl's argument.

Implementation notes:

    All ioctls in this patch have a pointer to a structure
    rtc_wkalrm as their third argument. That is the reason why
    corresponding definition is added in linux-user/syscall_types.h.
    Since all  elements of this structure are either of type
    'unsigned char' or 'struct rtc_time' (that was covered in one
    of previous patches), the rest of the implementation is
    straightforward.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Filip Bozuta <Filip.Bozuta@rt-rk.com>
Message-Id: <1579117007-7565-5-git-send-email-Filip.Bozuta@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for getting/setting RTC periodic interrupt and epoch using...
Filip Bozuta [Wed, 15 Jan 2020 19:36:37 +0000 (20:36 +0100)]
linux-user: Add support for getting/setting RTC periodic interrupt and epoch using ioctls

This patch implements functionalities of following ioctls:

RTC_IRQP_READ, RTC_IRQP_SET - Getting/Setting IRQ rate

    Read and set the frequency for periodic interrupts, for RTCs
    that support periodic interrupts. The periodic interrupt must
    be separately enabled or disabled using the RTC_PIE_ON,
    RTC_PIE_OFF requests. The third ioctl's argument is an
    unsigned long * or an unsigned long, respectively. The value
    is the frequency in interrupts per second. The set of allow‐
    able frequencies is the multiples of two in the range 2 to
    8192. Only a privileged process (i.e., one having the
    CAP_SYS_RESOURCE capability) can set frequencies above the
    value specified in /proc/sys/dev/rtc/max-user-freq. (This
    file contains the value 64 by default.)

RTC_EPOCH_READ, RTC_EPOCH_SET - Getting/Setting epoch

    Many RTCs encode the year in an 8-bit register which is either
    interpreted as an 8-bit binary number or as a BCD number. In
    both cases, the number is interpreted relative to this RTC's
    Epoch. The RTC's Epoch is initialized to 1900 on most systems
    but on Alpha and MIPS it might also be initialized to 1952,
    1980, or 2000, depending on the value of an RTC register for
    the year. With some RTCs, these operations can be used to
    read or to set the RTC's Epoch, respectively. The third
    ioctl's argument is an unsigned long * or an unsigned long,
    respectively, and the value returned (or assigned) is the
    Epoch. To set the RTC's Epoch the process must be privileged
    (i.e., have the CAP_SYS_TIME capability).

Implementation notes:

    All ioctls in this patch have a pointer to 'ulong' as their
    third argument. That is the reason why corresponding parts
    of added code in linux-user/syscall_defs.h contain special
    handling related to 'ulong' type: they use 'abi_ulong' type
    to make sure that ioctl's code is calculated correctly for
    both 32-bit and 64-bit targets. Also, 'MK_PTR(TYPE_ULONG)'
    is used for the similar reason in linux-user/ioctls.h.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Filip Bozuta <Filip.Bozuta@rt-rk.com>
Message-Id: <1579117007-7565-4-git-send-email-Filip.Bozuta@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for getting/setting RTC time and alarm using ioctls
Filip Bozuta [Wed, 15 Jan 2020 19:36:36 +0000 (20:36 +0100)]
linux-user: Add support for getting/setting RTC time and alarm using ioctls

This patch implements functionalities of following ioctls:

RTC_RD_TIME - Getting RTC time

    Returns this RTC's time in the following structure:

        struct rtc_time {
            int tm_sec;
            int tm_min;
            int tm_hour;
            int tm_mday;
            int tm_mon;
            int tm_year;
            int tm_wday;     /* unused */
            int tm_yday;     /* unused */
            int tm_isdst;    /* unused */
        };

    The fields in this structure have the same meaning and ranges
    as the tm structure described in gmtime man page. A pointer
    to this structure should be passed as the third ioctl's argument.

RTC_SET_TIME - Setting RTC time

    Sets this RTC's time to the time specified by the rtc_time
    structure pointed to by the third ioctl's argument. To set
    the RTC's time the process must be privileged (i.e., have the
    CAP_SYS_TIME capability).

RTC_ALM_READ, RTC_ALM_SET - Getting/Setting alarm time

    Read and set the alarm time, for RTCs that support alarms.
    The alarm interrupt must be separately enabled or disabled
    using the RTC_AIE_ON, RTC_AIE_OFF requests. The third
    ioctl's argument is a pointer to a rtc_time structure. Only
    the tm_sec, tm_min, and tm_hour fields of this structure are
    used.

Implementation notes:

    All ioctls in this patch have pointer to a structure rtc_time
    as their third argument. That is the reason why corresponding
    definition is added in linux-user/syscall_types.h. Since all
    elements of this structure are of type 'int', the rest of the
    implementation is straightforward.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Filip Bozuta <Filip.Bozuta@rt-rk.com>
Message-Id: <1579117007-7565-3-git-send-email-Filip.Bozuta@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for enabling/disabling RTC features using ioctls
Filip Bozuta [Wed, 15 Jan 2020 19:36:35 +0000 (20:36 +0100)]
linux-user: Add support for enabling/disabling RTC features using ioctls

This patch implements functionalities of following ioctls:

RTC_AIE_ON, RTC_AIE_OFF - Alarm interrupt enabling on/off

    Enable or disable the alarm interrupt, for RTCs that support
    alarms.  The third ioctl's argument is ignored.

RTC_UIE_ON, RTC_UIE_OFF - Update interrupt enabling on/off

    Enable or disable the interrupt on every clock update, for
    RTCs that support this once-per-second interrupt. The third
    ioctl's argument is ignored.

RTC_PIE_ON, RTC_PIE_OFF - Periodic interrupt enabling on/off

    Enable or disable the periodic interrupt, for RTCs that sup‐
    port these periodic interrupts. The third ioctl's argument
    is ignored. Only a privileged process (i.e., one having the
    CAP_SYS_RESOURCE capability) can enable the periodic interrupt
    if the frequency is currently set above the value specified in
    /proc/sys/dev/rtc/max-user-freq.

RTC_WIE_ON, RTC_WIE_OFF - Watchdog interrupt enabling on/off

    Enable or disable the Watchdog interrupt, for RTCs that sup-
    port this Watchdog interrupt. The third ioctl's argument is
    ignored.

Implementation notes:

    Since all of involved ioctls have NULL as their third argument,
    their implementation was straightforward.

    The line '#include <linux/rtc.h>' was added to recognize
    preprocessor definitions for these ioctls. This needs to be
    done only once in this series of commits. Also, the content
    of this file (with respect to ioctl definitions) remained
    unchanged for a long time, therefore there is no need to
    worry about supporting older Linux kernel version.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Filip Bozuta <Filip.Bozuta@rt-rk.com>
Message-Id: <1579117007-7565-2-git-send-email-Filip.Bozuta@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for TYPE_LONG and TYPE_ULONG in do_ioctl()
Filip Bozuta [Wed, 15 Jan 2020 19:36:47 +0000 (20:36 +0100)]
linux-user: Add support for TYPE_LONG and TYPE_ULONG in do_ioctl()

Function "do_ioctl()" located in file "syscall.c" was missing
an option for TYPE_LONG and TYPE_ULONG. This caused some ioctls
to not be recognised because they had the third argument that was
of type 'long' or 'unsigned long'.

For example:

Since implemented ioctls RTC_IRQP_SET and RTC_EPOCH_SET
are of type IOW(writing type) that have unsigned long as
their third argument, they were not recognised in QEMU
before the changes of this patch.

Signed-off-by: Filip Bozuta <Filip.Bozuta@rt-rk.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <1579117007-7565-14-git-send-email-Filip.Bozuta@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for KCOV_INIT_TRACE ioctl
Aleksandar Markovic [Thu, 16 Jan 2020 22:49:51 +0000 (23:49 +0100)]
linux-user: Add support for KCOV_INIT_TRACE ioctl

KCOV_INIT_TRACE ioctl plays the role in kernel coverage tracing.
This ioctl's third argument is of type 'unsigned long', and the
implementation in QEMU is straightforward.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Message-Id: <1579214991-19602-13-git-send-email-aleksandar.markovic@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for KCOV_<ENABLE|DISABLE> ioctls
Aleksandar Markovic [Thu, 16 Jan 2020 22:49:50 +0000 (23:49 +0100)]
linux-user: Add support for KCOV_<ENABLE|DISABLE> ioctls

KCOV_ENABLE and KCOV_DISABLE play the role in kernel coverage
tracing. These ioctls do not use the third argument of ioctl()
system call and are straightforward to implement in QEMU.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Message-Id: <1579214991-19602-12-git-send-email-aleksandar.markovic@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agoconfigure: Detect kcov support and introduce CONFIG_KCOV
Aleksandar Markovic [Thu, 16 Jan 2020 22:49:49 +0000 (23:49 +0100)]
configure: Detect kcov support and introduce CONFIG_KCOV

kcov is kernel code coverage tracing tool. It requires kernel 4.4+
compiled with certain kernel options.

This patch checks if kcov header "sys/kcov.h" is present on build
machine, and stores the result in variable CONFIG_KCOV, meant to
be used in linux-user code related to the support for three ioctls
that were introduced at the same time as the mentioned header
(their definition was a part of the first version of that header).

Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <1579214991-19602-11-git-send-email-aleksandar.markovic@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for FDFMT<BEG|TRK|END> ioctls
Aleksandar Markovic [Thu, 16 Jan 2020 22:49:47 +0000 (23:49 +0100)]
linux-user: Add support for FDFMT<BEG|TRK|END> ioctls

FDFMTBEG, FDFMTTRK, and FDFMTEND ioctls provide means for controlling
formatting of a floppy drive.

FDFMTTRK's third agrument is a pointer to the structure:

struct format_descr {
    unsigned int device,head,track;
};

defined in Linux kernel header <linux/fd.h>.

Since all fields of the structure are of type 'unsigned int', there is
no need to define "target_format_descr".

FDFMTBEG and FDFMTEND ioctls do not use the third argument.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Message-Id: <1579214991-19602-9-git-send-email-aleksandar.markovic@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for FD<SETEMSGTRESH|SETMAXERRS|GETMAXERRS> ioctls
Aleksandar Markovic [Thu, 16 Jan 2020 22:49:46 +0000 (23:49 +0100)]
linux-user: Add support for FD<SETEMSGTRESH|SETMAXERRS|GETMAXERRS> ioctls

FDSETEMSGTRESH, FDSETMAXERRS, and FDGETMAXERRS ioctls are commands
for controlling error reporting of a floppy drive.

FDSETEMSGTRESH's third agrument is a pointer to the structure:

struct floppy_max_errors {
    unsigned int
      abort,      /* number of errors to be reached before aborting */
      read_track, /* maximal number of errors permitted to read an
                   * entire track at once */
      reset,      /* maximal number of errors before a reset is tried */
      recal,      /* maximal number of errors before a recalibrate is
                   * tried */
      /*
       * Threshold for reporting FDC errors to the console.
       * Setting this to zero may flood your screen when using
       * ultra cheap floppies ;-)
       */
      reporting;
};

defined in Linux kernel header <linux/fd.h>.

Since all fields of the structure are of type 'unsigned int', there is
no need to define "target_floppy_max_errors".

FDSETMAXERRS and FDGETMAXERRS ioctls do not use the third argument.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Message-Id: <1579214991-19602-8-git-send-email-aleksandar.markovic@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for FS_IOC32_<GET|SET>VERSION ioctls
Aleksandar Markovic [Thu, 16 Jan 2020 22:49:42 +0000 (23:49 +0100)]
linux-user: Add support for FS_IOC32_<GET|SET>VERSION ioctls

These FS_IOC32_<GET|SET>VERSION ioctls are identical to
FS_IOC_<GET|SET>VERSION ioctls, but without the anomaly of their
number defined as if their third argument is of type long, while
it is treated internally in kernel as is of type int.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Message-Id: <1579214991-19602-4-git-send-email-aleksandar.markovic@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for FS_IOC32_<GET|SET>FLAGS ioctls
Aleksandar Markovic [Thu, 16 Jan 2020 22:49:41 +0000 (23:49 +0100)]
linux-user: Add support for FS_IOC32_<GET|SET>FLAGS ioctls

These FS_IOC32_<GET|SET>FLAGS ioctls are identical to
FS_IOC_<GET|SET>FLAGS ioctls, but without the anomaly of their
number defined as if their third argument is of type long, while
it is treated internally in kernel as is of type int.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Message-Id: <1579214991-19602-3-git-send-email-aleksandar.markovic@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Add support for FS_IOC_<GET|SET>VERSION ioctls
Aleksandar Markovic [Thu, 16 Jan 2020 22:49:40 +0000 (23:49 +0100)]
linux-user: Add support for FS_IOC_<GET|SET>VERSION ioctls

A very specific thing for these two ioctls is that their code
implies that their third argument is of type 'long', but the
kernel uses that argument as if it is of type 'int'. This anomaly
is recognized also in commit 6080723 (linux-user: Implement
FS_IOC_GETFLAGS and FS_IOC_SETFLAGS ioctls).

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Message-Id: <1579214991-19602-2-git-send-email-aleksandar.markovic@rt-rk.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user: Reserve space for brk
Richard Henderson [Fri, 17 Jan 2020 23:02:45 +0000 (13:02 -1000)]
linux-user: Reserve space for brk

With bad luck, we can wind up with no space at all for brk,
which will generally cause the guest malloc to fail.

This bad luck is easier to come by with ET_DYN (PIE) binaries,
where either the stack or the interpreter (ld.so) gets placed
immediately after the main executable.

But there's nothing preventing this same thing from happening
with ET_EXEC (normal) binaries, during probe_guest_base().

In both cases, reserve some extra space via mmap and release
it back to the system after loading the interpreter and
allocating the stack.

The choice of 16MB is somewhat arbitrary.  It's enough for libc
to get going, but without being so large that 32-bit guests or
32-bit hosts are in danger of running out of virtual address space.
It is expected that libc will be able to fall back to mmap arenas
after the limited brk space is exhausted.

Launchpad: https://bugs.launchpad.net/qemu/+bug/1749393
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200117230245.5040-1-richard.henderson@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agolinux-user:Fix align mistake when mmap guest space
Xinyu Li [Fri, 13 Dec 2019 02:29:19 +0000 (10:29 +0800)]
linux-user:Fix align mistake when mmap guest space

In init_guest_space, we need to mmap guest space. If the return address
of first mmap is not aligned with align, which was set to MAX(SHMLBA,
qemu_host_page_size), we need unmap and a new mmap(space is larger than
first size). The new size is named real_size, which is aligned_size +
qemu_host_page_size. alugned_size is the guest space size. And add a
qemu_host_page_size to avoid memory error when we align real_start
manually (ROUND_UP(real_start, align)). But when SHMLBA >
qemu_host_page_size, the added size will smaller than the size to align,
which can make a mistake(in a mips machine, it appears). So change
real_size from aligned_size +qemu_host_page_size
to aligned_size + align will solve it.

Signed-off-by: Xinyu Li <precinct@mail.ustc.edu.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20191213022919.5934-1-precinct@mail.ustc.edu.cn>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
4 years agoi386:acpi: Remove _HID from the SMBus ACPI entry
Corey Minyard [Mon, 20 Jan 2020 17:07:25 +0000 (11:07 -0600)]
i386:acpi: Remove _HID from the SMBus ACPI entry

Per the ACPI spec (version 6.1, section 6.1.5 _HID) it is not required
on enumerated buses (like PCI in this case), _ADR is required (and is
already there).  And the _HID value is wrong.  Linux appears to ignore
the _HID entry, but Windows 10 detects it as 'Unknown Device' and there
is no driver available.  See https://bugs.launchpad.net/qemu/+bug/1856724

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20200120170725.24935-6-minyard@acm.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
4 years agovhost: Only align sections for vhost-user
Dr. David Alan Gilbert [Thu, 16 Jan 2020 20:24:14 +0000 (20:24 +0000)]
vhost: Only align sections for vhost-user

I added hugepage alignment code in c1ece84e7c9 to deal with
vhost-user + postcopy which needs aligned pages when using userfault.
However, on x86 the lower 2MB of address space tends to be shotgun'd
with small fragments around the 512-640k range - e.g. video RAM, and
with HyperV synic pages tend to sit around there - again splitting
it up.  The alignment code complains with a 'Section rounded to ...'
error and gives up.

Since vhost-user already filters out devices without an fd
(see vhost-user.c vhost_user_mem_section_filter) it shouldn't be
affected by those overlaps.

Turn the alignment off on vhost-kernel so that it doesn't try
and align, and thus won't hit the rounding issues.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200116202414.157959-3-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agovhost: Add names to section rounded warning
Dr. David Alan Gilbert [Thu, 16 Jan 2020 20:24:13 +0000 (20:24 +0000)]
vhost: Add names to section rounded warning

Add the memory region names to section rounding/alignment
warnings.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200116202414.157959-2-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
4 years agovhost-vsock: delete vqs in vhost_vsock_unrealize to avoid memleaks
Pan Nengyuan [Wed, 15 Jan 2020 06:25:35 +0000 (14:25 +0800)]
vhost-vsock: delete vqs in vhost_vsock_unrealize to avoid memleaks

Receive/transmit/event vqs forgot to cleanup in vhost_vsock_unrealize. This
patch save receive/transmit vq pointer in realize() and cleanup vqs
through those vq pointers in unrealize(). The leak stack is as follow:

Direct leak of 21504 byte(s) in 3 object(s) allocated from:
  #0 0x7f86a1356970 (/lib64/libasan.so.5+0xef970)  ??:?
  #1 0x7f86a09aa49d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
  #2 0x5604852f85ca (./x86_64-softmmu/qemu-system-x86_64+0x2c3e5ca)  /mnt/sdb/qemu/hw/virtio/virtio.c:2333
  #3 0x560485356208 (./x86_64-softmmu/qemu-system-x86_64+0x2c9c208)  /mnt/sdb/qemu/hw/virtio/vhost-vsock.c:339
  #4 0x560485305a17 (./x86_64-softmmu/qemu-system-x86_64+0x2c4ba17)  /mnt/sdb/qemu/hw/virtio/virtio.c:3531
  #5 0x5604858e6b65 (./x86_64-softmmu/qemu-system-x86_64+0x322cb65)  /mnt/sdb/qemu/hw/core/qdev.c:865
  #6 0x5604861e6c41 (./x86_64-softmmu/qemu-system-x86_64+0x3b2cc41)  /mnt/sdb/qemu/qom/object.c:2102

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200115062535.50644-1-pannengyuan@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
4 years agovirtio-scsi: convert to new virtio_delete_queue
Pan Nengyuan [Fri, 17 Jan 2020 07:55:47 +0000 (15:55 +0800)]
virtio-scsi: convert to new virtio_delete_queue

Use virtio_delete_queue to make it more clear.

Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200117075547.60864-3-pannengyuan@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
4 years agovirtio-scsi: delete vqs in unrealize to avoid memleaks
Pan Nengyuan [Fri, 17 Jan 2020 07:55:46 +0000 (15:55 +0800)]
virtio-scsi: delete vqs in unrealize to avoid memleaks

This patch fix memleaks when attaching/detaching virtio-scsi device, the
memory leak stack is as follow:

Direct leak of 21504 byte(s) in 3 object(s) allocated from:
  #0 0x7f491f2f2970 (/lib64/libasan.so.5+0xef970)  ??:?
  #1 0x7f491e94649d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
  #2 0x564d0f3919fa (./x86_64-softmmu/qemu-system-x86_64+0x2c3e9fa)  /mnt/sdb/qemu/hw/virtio/virtio.c:2333
  #3 0x564d0f2eca55 (./x86_64-softmmu/qemu-system-x86_64+0x2b99a55)  /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:912
  #4 0x564d0f2ece7b (./x86_64-softmmu/qemu-system-x86_64+0x2b99e7b)  /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:924
  #5 0x564d0f39ee47 (./x86_64-softmmu/qemu-system-x86_64+0x2c4be47)  /mnt/sdb/qemu/hw/virtio/virtio.c:3531
  #6 0x564d0f980224 (./x86_64-softmmu/qemu-system-x86_64+0x322d224)  /mnt/sdb/qemu/hw/core/qdev.c:865

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200117075547.60864-2-pannengyuan@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
4 years agovirtio-9p-device: convert to new virtio_delete_queue
Pan Nengyuan [Fri, 17 Jan 2020 06:09:27 +0000 (14:09 +0800)]
virtio-9p-device: convert to new virtio_delete_queue

Use virtio_delete_queue to make it more clear.

Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200117060927.51996-3-pannengyuan@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
4 years agovirtio-9p-device: fix memleak in virtio_9p_device_unrealize
Pan Nengyuan [Fri, 17 Jan 2020 06:09:26 +0000 (14:09 +0800)]
virtio-9p-device: fix memleak in virtio_9p_device_unrealize

v->vq forgot to cleanup in virtio_9p_device_unrealize, the memory leak
stack is as follow:

Direct leak of 14336 byte(s) in 2 object(s) allocated from:
  #0 0x7f819ae43970 (/lib64/libasan.so.5+0xef970)  ??:?
  #1 0x7f819872f49d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
  #2 0x55a3a58da624 (./x86_64-softmmu/qemu-system-x86_64+0x2c14624)  /mnt/sdb/qemu/hw/virtio/virtio.c:2327
  #3 0x55a3a571bac7 (./x86_64-softmmu/qemu-system-x86_64+0x2a55ac7)  /mnt/sdb/qemu/hw/9pfs/virtio-9p-device.c:209
  #4 0x55a3a58e7bc6 (./x86_64-softmmu/qemu-system-x86_64+0x2c21bc6)  /mnt/sdb/qemu/hw/virtio/virtio.c:3504
  #5 0x55a3a5ebfb37 (./x86_64-softmmu/qemu-system-x86_64+0x31f9b37)  /mnt/sdb/qemu/hw/core/qdev.c:876

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200117060927.51996-2-pannengyuan@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Acked-by: Greg Kurz <groug@kaod.org>
4 years agobios-tables-test: document expected file update
Michael S. Tsirkin [Fri, 17 Jan 2020 12:18:18 +0000 (07:18 -0500)]
bios-tables-test: document expected file update

Document the flow for the case where contributor
updates the expected files.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
4 years agoacpi: cpuhp: add CPHP_GET_CPU_ID_CMD command
Igor Mammedov [Mon, 9 Dec 2019 13:09:02 +0000 (14:09 +0100)]
acpi: cpuhp: add CPHP_GET_CPU_ID_CMD command

Firmware can enumerate present at boot APs by broadcasting wakeup IPI,
so that woken up secondary CPUs could register them-selves.
However in CPU hotplug case, it would need to know architecture
specific CPU IDs for possible and hotplugged CPUs so it could
prepare environment for and wake hotplugged AP.

Reuse and extend existing CPU hotplug interface to return architecture
specific ID for currently selected CPU in 2 registers:
 - lower 32 bits in ACPI_CPU_CMD_DATA_OFFSET_RW
 - upper 32 bits in ACPI_CPU_CMD_DATA2_OFFSET_R

On x86, firmware will use CPHP_GET_CPU_ID_CMD for fetching the APIC ID
when handling hotplug SMI.

Later, CPHP_GET_CPU_ID_CMD will be used on ARM to retrieve MPIDR,
which serves the similar to APIC ID purpose.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1575896942-331151-10-git-send-email-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
4 years agoacpi: cpuhp: spec: add typical usecases
Igor Mammedov [Mon, 9 Dec 2019 13:09:01 +0000 (14:09 +0100)]
acpi: cpuhp: spec: add typical usecases

Document work-flows for
  * enabling/detecting modern CPU hotplug interface
  * finding a CPU with pending 'insert/remove' event
  * enumerating present and possible CPUs

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1575896942-331151-9-git-send-email-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
4 years agoacpi: cpuhp: introduce 'Command data 2' field
Igor Mammedov [Mon, 9 Dec 2019 13:09:00 +0000 (14:09 +0100)]
acpi: cpuhp: introduce 'Command data 2' field

No functional change in practice, patch only aims to properly
document (in spec and code) intended usage of the reserved space.

The new field is to be used for 2 purposes:
  - detection of modern CPU hotplug interface using
    CPHP_GET_NEXT_CPU_WITH_EVENT_CMD command.
    procedure will be described in follow up patch:
      "acpi: cpuhp: spec: add typical usecases"
  - for returning upper 32 bits of architecture specific CPU ID,
    for new CPHP_GET_CPU_ID_CMD command added by follow up patch:
      "acpi: cpuhp: add CPHP_GET_CPU_ID_CMD command"

Change is backward compatible with 4.2 and older machines, as field was
unconditionally reserved and always returned 0x0 if modern CPU hotplug
interface was enabled.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1575896942-331151-8-git-send-email-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
4 years agoacpi: cpuhp: spec: clarify store into 'Command data' when 'Command field' == 0
Igor Mammedov [Mon, 9 Dec 2019 13:08:59 +0000 (14:08 +0100)]
acpi: cpuhp: spec: clarify store into 'Command data' when 'Command field' == 0

Write section of 'Command data' register should describe what happens
when it's written into. Correct description in case the last stored
'Command field' value is equal to 0, to reflect that currently it's not
supported.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <1575896942-331151-7-git-send-email-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
4 years agoacpi: cpuhp: spec: fix 'Command data' description
Igor Mammedov [Mon, 9 Dec 2019 13:08:58 +0000 (14:08 +0100)]
acpi: cpuhp: spec: fix 'Command data' description

Correct returned value description in case 'Command field' == 0x0,
it's not PXM but CPU selector value with pending event

In addition describe 0 blanket value in case of not supported
'Command field' value.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <1575896942-331151-6-git-send-email-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
4 years agoacpi: cpuhp: spec: clarify 'CPU selector' register usage and endianness
Igor Mammedov [Mon, 9 Dec 2019 13:08:57 +0000 (14:08 +0100)]
acpi: cpuhp: spec: clarify 'CPU selector' register usage and endianness

* Move reserved registers to the top of the section, so reader would be
  aware of effects when reading registers description.
* State registers endianness explicitly at the beginning of the section
* Describe registers behavior in case of 'CPU selector' register contains
  value that doesn't point to a possible CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <1575896942-331151-5-git-send-email-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
4 years agotests: q35: MCH: add default SMBASE SMRAM lock test
Igor Mammedov [Mon, 9 Dec 2019 13:46:57 +0000 (14:46 +0100)]
tests: q35: MCH: add default SMBASE SMRAM lock test

test lockable SMRAM at default SMBASE feature, introduced by
patch "q35: implement 128K SMRAM at default SMBASE address"

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1575899217-333105-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
4 years agoq35: implement 128K SMRAM at default SMBASE address
Igor Mammedov [Mon, 9 Dec 2019 13:08:55 +0000 (14:08 +0100)]
q35: implement 128K SMRAM at default SMBASE address

It's not what real HW does, implementing which would be overkill [**]
and would require complex cross stack changes (QEMU+firmware) to make
it work.
So considering that SMRAM is owned by MCH, for simplicity (ab)use
reserved Q35 register, which allows QEMU and firmware easily init
and make RAM at SMBASE available only from SMM context.

Patch uses commit (2f295167e0 q35/mch: implement extended TSEG sizes)
for inspiration and uses reserved register in config space at 0x9c
offset [*] to extend q35 pci-host with ability to use 128K at
0x30000 as SMRAM and hide it (like TSEG) from non-SMM context.

Usage:
  1: write 0xff in the register
  2: if the feature is supported, follow up read from the register
     should return 0x01. At this point RAM at 0x30000 is still
     available for SMI handler configuration from non-SMM context
  3: writing 0x02 in the register, locks SMBASE area, making its contents
     available only from SMM context. In non-SMM context, reads return
     0xff and writes are ignored. Further writes into the register are
     ignored until the system reset.

*) https://www.mail-archive.com/qemu-devel@nongnu.org/msg455991.html
**) https://www.mail-archive.com/qemu-devel@nongnu.org/msg646965.html

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1575896942-331151-3-git-send-email-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
4 years agoscripts/git.orderfile: Display decodetree before C source
Philippe Mathieu-Daudé [Mon, 30 Dec 2019 08:28:56 +0000 (09:28 +0100)]
scripts/git.orderfile: Display decodetree before C source

To avoid scrolling each instruction when reviewing tcg
helpers written for the decodetree script, display the
.decode files (similar to header declarations) before
the C source (implementation of previous declarations).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20191230082856.30556-1-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4 years agocputlb: Hoist timestamp outside of loops over tlbs
Richard Henderson [Sat, 7 Dec 2019 22:36:01 +0000 (14:36 -0800)]
cputlb: Hoist timestamp outside of loops over tlbs

Do not call get_clock_realtime() in tlb_mmu_resize_locked,
but hoist outside of any loop over a set of tlbs.  This is
only two (indirect) callers, tlb_flush_by_mmuidx_async_work
and tlb_flush_page_locked, so not onerous.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4 years agocputlb: Initialize tlbs as flushed
Richard Henderson [Thu, 9 Jan 2020 00:23:56 +0000 (11:23 +1100)]
cputlb: Initialize tlbs as flushed

There's little point in leaving these data structures half initialized,
and relying on a flush to be done during reset.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4 years agocputlb: Partially merge tlb_dyn_init into tlb_init
Richard Henderson [Sat, 7 Dec 2019 21:22:19 +0000 (13:22 -0800)]
cputlb: Partially merge tlb_dyn_init into tlb_init

Merge into the only caller, but at the same time split
out tlb_mmu_init to initialize a single tlb entry.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4 years agocputlb: Split out tlb_mmu_flush_locked
Richard Henderson [Sat, 7 Dec 2019 20:08:04 +0000 (12:08 -0800)]
cputlb: Split out tlb_mmu_flush_locked

We will want to be able to flush a tlb without resizing.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>