Category Archives: troubleshooting

Random notes on GNOME, GDM3, systemd –user and ephemeral environment variables such as SSH_AUTH_SOCK

This is totally a thought dump, as I just spent hours (!) figuring out why my environment had been persistently setting SSH_AUTH_SOCK across login sessions. It’s not a solution for readers’ particular issues, nor a tutorial on how to resolve my particular issue either: just a log of surprising things I found out today about a machine I’m, using.

The managed machine I’m using has switched to GDM3 as the display manager, and the default environment is GNOME3. I don’t enjoy GNOME3, and prefer i3 for work uses. I gave it a chance, but after restoring my homedir, I decided to go back to i3.

Note: During the homedir restoration, I had the GNOME3 session running. I moved my homedir away, signed out, and rsynced away. I hope this order of operation got GNOME3 confused and made it forget to clean things up.

Symptom: tools have been complaining they can’t talk to a valid-looking value for SSH_AUTH_SOCK. The socket file and its directory were both missing. ssh-agent was not running in the session.

  • I use ~/.Xsession to configure my graphical session before starting i3. My first suspicion was something ran, set SSH_AUTH_SOCK and the ssh-agent crashed afterwards. This was not the case.
  • I still had the terminal session running. It had an ssh-agent in it. Could that be the cause? No, after nuking the terminal session and signing in and out in the graphical one, the issue was stil present.
  • Was ~/.Xsession supposed to execute /etc/X11/Xsession? No, that happens separately. This is fine.
  • Is /etc/X11/Xsession.d/90x11-common_ssh-agent getting executed? Yes, it is. But $STARTUP is not getting updated. (Oh, right: /etc/X11/Xsession.d scripts on Debian OSes are not executing things directly, but scheduling later execution by updating envvarSTARTUP`.)
  • Is /etc/X11/Xsession getting executed at all? At what point is SSH_AUTH_SOCK set? No, we are not running it at all. lightdm did (I think), but gdm3 has to be a special puppy.
  • What’s executed instead? /etc/gdm3/Xsession which closely resembles /etc/X11/Xsession, but is not exactly the same.
  • Is /etc/gdm3/Xsession executing /etc/X11/Xsession.d scripts? Yes, it is.
  • So, which of the scripts is setting SSH_AUTH_SOCK? Well, in my individual situation, it looks like it’s happening before any of the script executes.
  • Something in ~/.config? No. Envvar name or value not found.
  • Something in ~/.local? No. Envvar name or value not found.
  • Is something else run? Well, gnome-keychain-daemon is running for some reason and gets restarted upon session restart. It’s run by /etc/xdg/autostart/gnome-keyring-secrets.desktop file, and can be disabled by putting
    [Desktop Entry]
    # maybe other fields are required too?

    into ~/.config/autostart/gnome-keyring-secrets.desktop, blocking system file from starting up. (Remember, this is a managed machine; even if it weren’t, I don’t want to touch distribution-installed files.)

    However, no, blocking gnome-keyring-daemon from starting up doesn’t fix the issue.

So this is very confusing. A bad environment variable is surviving logout and seems set before any Xsession script is run.

Is gdm3 remembering things for us? Where would it be writing them anyway if not into homedir?

Turns out that no. gdm3 isn’t remembering anything.

Here’s what happened.

systemd can run in per-user mode (systemd --user). It keeps the environment in RAM and can also survive logouts. systemd --user is shared between all logged-in sessions of the current users.

The feature that caused trouble is — management of environment for daemons. systemctl --user show-environment shows that something wrote the entire environment of the GNOME3 session into systemd --user‘s environment. From what I can tell, all daemons started after login will inherit the environment from this. And it had rather ephemeral things like SSH_AUTH_SOCK, XAUTHORITY GPG_AGENT_INFO or XDG_SESSION_DESKTOP written into it!

Killing systemd --user process and restarting the session fixed everything. /etc/gdm3/Xsession no longer had SSH_AUTH_SOCK set when it started (in fact, it was not set by the time /etc/X11/Xsession.d/99x11-common_start was starting to read the $STARTUP envvar.

So, something in GNOME3 decided to write very ephemeral environment variables into systemd --user, never cleaned them up, and systemd --user did not get reaped even after I signed out from both the graphical and the terminal sessions! There’s a chance cleanup of systemd --user did not happen because the homedir was moved away at the time, but isn’t this stuff working with environment variables such as DBUS_SESSION_BUS_ADDRESS envvar, cat /run/systemd/user/$(id -u), /run/user/$(id -u) and other files under /run? How would have moving /home/${LOGNAME} prevented reaping of systemd --user?

I can see some value in these things being per-user rather than per-session, but given how systemd has been pushing for per-session stuff too, this is leaving a bad taste in the mouth, and makes me believe even further that systemd should not try to be “the runtime for Linux” (note, not the other OSes), it should not infect user sessions, and it should simply stick to what it does reasonably well: manage service startup. I really only want the ability to mount a mountpoint after a service has started, and to start a service after a mountpoint appeared. And otherwise similar dependencies on devices, perhaps.

I really don’t appreciate systemd getting into the business of managing cross-session environment variables. Is this why modern free software desktops refuse to start more than one session for a single user? I suspect so.

Previously, I didn’t think whatever we gained by giving up multiple-sessions-per-user was worth it, and after today, I’m not quite encouraged to give up on this gut feeling.

How do headsets know they may trigger Google Assistant or Siri?

I don’t know what the Bose QC35-ii is doing: the Action button refuses to do anything unless it’s sure it’s talking to either Google Assistant or Alexa (no Siri mentioned in the app, interestingly).

I can’t get the 2021 version of the Star Trek TNG Bluetooth Combadge to trigger anything when connected to a Linux machine. The regular press is triggering KEY_PLAYCD and KEY_PAUSECD, thus mapping onto the relevant X events and interacting well with my desktop’s media players (particularly Chrome) — but doublepress, which normally activates Siri on my iPad, sends no input device events on the relevant /dev/input/event* special file. There’s just no traffic.

btmon is an interesting discovery, and it pointed me in the direction of the world of AT commands flowing as ACL Data on my local hci0 device. Many of the ones flowing are documented on Qt Extended’s modem emulator component documentation from 2009: it starts with the combadge sending AT+BRSF and seeing a response, then sending AT+CIDN and getting and response, and so on and on and on.

If I am reading everything right, the values returned are decimal numbers representing a binary mask. btmon output seems to indicate the combadge (‘hands-free’ device) claims it supports 127 (i.e. all 8 functionalities in the Modem Emulator docs), and the desktop (‘audio gateway’) says it supports 1536, which is binary 110 0000 0000, meaning the only bits that are set are in the reserved range from the perspective of the 2009 Modem Emulator documentation.

A list of flags can also be found in 2013 bluez test for HFP. Over there, one of the formerly ‘reserved’ bits is specified as being AG_CODEC_NEGOTIATION, but we can luckily find the other one in ChromiumOS’s source code: inside something called adhd (apparently, ChromiumOS Audio Daemon) and its cras component’s server part, the constants are in cras_hfp_slc.h. So, the other bit the desktop claims to support is AG_HF_INDICATORS, which also has nothing to do with remote control.

That source code also indicates we can read the Hands-Free Profile specification, the latest one being version 1.8 available on

So, if I am interpreting everything correctly, the combadge says it supports “everything”, but the desktop doesn’t tell it back that it knows what voice recognition is. No wonder we’re not seeing any traffic.

So, we don’t quite need to support Apple-specific HFP commands such as AT+XAPL (bluetooth accessory identification), AT+APLSIRI (confirming the device supports specifically Siri) or AT+IPHONEACCEV (sharing battery level), which is nice. Both of the platforms documented by the combadge’s marketing materials and the manual (Google Now i.e. Assistant and Siri) document they support AT+BVRA from the Hands-Free Profile specification; see Google Assistant’s “Voice Activation Optimization” document for Bluetooth devices, as well as the “Accessory Design Guidelines for Apple Devices (release R16 talks about this in section 30.3.1).

Instead, it looks like we mainly need to trick the desktop to respond to combadge’s AT+BRSF request with a bitmask that includes the voice recognition bit, and move on from that, hoping the combadge starts emitting AT+BVRA, and that we can easily programmatically capture that!

But that’s a topic for another post.

ddwrt router refusing to forward multicast packets over an OpenVPN tap device joined to a bridge

Trying out the go-chromecast CLI, I was able to see mDNS requests coming to my home network over OpenVPN (a L2 tunnel using a TAP device bridged using br0 on both ends). I could also see responses being generated, but none of them were seen on the other end. The build of dd-wrt I use has no tcpdump, making it hard to observe both ends of the tunnel.

There’s a lot of sources suggesting a bunch of actions:

However, that was all tapping in the dark.

But when someone mentioned ebtables, nat and PREROUTING, this led me to the right path: what if one of the chains in one of the tables was dropping outgoing packets?

# ebtables -t nat -L POSTROUTING
Bridge table: nat

Bridge chain: POSTROUTING, entries: 1, policy: ACCEPT
-o tap1 --pkttype-type multicast -j DROP

Voila. All multicast packets were being dropped on L2.

# ebtables -t nat -D POSTROUTING -o tap1 --pkttype-type multicast -j DROP

This is fine because I happen to control both ends of the tunnel. Because my systems use multicast only for mDNS, I don’t expect traffic to require drops.

On the bridge device (br0 on both ends), I currently have multicast_router set to 2 on both ends; I have multicast_querier set to 1 on both ends; non-ddwrt system has multicast_igmp_version set to 2; and multicast_snooping is set to 1 on both ends. I don’t claim correctness of any of these, nor do I claim them to be optimal. But, getting mDNS traffic through is exactly what I wanted, so I’m happy right now.

Exploring the sudoers file: "sudo: sorry, you are not allowed to preserve the environment"

On my NAS I’ve received sudo: sorry, you are not allowed to preserve the environment today when running sudo -E to transfer the environment from whatever the current shell allows sudo to see, into the new shell.

That’s because I was copying something from a remote machine to my NAS, and I (temporarily) wanted to tell remote rsync to use sudo rsync on my NAS to be sure files will be replicated correctly. While this is an absolutely horrible idea to do permanently, a temporary workaround in a safe environment is that sudo rsync should not require a root password:

ivucica ALL=NOPASSWD:/usr/bin/rsync

However, compare this to some of the defaults in /etc/sudoers:

# User privilege specification
root    ALL=(ALL:ALL) ALL

# Allow members of group sudo to execute any command
%sudo   ALL=(ALL:ALL) ALL

Without reading the sudoers(5) manpage, I decided to add ALL at the end:

ivucica ALL=NOPASSWD:/usr/bin/rsync ALL

This seemed to work, but despite a lack of a syntax error, it’s wrong (see below): now no password is required for rsync nor for any other command.

# Wrong order:
 ALL=NOPASSWD: /usr/bin/rsync, PASSWD: ALL
# Correct order:
ivucica ALL=PASSWD: ALL, NOPASSWD: /usr/bin/rsync

The best option, however, is to remove the entry completely given that allowing rsync root without a password is dangerous; even if someone managed to hijack my own account’s non-password NAS credentials, without the password itself they can’t do much.

However, looking online for this error, I realized I’ve never looked into the syntax of the sudoers file. What’s Defaults env_reset, for instance? Turns out sudoers(5) is an interesting read. If you’re seriously going to play with editing /etc/sudoers beyond just adding a user (even just specifying the NOPASSWD tag), please read the manpage and carefully experiment. Cover all the edge cases you can think of. Have someone else review your changes.

Here’s some interesting stuff:

  • #include (and presumably #includedir) directive to include a file can contain %h (a hostname) in file path
  • to match all users except root, instead of ALL we can specify ALL,!root
  • you can do things like disable setting utmp (set_utmp) or prohibit the capability to disable env_reset (setenv) — which is what I did
  • the first-time “lecture” status (‘has this user seen the sudo lecture?’) is in /var/lib/sudo/lectured and configurable using lecture_status_dir
  • the password prompt can be overridden with passprompt and can include %H FQDN, %h hostname, %p the user whose password is requested, %U the user whom we’re switching to [caveat: PAM module’s output must match Password: or username's Password:
  • there are ways to specify directives per running host (important for standardized configs deployed across an org), per requesting user, per run-as user, per command
  • there are tags to tweak execution of a command

While I haven’t actually tried all of the following, here are some “notes-to-self” on what individual fields mean. Please prefer learning the details from the manpage.

What does ALL before = mean? Those are the hosts. For instance, we can let roger run anything on all the fileservers:

Host_Alias FILESERVERS = fs1, fs, fs3
Host_Alias FINANCE =,


(Just like Host_Alias, there’s User_Alias, Runas_Alias and Cmnd_Alias.)

What’s (ALL:ALL) after =? That’s called a Runas_Spec and it’s saying “You can pretend to be any user of any group.” Let’s only allow roger to run only /usr/bin/rsync and only to do it as mainweb:

User_Alias     FINANCE_DEPLOY      = james, mike
roger          FILESERVERS         = (mainweb) /usr/bin/rsync
richard        richard-workstation = (www-data) ALL
FINANCE_DEPLOY FINANCE             = (financeweb : financeservices) /usr/bin/rsync

And what’s the ALL after (ALL:ALL)? As demonstrated, that’s the command specification.

Alright, so what’s NOPASSWD:? That’s a Tag_Spec. Each command may be prefixed with zero or more tags associated with it, such as EXEC, NOEXEC, PASSWD, NOPASSWD, MAIL, NOMAIL. For instance:

# Allow ls and cat without password, but require a password for vi.
richard richard-workstation = (www-data) NOPASSWD: /usr/bin/ls, /usr/bin/cat, NOEXEC: PASSWD: /usr/bin/vi
richard richard-workstation = (www-data) NOPASSWD: /usr/bin/ls, /usr/bin/cat, PASSWD: /usr/bin/vi

Allowlisting only some commands results in the following when a non-allowed command is run:

Sorry, user richard is not allowed to execute '/bin/bash' as root on my.machine.hostname.

And attempting to run a command from within vi when NOEXEC is specified results in:

Cannot execute shell /bin/bash

There’s even a way to set things like timeouts. But, again, read the manpage for details.

As usual, this is not advice; these are personal notes from trying to resolve an immediate issue. As this is a security-sensitive feature, for even one person that reads this and configures an important system incorrectly, please note: these notes are written as a hobby. Just like you would with PAM, NSS, LDAP, Kerberos and other things, please carefully read more authoritative documentation sources; I’m not responsible for breakage you may cause with my non-advice.

Scanning on Linux from Canon TS5050 over the network

scangearmp2 screenshot

Turns out that ScanGear MP 3.70 works perfectly with my Canon TS5050. After unpacking the archive and dpkg -i scangearmp2_3.70-1_amd64.deb, I ran scangearmp2, got simple GTK-based GUI, the TS5050 was recognized, and that was it.

scangearmp2 screenshot

I have no idea if it integrates in desktop GUIs, since when I’m under Debian GNU/Linux these days, I am happily using i3 and terminals.

It would be really nice if Canon listed any software for Linux on their official support page for Canon TS5050, but they don’t. There are no Linux “bundles” listed (whew!), but this also means there is no software offered at all. Once you choose Linux, you get random offers for Windows and Mac software — but that’s just generic ads displayed under the ‘no results’ area.

Samba 4 + Windows 10 time synchronization issues

Where does ListeningThread -- Recvd 52 of 48/68 bytes come from?

If you follow the instructions for setting up Samba 4 AD DC for time synchronization, ntpd (coming out of Debian’s ntp package at some version 4.2.8) should just work.1

I came to this discovery after giving up and discarding my /etc/ntp.conf. Suddenly, after restarting ntpd and running w32tm /resync, things just worked. It’s not the software that’s broken — it’s me that was crazy.

The packet was now 110 bytes in Wireshark (68 of which was data). This was a stark improvement over seeing a 94 byte packet (52 of which was data). C:\temp\ntpDebug.log 2 no longer contained this:

ListeningThread -- Recvd 52 of 48/68 bytes

Hoozah! Now I wanted to figure out what was causing ntpd to send 52b packets, and not either 48b or 68b packets.

Turns out that my restrict statements had unexpected side effects. For instance, Samba wiki-recommended config tries to unrestrict localhost using restrict 3

But I wanted to do the same for IPv6 localhost, so I did restrict ::1. This seems to have greatly confused ntpd.

The way out?

restrict -4
restrict -6 ::1

Second mistake was restrict mask It didn’t specify that mssntp should be enabled. For good measure I threw in -4:

restrict -4 mask mssntp

Given that Samba config doesn’t recommend any special allowlisting for my internal IP range, I’ll just remove this line completely; the default restriction from the wiki should cover everything clients need to do anyway:

# Access control
# Default restriction: Allow clients only to query the time
restrict default kod nomodify notrap nopeer mssntp

Moral of the story? ntpd seems to be awfully sensitive to restrict statements. If w32time service complains or breaks in some way, be sure to remove the statements bit by bit, or make sure IPv4 and IPv6 statements don’t stomp over each other.

  1. Granted, I needed to modify the path to the socket to say /var/lib/samba/ntp_signd/ instead of /usr/local/samba/var/lib/ntp_signd/, but otherwise it just worked. 
  2. That file was created using w32tm /debug /enable /file:C:\temp\ntpDebug.log /size:102400 /entries:0-300 which I found somewhere online. 
  3. Apparently, passing no restrictions at all after the address simply means “unrestrict these peers”.