How do variables really work in Dockerfiles?

Whether you're naming a Dockerfile ARG or ENV variable or a regular shell script variable, inside your Dockerfile they're all referenced as simply $var. For Docker Compose commands there's a special $$var syntax for variables that you don't want Docker Compose to interpolate.

Every docker RUN command is a completely separate process/environment, so if you're using a regular shell script variable, setting and getting the variable must all be done within one RUN command.

If you need to share a shell script variable $var across several RUN commands, it's definitely preferable to try and 1) refactor it into a single RUN command or 2) set the value in a build-arg and write a wrapper script. If neither of those work, then you should just read and write your variable value to/from files. For example:

RUN echo 1 > /tmp/__var_1
RUN echo `cat /tmp/__var_1`
RUN rm -f /tmp/__var_1

If your RUN command works in your shell but not via your Dockerfile RUN, it's likely a quoting issue.

Docker RUN commands are hard coded to use /bin/sh -c. On many systems sh will be dash or ash or another very minimal shell with slightly different rules than the shell you typically use.
Use a RUN echo ... command to make sure that any variables your command depends on are in fact set to what you think they're set to.
Try to make the command you're having trouble with the first command in your file so that you can keep running your docker build with --no-cache and still minimize your wait time.
When I was troubleshooting what also ended up being a quoting problem, I should have written my simplified test version like so:

RUN SHELL_PATH=$(head -n 1 /etc/shells) &&\
    useradd --shell $SHELL_PATH --uid 1000 foo

Even if you use zsh as your normal shell, I would recommend using bash to test your RUN commands like:

/bin/sh -c 'SHELL_PATH=$(head -n 1 /etc/shells) &&\
    useradd --shell $SHELL_PATH --uid 1000 foo'

Dockerfile ARG values will overwrite any shell script variables that you setâ€¦ For example, say we have this Dockerfile

FROM alpine:latest

ARG ArgFoo
ENV EnvFoo="Must be Set"

RUN echo "Value of ArgFoo is $ArgFoo"
RUN echo "Value of EnvFoo is $EnvFoo"
RUN ShFoo="awesome" && echo "Value of ShFoo is $ShFoo"
RUN echo "Our \$ShFoo is Gone Again: $ShFoo"

When we run:

$ docker build . --no-cache

Step 4/7 : RUN echo "Value of ArgFoo is $ArgFoo"
Value of ArgFoo is

Step 5/7 : RUN echo "Value of EnvFoo is $EnvFoo"
Value of EnvFoo is Must be Set

Step 6/7 : RUN ShFoo="awesome" && echo "Value of ShFoo is $ShFoo"
Value of ShFoo is awesome

Step 7/7 : RUN echo "Our \$ShFoo is Gone Again: $ShFoo"
Our $ShFoo is Gone Again:

If we ran it again, but this time with a --build-arg setting $ShFoo surprisingly, it's still difficult to get ourselves into trouble. First, update the Dockerfile to try and cause some trouble with $ShFoo

FROM alpine:latest

ARG ShFoo
RUN echo "$ShFoo" && ShFoo="awesome" && echo "Value of ShFoo is $ShFoo"

$ docker build . --build-arg ShFoo="Difficult to cause trouble" --no-cache

Step 3/3 : RUN echo "$ShFoo" && ShFoo="awesome" && echo "Value of ShFoo is $ShFoo"
Difficult to cause trouble
Value of ShFoo is awesome

So Docker does a surprisingly good job of allowing you to reference any variable as just $var and everything just working.

The main difference when writing RUN commands really has more to do with /bin/sh -c than it does with Docker.

For example, I was working on a Dockerfile that would automatically set the permissions of the running Docker container to match the current user. Ultimately the command that worked was like this:

RUN shell=$(grep -E -m 1 \.\*\\b$USER_SHELL\\b /etc/shells) && \
    echo "DUMP: $shell $USER_ID:$GROUP_ID $USER_NAME:$GROUP_NAME" && \
    groupadd --gid $GROUP_ID $GROUP_NAME && \  
    useradd --shell $shell --uid $USER_ID --gid $GROUP_ID $USER_NAME

The subshell portion is:

grep -E -m 1 \.\*\\b$USER_SHELL\\b /etc/shells

Yet when executing in a normal shell I only need to use:

grep -E -m 1 '.*\bzsh\b' /etc/shells

When you have a fairly complex shell command inside of a docker RUN the format above of assigning a shell variable, then echoing everything so that you can be sure you really have what you think you have, is probably a good way to go if you start running into anything unexpected.