Using Indexed and Associative Arrays in Bash

2024-07-10 · 12 min

Shell scripting is vital for automating repetitive tasks and simplifying complex commands into a more straightforward format. Understanding the details of shell scripting enables us to write more robust, portable, and efficient scripts.

There are a few features, however, that aren’t part of the POSIX standard, like (associative) arrays. Despite this, many shells support them and provide ample functionality.

In the context of shell scripting, there are two types: indexed and associative.

This article will look at arrays and how to use them in the Bash shell. Many other shells support them, but most slightly differ in their usage.

Indexed Arrays

Arrays are a fundamental data structure in computer programming. They store multiple values behind a single variable and provide the tools for accessing and manipulating their content in different ways.

An indexed array is what most people think of when they hear array: an ordered collection of elements that are accessed by a numeric index (starting at 0) via subscripting.

Creating Indexed Arrays

Indexed arrays can either be created implicitly or explicitly.

The implicit variant is the simplest:

shell

EMPTY=()

COLORS=(red green blue)

The elements are separated based on the value of IFS, the internal field separator. It’s a special variable used to recognize word boundaries, meaning it determines how separate words are treated, like in an array declaration.

The default value is IFS=$ \t\n (space, tab, newline), but it can be changed as needed:

shell

# CREATE ARRAY WITH CUSTOM SEPARATOR
IFS=','
MEMBERS="Jimmy Page,Robert Plant,John Paul Jones,John Bonham"
LED_ZEPPELIN=($MEMBERS)

# RESET IFS
IFS=$' \t\n'

We need to do the additional step of using the MEMBERS variable, as the code between the parentheses of the array still must be valid, which wouldn’t be the case with the comma as a separator. But using expanding MEMBERS will do the right thing, thanks to the custom IFS.

Don’t forget to reset IFS! It’s used in other contexts, too, such as the read command.

Personally, I always quote the elements, just to be sure I don’t accidentally forget it. Thanks to the newline, array declaration can be made quite readable, like grouping arguments and values on the same line but still having them as separate elements:

shell

FFMPEG_ARGS=(
    "ffmpeg" "-y"
    "-r" "${FFMPEG_FRAMERATE}"
    "-force_key_frames" "expr:gte(t,n_forced*${FORCE_KEY_FRAMES_EVERY_X_SEC:-2})"
    "-movflags" "+faststart"
    "-flags" "+global_header"
)

The simpler, implicit array declaration is suitable for most use cases.

However, we can make it explicit by using the declare command which allows setting attributes on any kind of shell variable:

shell

declare -a COLORS=("red" "green" "blue")

declare -a EMPTY

The -a attribute marks the declared variable as an array.

Although POSIX-based shells don’t enforce types like a strongly typed language, it still makes sense to declare variables with their correct type.

First and foremost, it signals explicit intent to any reader of your code, which improves clarity and maintainability.

Second, the other recipients of the script, the shell that runs it or a linting tool like ShellCheck, are told about your intent. This way, they have the opportunity to ensure that it’s treated correctly and might prevent unexpected behavior down the line.

Furthermore, using declare in a function creates a locally scoped variable, avoiding conflicts with other variables and unintended side-effects, like accidental re-assignment:

shell

COLORS=("red" "green" "blue")

new_scope () {
    declare -a COLORS=("cyan" "yellow" "magenta" "black")
    echo "(local) The second color is ${COLORS[1]}"
}

new_scope

echo "(global) The second color is ${COLORS[1]}"

Without the declare command, the second echo would also print out yellow.

There’s also a way to declare an array by using the read command and word boundary splitting:

shell

read -a COLORS <<< "red green blue"

Once again, the -a attribute stands for “array”, and the word boundaries are defined by the value of IFS.

Assigning and Adding Values

Values can either be assigned to an index or added in bulk by adding another array:

shell

COLORS[4]="orange"

COLORS+=("pink")

The array doesn’t have a fixed size, and we can assign any index we want, not only “length + 1”. That’s why I suggest always using += operator for adding and only using index-assignment for explicitly overriding a value.

A single value must be added as an array, too, or their value gets appended to the first element:

shell

COLORS=("red" "green" "blue")
COLORS+="orange"

echo "First element: ${COLORS[0]}"
# OUTPUT:
# First element: redorange

Getting all Elements

The special index @ (at) expands all elements of an array:

shell

COLORS=("red" "green" "blue")

echo "Colors: ${COLORS[@]}"

# OUTPUT:
# Colors: red green blue

With potentially empty arrays, using ${array[@]} ensures that the expansion results in zero arguments instead of one empty string.

We can also use the special index * (asterisk) to access all elements, but this won’t expand each element into a separate word, but all elements into a single string.

For a simple echo it doesn’t matter, but we can’t safely use it for iteration.

Check for Elements

There’s no directly integrated way to check for a specific.

One (problematic) option is using a regular expression:

shell

COLORS=("red" "green" "blue")

if [[ "${COLORS[*]}" =~ "red" ]]; then
    echo "There's some red in there."
else

Here, the array gets expanded to a string, and then, a substring match is done with the regular expression.

However, if the array contains the Crayola color “Cultured Pearl”, a white color defined as #f5f5f5, it also matches!

We could add spaces into the mix, but it gets out of hand quite quickly:

shell

if [[ " ${COLORS[*]}" =~ " red " ]]; then
    echo "I more certain there's red in there."
fi

This is still not a fail-safe solution, so let’s create a function to check for an element:

shell

COLORS=("red" "green" "blue" "black coral pearl")

contains_element () {
    local TARGET="$1"
    shift
    local ARRAY=("$@") # ASSINGS THE REMAINING ARGUMENTS

    local ELEMENT

    for ELEMENT in "${ARRAY[@]}"; do
        if [[ "$ELEMENT" == "${TARGET}" ]]; then
            return 0 # Found
        fi
    done

    return 1 # Not found
}

if contains_element "red" "${COLORS[@]}"; then
    echo "Definitly contains exactly red."
fi

if contains_element "black" "${COLORS[@]}"; then
    echo "There's also definitely black in there."
fi

We find red but don’t find black, as there’s no exact black element in the array.

Length of an Indexed Array

The length of an array is retrieved by accessing all elements but prefixing the name with # (hash) after the opening curly brace:

shell

COLORS=("red" "green" "blue")

echo "Count: ${#COLORS[@]}"

# OUTPUT:
# Count: 3

The returned value is an integer, so it can be used to check if an array is empty:

shell

if [ ${#COLORS[0]} -eq 0 ]; then
    echo "No colors for you!"
else
    echo "We have colors!"
fi

Iterating Over Elements

The special index @ (at) is also used for iteration:

shell

COLORS=("red" "green" "blue")

for COLOR in "${COLORS[@]}"; do
    echo "- ${COLOR}"
done

# OUTPUT:
# - red
# - green
# - blue

It yields each element of the array for the loop, based on the word boundaries.

Slicing an Array

Slicing an array means selecting a subset of elements from the array based on specified indices:

${array[@]:start:length}

This means we want to take length elements from array, starting at index start.

shell

COLORS=("red" "green" "blue" "orange" "pink" "brown")

SLICED=("${COLORS[@]:2:3}")

for COLOR in "${SLICED[@]}"; do
    echo "- ${COLOR}"
done

# OUTPUT:
# - blue
# - orange
# - pink

Remove a Value

Values at a specified index can be unset with the aplty named command unset:

shell

unset COLORS[2]

Remember that this is an indexed array, not a stack- or queue-like structure! That means that index 2 still exists, it just doesn’t have a value. Asking for the array’s length will only return the count of set indexes, and iterating over the elements will skip the unset index.

Associative Arrays (Bash 4+)

An associative array is a key-value-based data structure, like a hashtable or dictionary.

Unlike indexed arrays, which use numeric indices, associative arrays use strings as their keys.

Creating an Associative Array

To create an associative array, we must use the declare command; there’s no implicit declaration possible, at least in Bash.

It’s declared similarly to an indexed array but uses the following syntax for its key-value pairs:

[key]=element

Keys don’t need to be quoted, as the brackets form the actual boundary. For elements, the previous rules apply. And just like before, I recommend always quoting elements:

shell

declare -A COLORS=(
    [red]="#ff0000"
    [green]="#00ff00"
    [blue]="#0000ff"
)

declare -A EMPTY

The uppercase -A marks the variable as an associative array.

Assigning and Adding Key-Value Pairs

Assignment works like with an indexed array by using the key instead of an index:

shell

declare -A COLORS

COLORS[red]="#ff0000"
COLORS[green]="#00ff00"
COLORS[blue]="#0000ff"

If a value exists for a key, it’s overwritten.
If not, it’s added to the array.

There’s no way to add another associative array to an existing one with the += (plus-equals) operator.

Getting All Values

The special key @ (at) is used to get all values, just like it gives us all elements in an indexed array:

shell

declare -A COLORS=(
    [red]="#ff0000"
    [green]="#00ff00"
    [blue]="#0000ff"
)

echo "Colors: ${COLORS[@]}"

# OUTPUT:
# Colors: #0000ff #ff0000 #00ff00

As you can see from the output, the insert-order is not maintained!

And there’s no natural order, like alphanumeric order, and no deterministic one, so don’t rely on a specific “out-of-order” combination.

Internally, associative arrays in Bash are implemented using hash tables, which inherently do not guarantee any ordering of keys. If you need a particular order, either sort them with things like the sort command or use an indexed array.

Getting All Keys

To access all keys, we use the ! (exclamation mark) after the opening curly brace:

shell

declare -A COLORS=(
    [red]="#ff0000"
    [green]="#00ff00"
    [blue]="#0000ff"
)

echo "Colors: ${!COLORS[@]}"

# OUTPUT:
# Colors: blue red green

Just like with the values, there’s no particular guarantee on the order.

Length of an Associative Array

Equal to an indexed array, the # (hash) prefix after the opening curly brace, in combination with @ (at) key, is used to get the length:

shell

declare -A COLORS=(
    [red]="#ff0000"
    [green]="#00ff00"
    [blue]="#0000ff"
)

echo "Count: ${#!COLORS[@]}"

# OUTPUT:
# Count: 3

Trying to use all keys to get the length with something like ${#!COLORS[@]} results in a “bad substitution” error.

Iterating over Keys and Values

There are only two ways to iterate over an associative array: keys or values.

There’s no combined “iterate over a key-value pair”, so we need to extract the values ourselves:

shell

declare -A COLORS=(
    [red]="#ff0000"
    [green]="#00ff00"
    [blue]="#0000ff"
)

for COLOR_NAME in "${!COLORS[@]}"; do
    COLOR_HEX="${COLORS[$COLOR_NAME]}"
    echo "${COLOR_NAME} => ${COLOR_HEX}"
done

# OUTPUT:
# blue => #0000ff
# red => #ff0000
# green => #00ff00

Removing Key-Value Pairs

Like with indexed arrays, unset is used to remove a key-value pair:

shell

declare -A COLORS=(
    [red]="#ff0000"
    [green]="#00ff00"
    [blue]="#0000ff"
)

unset COLORS[red]

echo "Count: ${#COLORS[@]}"

# OUTPUT:
# Count: 2

Thanks to the arbitrary keys, we don’t have to worry about disrupted consecutive index-numbering.

Common Use-Cases

Arrays are a versatile yet low-level programming feature with such a wide range of use cases across various domains that it’s nearly impossible to pick only a few use cases.

However, here is one each for both types I use in my script.

(Indexed) Building Commands

Many of my scripts are helping to tame more complicated programs.

Take ffmpeg, for example. It’s a mighty but complicated tool with so many argumnets that the documentation of them is over 16k words long…

The simplest way would be creating an argument string with variables:

shell

FFMPEG_ARGS="-i ${INPUT} -b:v ${FFMPEG_BITRATE_VIDEO:-3672k} -s ${FFMPEG_SIZE:-1920x1080} ${OUT_FILE}"

ffmpeg $FFMPEG_ARGS

As the decision of how to call ffmpeg is decided over multiple steps in my script, that’s not an option.

Maybe we could concat a string as needed? It would make it more readable:

shell

FFMPEG_ARGS="-i ${INPUT}"
FFMPEG_ARGS+=" -b:v ${FFMPEG_BITRATE_VIDEO:-3672k}"

# SOME OTHER CODE FOR DECISION MAKING

FFMPEG_ARGS+=" -s ${FFMPEG_SIZE:-1920x1080}"
FFMPEG_ARGS+=" ${OUT_FILE}"

ffmpeg $FFMPEG_ARGS

That’s better, but it still has some drawbacks, like needing to remember to spaces between the arguments, and what about arguments that need quoting?

Let’s try again with an indexed array!

shell

FFMPEG_ARGS=("-i" "${INPUT}")
FFMPEG_ARGS+=("-b:v" "${FFMPEG_BITRATE_VIDEO:-3672k}")

# SOME OTHER CODE FOR DECISION MAKING

FFMPEG_ARGS+=("-s" "${FFMPEG_SIZE:-1920x1080}")
FFMPEG_ARGS+=("${OUT_FILE}")

ffmpeg "${FFMPEG_ARGS[@]}"

Even better!

Now, each argument (and value if needed) is added to the array as quote elements without any extra spacing because the array expansion takes care of it.

We could even put the ffmpeg call directly into the array and use command evaluation with eval:

shell

CMD=("ffmpeg")
CMD+=("-i" "${INPUT}")

# ...

eval "${CMD[@]}"

(Associative) Storing Arguments

Writing a script that accepts multiple key-value arguments or flags to make it as flexible as possible can be daunting. But with a while loop, a case, and an associative array, we have it up and running in no time!

Here’s a script parsing arguments and flags for connecting to a server. For simplification reasons, there’s no error handling of the parsed arguments:

shell

#!/bin/bash

usage() {
    echo "Usage: $0 -s/--server <server> -p/--port <port> [--insecure]"
    exit 1
}

# DECLARE ARGUMENT STORAGE AND DEFAULT VALUES

declare -A ARGS=(
    ["server"]="localhost"
    ["port"]="8123"
)

# PARSE SCRIPT ARGUMENTS TO FILL/OVERRIDE THE GAPS

while [[ "$#" -gt 0 ]]; do
    case $1 in
        -s|--server) ARGS["server"]="$2"; shift ;;
        -p|--port) ARGS["port"]="$2"; shift ;;
        --secure) ARGS["insecure"]="1" ;;
        *) usage ;;
    esac

    shift
done

# ...

The advantages and disadvantages of using an associative array over dedicated variables for each option are debatable. But personally, I like the encapsulation of all arguments into the single variable ARGS instead of ARGS_SERVER, ARGS_PORT, etc.

Conclusion

Even though arrays aren’t a POSIX-standard feature, they are worth exploring in the shell of your choice. This article focused on tthe Bash implementation, but arrays are available in many other shells, often with slight variations.

If you’re not using Bash, you should still check out the specifics for your shell. Most of the concepts and syntax working in Bash are applicable to many other shells. Although, look out for the differences!

You could also use the shebang #!/usr/bin/env bash to ensure your script is “kind of” portable as it wants to run in a Bash environment. That’s actually what I usually do, as my weapon of choice is Zsh, but with Bash 4+ available if needed.

Resources

Reference Manual: Arrays (gnu.org)
You don’t know Bash: An introduction to Bash arrays (opensource.com)

#shell

Support Me on Ko-fi