Unable to mkfs EBS volume and AWS provisioning confusion

Hi all,

I’m learning how to deploy mindLamp and I need help with provisioning resources on AWS.

I’ve followed the instructions to launch an instance.

Now I’m in session manager to run these commands, starting with mkfs -t xfs /dev/xvda which results in mkfs.xfs: cannot open /dev/xvda: Permission denied.

So I run sudo -t xfs /dev/xvda, which results in mkfs.xfs: cannot open /dev/xvda: Device or resource busy.

Is this because the EBS volume is attached to EC2 instance? Do I need to format the EBS volume before attaching to EC2 instance?

Another thing I’m confused about is hostname.
In the instance commands we run sudo hostnamectl set-hostname <MY_DNS_NAME>.
What is MY_DNS_NAME?
Also, in the session manager commands we run hostnamectl set-hostname node-01.example.com. Why? Is that redundant to the earlier command?

Thank you for your help.

Okay, so I needed to target the partition at /dev/xvda1 instead of the disk at /dev/xvda.

Next, I notice that there is no value when I run blkid -s UUID -o value /dev/xvda1.

Is that expected?

Is this because the EBS volume is attached to EC2 instance? Do I need to format the EBS volume before attaching to EC2 instance?

No, the EBS volume should be formatted after it is attached by AWS, and once formatted it should then be mounted.

Okay, so I needed to target the partition at /dev/xvda1 instead of the disk at /dev/xvda .

This is likely supported/a working possibility but I’m not sure if it’s supported to partition the disk first - you may not be able to resize the whole disk at a later point.

Next, I notice that there is no value when I run blkid -s UUID -o value /dev/xvda1.

This is correct, since a partition is not a block device (disk) and won’t have a UUID.

What is MY_DNS_NAME ?

This is what you’ve configured as your public DNS name for your instance of the LAMP Platform - example.com should be replaced by this domain too. (For example, lamp.digitalpsych.org, etc. which you may purchase and configure through AWS Route53 or other domain name services.)

Also, in the session manager commands we run hostnamectl set-hostname node-01.example.com . Why? Is that redundant to the earlier command?

The set-hostname command should only be run once with your DNS-configured domain name, not node-01.example.com.

Hi Aditya!

Thank you so much for taking the time to help me learn how to properly deploy mindLamp.

Here’s my current setup, all in text form, so searchable and hopefully helpful to others having issues while shedding light onconfig errors I’m making which are producing my current deployment issues.

I’m just learning how to deploy so I want to use Public IPv4 DNS assigned to EC2 instance.

Note: all values are for throwaway environment for learning purposes and won’t be used in production.

EC2 Instance Info:

Public IPv4 address: 34.216.140.177
Private IPv4 addresses: 172.31.27.154
Public IPv4 DNS: ec2-34-216-140-177.us-west-2.compute.amazonaws.com

Value of static hostname, as output by hostnamectl: ec2-34-216-140-177.us-west-2.compute.amazonaws.com

Security groups

Security group rule ID Port range Protocol Source Security groups
sgr-04287eeb60794cfec 7946 TCP sg-0148b5470d87e4246 launch-wizard-3
sgr-013d641263afef19e 2375 TCP sg-0148b5470d87e4246 launch-wizard-3
sgr-04e122617bc709ab9 7946 UDP sg-0148b5470d87e4246 launch-wizard-3
sgr-03d8e309104422c83 2377 TCP sg-0148b5470d87e4246 launch-wizard-3
sgr-0ca0b9a870f551d1d 80 TCP ::/0 launch-wizard-3
sgr-0d78f609a178aa2bb 443 TCP ::/0 launch-wizard-3
sgr-0ee0630eecafaaed9 22 TCP 0.0.0.0/0 launch-wizard-3
sgr-0c5cc59a06ad4a897 80 TCP 0.0.0.0/0 launch-wizard-3
sgr-03c04a9f1905482de 2376 TCP sg-0148b5470d87e4246 launch-wizard-3
sgr-051f9d184a2c7d0b2 4789 UDP sg-0148b5470d87e4246 launch-wizard-3

Route53 configuration:

Record Name Type Routing Policy Differentiator Value/Route traffic to
ec2-34-216-140-177.us-west-2.compute.amazonaws.com A Simple 34.216.140.177
ec2-34-216-140-177.us-west-2.compute.amazonaws.com NS Simple ns-14.awsdns-01.com. ns-879.awsdns-45.net. ns-1107.awsdns-10.org. ns-1955.awsdns-52.co.uk.
ec2-34-216-140-177.us-west-2.compute.amazonaws.com SOA Simple ns-14.awsdns-01.com. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
*.ec2-34-216-140-177.us-west-2.compute.amazonaws.com A Simple 34.216.140.177

Pertinent configuration values for lamp.yml, otherwise all other configuration values are copied from lamp.yml file.

...
CDB: 'http://admin:6628680d88278f32@db.ec2-34-216-140-177.us-west-2.compute.amazonaws.com:5984/'
...
traefik.http.routers.lamp_server.rule: 'Host(`api.ec2-34-216-140-177.us-west-2.compute.amazonaws.com`)'
...
traefik.http.routers.lamp_database.rule: 'Host(`db.ec2-34-216-140-177.us-west-2.compute.amazonaws.com`)'
...
COUCHDB_USER: 'admin'
COUCHDB_PASSWORD: '6628680d88278f32'
...

When I run docker stack deploy --compose-file lamp.yml lamp to deploy components, I see 4 services:

ID NAME MODE REPLICAS IMAGE PORTS
jv95ft0a9cqg lamp_cache replicated 1/1 redis:6.0.8-alpine
knl6mezjd5d9 lamp_database replicated 0/1 apache/couchdb:3.1.1
2d853i4erema lamp_message_queue replicated 1/1 nats:2.1.9-alpine3.12
kp9ug9w4aooc lamp_server replicated 0/1 ghcr.io/bidmcdigitalpsychiatry/lamp-server:2022

When checking the logs of lamp_cache and lamp_message_queue I see no errors and a final log stating that the service is listening for connections.

Logs for lamp_database returns no results.

Logs for lamp_server:
I see multiple

lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     Error in Connecting to nats pub server
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     redis connection error Error: getaddrinfo ENOTFOUND cache
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26) {
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       errno: -3008,
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       code: 'ENOTFOUND',
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       syscall: 'getaddrinfo',
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       hostname: 'cache'
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     }

followed by

lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     Connected to redis
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     Connected to nats pub server

It looks like the CouchDB connection is failing:

lamp_server.1.srrfvk62fgnn@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    | COUCHDB adapter in use 
lamp_server.1.srrfvk62fgnn@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    | Initializing LAMP API server...
lamp_server.1.9d9lxy5eyiht@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |   Trying to connect redis
lamp_server.1.9d9lxy5eyiht@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |   Initializing database connection...
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     Error: getaddrinfo ENOTFOUND db.ec2-34-216-140-177.us-west-2.compute.amazonaws.com
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26) {
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       scope: 'socket',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       errid: 'request',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       errno: -3008,
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       code: 'ENOTFOUND',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       syscall: 'getaddrinfo',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       hostname: 'db.ec2-34-216-140-177.us-west-2.compute.amazonaws.com',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       description: 'getaddrinfo ENOTFOUND db.ec2-34-216-140-177.us-west-2.compute.amazon
aws.com',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       stacktrace: [
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         'Error: getaddrinfo ENOTFOUND db.ec2-34-216-140-177.us-west-2.compute.amazonaws.
com',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         '    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26)'
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       ]
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     }

I’m thinking that if the CouchDB connection were successful I would be seeing further logs from this file below line 143.

Please let me know if there’s config information that I’m missing which would help to shed light on my issue.

Thank you.

@hunterlester I think the issue is in your lamp.yml file:

CDB: 'http://admin:6628680d88278f32@db.ec2-34-216-140-177.us-west-2.compute.amazonaws.com:5984/'

The domain name used here should NOT be the external one used by your server, it should be the local one under which the docker service is named. In lamp.yml, the CouchDB service is named database, so your URL should be:

CDB: 'http://admin:6628680d88278f32@database:5984/'

Another note - you will need to set up SSL (either using Traefik as we suggest in the docs, or another load balancer using AWS certs, etc.) as LAMP does not allow unencrypted connections.

//thanks to @LukeS for his assistance on this reply

Thank you Aditya.

I changed back to http://admin:6628680d88278f32@database:5984, which results in the same set of errors.

The reason I originally changed it to the public IPv4 DNS subdomain was because of the ENOTFOUND error. Thought maybe I need to explicitly define address since no database defined in /etc/hosts, but maybe Traefik handles that, idk.

lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     Error: getaddrinfo ENOTFOUND database
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26) {
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       scope: 'socket',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       errid: 'request',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       errno: -3008,
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       code: 'ENOTFOUND',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       syscall: 'getaddrinfo',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       hostname: 'database',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       description: 'getaddrinfo ENOTFOUND database',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       stacktrace: [
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         'Error: getaddrinfo ENOTFOUND database',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         '    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26)'
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       ]
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     }

I suppose I’m expecting to see a different set of error messages for lack of SSL connection such as ECONNRESET or ERR_SSL_PROTOCOL_ERROR.

Good learnings.
We’ll get there.
Thank you for the help.

Looking for my next clue, I’m looking at the syscall associated with the error.

A DNS lookup seems to be attempted to find the IP address associated with database.

So that DNS to IP mapping either doesn’t exist or isn’t being looked for in the right place.

This is the current list of networks:

NETWORK ID NAME DRIVER SCOPE
d6a05ef5be61 bridge bridge local
e36401672375 docker_gwbridge bridge local
7ae9605a3d7e host host local
l5xgqzhrseq4 ingress overlay swarm
nngab982syo4 lamp_default overlay swarm
f0fbfb0ea3ab none null local
i4nnvq9eahxv public overlay swarm

When inspecting a network called lamp_default I see all services exposed on this overlay:

[
    {
        "Name": "lamp_default",
        "Id": "nngab982syo4y2jkw0ztuhhkc",
        "Created": "2022-02-21T07:11:32.514985557Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.2.0/24",
                    "Gateway": "10.0.2.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "627a874e1b733b1615d670b5ec418522fdba272f1ce57d872300b85542fa5b68": {
                "Name": "lamp_server.1.0hlt3kdtipnkejowtm5889md0",
                "EndpointID": "44d1c8ee4deb3c561b3ae63b8cb95e147374054fa6eb4a2474653a24425150db",
                "MacAddress": "02:42:0a:00:02:0d",
                "IPv4Address": "10.0.2.13/24",
                "IPv6Address": ""
            },
            "9709463fec510c9b011ea6fd78901afa60cccdbfe8ed1da96406a0378c603f66": {
                "Name": "lamp_message_queue.1.iuqpcku1vrei3giu8g5fqy2un",
                "EndpointID": "1ec64821202d89450b518dff1ea234dbdcfc6a870f53cc34232ec7c4c10fdcfb",
                "MacAddress": "02:42:0a:00:02:03",
               "IPv4Address": "10.0.2.3/24",
                "IPv6Address": ""
            },
            "cd7d2005e4ade08c0eff309d553bf73400f36654947534857e66d878ab6c4e22": {
                "Name": "lamp_cache.1.mrj4tlzjrzcz6uumfred2d0nr",
                "EndpointID": "4e86093bbf52f5d02000ea591525db58d19c85f38e59147609d5981a99674f06",
                "MacAddress": "02:42:0a:00:02:08",
                "IPv4Address": "10.0.2.8/24",
                "IPv6Address": ""
            },
            "lb-lamp_default": {
                "Name": "lamp_default-endpoint",
                "EndpointID": "f68f232403132f767ec868714efc3f78fd610152b0f0945db3e3ae362b36a08a",
                "MacAddress": "02:42:0a:00:02:04",
                "IPv4Address": "10.0.2.4/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4098"
        },
        "Labels": {
            "com.docker.stack.namespace": "lamp"
        },
        "Peers": [
            {
                "Name": "1d46652e82f6",
                "IP": "172.31.27.154"
            }
        ]
    }
]

One thing I notice in Containers property value is there I don’t see a database container.

Then when I list out containers I see server, redis, and message_queue server but no couch database server.

This feels like progress.

I just need to figure out why the database container is failing to start.
Something must be wrong with my configuration.

1 Like

You may not have the networks: - public definition as part of the database service. Docker should handle all the DNS/hosts-mapping related work for you there.

I’ve checked that the config file has networks: - public in the database configuration.
I’ve verified that curl --fail --silent http://localhost:5984/_up fails.

database:
  image: apache/couchdb:3.1.1
  healthcheck:
    test: curl --fail --silent http://localhost:5984/_up || exit 1
  environment:
    COUCHDB_USER: 'admin'
    COUCHDB_PASSWORD: '6628680d88278f32'
  volumes:
    - /data/couchdb:/opt/couchdb/data
  networks:
    - public
  deploy:
    mode: replicated
    update_config:
      order: stop-first
      failure_action: rollback
    labels:
      traefik.enable: 'true'
      traefik.http.routers.lamp_database.entryPoints: 'websecure'
      traefik.http.routers.lamp_database.rule: 'Host(`db.ec2-34-219-47-208.us-west-2.compute.amazonaws.com`)'
      traefik.http.routers.lamp_database.tls.certresolver: 'default'
      traefik.http.services.lamp_database.loadbalancer.server.port: 5984
    placement:
      constraints:
        - node.role == manager

Back to the lamp_default network, I should verify the expected output of inspecting the work before I assume that the lamp_database container is not running. Should I be seeing a container on that network called something like lamp_database.1.xxxxxxxxxxx?

@hunterlester Interestingly, this might be an issue with CouchDB (i.e. the Dockerfile/some related Swarm setup) itself?

Instead of continuing down this path, you might actually be better off switching to MongoDB. (See this post for an example of others using Mongo instead of Couch.)

Should I be seeing a container on that network called something like lamp_database.1.xxxxxxxxxxx ?

That’s correct – if it’s not appearing, that means it’s not connected to the network or failed to initialize correctly.

Okay, so my main lesson here was that I need to pay more attention to logs, both at the container and service level.

I finally inspected the logs of the container running the lamp_database service and saw a Shutdown status for the container due to apache/couchdb:3.1.1 image not existing.

I looked in Docker Hub and found that the tag for 3.1.1 no longer exists.
I changed image tag version for the couchdb image in my lamp.yml to 3.1 and now upon running docker stack deploy --compose-file lamp.yml lamp, when inspecting lamp_default network, I see the lamp_database service exposed and available on the network.

Now all services are running successfully, I just need to figure out SSL stuff.

Thank you!

Update 07-deploying.md by hunterlester · Pull Request #552 · BIDMCDigitalPsychiatry/LAMP-platform (github.com)

1 Like

Glad the issue is resolved! Just to keep in mind, we do recommend using MongoDB or a managed MongoDB solution (AWS DocumentDB or Azure CosmosDB) in production though.

1 Like