Unable to mkfs EBS volume and AWS provisioning confusion

hunterlester · February 6, 2022, 6:07am

Hi all,

I’m learning how to deploy mindLamp and I need help with provisioning resources on AWS.

I’ve followed the instructions to launch an instance.

Now I’m in session manager to run these commands, starting with mkfs -t xfs /dev/xvda which results in mkfs.xfs: cannot open /dev/xvda: Permission denied.

So I run sudo -t xfs /dev/xvda, which results in mkfs.xfs: cannot open /dev/xvda: Device or resource busy.

Is this because the EBS volume is attached to EC2 instance? Do I need to format the EBS volume before attaching to EC2 instance?

Another thing I’m confused about is hostname.
In the instance commands we run sudo hostnamectl set-hostname <MY_DNS_NAME>.
What is MY_DNS_NAME?
Also, in the session manager commands we run hostnamectl set-hostname node-01.example.com. Why? Is that redundant to the earlier command?

Thank you for your help.

hunterlester · February 7, 2022, 4:00am

Okay, so I needed to target the partition at /dev/xvda1 instead of the disk at /dev/xvda.

Next, I notice that there is no value when I run blkid -s UUID -o value /dev/xvda1.

Is that expected?

avaidyam · February 7, 2022, 2:46pm

Is this because the EBS volume is attached to EC2 instance? Do I need to format the EBS volume before attaching to EC2 instance?

No, the EBS volume should be formatted after it is attached by AWS, and once formatted it should then be mounted.

Okay, so I needed to target the partition at /dev/xvda1 instead of the disk at /dev/xvda .

This is likely supported/a working possibility but I’m not sure if it’s supported to partition the disk first - you may not be able to resize the whole disk at a later point.

Next, I notice that there is no value when I run blkid -s UUID -o value /dev/xvda1.

This is correct, since a partition is not a block device (disk) and won’t have a UUID.

What is MY_DNS_NAME ?

This is what you’ve configured as your public DNS name for your instance of the LAMP Platform - example.com should be replaced by this domain too. (For example, lamp.digitalpsych.org, etc. which you may purchase and configure through AWS Route53 or other domain name services.)

Also, in the session manager commands we run hostnamectl set-hostname node-01.example.com . Why? Is that redundant to the earlier command?

The set-hostname command should only be run once with your DNS-configured domain name, not node-01.example.com.

hunterlester · February 9, 2022, 7:11am

Hi Aditya!

Thank you so much for taking the time to help me learn how to properly deploy mindLamp.

Here’s my current setup, all in text form, so searchable and hopefully helpful to others having issues while shedding light onconfig errors I’m making which are producing my current deployment issues.

I’m just learning how to deploy so I want to use Public IPv4 DNS assigned to EC2 instance.

Note: all values are for throwaway environment for learning purposes and won’t be used in production.

EC2 Instance Info:

Public IPv4 address: 34.216.140.177
Private IPv4 addresses: 172.31.27.154
Public IPv4 DNS: ec2-34-216-140-177.us-west-2.compute.amazonaws.com

Value of static hostname, as output by hostnamectl: ec2-34-216-140-177.us-west-2.compute.amazonaws.com

Security groups

Security group rule ID	Port range	Protocol	Source	Security groups
sgr-04287eeb60794cfec	7946	TCP	sg-0148b5470d87e4246	launch-wizard-3
sgr-013d641263afef19e	2375	TCP	sg-0148b5470d87e4246	launch-wizard-3
sgr-04e122617bc709ab9	7946	UDP	sg-0148b5470d87e4246	launch-wizard-3
sgr-03d8e309104422c83	2377	TCP	sg-0148b5470d87e4246	launch-wizard-3
sgr-0ca0b9a870f551d1d	80	TCP	::/0	launch-wizard-3
sgr-0d78f609a178aa2bb	443	TCP	::/0	launch-wizard-3
sgr-0ee0630eecafaaed9	22	TCP	0.0.0.0/0	launch-wizard-3
sgr-0c5cc59a06ad4a897	80	TCP	0.0.0.0/0	launch-wizard-3
sgr-03c04a9f1905482de	2376	TCP	sg-0148b5470d87e4246	launch-wizard-3
sgr-051f9d184a2c7d0b2	4789	UDP	sg-0148b5470d87e4246	launch-wizard-3

Route53 configuration:

Record Name	Type	Routing Policy	Differentiator	Value/Route traffic to
ec2-34-216-140-177.us-west-2.compute.amazonaws.com	A	Simple	–	34.216.140.177
ec2-34-216-140-177.us-west-2.compute.amazonaws.com	NS	Simple	–	ns-14.awsdns-01.com. ns-879.awsdns-45.net. ns-1107.awsdns-10.org. ns-1955.awsdns-52.co.uk.
ec2-34-216-140-177.us-west-2.compute.amazonaws.com	SOA	Simple	–	ns-14.awsdns-01.com. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
*.ec2-34-216-140-177.us-west-2.compute.amazonaws.com	A	Simple	–	34.216.140.177

Pertinent configuration values for lamp.yml, otherwise all other configuration values are copied from lamp.yml file.

...
CDB: 'http://admin:6628680d88278f32@db.ec2-34-216-140-177.us-west-2.compute.amazonaws.com:5984/'
...
traefik.http.routers.lamp_server.rule: 'Host(`api.ec2-34-216-140-177.us-west-2.compute.amazonaws.com`)'
...
traefik.http.routers.lamp_database.rule: 'Host(`db.ec2-34-216-140-177.us-west-2.compute.amazonaws.com`)'
...
COUCHDB_USER: 'admin'
COUCHDB_PASSWORD: '6628680d88278f32'
...

When I run docker stack deploy --compose-file lamp.yml lamp to deploy components, I see 4 services:

ID	NAME	MODE	REPLICAS	IMAGE
jv95ft0a9cqg	lamp_cache	replicated	1/1	redis:6.0.8-alpine
knl6mezjd5d9	lamp_database	replicated	0/1	apache/couchdb:3.1.1
2d853i4erema	lamp_message_queue	replicated	1/1	nats:2.1.9-alpine3.12
kp9ug9w4aooc	lamp_server	replicated	0/1	ghcr.io/bidmcdigitalpsychiatry/lamp-server:2022

When checking the logs of lamp_cache and lamp_message_queue I see no errors and a final log stating that the service is listening for connections.

Logs for lamp_database returns no results.

Logs for lamp_server:
I see multiple

lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     Error in Connecting to nats pub server
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     redis connection error Error: getaddrinfo ENOTFOUND cache
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26) {
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       errno: -3008,
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       code: 'ENOTFOUND',
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       syscall: 'getaddrinfo',
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       hostname: 'cache'
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     }

followed by

lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     Connected to redis
lamp_server.1.kvg1dvu468hw@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     Connected to nats pub server

It looks like the CouchDB connection is failing:

lamp_server.1.srrfvk62fgnn@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    | COUCHDB adapter in use 
lamp_server.1.srrfvk62fgnn@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    | Initializing LAMP API server...
lamp_server.1.9d9lxy5eyiht@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |   Trying to connect redis
lamp_server.1.9d9lxy5eyiht@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |   Initializing database connection...
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     Error: getaddrinfo ENOTFOUND db.ec2-34-216-140-177.us-west-2.compute.amazonaws.com
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26) {
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       scope: 'socket',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       errid: 'request',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       errno: -3008,
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       code: 'ENOTFOUND',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       syscall: 'getaddrinfo',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       hostname: 'db.ec2-34-216-140-177.us-west-2.compute.amazonaws.com',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       description: 'getaddrinfo ENOTFOUND db.ec2-34-216-140-177.us-west-2.compute.amazon
aws.com',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       stacktrace: [
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         'Error: getaddrinfo ENOTFOUND db.ec2-34-216-140-177.us-west-2.compute.amazonaws.
com',
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         '    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26)'
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       ]
lamp_server.1.ri4yk7zybk3z@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     }

I’m thinking that if the CouchDB connection were successful I would be seeing further logs from this file below line 143.

Please let me know if there’s config information that I’m missing which would help to shed light on my issue.

Thank you.

avaidyam · February 10, 2022, 5:19pm

@hunterlester I think the issue is in your lamp.yml file:

CDB: 'http://admin:6628680d88278f32@db.ec2-34-216-140-177.us-west-2.compute.amazonaws.com:5984/'

The domain name used here should NOT be the external one used by your server, it should be the local one under which the docker service is named. In lamp.yml, the CouchDB service is named database, so your URL should be:

CDB: 'http://admin:6628680d88278f32@database:5984/'

Another note - you will need to set up SSL (either using Traefik as we suggest in the docs, or another load balancer using AWS certs, etc.) as LAMP does not allow unencrypted connections.

//thanks to @LukeS for his assistance on this reply

hunterlester · February 11, 2022, 3:39am

Thank you Aditya.

I changed back to http://admin:6628680d88278f32@database:5984, which results in the same set of errors.

The reason I originally changed it to the public IPv4 DNS subdomain was because of the ENOTFOUND error. Thought maybe I need to explicitly define address since no database defined in /etc/hosts, but maybe Traefik handles that, idk.

lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     Error: getaddrinfo ENOTFOUND database
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26) {
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       scope: 'socket',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       errid: 'request',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       errno: -3008,
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       code: 'ENOTFOUND',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       syscall: 'getaddrinfo',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       hostname: 'database',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       description: 'getaddrinfo ENOTFOUND database',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       stacktrace: [
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         'Error: getaddrinfo ENOTFOUND database',
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |         '    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:72:26)'
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |       ]
lamp_server.1.pxllxks98ass@ec2-34-216-140-177.us-west-2.compute.amazonaws.com    |     }

I suppose I’m expecting to see a different set of error messages for lack of SSL connection such as ECONNRESET or ERR_SSL_PROTOCOL_ERROR.

Good learnings.
We’ll get there.
Thank you for the help.

hunterlester · February 21, 2022, 7:38am

Looking for my next clue, I’m looking at the syscall associated with the error.

A DNS lookup seems to be attempted to find the IP address associated with database.

So that DNS to IP mapping either doesn’t exist or isn’t being looked for in the right place.

This is the current list of networks:

NETWORK ID	NAME	DRIVER	SCOPE
d6a05ef5be61	bridge	bridge	local
e36401672375	docker_gwbridge	bridge	local
7ae9605a3d7e	host	host	local
l5xgqzhrseq4	ingress	overlay	swarm
nngab982syo4	lamp_default	overlay	swarm
f0fbfb0ea3ab	none	null	local
i4nnvq9eahxv	public	overlay	swarm

When inspecting a network called lamp_default I see all services exposed on this overlay:

[
    {
        "Name": "lamp_default",
        "Id": "nngab982syo4y2jkw0ztuhhkc",
        "Created": "2022-02-21T07:11:32.514985557Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.2.0/24",
                    "Gateway": "10.0.2.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "627a874e1b733b1615d670b5ec418522fdba272f1ce57d872300b85542fa5b68": {
                "Name": "lamp_server.1.0hlt3kdtipnkejowtm5889md0",
                "EndpointID": "44d1c8ee4deb3c561b3ae63b8cb95e147374054fa6eb4a2474653a24425150db",
                "MacAddress": "02:42:0a:00:02:0d",
                "IPv4Address": "10.0.2.13/24",
                "IPv6Address": ""
            },
            "9709463fec510c9b011ea6fd78901afa60cccdbfe8ed1da96406a0378c603f66": {
                "Name": "lamp_message_queue.1.iuqpcku1vrei3giu8g5fqy2un",
                "EndpointID": "1ec64821202d89450b518dff1ea234dbdcfc6a870f53cc34232ec7c4c10fdcfb",
                "MacAddress": "02:42:0a:00:02:03",
               "IPv4Address": "10.0.2.3/24",
                "IPv6Address": ""
            },
            "cd7d2005e4ade08c0eff309d553bf73400f36654947534857e66d878ab6c4e22": {
                "Name": "lamp_cache.1.mrj4tlzjrzcz6uumfred2d0nr",
                "EndpointID": "4e86093bbf52f5d02000ea591525db58d19c85f38e59147609d5981a99674f06",
                "MacAddress": "02:42:0a:00:02:08",
                "IPv4Address": "10.0.2.8/24",
                "IPv6Address": ""
            },
            "lb-lamp_default": {
                "Name": "lamp_default-endpoint",
                "EndpointID": "f68f232403132f767ec868714efc3f78fd610152b0f0945db3e3ae362b36a08a",
                "MacAddress": "02:42:0a:00:02:04",
                "IPv4Address": "10.0.2.4/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4098"
        },
        "Labels": {
            "com.docker.stack.namespace": "lamp"
        },
        "Peers": [
            {
                "Name": "1d46652e82f6",
                "IP": "172.31.27.154"
            }
        ]
    }
]

One thing I notice in Containers property value is there I don’t see a database container.

Then when I list out containers I see server, redis, and message_queue server but no couch database server.

This feels like progress.

I just need to figure out why the database container is failing to start.
Something must be wrong with my configuration.

avaidyam · February 21, 2022, 1:28pm

You may not have the networks: - public definition as part of the database service. Docker should handle all the DNS/hosts-mapping related work for you there.

hunterlester · March 4, 2022, 4:15am

I’ve checked that the config file has networks: - public in the database configuration.
I’ve verified that curl --fail --silent http://localhost:5984/_up fails.

database:
  image: apache/couchdb:3.1.1
  healthcheck:
    test: curl --fail --silent http://localhost:5984/_up || exit 1
  environment:
    COUCHDB_USER: 'admin'
    COUCHDB_PASSWORD: '6628680d88278f32'
  volumes:
    - /data/couchdb:/opt/couchdb/data
  networks:
    - public
  deploy:
    mode: replicated
    update_config:
      order: stop-first
      failure_action: rollback
    labels:
      traefik.enable: 'true'
      traefik.http.routers.lamp_database.entryPoints: 'websecure'
      traefik.http.routers.lamp_database.rule: 'Host(`db.ec2-34-219-47-208.us-west-2.compute.amazonaws.com`)'
      traefik.http.routers.lamp_database.tls.certresolver: 'default'
      traefik.http.services.lamp_database.loadbalancer.server.port: 5984
    placement:
      constraints:
        - node.role == manager

Back to the lamp_default network, I should verify the expected output of inspecting the work before I assume that the lamp_database container is not running. Should I be seeing a container on that network called something like lamp_database.1.xxxxxxxxxxx?

avaidyam · March 4, 2022, 4:26pm

@hunterlester Interestingly, this might be an issue with CouchDB (i.e. the Dockerfile/some related Swarm setup) itself?

Instead of continuing down this path, you might actually be better off switching to MongoDB. (See this post for an example of others using Mongo instead of Couch.)

Should I be seeing a container on that network called something like lamp_database.1.xxxxxxxxxxx ?

That’s correct – if it’s not appearing, that means it’s not connected to the network or failed to initialize correctly.

hunterlester · March 7, 2022, 6:04am

Okay, so my main lesson here was that I need to pay more attention to logs, both at the container and service level.

I finally inspected the logs of the container running the lamp_database service and saw a Shutdown status for the container due to apache/couchdb:3.1.1 image not existing.

I looked in Docker Hub and found that the tag for 3.1.1 no longer exists.
I changed image tag version for the couchdb image in my lamp.yml to 3.1 and now upon running docker stack deploy --compose-file lamp.yml lamp, when inspecting lamp_default network, I see the lamp_database service exposed and available on the network.

Now all services are running successfully, I just need to figure out SSL stuff.

Thank you!

Update 07-deploying.md by hunterlester · Pull Request #552 · BIDMCDigitalPsychiatry/LAMP-platform (github.com)

avaidyam · March 8, 2022, 1:43pm

Glad the issue is resolved! Just to keep in mind, we do recommend using MongoDB or a managed MongoDB solution (AWS DocumentDB or Azure CosmosDB) in production though.

Topic		Replies	Views
Setting up Mindlamp Ver 2 on AWS Data & Analysis	1	281	October 15, 2021
Unable to start server Deployment & Development	42	671	November 13, 2020
Issue with Docker Swarm setup on AWS Deployment & Development	2	759	November 4, 2021
mindLAMP installation Deployment & Development	3	217	April 6, 2023
Deploying the LAMP Platform Data & Analysis	2	248	July 22, 2021

Unable to mkfs EBS volume and AWS provisioning confusion

Related topics