LanguageTool Docker

Telos · 8. Sep 2022

one-eyed-king said:
Does it allow to install the ngrams?

Funny you should ask... I just did.

one-eyed-king · 9. Sep 2022

I decided not to fork the github project due to some existing design choices that don't make sense to me (why build the languagetool distribution when a download is available?). Instead I use the languagetool image as base for my own image, which adds the fasttext binary and fixes a nasty pid1 problem in the Dockerfile.

In case someone wants to try it, just store the content of the codeblock in a file called Dockerfile and run the command docker build -t languagetool:latest . (the dot at the end is not a mistake) to build your own image with fasttext:

Code:

FROM alpine:3.16.2 as build
RUN apk add git build-base --no-cache
RUN git clone https://github.com/facebookresearch/fastText.git \
    && cd fastText \
    && make

FROM erikvl87/languagetool:latest
COPY --chown=languagetool --from=build /fastText/fasttext .

# fix pid1 issue
RUN chmod +x /LanguageTool/start.sh
ENTRYPOINT ["/LanguageTool/start.sh"]

Set environments to enable fasttext:
langtool_fasttextBinary with the value /LanguageTool/fasttest to point to the fasttext binary.
langtool_fasttextModel and the path to the fasttext model you downloaded and mapped as volume into the container.

Note: the entrypoint script also needs pid1 fixing. The way the ENTRYPOINT (which was a CMD before) is declared and the way languagetool is called from inside the entrypoint script prevent proper signal handling on the original image (when the container is stopped or killed).

Both will be fixed when I find time to test the image and push it to dockerhub.

one-eyed-king · 9. Sep 2022

I forgot to mention that the fasttext model can be downloaded from here: Language identification · fastText

Dr_Frankenstein · 9. Sep 2022

OH SOZ When I posted this it was not showing all the recent replies above - I had not refreshed the page since yesterday!

OK it looks like the environment variable 'langtool_' passes anything at the end of it into the config file. I checked the config file inside the container to confirm this.

So I have made some progress.. just trying to get the fasttext executable working at this point, I worked through some errors on the mount points which confirms the fasttext files are being recognised.

This is my current compose - I have amended the mount points so the ngrams are in their own folder and the fasttext and lid.176.bin are in /fasttext

Just trying to get past the error, I have linked the start up log were the fasttext is not being run correctly. This stops the whole container coming up - so close!
PrivateBin

YAML:

version: "3.8"
services:
  languagetool:
    image: erikvl87/languagetool
    container_name: languagetool
    environment:
        - langtool_languageModel=/ngrams
        - langtool_fasttextModel=/fasttext/lid.176.bin
        - langtool_fasttextBinary=/fasttext/fasttext
        - Java_Xms=1g
        - Java_Xmx=2g
    volumes:
        - /volume3/docker/languagetool/ngrams:/ngrams
        - /volume3/docker/languagetool/fasttext:/fasttext
    ports:
      - 8010:8010
    network_mode: synobridge

one-eyed-king · 10. Sep 2022

@Dr_Frankenstein you might want to build your own image using the approach I shared above. It specificly compiles fasttext in Alpine - which is what the languagetool image is based on - and copies that version into the self build languagetool image. This ensures that fasttext works. The resulting image is just 1mb larger in size, as it will reuse all layers of erikvl87/languagetool.

I fixed the pid1 problem, but didn't like that the container does terminate with a an error code of 143 (TERM) or 137 (KILL). Because the java application itself does not implement a termination handler, error code 143 is the correct one and Inevitable as long as it is not implemented. Only if they implement a termination handler, the error code can become 0.

The way the original image handles the entrypoint script, it is possible to get zombie processes, which is unaccaptable from my point of view.

one-eyed-king · 10. Sep 2022

I created an image and pushed it to dockerhub: meyay/languagetool. It's completly written from the scratch and borrows some ideas of already existing languagetool images.

It already commes with the fasttext binary baked in and optionaly allows to download ngrams and the fasttext model. It also uses a more recent Java version (17 instead of 11), which is not going that be EOL next year. And it chowns the model folders to allign with the UID/GID of the user the restricted user that executes the process inside the container.

If something is unclear in the Dockerhub description let me know.

I prepared the image, so I can add user mapping later (most of you refere to it as PUID, PGID), but I didn't add it right away.

one-eyed-king · 10. Sep 2022

I now added user mapping as well.

Dr_Frankenstein · 17. Sep 2022

Hey – fantastic work – I will spin this up on my personal setup and do some testing!

Dr_Frankenstein · 21. Sep 2022

OK everything seems to be working well, I am going to migrate my guide to use your image!

one-eyed-king · 21. Sep 2022

When I find time, I will add Github Actions to make the image build, test, scan for vulnerabilities and push it to Dockerhub when the base image or the LanguageTool version updates.

Long story short, adding automated builds to keep the image as safe as possible are on my todo list.

one-eyed-king · 28. Sep 2022

I just pushed an updated image with languagetool 5.9.

Telos · 6. Jun 2023

In the docker-compose model, there is no volume mount that would give persistence to words added to the user's custom dictionary, or white-listed sites (when "all sites" is not enabled).

Before I dive into the container files, has anyone located these, so that they may be mapped to a persistent directory? I'm tired of re-entering this stuff when I restart the container. TIA!

one-eyed-king · 6. Jun 2023

I never used a custom dictionary.

Can't that impossible to add

Figure out the target location in the container, then add a volume to that location, then add your custom dictionary.

How does your current process to add a custom dictionary look like?

Telos · 6. Jun 2023

one-eyed-king said:
How does your current process to add a custom dictionary look like?

Maybe I've used the incorrect terminology, but when LT identifies as a misspelling... for example "SynoForum", I can whitelist that such that it does not come up in later corrections. So terms like Plex, RAID5, btrfs...

which are not natural to most dictionaries, I can auto-disregard.

one-eyed-king · 6. Jun 2023

Ah, I never noticed the feature

The location inside the container shouldn't be so hard to find out.

I had no idea this feature was hidden under the dictionary item.

-- post merged: 6. Jun 2023 --

Are you sure the dictionary is actually stored on the languagetool server and not in the browser plugin itself?

I just added a word, and looked inside the container using find /languagetool -type f -mtime -1. No files have been added or changed inside the container.

-- post merged: 6. Jun 2023 --

I just installed the lt plugin in a different browser, configured the plugin to use my existing server.
The personal dictionary was empty, words I stored using the other browser are identified as misspelled.

It appears the personal dictionary is not synched to the lt server.

Dr_Frankenstein · 6. Jun 2023

I will have a play with this as well, see if it syncs between devices..

Telos · 6. Jun 2023

For me neither the white-listed domains nor the personal dictionary entries seem to sync. So either they are stored separately by device, or are tied up with the browser extensions.

one-eyed-king · 6. Jun 2023

Looks like the sync feature is only available when a lt-account (as in subscribed online account) is used.

-- post merged: 6. Jun 2023 --

Telos said:
So either they are stored separately by device

I tested two browsers on the same device. It is tied in into the browser extension.

Telos · 6. Jun 2023

That's discouraging. There seems to be no simple way to back up your settings.

LanguageTool Docker

Currently reading
LanguageTool Docker

Telos

one-eyed-king

one-eyed-king

Dr_Frankenstein

one-eyed-king

one-eyed-king

one-eyed-king

Dr_Frankenstein

Dr_Frankenstein

one-eyed-king

one-eyed-king

Telos

one-eyed-king

Telos

one-eyed-king

Dr_Frankenstein

Telos

one-eyed-king

Telos

Similar threads

Trending threads

Forum statistics

We value your privacy

LanguageTool Docker

Currently reading LanguageTool Docker

Similar threads

We value your privacy

Currently reading
LanguageTool Docker