LanguageTool Docker

Currently reading
LanguageTool Docker

I decided not to fork the github project due to some existing design choices that don't make sense to me (why build the languagetool distribution when a download is available?). Instead I use the languagetool image as base for my own image, which adds the fasttext binary and fixes a nasty pid1 problem in the Dockerfile.

In case someone wants to try it, just store the content of the codeblock in a file called Dockerfile and run the command docker build -t languagetool:latest . (the dot at the end is not a mistake) to build your own image with fasttext:

Code:
FROM alpine:3.16.2 as build
RUN apk add git build-base --no-cache
RUN git clone https://github.com/facebookresearch/fastText.git \
    && cd fastText \
    && make

FROM erikvl87/languagetool:latest
COPY --chown=languagetool --from=build /fastText/fasttext .

# fix pid1 issue
RUN chmod +x /LanguageTool/start.sh
ENTRYPOINT ["/LanguageTool/start.sh"]

Set environments to enable fasttext:
langtool_fasttextBinary with the value /LanguageTool/fasttest to point to the fasttext binary.
langtool_fasttextModel and the path to the fasttext model you downloaded and mapped as volume into the container.

Note: the entrypoint script also needs pid1 fixing. The way the ENTRYPOINT (which was a CMD before) is declared and the way languagetool is called from inside the entrypoint script prevent proper signal handling on the original image (when the container is stopped or killed).

Both will be fixed when I find time to test the image and push it to dockerhub.
 
OH SOZ When I posted this it was not showing all the recent replies above - I had not refreshed the page since yesterday!

OK it looks like the environment variable 'langtool_' passes anything at the end of it into the config file. I checked the config file inside the container to confirm this.

So I have made some progress.. just trying to get the fasttext executable working at this point, I worked through some errors on the mount points which confirms the fasttext files are being recognised.

This is my current compose - I have amended the mount points so the ngrams are in their own folder and the fasttext and lid.176.bin are in /fasttext

Just trying to get past the error, I have linked the start up log were the fasttext is not being run correctly. This stops the whole container coming up - so close!
PrivateBin

YAML:
version: "3.8"
services:
  languagetool:
    image: erikvl87/languagetool
    container_name: languagetool
    environment:
        - langtool_languageModel=/ngrams
        - langtool_fasttextModel=/fasttext/lid.176.bin
        - langtool_fasttextBinary=/fasttext/fasttext
        - Java_Xms=1g
        - Java_Xmx=2g
    volumes:
        - /volume3/docker/languagetool/ngrams:/ngrams
        - /volume3/docker/languagetool/fasttext:/fasttext
    ports:
      - 8010:8010
    network_mode: synobridge
 
@Dr_Frankenstein you might want to build your own image using the approach I shared above. It specificly compiles fasttext in Alpine - which is what the languagetool image is based on - and copies that version into the self build languagetool image. This ensures that fasttext works. The resulting image is just 1mb larger in size, as it will reuse all layers of erikvl87/languagetool.

I fixed the pid1 problem, but didn't like that the container does terminate with a an error code of 143 (TERM) or 137 (KILL). Because the java application itself does not implement a termination handler, error code 143 is the correct one and Inevitable as long as it is not implemented. Only if they implement a termination handler, the error code can become 0.

The way the original image handles the entrypoint script, it is possible to get zombie processes, which is unaccaptable from my point of view.
 
I created an image and pushed it to dockerhub: meyay/languagetool. It's completly written from the scratch and borrows some ideas of already existing languagetool images.

It already commes with the fasttext binary baked in and optionaly allows to download ngrams and the fasttext model. It also uses a more recent Java version (17 instead of 11), which is not going that be EOL next year. And it chowns the model folders to allign with the UID/GID of the user the restricted user that executes the process inside the container.

If something is unclear in the Dockerhub description let me know.

I prepared the image, so I can add user mapping later (most of you refere to it as PUID, PGID), but I didn't add it right away.
 
In the docker-compose model, there is no volume mount that would give persistence to words added to the user's custom dictionary, or white-listed sites (when "all sites" is not enabled).

Before I dive into the container files, has anyone located these, so that they may be mapped to a persistent directory? I'm tired of re-entering this stuff when I restart the container. TIA!
 
I never used a custom dictionary.

Can't that impossible to add :) Figure out the target location in the container, then add a volume to that location, then add your custom dictionary.

How does your current process to add a custom dictionary look like?
 
How does your current process to add a custom dictionary look like?
Maybe I've used the incorrect terminology, but when LT identifies as a misspelling... for example "SynoForum", I can whitelist that such that it does not come up in later corrections. So terms like Plex, RAID5, btrfs...
JiyzQhy.png

which are not natural to most dictionaries, I can auto-disregard.
 
Last edited:
Ah, I never noticed the feature :oops:

The location inside the container shouldn't be so hard to find out.

I had no idea this feature was hidden under the dictionary item.

1686076268663.png

-- post merged: --

Are you sure the dictionary is actually stored on the languagetool server and not in the browser plugin itself?

I just added a word, and looked inside the container using find /languagetool -type f -mtime -1. No files have been added or changed inside the container.
-- post merged: --

I just installed the lt plugin in a different browser, configured the plugin to use my existing server.
The personal dictionary was empty, words I stored using the other browser are identified as misspelled.

It appears the personal dictionary is not synched to the lt server.
 
For me neither the white-listed domains nor the personal dictionary entries seem to sync. So either they are stored separately by device, or are tied up with the browser extensions.
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.

Similar threads

For the heck of it, I just checked again in docker container, and it announced an update was available. I...
Replies
4
Views
469
  • Question
Do realize, that enabling any user to run docker containers is largely the same as giving that user full...
Replies
6
Views
1,061
Hello, I already have it configured perfectly with wireguard. I was looking at the Gluetun configuration...
Replies
4
Views
650
Thanks... I tried something similar with rsync. The docker volume lived in...
Replies
7
Views
737
I can’t find any option to restore just the settings. 1710356648 Phew, managed to fix it. Within the...
Replies
4
Views
572
Good to hear. Deluge has not been updated for almost two years now as an app, nevertheless. But it gives...
Replies
12
Views
1,242
  • Question
Open an issue on that GitHub page. The developers will be glad to assist. OP has posted two threads on...
Replies
5
Views
1,216

Welcome to SynoForum.com!

SynoForum.com is an unofficial Synology forum for NAS owners and enthusiasts.

Registration is free, easy and fast!

Back
Top