Today I released version 0.3 of TinySegmenter, a Japanese Tokenizer in pure Python (released in New BSD license), with a single minor fix for proper install on systems not-using UTF-8 (apparently that still exists! :P). Thanks to Mišo Belica for the patch. Apparently some of his Japanese users are using it for Sumy, his software to extract summary from texts.
About TinySegmenter and Japanese tokenization
It’s not much of a release, but it is a good occasion to tell about TinySegmenter. This is a “Tokenizer” for Japanese. What is a tokenizer? Basically it breaks sentences into words. For people who don’t know Japanese, it doesn’t use spaces or any other symbol to separate words. Theybasicallywritelikethis. Yet there are ways to break these sentences into words, usually based on statistical analysis (like most things in Natural Language Processing and Artificial Intelligence in general). For anyone who wants to know a bit more, this message from Kytea developer (another tokenizer, which is great) explains the 2 main methods with some links of software using them (among them Tinysegmenter) and especially keywords (allowing you to search more).
The reason why you want to “tokenize” Japanese or Chinese is that it is often a first step for further natural language analysis (for instance for automatic translation, grammar analysis, pronounciation hence speech synthesis, etc.).
Now the required example, “my name is Jehan” in Japanese is: 私の名前はJehanです。TinySegmenter breaks it like this:
I am not planning on hacking much TinySegmenter anymore. I never was planning to; at the time I took over maintainership, I just wanted to use it for a project (which never went through) and the original developers were not answering. So I just properly packaged it, did minor changes (for instance better support of European words using Latin1 and extended Latin Unicode characters), added some tests, and that’s it. I don’t even use it anymore. Yet if more people are interested and want to use it, feel free to send me patches. I could also give commit rights, and even co-maintainership after a few patches. I just wanted to get these words out. 🙂
I also discover today the existence of a TinySegmenter3 on pypi, with less downloads than TinySegmenter (the older one I maintained, yes I know that’s a bit confusing, why would they keep the same name and just add a 3?) but worth looking at since they apparently improved performance a good deal (I haven’t checked but that’s what it says). Maybe I should look at their code and merge their commits at some points after talking to them?
We’ll see…
Last month, I released Crossroad 0.7. Do you remember Crossroad? My tool to cross-compile for Windows from a Linux platform, which I told about a year ago. Well there is not much to say: small release with bug fixes, minor improvements, update of the third-party pre-built Windows package repository (thanks OpenSUSE!), and so on.
Also there used to be a bug in pip, so any crossroad installed through pip was broken (I had a quick look at the time, and I think it was because it would break the install prefix). Fortunately this bug is apparently fixed so getting crossroad through pip is again the recommended installation:
pip3 install crossroad
The example from last year is still mostly valid so have a look if you want to see better what crossroad can do.
Future: Android, ARM, MIPS…
Though I historically started this project to build GIMP for Windows (when debugging for this platform), I had wanted to go further for some time now. Android cross-compilation, or even bare-metal builds come to mind.
10 days ago, I have started to work on the support for more cross-compilers. It’s not available in 0.7, but it should be in 0.8! I have successfully cross-built glib, babl, GEGL (and half a dozen other dependencies for these) for Android quite easily, in barely a few dozen of minutes (for Android ARM, x86, MIPS, etc.). Crossroad really makes cross-compilation just as easy as native compilation. 🙂
I will make a blog post with examples on cross-compiling Glib and GEGL for Android when Crossroad 0.8 will be out (not now since I may change a few things before the release). But really… if you already know how to use crossroad for building for Windows, then it’s exactly the same for Android (except there is no pre-built package installer; does anyone know if such a repository exist somewhere?). Just give a go to the git version if you can’t wait.
Going to mobile? Wait… is that… GIMP for tablets?
As always, I never develop just for the sake of it: I code because I want this for a longer term project. And I have grown interested in small devices, even though I resisted for a long time (I still barely use my phone other than for calling, and I don’t even call much). I don’t think small devices will just replace full-grown desktops and laptops any time soon (oppositely to what some would tell you), but they are definitely funny devices. So let’s have some fun in building Android (or other small devices) programs! 🙂
Now I know that a lot of people have asked for a GIMP on Android. Let me tell you I’m not sure it will happen just now. Not that it can’t. I don’t see why we could not build it on this platform (I will probably do a cross-build at some point, just for the sake of trying) but I believe it would be utter-crap as-is. GIMP has not been thought for small devices at all (I even have sometimes GUI size issues on my laptop display!) and therefore we should either heavily modify its GUI with conditional code for small touch devices, or simply create a brand new GUI, which is probably a much better idea anyway, with such different usage paradigms. Maybe we could create a new Free Software adapted for smaller devices? If other devs are interested to make one as a continuation of the GIMP project, this could be interesting.
This said, having the main GIMP also more touch-aware would be a very good thing (for screen-tablet users), so who knows how things will evolve…
My first GEGL-powered Android “App”
Now I really wanted to have a go at this so I developed my first application to apply GEGL filters on images. This was also my first Android application, period, so I discovered a lot more than just using native libraries on Android.
I know, there are thousands of these “image effects” applications. Sorry! 😛
Really I just wanted a small and easy stuff based on GEGL, and that popped in my mind. For now, it’s called with the stupid name “Robogoat”, and you are free to look at the code under GPLv3. Current version only applies a Sepia effect (“gegl:sepia” operation) to test that the cross-compiled libgegl works well inside Android (it does!). When it will be ready, we should be able to select any effect from a wide range of GEGL operations. 🙂
If anyone wants to have fun with it, build it and even provide patches, you are more than welcome!
As a conclusion, I would like to remind that I am trying to make a living by developing Free Software, and for the time being, it doesn’t work that well. All my coding is supported through ZeMarmot project, which funds us for making an animation film while contributing to Free Software, in particular GIMP, but others too. For instance, while working on this Android stuff in the previous week, I improved Crossroad, contributed patches and a bug report to meson (and I may have discovered a bug in json-glib but I must check to be sure, before filling a new bug report) and to gradle, and also I have a few commits pending for babl (for Android support)…
P.S.: by the way, thanks to Free Electrons (a company for embedded Linux development, which contributes back quite a lot to the kernel; I like this, so here is for my minor help by citing them, even though I was not required to!) for having offered me a training in Android system development, a year ago. This is not the reason I first got interested into hand-held devices (rather the opposite, I went there because I had the interest), nor has it been that much help to what I did above, but that sure showed me how easy it indeed was and gave me a preview of the world of embedded Linux.