Looking for SQL hero: database structure for synonyms database

Scito

New member
Jan 2, 2007
150
3
0
The Netherlands
Hi guys,

I'm working on article spinning software for the Dutch market (because all the article spinners saying they can do international languages SUCK).

Getting the words isn't the problem, it's a matter of just scraping some sites. The real question is though: what would be the best setup of my database? What indexes should I use? How many fields?

Of anyone can give me some insights on this: thanks :)!
 


Have you looked into Database Design any? It's hard to say what you will need without making lists of information you want collected and then doing some Data Modeling.

Look into MySQL Workbench. It's great for modeling databases and then exporting it out to the queries used to make the tables.

I'd also highly recommend getting this book. The guy who wrote it was a past professor and he really makes it easy to understand SQL (I understood it before, but he really makes you think about it all differently). You can get some good info from his powerpoints on his website too.
 
I eat databases for lunch. Your question right now is way too broad. Give me an example step by step of what you want your software to do. What data are you going to add, what do you want to look up, are you going to want to alias words or phrases ? Where is this software going to run (PC/Web)?

More input.
 
Alright, good point, I'll try to be more specific.

The type of database I want to use is MySQL and it will be working together with a PHP script on a dedicated server (Linux & FreeBSD).

The information I want to store in the database is the following:
- original word
- synonym of the word
- type of word (e.g. verb)
- plural or singular

So, I'll be inputting a text in the PHP script. The PHP script will strip the text into words. Every word then has to be checked in the database if there's a corresponding synonym. If yes, replace it.

This is what I've come up with so far.
 
Yes, but will it also store spintax, dates of the changes and additions of words, etc.?

There's a ton of stuff you aren't telling us (and rightly so in a public forum) that needs to be discussed before a good data model can be done to make you a good, Normalized, database.

That's what I'm trying to get you to understand. We can't tell you if you need x numbers of tables and fields that use recursion or other good data storage practices without that information - and you can't tell us all that without risking the competition stealing your ideas.
 
You don't even need an SQL database (yes, SQL database is the theoretically better way, but I have an aversion to them, and hey! this works!). My spinner stores everything in a 4.6 MB text file with this format:
{afterward|after|later|subsequently|next|then|shortly....}

... but is still spins anything I put into it in less than a second. The way my program works is:

-Upon initiation, read all words into a list of Phrase objects. The Phrase objects look like this:
string originalWord;
List<string> replacements;
so you will have the original word and a list of replacements for it at hand.

-Sort the phrase list using String.Compare on originalWord.

-Go through the entire text we want to spin, split by space/newline

-For each word in the text, do a binary search on the Phrase list to see if one of the Phrase objects has the originalWord you are looking for.

-If it does, get the replacements, and replace the original word in the text with {originalWord|replacements[0]|replacements[1]|replacements[2]...}, and continue on to the next word.

Now, I didn't give you everything. You still have to figure out how to handle phrase synonyms such as:
{the most|probably the most|one of the most|essentially the most|by far the most|the best|just about the most|quite possibly the most|the foremost|some of the most|the more|one|the maximum|...}

... in a timely manner, but I leave that as an exercise to the reader ;)

-t
 
You just need one table. The fields are exactly what you listed, primary index includes all of the columns. I don't see why you need the part of speech and singular or plaurel though. I built a pretty good database just starting with a dictionary file then scraping synonyms for each, adding records as I went along.
 
You just need one table. The fields are exactly what you listed, primary index includes all of the columns. I don't see why you need the part of speech and singular or plaurel though. I built a pretty good database just starting with a dictionary file then scraping synonyms for each, adding records as I went along.

So. Much. Fail.

Primary index = 1 field or 1 set of field segments. If you are going to include every field in the db the index search will be slower than a table scan...

Hit me up - I have worked with Oracle and MySQL for a decade. I know this shit inside and out.
 
When you making a choice of what database engines to go with, you have to start from the queries that you want to run then concern the data type. The indexes and the structure are totally depend on them. For the big amount of simple SQL requests go with MySQL, for long complicated queries I would rather use Oracle or Postgres. Also what stops you from using several DBs? I assume you run your own hosting servers. That is the concern also, how to build them and what OS to run. For Ex., we have discovered that ubuntu servers rapes mac osx big time in a matter of connections.
I'm sorry if it made you more confused than you were before but I think it's better to consider everything that's involved into DB architecture, unless you just want to do some makeup for your current system.
 
So. Much. Fail.

Primary index = 1 field or 1 set of field segments. If you are going to include every field in the db the index search will be slower than a table scan...

Hit me up - I have worked with Oracle and MySQL for a decade. I know this shit inside and out.

Isn't every query the app does going to be a table scan on one field? If you don't set that index how do you avoid duplicate entries? Not trying to be an ass, curious if you can tell me something I don't know.