I am looking for an SQL statement to shuffle the last name column in a persons table. To be able to make a demo on the basis of existing data. A solution with the ability to shuffle more than one column in one statement would be welcome. Thanks a lot and best regards, Robert |
You can use a WINDOW clause to access the GivenName (or whatever value) of a neighbouring row. As you have not supplied a test script, here's a sample from the SQL Anywhere (V16) Demo database, there's a table "employees" with the mentioned columns Surname, GivenName and Sex. In that case, I have reduced the access to employess of one department and have rotated their surnames, so that each person gets the Surname of the previous person (ordered by EmployeeID) but separated for the sexes. select EmployeeID, Surname, isnull( first_value(Surname) over (partition by Sex order by EmployeeID rows between 1 preceding and 1 preceding), last_value (Surname) over (partition by Sex order by EmployeeID rows between 1 following and unbounded following)) as PreviousRotatedSurname, GivenName, Sex from Employees where DepartmentID = 400 order by Sex, EmployeeID Note the usage of the ISNULL() expression to supply the first person of each partition (i.e. the first female/male) with the surname of the last person. The result contains both the original surname and the rotated value. Result: (Ah, I want to upload an image, however, that fails again...) So here's the result as text output: EmployeeID Surname PreviousRotatedSurname GivenName Sex ----------- -------------------- ---------------------- -------------------- --- 184 Espinoza Hildebrand Melissa F 207 Francis Espinoza Jane F 591 Barletta Francis Irene F 992 Butterfield Barletta Joyce F 1062 Blaikie Butterfield Barbara F 1507 Wetherby Blaikie Ruth F 1643 Lambert Wetherby Elizabeth F 1684 Hildebrand Lambert Janet F 318 Crow Ahmed John M 409 Weaver Crow Bruce M 888 Charlton Weaver Doug M 1191 Bucceri Charlton Matthew M 1576 Evans Bucceri Scott M 1607 Morris Evans Mark M 1740 Nielsen Morris Robert M 1751 Ahmed Nielsen Alex M I don't know whether you might finally need to make sure that the original surname and the "rotated" one are different. Aside: A completely different approach would be to use a CROSS JOIN between persons of the same sex and join when their surnames are different. Of course that will usually generate many more rows than the original table has. Just saying. What I was looking for :-) Thanks very much, Volker
(28 Apr '17, 08:51)
robert
Replies hidden
If that solved your question, feel free to mark the answer as accepted.
(29 Apr '17, 08:21)
Volker Barth
Of course I will do that Volker, thought it was easy to convert this to an UPDATE statement, but doesn't seem so. As in the end I'd like to update the surname column, I tried: UPDATE Employees SET Surname = isnull( first_value(Surname) over (partition by Sex order by EmployeeID rows between 1 preceding and 1 preceding), last_value (Surname) over (partition by Sex order by EmployeeID rows between 1 following and unbounded following)) where DepartmentID = 400 But that doesn't work. Any idea how to convert the SELECT statement into an UPDATE of the Surname?
(02 May '17, 10:12)
robert
Replies hidden
1
I'd put the original SELECT into a derived query (say, named dt) and then join the original TABLE with that dt, such as UPDATE Employees e SET SurName = dt.PreviousRotatedSurname FROM Employees e INNER JOIN (...< the original SELECT, probably without the ORDER BY> ...) dt ON e.EmployeeID = dt.EmployeeID ORDER BY e.EmployeeID; Note, for such ad-hoc updates, I do always use a SELECT to check whether the join will work as expected...
(02 May '17, 10:27)
Volker Barth
|
Please show us the sample table schema, some initial data and the desired "shuffled" outcome.
Thanks, Volker, for asking.
The (simplified) content of the table is:
...and after shuffling (the last name), the outcome would be for example like this:
It would be nice to have a SQL statement where I can shuffle on a condition like sex code, i. e. shuffling only first names which have sex code w, for example.
Hope this clarifies it a bit.
What does "shuffle" mean? Exactly like a card deck, where the values are preserved but the order is randomized? What are the constraints?
FWIW, are you trying to randomize real data to generate test data or to "anonymize" data?
@Volker: There are not much constraints, as long as the name is not obviously recognisable it's ok. It's not that I need a strict randomisation of the data.
@Breck: It would not even be necessary to preserve the last names. Preserving the names and putting them in a different order would be ok. As far as I understand you, if the last names wouldn't preserved, they would have to be newly generated? That is not needed.
I don't want to stop the party here, but why don' you export the data as CSV, use Excel to "shuffle" the data and update the existing records in the new (or the same) database?
You need the result fast, or you want to invest the time into the SQL statement? :)