main.py

from news_graph import NewsMining

content1 = """

We’re living through a fascinating era of rapid change for the blockbuster movie model. America producers, eager to get their $200 million movies into the lucrative Chinese market, are increasingly looking for Chinese production partners, shooting in Chinese locations, and adding China-friendly characters and plotlines to American movies, even including extra scenes just for the Chinese cuts of films. But simultaneously, China and other countries are moving toward the blockbuster model themselves, creating homegrown films that don’t need to involve American partners at all.

And just as American films attempt to find paydays in foreign markets, foreign blockbusters are coming to America. The Wandering Earth, described as China’s first big-budget science fiction thriller, quietly made it onto screens at AMC theaters in North America this weekend, and it shows a new side of Chinese filmmaking — one focused toward futuristic spectacles rather than China’s traditionally grand, massive historical epics. At the same time, The Wandering Earth feels like a throwback to a few familiar eras of American filmmaking. While the film’s cast, setting, and tone are all Chinese, longtime science fiction fans are going to see a lot on the screen that reminds them of other movies, for better or worse.

The film, based on a short story by Three-Body Problem author Cixin Liu, lays out a crisis of unprecedented proportions: the sun has become unstable, and within a hundred years, it will expand to consume Earth. Within 300, the entire solar system will be gone. Earth’s governments rally and unite to face the problem, and come up with a novel solution: they speckle the planet with 10,000 gigantic jets, and blast it out of its orbit and off on a hundred-generation journey to a new home 4.2 light-years away. The idea is to use Jupiter’s gravitational well to pick up speed for the trip, but a malfunction of the Earth Engine system leaves the planet caught in Jupiter’s gravity, and gradually being pulled toward destruction. A frantic group of workers have to scramble to reactivate the jets and correct the Earth’s course.

The action takes place in two arenas simultaneously. On the Earth’s frigid surface, self-proclaimed genius Liu Qi (Qu Chuxiao) and his younger adopted sister Han Duoduo (Zhao Jinmai) get roped into the rescue efforts after they run away from home. Han is just curious to see the planet’s surface — most of humanity now lives in crowded underground cities, and the surface is for workers only — but Liu Qi is nursing a deeper grudge against his astronaut father Liu Peiqiang (longtime martial-arts movie star Wu Jing) and grandfather (Ng Man-tat, whom Western audiences might recognize from Stephen Chow’s Shaolin Soccer). When Liu Qi was a child, his father moved to a newly-built international space station, designed to move ahead of Earth as a guide and pathfinder. Now an adult, Liu Qi feels his father abandoned him, and wants to strike out independently.

Meanwhile, on the space station, Liu Peiqiang is ironically a day away from completing his 17-year tour of duty and returning to Earth and his family when the crisis hits. The station’s artificial intelligence, MOSS, insists on putting the station’s personnel in hibernation to save energy, but Liu Peiqiang realizes the computer has a secret agenda, and he and a Russian cosmonaut set out to defy it.

The entire space plot may feel suspiciously familiar to American audiences, who have a strong emotional touchstone when it comes to a calm-voiced computer in space telling a desperate astronaut that it can’t obey his orders, even when human lives are on the line, because it has orders of its own. MOSS even looks something like the HAL 9000 from 2001: A Space Odyssey: it’s represented as a red light on a gimbled panel, like a single unblinking, judgmental red eye. But a good deal of Liu Peiqiang’s space adventure also plays out like a sequence from Alfonso Cuarón’s 2013 Oscar-winner Gravity, with dizzying sequences of astronauts trying to navigate clouds of debris and find handholds on a treacherous moving station while tumbling through space.

Meanwhile, the Earthside half of the mission resembles nothing so much as the 2003 nonsense-thriller The Core, about a team trying to drill their way to the center of the Earth to set the planet’s core spinning again. Liu Qi and Han pick up a few distinctive allies along the way, including biracial Chinese-Australian gadabout Tim (viral video star Mike Sui), but mostly, the characters are drawn as blandly and broadly as in any American action movie, and a fair number of them get killed along the journey without ever having developed enough personality for audiences to feel the loss.

Pretty much any flaw The Wandering Earth can claim — flashy action scenes without much substance, a marked bent toward sticky sentimentality, an insistently pushy score that demands emotional response from the audience at every given moment — are familiar flaws from past blockbusters. Where the film really stands out, though, is in its eye for grandiose spectacle. Director Frant Gwo gives the film a surprising stateliness, especially in the scenes of the mobile Earth wandering the cosmos, wreathed in tiny blue jets that leave eerie space-contrails behind. His attention to detail is marvelous — in scenes where characters stand on Earth’s surface, contemplating Jupiter’s malicious beauty, the swirling colors of the Great Red Spot are clearly visible in reflections in their suit helmets.

No matter how familiar the plot beats feel, that level of attention not just to functional special effects, but to outright beauty, makes The Wandering Earth memorable. Not every CGI sequence is aesthetically impeccable — sequences like a vehicle chase through a frozen Shanghai sometimes look brittle and false. But everything having to do with Jupiter, Earth as seen from space, and the space station subplot is visually sumptuous. This is frequently a gorgeously rendered film, with an emphasis on intimidating space vistas that will look tremendous on IMAX screens.

And while the constant attempts to flee the destructive power of changing weather have their own echoes in past films, from The Day After Tomorrow to 2012, Gwo mostly keeps the action tight and propulsive. The Wandering Earth is frequently breathless, though the action occasionally gets a little muddled in editing. At times, particularly on the surface scenes where everyone is wearing identical pressure suits, it can be easy to lose track of which character is where. It’s often easy to feel that Gwo cares more about the collective rescue project than about any individual character — potentially a value that will work better for Chinese audiences than American viewers, who are looking for a single standout hero to root for.

But the film’s biggest strengths are in its quieter moments, where Gwo takes the time to contemplate Jupiter’s gravity well slowly deepening its pull on Earth’s atmosphere, or Liu Qi staring up, awestruck, at the gas giant dwarfing his home. In those chilly sequences, the film calls back to an older tradition of slower science fiction, in epic-scale classics like 1951’s When Worlds Collide or 1956’s Forbidden Planet. The interludes are brief, but they’re a welcome respite from chase sequences and destruction.

The Wandering Earth gets pretty goofy at times, with jokes about Tim’s heritage, or Liu Qi’s inexperienced driving and overwhelming arrogance, or with high-speed banter over an impossibly long technical manual that no one has time to digest in the middle of an emergency. At times, the humor is even a little dry, as when MOSS responds to Liu Peiqiang’s repeated rebellions with a passive-aggressive “Will all violators stop contact immediately with Earth?” But Gwo finds time for majesty as well, and makes a point of considering the problem on a global scale, rather than just focusing on the few desperate strivers who’ve tied the Earth’s potential destruction into their own personal issues.

Much like the Russian space blockbuster Salyut-7 was a fascinating look into the cultural differences between American films and their Russian equivalents, The Wandering Earth feels like a telling illustration of the similarities and differences between Chinese and American values. Gwo’s film is full of images and moments that will be familiar to American audiences, and it has an equally familiar preoccupation with the importance of family connections, and the nobility of sacrifice. But it also puts a strong focus on global collective action, on the need for international cooperation, and for the will of the group over the will of the individual.

None of these things will be inherently alien to American viewers, who may experience The Wandering Earth as a best-of mash-up of past science fiction films, just with less-familiar faces in the lead roles. But as China gets into the action-blockbuster business, it’ll continue to be fascinating to see how the country brings its own distinctive voices and talents into a global market. The Wandering Earth feels like the same kind of projects American filmmakers are making — accessible, thrill-focused, and at least somewhat generic, in an attempt to go down easy with any audience. But there’s enough specific personality in it to point to a future of more nationally inflected blockbusters. Once every country is making would-be international crossovers, the strongest appeal may come from the most distinctive, personal visions with the most to say about the cultures they come from.
"""

content2 = """
Ever since 2017, when the new flat and fast course was introduced in the 11th　edition of the Tokyo Marathon, fast times are being recorded. The Tokyo Marathon is now a truly world class race.

Two years ago, a former world record holder Wilson Kipsang (KEN) recorded the Japanese all comers record of 2:03:58 on this course. In the same year, a marathon debutant Yuta Shitara (JPN, Honda) ran aggressively and for a while, stayed close to Kipsang. The race gave hope and courage to Japanese runners and fans, leading to the thoughts "Japanese can compete well at the world class level." 
Sequel to the drama, Shitara recorded Japanese record of 2:06:11 for the first time in 16 years. To add further excitement, the Japan Industrial Track & Field Association awarded Shitara with 100-million-yen prize money as part of "Project Exceed", a program launched to encourage athletes to break national records. Additionally, nine Japanese runners have cracked 2:10 for the marathon, showing that many of them are ready to compete at the world class level. I remember last year's race was nothing like previous years, leading to the new era for Japanese men's marathon.

This year, the 13th edition of the Tokyo Marathon, is expected to be even more exciting than the previous years.

On the men's side, Kenenisa Bekele (ETH), who has the personal best of 2:03:03, third fastest marathon in history on the standard course, head the field. Kenenisa, who is endowed with superior speed, won the gold medal at track events in both the 2004 & 2008 Olympics. He is known for his aggressive running style and thus likely to be the force to be reckoned with in Tokyo Marathon. Total of five 2:04 runners, including great tactician Dickson Chumba, the only two-time Tokyo Marathon champion (2014, 2018) on the men's side, and El Hassan El Abassi (BRN), who battled with Hiroto Inoue (JPN, MHPS) in the 2018 Asian Games, will start the race. Furthermore, two Kenyans with 2:05 marathon best are in the field. It is very interesting to see how the race will unfold.

The most fascinating runner in the field, at least from the Japanese perspective, is Suguru Osako (JPN, Nike Oregon Project), who recorded the national record of 2:05:50 in the Chicago Marathon in October. How will he run his first Tokyo Marathon? In the 2017 Boston Marathon, his debut, Osako recorded 2:10:28. In his second marathon, the 2017 Fukuoka Marathon, he improved his personal best to 2:07:19 before recording the national record in Chicago. He is steadily improving his personal best and his potential is unimaginable. By battling well against the runners from abroad including Kenenisa Bekele, perhaps he can improve his own national record again.

At the present time, two sets of pacemakers are planned for the men's race. First set of pacemakers will lead at 2:57-2:58 /km pace until 30km, targeting the time around 2:04:30 to 2:05:10. Osako, along with the Africans headed by Kenenisa Bekele are expected to follow the first set of pacemakers. I would also like to see other Japanese runners to follow these pacers to experience the world class races.
The second set of pacemakers will lead the runners at 3min/km to target the final time around 2:06:35. Ryo Kiname (JPN, MHPS), who finished seventh with 2:08:08 last year in Tokyo, as well as Shogo Nakamura (JPN, Fujitsu), who has the best of 2:08:16, have a potential to record 2:06 marathon. They probably target at least 2:07 marathon in Tokyo.

It is hard not to expect a great marathon from Kenta Murayama (JPN, Asahi Kasei). In the 10th edition of the Tokyo Marathon in 2016, Murayama was the only Japanese to stay with the lead pack as far as 22km. The impact he made in his debut marathon is unforgettable. Many imagine that Murayama runs better at a fast-paced marathon. If he is sucked into the fast pace, Murayama could move up several levels as a marathon runner.

In recent years, Japanese men's marathon is on the rise. Last year, Shitara and Inoue recorded 2:06 marathon in Tokyo. Later in the year, Inoue won the Asian Games marathon, Osako recorded the national record and first 2:05 marathon by Japanese in Chicago, and in December Yuma Hattori (JPN, Toyota Motors) became the first Japanese to win the Fukuoka Marathon in 14 years. With young and upcoming runners on the rise, the Japanese are closing on to the best in the world. I am determined to do everything possible for the race so that at least five Japanese will run 2:06 marathon before the 2020 Tokyo Olympics.

The women's field is also loaded this year with fast runners and thus the expectation for a great competition is high. The rising star from abroad is Ruti Aga (ETH), who recorded the personal best of 2:18:34 at the Berlin Marathon last September. Perhaps one set of women's pacemakers will aim for a 2:17 finishing time. In addition, there are three other runners with the personal best of 2:19 including Florence Kiplagat.

Among the Japanese women Honami Maeda (JPN, Tenmaya), who was second in the 2018 Osaka Women's Marathon and Keiko Nogami (JPN, 18 Bank), who won the silver medal at the 2018 Asian Games, will start the race. They have already qualified for the Marathon Grand Championships (MGC, the Japanese Olympic trial marathon). Yuka Takashima (JPN, Shiseido), who ran 10000m in the 2016 Olympic Games in Rio de Janeiro, is running her second marathon in her attempt to qualify for the MGC. Mao Ichiyama (JPN, Wacoal), making her marathon debut, is expected to run aggressively. Excitement is never ending for this year's race.

With the Tokyo Olympics just around the corner, elite runners, both men and women, are gathering in Tokyo to experience the Olympic course. It is exciting to see the world class competitions. If Osako and his Japanese rivals come close to the national record, it will add to the excitement. The history may be in making. Please enjoy the Tokyo Marathon 2019, the scene of world class runners running over the world class course.
"""

content3 = """
RESEARCHERS WHO STUDY stylometry—the statistical analysis of linguistic style—have long known that writing is a unique, individualistic process. The vocabulary you select, your syntax, and your grammatical decisions leave behind a signature. Automated tools can now accurately identify the author of a forum post for example, as long as they have adequate training data to work with. But newer research shows that stylometry can also apply to artificial language samples, like code. Software developers, it turns out, leave behind a fingerprint as well.

Rachel Greenstadt, an associate professor of computer science at Drexel University, and Aylin Caliskan, Greenstadt's former PhD student and now an assistant professor at George Washington University, have found that code, like other forms of stylistic expression, are not anonymous. At the DefCon hacking conference Friday, the pair will present a number of studies they've conducted using machine learning techniques to de-anonymize the authors of code samples. Their work, some of which was funded by and conducted in collaboration with the United States Army Research Laboratory, could be useful in a plagiarism dispute, for instance, but also has privacy implications, especially for the thousands of developers who contribute open source code to the world.

How To De-Anonymize Code
Here's a simple explanation of how the researchers used machine learning to uncover who authored a piece of code. First, the algorithm they designed identifies all the features found in a selection of code samples. That's a lot of different characteristics. Think of every aspect that exists in natural language: There's the words you choose, which way you put them together, sentence length, and so on. Greenstadt and Caliskan then narrowed the features to only include the ones that actually distinguish developers from each other, trimming the list from hundreds of thousands to around 50 or so.

The researchers don't rely on low-level features, like how code was formatted. Instead, they create "abstract syntax trees," which reflect code's underlying structure, rather than its arbitrary components. Their technique is akin to prioritizing someone's sentence structure, instead of whether they indent each line in a paragraph.

'People should be aware that it’s generally very hard to 100 percent hide your identity in these kinds of situations.'

RACHEL GREENSTADT, DREXEL UNIVERSITY

The method also need requires examples of someone's work to teach an algorithm to know when it spots another one of their code samples. If a random GitHub account pops up and publishes a code fragment, Greenstadt and Caliskan wouldn't necessarily be able to identify the person behind it, because they only have one sample to work with. (They could possibly tell that it was a developer they hadn't seen before.) Greenstadt and Caliskan, however, don't need your life's work to attribute code to you. It only takes a few short samples.

For example, in a 2017 paper, Caliskan, Greenstadt, and two other researchers demonstrated that even small snippets of code on the repository site GitHub can be enough to differentiate one coder from another with a high degree of accuracy.

Most impressively, Caliskan and a team of other researchers showed in a separate paper that it’s possible to de-anonymize a programmer using only their compiled binary code. After a developer finishes writing a section of code, a program called a compiler turns it into a series of 1s and 0s that can be read by a machine, called binary. To humans, it mostly looks like nonsense.

Caliskan and the other researchers she worked with can decompile the binary back into the C++ programming language, while preserving elements of a developer’s unique style. Imagine you wrote a paper and used Google Translate to transform it into another language. While the text might seem completely different, elements of how you write are still embedded in traits like your syntax. The same holds true for code.

“Style is preserved,” says Caliskan. “There is a very strong stylistic fingerprint that remains when things are based on learning on an individual basis.”

To conduct the binary experiment, Caliskan and the other researchers used code samples from Google’s annual Code Jam competition. The machine learning algorithm correctly identified a group of 100 individual programmers 96 percent of the time, using eight code samples from each. Even when the sample size was widened to 600 programmers, the algorithm still made an accurate identification 83 percent of the time.

Plagiarism and Privacy Implications
Caliskan and Greenstadt say their work could be used to tell whether a programming student plagiarized, or whether a developer violated a noncompete clause in their employment contract. Security researchers could potentially use it to help determine who might have created a specific type of malware.

More worryingly, an authoritarian government could use the de-anonymization techniques to identify the individuals behind, say, a censorship circumvention tool. The research also has privacy implications for developers who contribute to open source projects, especially if they consistently use the same GitHub account.

“People should be aware that it’s generally very hard to 100 percent hide your identity in these kinds of situations,” says Greenstadt.

For example, Greenstadt and Caliskan have found that some off-the-shelf obfuscation methods, tools used by software engineers to make code more complicated, and thus secure, aren't successful in hiding a developer's unique style. The researchers say that in the future, however, programmers might be able to conceal their styles using more sophisticated methods.

“I do think as we proceed, one thing we’re going to discover is what kind of obfuscation works to hide this stuff,” says Greenstadt. “I’m not convinced that the end point of this is going to be everything you do forever is traceable. I hope not, anyway.”

In a separate paper, for instance, a team led by Lucy Simko at the University of Washington found that programmers could craft code with the intention of tricking an algorithm into believing it had been authored by someone else. The team found that a developer may be able to spoof their "coding signature," even if they're not specifically trained in creating forgeries.

Future Work
Greenstadt and Caliskan have also uncovered a number of interesting insights about the nature of programming. For example, they have found that experienced developers appear easier to identify than novice ones. The more skilled you are, the more unique your work apparently becomes. That might be in part because beginner programmers often copy and paste code solutions from websites like Stack Overflow.

Similarly, they found that code samples addressing more difficult problems are also easier to attribute. Using a sample set of 62 programmers, who each solved seven "easy" problems, the researchers were able to de-anonymize their work 90 percent of the time. When the researchers used seven "hard" problem samples instead, their accuracy bumped to 95 percent.

'Style is preserved.'

AYLIN CALISKAN, GEORGE WASHINGTON UNIVERSITY

In the future, Greenstadt and Caliskan want to understand how other factors might affect a person’s coding style, like what happens when members of the same organization collaborate on a project. They also want to explore questions like whether people from different countries code in different ways. In one preliminary study for example, they found they could differentiate between code samples written by Canadian and by Chinese developers with over 90 percent accuracy.

There’s also the question of whether the same attribution methods could be used across different programming languages in a standardized way. For now, the researchers stress that de-anonymizing code is still a mysterious process, though so far their methods have been shown to work.

“We’re still trying to understand what makes something really attributable and what doesn't,” says Greenstadt. “There’s enough here to say it should be a concern, but I hope it doesn’t cause anybody to not contribute publicly on things.”

Updated 8/14/18 2:53 PM PST: This article has been updated to reflect the contributions of the US Army Research Laboratory.
"""

Miner = NewsMining()
Miner.main(content1)